From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: David Kastrup Newsgroups: gmane.emacs.devel Subject: Re: Unibyte characters, strings, and buffers Date: Sun, 30 Mar 2014 17:05:19 +0200 Message-ID: <87y4zrn2vk.fsf@fencepost.gnu.org> References: <831txozsqa.fsf@gnu.org> <83ppl7y30l.fsf@gnu.org> <87r45nouvx.fsf@uwakimon.sk.tsukuba.ac.jp> <8361myyac6.fsf@gnu.org> <87a9capqfr.fsf@uwakimon.sk.tsukuba.ac.jp> <83eh1mfd09.fsf@gnu.org> <87ob0pnyt6.fsf@uwakimon.sk.tsukuba.ac.jp> <87ioqxnhhk.fsf@uwakimon.sk.tsukuba.ac.jp> <87bnwpov7b.fsf@fencepost.gnu.org> <87eh1lnf4q.fsf@uwakimon.sk.tsukuba.ac.jp> <877g7dos88.fsf@fencepost.gnu.org> <87a9c8o2yq.fsf@uwakimon.sk.tsukuba.ac.jp> <87a9c8njqf.fsf@fencepost.gnu.org> <87wqfblq58.fsf@igel.home> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1396191963 14991 80.91.229.3 (30 Mar 2014 15:06:03 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 30 Mar 2014 15:06:03 +0000 (UTC) Cc: emacs-devel@gnu.org To: Andreas Schwab Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Mar 30 17:05:55 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1WUHJ4-00069g-N4 for ged-emacs-devel@m.gmane.org; Sun, 30 Mar 2014 17:05:54 +0200 Original-Received: from localhost ([::1]:44634 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WUHJ4-0004rl-4G for ged-emacs-devel@m.gmane.org; Sun, 30 Mar 2014 11:05:54 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:38536) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WUHJ0-0004rV-43 for emacs-devel@gnu.org; Sun, 30 Mar 2014 11:05:50 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WUHIy-0006ZF-3h for emacs-devel@gnu.org; Sun, 30 Mar 2014 11:05:49 -0400 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:45694) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WUHIy-0006ZB-0U for emacs-devel@gnu.org; Sun, 30 Mar 2014 11:05:48 -0400 Original-Received: from localhost ([127.0.0.1]:52867 helo=lola) by fencepost.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WUHIw-0003Ia-Tk; Sun, 30 Mar 2014 11:05:47 -0400 Original-Received: by lola (Postfix, from userid 1000) id BC64FE0511; Sun, 30 Mar 2014 17:05:19 +0200 (CEST) In-Reply-To: <87wqfblq58.fsf@igel.home> (Andreas Schwab's message of "Sun, 30 Mar 2014 16:25:39 +0200") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4.50 (gnu/linux) X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::e X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:171198 Archived-At: Andreas Schwab writes: > David Kastrup writes: > >> I don't think it gets much more transparent than "unibyte flag only >> marks the valid Unicode-in-Emacs character range". I'm for the range >> 0..255, Andreas for something like 0..127 U 4194176..4194303 which >> I=A0find cumbersome for little return. > > Before decoding there is no charset information yet, so using anything > other than the eight-bit charset would be wrong. When "right" does not buy you anything but trouble, why bother? > After decoding, the eight-bit charset is used only for undecodable > bytes. That preserves the distinction between encoded and decoded > strings/buffers (except for the uninteresting trivial ASCII decoding) > in a world without unibyte flag. The "uninteresting trivial ASCII" listens to case-fold-search just as much as the latin-1 code page does. So being "right" for half of the coding range does not really buy anything. --=20 David Kastrup