From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: David Kastrup Newsgroups: gmane.emacs.devel Subject: Re: Unibyte characters, strings, and buffers Date: Sat, 29 Mar 2014 18:00:07 +0100 Message-ID: <877g7dos88.fsf@fencepost.gnu.org> References: <831txozsqa.fsf@gnu.org> <83ppl7y30l.fsf@gnu.org> <87r45nouvx.fsf@uwakimon.sk.tsukuba.ac.jp> <8361myyac6.fsf@gnu.org> <87a9capqfr.fsf@uwakimon.sk.tsukuba.ac.jp> <83eh1mfd09.fsf@gnu.org> <87ob0pnyt6.fsf@uwakimon.sk.tsukuba.ac.jp> <87ioqxnhhk.fsf@uwakimon.sk.tsukuba.ac.jp> <87bnwpov7b.fsf@fencepost.gnu.org> <87eh1lnf4q.fsf@uwakimon.sk.tsukuba.ac.jp> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1396112430 24055 80.91.229.3 (29 Mar 2014 17:00:30 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 29 Mar 2014 17:00:30 +0000 (UTC) Cc: emacs-devel@gnu.org To: "Stephen J. Turnbull" Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Mar 29 18:00:25 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1WTwcL-0003pV-2D for ged-emacs-devel@m.gmane.org; Sat, 29 Mar 2014 18:00:25 +0100 Original-Received: from localhost ([::1]:40186 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WTwcK-0002kz-NF for ged-emacs-devel@m.gmane.org; Sat, 29 Mar 2014 13:00:24 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:48204) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WTwcH-0002Yv-0a for emacs-devel@gnu.org; Sat, 29 Mar 2014 13:00:21 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WTwcF-0005Pz-Th for emacs-devel@gnu.org; Sat, 29 Mar 2014 13:00:20 -0400 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:57803) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WTwcF-0005Pt-Qq for emacs-devel@gnu.org; Sat, 29 Mar 2014 13:00:19 -0400 Original-Received: from localhost ([127.0.0.1]:36746 helo=lola) by fencepost.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WTwcE-0003yd-Qj; Sat, 29 Mar 2014 13:00:19 -0400 Original-Received: by lola (Postfix, from userid 1000) id 9EC66E0497; Sat, 29 Mar 2014 18:00:07 +0100 (CET) In-Reply-To: <87eh1lnf4q.fsf@uwakimon.sk.tsukuba.ac.jp> (Stephen J. Turnbull's message of "Sun, 30 Mar 2014 01:28:21 +0900") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4.50 (gnu/linux) X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::e X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:171158 Archived-At: "Stephen J. Turnbull" writes: > David Kastrup writes: [...] > > And since Unicode 128..255 happens to be the latin-1 plane where > > the latin-1 plane is defined as all, this will mean that the result > > will behave like the latin-1 plane. > > That's not necessarily true. Sure. It depends on whether you value your users' sanity. > It just requires a slightly more complex design, which would be > appropriate for Emacsen (as compared to Python). If the "slightly more complexity" hits in unexpected places, it's going to end up a liability. Having more than one charset to work with if characters themselves don't contain a charset specification is affecting a load of stuff that can then conceivably work in more than one way. Unicode meaningfully uses values 128..255, Bytes meaningfully use values 128..255. When one wants to work without surprises in both cases, converting strings to characters will use 128..255 in either case. Differentiating is, of course, possible. One reasonably cute choice would be mapping bytes (as opposed to characters) 128..255 to integers -128..-1. But if you are talking about case-fold-search semantics, you'll actually need to remap 0..127 as well (they are more relevant than 128..255). And then things get really ugly. -- David Kastrup