From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Stephen J. Turnbull" Newsgroups: gmane.emacs.devel Subject: Re: Unibyte characters, strings, and buffers Date: Sun, 30 Mar 2014 21:13:01 +0900 Message-ID: <878urropf6.fsf@uwakimon.sk.tsukuba.ac.jp> References: <831txozsqa.fsf@gnu.org> <83ppl7y30l.fsf@gnu.org> <87r45nouvx.fsf@uwakimon.sk.tsukuba.ac.jp> <8361myyac6.fsf@gnu.org> <87a9capqfr.fsf@uwakimon.sk.tsukuba.ac.jp> <83eh1mfd09.fsf@gnu.org> <87ob0pnyt6.fsf@uwakimon.sk.tsukuba.ac.jp> <87ioqxnhhk.fsf@uwakimon.sk.tsukuba.ac.jp> <87bnwpov7b.fsf@fencepost.gnu.org> <87eh1lnf4q.fsf@uwakimon.sk.tsukuba.ac.jp> <877g7dos88.fsf@fencepost.gnu.org> <87a9c8o2yq.fsf@uwakimon.sk.tsukuba.ac.jp> <87a9c8njqf.fsf@fencepost.gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1396181618 13701 80.91.229.3 (30 Mar 2014 12:13:38 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 30 Mar 2014 12:13:38 +0000 (UTC) Cc: emacs-devel@gnu.org To: David Kastrup Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Mar 30 14:13:30 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1WUEcE-0006of-68 for ged-emacs-devel@m.gmane.org; Sun, 30 Mar 2014 14:13:30 +0200 Original-Received: from localhost ([::1]:44140 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WUEcD-0007WF-JW for ged-emacs-devel@m.gmane.org; Sun, 30 Mar 2014 08:13:29 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:43532) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WUEc4-0007UD-0f for emacs-devel@gnu.org; Sun, 30 Mar 2014 08:13:27 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WUEbw-00076J-4n for emacs-devel@gnu.org; Sun, 30 Mar 2014 08:13:19 -0400 Original-Received: from mgmt2.sk.tsukuba.ac.jp ([130.158.97.224]:55063) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WUEbn-00074Q-PK; Sun, 30 Mar 2014 08:13:04 -0400 Original-Received: from uwakimon.sk.tsukuba.ac.jp (uwakimon.sk.tsukuba.ac.jp [130.158.99.156]) by mgmt2.sk.tsukuba.ac.jp (Postfix) with ESMTP id 74FDA9707DD; Sun, 30 Mar 2014 21:13:01 +0900 (JST) Original-Received: by uwakimon.sk.tsukuba.ac.jp (Postfix, from userid 1000) id 61BA41A28DC; Sun, 30 Mar 2014 21:13:01 +0900 (JST) In-Reply-To: <87a9c8njqf.fsf@fencepost.gnu.org> X-Mailer: VM undefined under 21.5 (beta34) "kale" 2a0f42961ed4 XEmacs Lucid (x86_64-unknown-linux) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 130.158.97.224 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:171195 Archived-At: David Kastrup writes: > I don't think it gets much more transparent than "unibyte flag only > marks the valid Unicode-in-Emacs character range". I'm for the > range 0..255, It's easy to be more transparent in that case: no unibyte flag. However, that delays detection of out-of-range characters to encoding rather than the insert step. > Andreas for something like 0..127 U 4194176..4194303 which > I=C2=A0find cumbersome for little return. Agreed. If bytes are going to be non-characters, having a half-ASCII type is just going to cause surprises when US English apps get internationalized. > > Maybe it wouldn't work; maybe it would be inefficient. But one > > thing it wouldn't do is present a charset other than Unicode to > > Lisp. >=20 > Neither does the above. Abolishing unibyte just means that > buffers/strings have only one possible character range. That's not really true. Encoding and decoding will still constrain ranges; as pointed out above, it delays detection on the one hand, on the other avoids spurious errors when the user really does want to add characters outside of the prespecified range for some reason. > That does not really give any "transparency" per se from the Lisp > level. I disagree, based primarily on the experience of XEmacs that we can do everything (with characters and bytes) that Emacs does[1], without randomly injecting new bugs due to lack of unibyte that I can recall. (Other bugs, yes, but bugs due to adapting code that used unibyte to XEmacs where there is no unibyte, no.) > The interesting level is the C level. You need a byte stream > representation in C at some point anyway, and not being able to > call this representation either "string" or "buffer" may be neat in > some manners but will end up cumbersome in others. I don't see why you need that, actually. Of course you need C level streams for I/O, but I don't see why it needs to persist past decoding into a buffer or string. Footnotes:=20 [1] OK, we don't have a representation of "undecodable bytes". But that's not conceptually hard, just tedious enough that nobody's done it yet.