From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: "Stephen J. Turnbull" <stephen@xemacs.org>
Newsgroups: gmane.emacs.devel
Subject: Re: Unibyte characters, strings, and buffers
Date: Sun, 30 Mar 2014 21:13:01 +0900
Message-ID: <878urropf6.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <831txozsqa.fsf@gnu.org> <jwv4n2j2141.fsf-monnier+emacs@gnu.org>
	<83ppl7y30l.fsf@gnu.org> <87r45nouvx.fsf@uwakimon.sk.tsukuba.ac.jp>
	<8361myyac6.fsf@gnu.org> <87a9capqfr.fsf@uwakimon.sk.tsukuba.ac.jp>
	<83eh1mfd09.fsf@gnu.org> <87ob0pnyt6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<m2y4ztuuvc.fsf@linux-m68k.org>
	<87ioqxnhhk.fsf@uwakimon.sk.tsukuba.ac.jp>
	<87bnwpov7b.fsf@fencepost.gnu.org>
	<87eh1lnf4q.fsf@uwakimon.sk.tsukuba.ac.jp>
	<877g7dos88.fsf@fencepost.gnu.org>
	<87a9c8o2yq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<87a9c8njqf.fsf@fencepost.gnu.org>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Trace: ger.gmane.org 1396181618 13701 80.91.229.3 (30 Mar 2014 12:13:38 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Sun, 30 Mar 2014 12:13:38 +0000 (UTC)
Cc: emacs-devel@gnu.org
To: David Kastrup <dak@gnu.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Mar 30 14:13:30 2014
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1WUEcE-0006of-68
	for ged-emacs-devel@m.gmane.org; Sun, 30 Mar 2014 14:13:30 +0200
Original-Received: from localhost ([::1]:44140 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1WUEcD-0007WF-JW
	for ged-emacs-devel@m.gmane.org; Sun, 30 Mar 2014 08:13:29 -0400
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:43532)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <stephen@xemacs.org>) id 1WUEc4-0007UD-0f
	for emacs-devel@gnu.org; Sun, 30 Mar 2014 08:13:27 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <stephen@xemacs.org>) id 1WUEbw-00076J-4n
	for emacs-devel@gnu.org; Sun, 30 Mar 2014 08:13:19 -0400
Original-Received: from mgmt2.sk.tsukuba.ac.jp ([130.158.97.224]:55063)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <stephen@xemacs.org>)
	id 1WUEbn-00074Q-PK; Sun, 30 Mar 2014 08:13:04 -0400
Original-Received: from uwakimon.sk.tsukuba.ac.jp (uwakimon.sk.tsukuba.ac.jp
	[130.158.99.156])
	by mgmt2.sk.tsukuba.ac.jp (Postfix) with ESMTP id 74FDA9707DD;
	Sun, 30 Mar 2014 21:13:01 +0900 (JST)
Original-Received: by uwakimon.sk.tsukuba.ac.jp (Postfix, from userid 1000)
	id 61BA41A28DC; Sun, 30 Mar 2014 21:13:01 +0900 (JST)
In-Reply-To: <87a9c8njqf.fsf@fencepost.gnu.org>
X-Mailer: VM undefined under 21.5  (beta34) "kale" 2a0f42961ed4 XEmacs Lucid
	(x86_64-unknown-linux)
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x
X-Received-From: 130.158.97.224
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:171195
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/171195>

David Kastrup writes:

 > I don't think it gets much more transparent than "unibyte flag only
 > marks the valid Unicode-in-Emacs character range".  I'm for the
 > range 0..255,

It's easy to be more transparent in that case: no unibyte flag.
However, that delays detection of out-of-range characters to encoding
rather than the insert step.

 > Andreas for something like 0..127 U 4194176..4194303 which
 > I=C2=A0find cumbersome for little return.

Agreed.  If bytes are going to be non-characters, having a half-ASCII
type is just going to cause surprises when US English apps get
internationalized.

 > > Maybe it wouldn't work; maybe it would be inefficient.  But one
 > > thing it wouldn't do is present a charset other than Unicode to
 > > Lisp.
 >=20
 > Neither does the above.  Abolishing unibyte just means that
 > buffers/strings have only one possible character range.

That's not really true.  Encoding and decoding will still constrain
ranges; as pointed out above, it delays detection on the one hand, on
the other avoids spurious errors when the user really does want to add
characters outside of the prespecified range for some reason.

 > That does not really give any "transparency" per se from the Lisp
 > level.

I disagree, based primarily on the experience of XEmacs that we can do
everything (with characters and bytes) that Emacs does[1], without
randomly injecting new bugs due to lack of unibyte that I can recall.
(Other bugs, yes, but bugs due to adapting code that used unibyte to
XEmacs where there is no unibyte, no.)

 > The interesting level is the C level.  You need a byte stream
 > representation in C at some point anyway, and not being able to
 > call this representation either "string" or "buffer" may be neat in
 > some manners but will end up cumbersome in others.

I don't see why you need that, actually.  Of course you need C level
streams for I/O, but I don't see why it needs to persist past decoding
into a buffer or string.


Footnotes:=20
[1]  OK, we don't have a representation of "undecodable bytes".  But
that's not conceptually hard, just tedious enough that nobody's done
it yet.