From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Eli Zaretskii <eliz@gnu.org>
Newsgroups: gmane.emacs.devel
Subject: Re: Unibyte characters, strings, and buffers
Date: Sat, 29 Mar 2014 12:25:43 +0300
Message-ID: <83r45le4q0.fsf@gnu.org>
References: <831txozsqa.fsf@gnu.org> <jwv4n2j2141.fsf-monnier+emacs@gnu.org>
	<83ppl7y30l.fsf@gnu.org> <87r45nouvx.fsf@uwakimon.sk.tsukuba.ac.jp>
	<8361myyac6.fsf@gnu.org> <87a9capqfr.fsf@uwakimon.sk.tsukuba.ac.jp>
	<83eh1mfd09.fsf@gnu.org> <87wqfeqkl1.fsf@fencepost.gnu.org>
	<834n2ifa50.fsf@gnu.org> <87siq2qg6a.fsf@fencepost.gnu.org>
	<83zjk9ec92.fsf@gnu.org> <87d2h5qxhm.fsf@fencepost.gnu.org>
	<83siq1e7kq.fsf@gnu.org> <8761mxqty4.fsf@fencepost.gnu.org>
Reply-To: Eli Zaretskii <eliz@gnu.org>
NNTP-Posting-Host: plane.gmane.org
X-Trace: ger.gmane.org 1396085155 5255 80.91.229.3 (29 Mar 2014 09:25:55 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Sat, 29 Mar 2014 09:25:55 +0000 (UTC)
Cc: emacs-devel@gnu.org
To: David Kastrup <dak@gnu.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Mar 29 10:26:02 2014
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1WTpWc-0007Eu-GH
	for ged-emacs-devel@m.gmane.org; Sat, 29 Mar 2014 10:26:02 +0100
Original-Received: from localhost ([::1]:38261 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1WTpWc-0003Ua-03
	for ged-emacs-devel@m.gmane.org; Sat, 29 Mar 2014 05:26:02 -0400
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:33615)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <eliz@gnu.org>) id 1WTpWU-0003Sn-J5
	for emacs-devel@gnu.org; Sat, 29 Mar 2014 05:26:00 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <eliz@gnu.org>) id 1WTpWO-0007Zs-MI
	for emacs-devel@gnu.org; Sat, 29 Mar 2014 05:25:54 -0400
Original-Received: from mtaout20.012.net.il ([80.179.55.166]:41175)
	by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <eliz@gnu.org>)
	id 1WTpWH-0007Xo-9v; Sat, 29 Mar 2014 05:25:41 -0400
Original-Received: from conversion-daemon.a-mtaout20.012.net.il by
	a-mtaout20.012.net.il (HyperSendmail v2007.08) id
	<0N3600E00Y6EHQ00@a-mtaout20.012.net.il>;
	Sat, 29 Mar 2014 12:25:39 +0300 (IDT)
Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout20.012.net.il
	(HyperSendmail v2007.08) with ESMTPA id
	<0N3600ERBY6RH600@a-mtaout20.012.net.il>;
	Sat, 29 Mar 2014 12:25:39 +0300 (IDT)
In-reply-to: <8761mxqty4.fsf@fencepost.gnu.org>
X-012-Sender: halo1@inter.net.il
X-detected-operating-system: by eggs.gnu.org: Solaris 10
X-Received-From: 80.179.55.166
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:171125
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/171125>

> From: David Kastrup <dak@gnu.org>
> Cc: emacs-devel@gnu.org
> Date: Sat, 29 Mar 2014 09:40:03 +0100
> 
> >> It means a buffer where each _character_ has the same value that the
> >> no-longer-available unibyte buffer would have in its bytes/characters.
> >
> > This doesn't seem to be a complete description of what is suggested.
> > E.g., just by looking at the values of characters, it is impossible to
> > distinguish between Latin characters below 256 and raw bytes.  In a
> > unibyte buffer, we know how to make that distinction,
> 
> Uh, what?  The point of a unibyte buffer is that it does not make the
> distinction.

Yes, it does: it treats every character as a raw byte.  So the dilemma
is resolved there by definition.  How to do that without unibyte
buffers remains to be defined, otherwise plans to remove unibyte
buffers are impractical.

> > but if there are no unibyte buffers, something else is needed for
> > doing that.
> 
> >> You can do that whether or not the conceptual array of 0..255 characters
> >> is internally encoded in unibyte or multibyte encodings.
> >
> > What do you mean by "multibyte encodings" in this context?  Are you
> > suggesting to store the bytes 128..255 as Latin-1 characters,
> > i.e. using the 2-byte UTF-8 sequences of the corresponding Latin
> > characters?
> 
> That would make the most sense, yes.

Then the above distinction is impossible, and all kinds of subtly
incorrect behaviors creep in.

> > Or are you suggesting something else?
> 
> You could also use the "raw byte" character encodings we use for not
> losing information when reading not properly formed utf-8 files into a
> multibyte buffer, but that seems less practical when working with the
> character codes.

Why less practical?