From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: David Kastrup <dak@gnu.org>
Newsgroups: gmane.emacs.devel
Subject: Re: Unibyte characters, strings, and buffers
Date: Sat, 29 Mar 2014 09:40:03 +0100
Message-ID: <8761mxqty4.fsf@fencepost.gnu.org>
References: <831txozsqa.fsf@gnu.org> <jwv4n2j2141.fsf-monnier+emacs@gnu.org>
	<83ppl7y30l.fsf@gnu.org> <87r45nouvx.fsf@uwakimon.sk.tsukuba.ac.jp>
	<8361myyac6.fsf@gnu.org> <87a9capqfr.fsf@uwakimon.sk.tsukuba.ac.jp>
	<83eh1mfd09.fsf@gnu.org> <87wqfeqkl1.fsf@fencepost.gnu.org>
	<834n2ifa50.fsf@gnu.org> <87siq2qg6a.fsf@fencepost.gnu.org>
	<83zjk9ec92.fsf@gnu.org> <87d2h5qxhm.fsf@fencepost.gnu.org>
	<83siq1e7kq.fsf@gnu.org>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain
X-Trace: ger.gmane.org 1396082414 10400 80.91.229.3 (29 Mar 2014 08:40:14 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Sat, 29 Mar 2014 08:40:14 +0000 (UTC)
Cc: emacs-devel@gnu.org
To: Eli Zaretskii <eliz@gnu.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Mar 29 09:40:24 2014
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1WTooR-0006Oo-QE
	for ged-emacs-devel@m.gmane.org; Sat, 29 Mar 2014 09:40:23 +0100
Original-Received: from localhost ([::1]:37969 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1WTooR-00064j-GE
	for ged-emacs-devel@m.gmane.org; Sat, 29 Mar 2014 04:40:23 -0400
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:54774)
	by lists.gnu.org with esmtp (Exim 4.71) (envelope-from <dak@gnu.org>)
	id 1WTooM-00060S-IX
	for emacs-devel@gnu.org; Sat, 29 Mar 2014 04:40:21 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <dak@gnu.org>) id 1WTooH-0001xt-Gp
	for emacs-devel@gnu.org; Sat, 29 Mar 2014 04:40:18 -0400
Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:50556)
	by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <dak@gnu.org>)
	id 1WTooH-0001xn-Em
	for emacs-devel@gnu.org; Sat, 29 Mar 2014 04:40:13 -0400
Original-Received: from localhost ([127.0.0.1]:57732 helo=lola)
	by fencepost.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dak@gnu.org>)
	id 1WTooG-0003bY-NA; Sat, 29 Mar 2014 04:40:13 -0400
Original-Received: by lola (Postfix, from userid 1000)
	id 322DBE0497; Sat, 29 Mar 2014 09:40:03 +0100 (CET)
In-Reply-To: <83siq1e7kq.fsf@gnu.org> (Eli Zaretskii's message of "Sat, 29 Mar
	2014 11:24:05 +0300")
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4.50 (gnu/linux)
X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address
	(bad octet value).
X-Received-From: 2001:4830:134:3::e
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:171123
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/171123>

Eli Zaretskii <eliz@gnu.org> writes:

>> From: David Kastrup <dak@gnu.org>
>> Cc: emacs-devel@gnu.org
>> Date: Sat, 29 Mar 2014 08:23:33 +0100
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> >> From: David Kastrup <dak@gnu.org>
>> >> Cc: emacs-devel@gnu.org
>> >> Date: Fri, 28 Mar 2014 20:25:17 +0100
>> >> 
>> >> >> > Then what do you call a buffer whose "text" is encoded?
>> >> >> 
>> >> >> I can't speak for Stephen, of course, but my impression was he would
>> >> >> call it "a bad idea".
>> >> >
>> >> > Then what other ideas to use when Lisp code needs to encode or decode
>> >> > text manually?
>> >> 
>> >> Redecode right to a "binary" coding system would be my guess.
>> >
>> > Sorry, I don't follow.  Can you tell more what that means?
>> 
>> It means a buffer where each _character_ has the same value that the
>> no-longer-available unibyte buffer would have in its bytes/characters.
>
> This doesn't seem to be a complete description of what is suggested.
> E.g., just by looking at the values of characters, it is impossible to
> distinguish between Latin characters below 256 and raw bytes.  In a
> unibyte buffer, we know how to make that distinction,

Uh, what?  The point of a unibyte buffer is that it does not make the
distinction.

> but if there are no unibyte buffers, something else is needed for
> doing that.

>> You can do that whether or not the conceptual array of 0..255 characters
>> is internally encoded in unibyte or multibyte encodings.
>
> What do you mean by "multibyte encodings" in this context?  Are you
> suggesting to store the bytes 128..255 as Latin-1 characters,
> i.e. using the 2-byte UTF-8 sequences of the corresponding Latin
> characters?

That would make the most sense, yes.

> Or are you suggesting something else?

You could also use the "raw byte" character encodings we use for not
losing information when reading not properly formed utf-8 files into a
multibyte buffer, but that seems less practical when working with the
character codes.

-- 
David Kastrup