From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: David Kastrup <dak@gnu.org>
Newsgroups: gmane.emacs.devel
Subject: Re: Unibyte characters, strings, and buffers
Date: Sat, 29 Mar 2014 18:16:39 +0100
Organization: Organization?!?
Message-ID: <8738i1orgo.fsf@fencepost.gnu.org>
References: <831txozsqa.fsf@gnu.org> <jwv4n2j2141.fsf-monnier+emacs@gnu.org>
	<83ppl7y30l.fsf@gnu.org> <87r45nouvx.fsf@uwakimon.sk.tsukuba.ac.jp>
	<8361myyac6.fsf@gnu.org> <87a9capqfr.fsf@uwakimon.sk.tsukuba.ac.jp>
	<83eh1mfd09.fsf@gnu.org> <87ob0pnyt6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<87a9c9aqhu.fsf@nbtrap.com>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain
X-Trace: ger.gmane.org 1396113436 1566 80.91.229.3 (29 Mar 2014 17:17:16 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Sat, 29 Mar 2014 17:17:16 +0000 (UTC)
To: emacs-devel@gnu.org
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Mar 29 18:17:10 2014
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1WTwsX-0006N6-Fg
	for ged-emacs-devel@m.gmane.org; Sat, 29 Mar 2014 18:17:09 +0100
Original-Received: from localhost ([::1]:40242 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1WTwsX-00078k-1K
	for ged-emacs-devel@m.gmane.org; Sat, 29 Mar 2014 13:17:09 -0400
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:51792)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <ged-emacs-devel@m.gmane.org>) id 1WTwsN-00070C-Lw
	for emacs-devel@gnu.org; Sat, 29 Mar 2014 13:17:05 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <ged-emacs-devel@m.gmane.org>) id 1WTwsI-0002OP-6G
	for emacs-devel@gnu.org; Sat, 29 Mar 2014 13:16:59 -0400
Original-Received: from plane.gmane.org ([80.91.229.3]:39204)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <ged-emacs-devel@m.gmane.org>) id 1WTwsI-0002OI-0Y
	for emacs-devel@gnu.org; Sat, 29 Mar 2014 13:16:54 -0400
Original-Received: from list by plane.gmane.org with local (Exim 4.69)
	(envelope-from <ged-emacs-devel@m.gmane.org>) id 1WTwsD-00069W-Uw
	for emacs-devel@gnu.org; Sat, 29 Mar 2014 18:16:49 +0100
Original-Received: from x2f4094b.dyn.telefonica.de ([2.244.9.75])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <emacs-devel@gnu.org>; Sat, 29 Mar 2014 18:16:49 +0100
Original-Received: from dak by x2f4094b.dyn.telefonica.de with local (Gmexim 0.1
	(Debian)) id 1AlnuQ-0007hv-00
	for <emacs-devel@gnu.org>; Sat, 29 Mar 2014 18:16:49 +0100
X-Injected-Via-Gmane: http://gmane.org/
Original-Lines: 41
Original-X-Complaints-To: usenet@ger.gmane.org
X-Gmane-NNTP-Posting-Host: x2f4094b.dyn.telefonica.de
X-Face: 2FEFf>]>q>2iw=B6,
	xrUubRI>pR&Ml9=ao@P@i)L:\urd*t9M~y1^:+Y]'C0~{mAl`oQuAl
	\!3KEIp?*w`|bL5qr,H)LFO6Q=qx~iH4DN; i";
	/yuIsqbLLCh/!U#X[S~(5eZ41to5f%E@'ELIi$t^
	Vc\LWP@J5p^rst0+('>Er0=^1{]M9!p?&:\z]|;&=NP3AhB!B_bi^]Pfkw
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4.50 (gnu/linux)
Cancel-Lock: sha1:49YKBoCcb0IBUOPg0bfBCRdLd0Q=
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
	recognized.
X-Received-From: 80.91.229.3
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:171163
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/171163>

Nathan Trapuzzano <nbtrap@nbtrap.com> writes:

> "Stephen J. Turnbull" <stephen@xemacs.org> writes:
>
>> What is relevant is how to represent byte streams in Emacs.  The
>> obvious non-unibyte way is a one-to-one mapping of bytes to Unicode
>> characters.  It is *extremely* convenient if the first 128 of those
>> bytes correspond to the ASCII coded character set, because so many
>> wire protocols use ASCII "words" syntactically.  The other 128 don't
>> matter much, so why not just use the extremely convenient Latin-1 set
>> for them?
>
> Sorry if someone brought this up already, but one reason raw bytes
> shouldn't be represented as Latin-1 characters is that the "raw
> bytes"-ness would be lost when writing them back to disk if the stream
> also contained characters outside the Latin-1 range.

No.

> For example, say we decode a stream of raw bytes as utf8, but that the
> stream contains some non-utf8 sequences.  IIUC, Emacs will interpret
> those as "raw bytes", so that when it goes to encode the string to write
> it back, they will be written back verbatim.

"Raw bytes" here are represented as particular characters outside of the
Unicode range.  They are representable in multibyte buffers.  They never
were representable in unibyte buffers.  While it is conceivable to map
characters 128..255 in unibyte strings/buffers to the respective
character codes outside of the Unicode range, that would render
programmatic manipulation of bytes strenuous.

> Whereas, if they had been interpreted as Latin-1 characters, they
> would get written back as the UTF8 equivalents.  Hence you have the
> odd situation where you can decode and then encode and end up with a
> different string.

No, you can't unless you decode into a unibyte buffer, and then all bets
are off regarding reencoding.

-- 
David Kastrup