From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!.POSTED!not-for-mail
From: =?utf-8?Q?Toke_H=C3=B8iland-J=C3=B8rgensen?= <toke@toke.dk>
Newsgroups: gmane.emacs.devel
Subject: Re: distinguishing multibyte/unibyte ASCII
Date: Fri, 09 Sep 2016 22:17:58 +0200
Message-ID: <87fup87rpl.fsf@toke.dk>
References: <20160907153014.15752-1-toke@toke.dk>
	<jwvshtb4qc5.fsf-monnier+gmane.emacs.devel@gnu.org>
	<87inu7k5z4.fsf@toke.dk> <83bmzzaawr.fsf@gnu.org>
	<877fank1oc.fsf@toke.dk>
	<bb5f2760-fc3f-630c-290d-8108cad08405@yandex.ru>
	<87inu6iim8.fsf@toke.dk>
	<2563921f-d20d-753b-09eb-c8671bc5b6d6@yandex.ru>
	<87a8fiidso.fsf@toke.dk> <86d1kdq7cs.fsf@realize.ch>
	<83bmzwaopr.fsf@gnu.org> <8660q4ria9.fsf@realize.ch>
	<8360q4amyx.fsf@gnu.org> <jwvd1kc7t4v.fsf-monnier+Inbox@gnu.org>
NNTP-Posting-Host: blaine.gmane.org
Mime-Version: 1.0
Content-Type: text/plain
X-Trace: blaine.gmane.org 1473452350 16013 195.159.176.226 (9 Sep 2016 20:19:10 GMT)
X-Complaints-To: usenet@blaine.gmane.org
NNTP-Posting-Date: Fri, 9 Sep 2016 20:19:10 +0000 (UTC)
Cc: Eli Zaretskii <eliz@gnu.org>, Alain Schneble <a.s@realize.ch>,
	dgutov@yandex.ru, emacs-devel@gnu.org
To: Stefan Monnier <monnier@IRO.UMontreal.CA>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Sep 09 22:19:06 2016
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by blaine.gmane.org with esmtp (Exim 4.84_2)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1biSGN-0003Pq-TP
	for ged-emacs-devel@m.gmane.org; Fri, 09 Sep 2016 22:19:04 +0200
Original-Received: from localhost ([::1]:60027 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1biSGL-0002au-U1
	for ged-emacs-devel@m.gmane.org; Fri, 09 Sep 2016 16:19:01 -0400
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:53118)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <toke@toke.dk>) id 1biSFW-0002Z0-DZ
	for emacs-devel@gnu.org; Fri, 09 Sep 2016 16:18:11 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <toke@toke.dk>) id 1biSFU-0002WZ-Dx
	for emacs-devel@gnu.org; Fri, 09 Sep 2016 16:18:09 -0400
Original-Received: from mail2.tohojo.dk ([77.235.48.147]:49252)
	by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <toke@toke.dk>)
	id 1biSFO-0002Vd-Lm; Fri, 09 Sep 2016 16:18:02 -0400
X-Virus-Scanned: amavisd-new at mail2.tohojo.dk
DKIM-Filter: OpenDKIM Filter v2.10.3 mail2.tohojo.dk AFECC40D5E
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=toke.dk; s=201310;
	t=1473452279; bh=iSyeKeA4ekWlebulAQnyPuaQOj5l2Z1nT6R96vJJbts=;
	h=From:To:Cc:Subject:References:Date:In-Reply-To:From;
	b=mXSwxqCHhzugTmm97aux+TUm89w1t2cjq3AouqNbnO3lwzUku5WJJf0/1R3jBJx4E
	wt96Jn23dF5nev+Ez+TMCJh1wDCUAsEw23mdIp1wLvAylRYm80GJf3T++lfp14zq9u
	xuQpmGHJXwDnCwTsL5wMKUH6D/yQzVuc/6NSNQV4=
Original-Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000)
	id BD95F8261; Fri,  9 Sep 2016 22:17:58 +0200 (CEST)
In-Reply-To: <jwvd1kc7t4v.fsf-monnier+Inbox@gnu.org> (Stefan Monnier's message
	of "Fri, 09 Sep 2016 16:01:57 -0400")
X-Clacks-Overhead: GNU Terry Pratchett
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x
X-Received-From: 77.235.48.147
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel/>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: "Emacs-devel" <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Xref: news.gmane.org gmane.emacs.devel:207338
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/207338>

Stefan Monnier <monnier@IRO.UMontreal.CA> writes:

>> If you just generate an ASCII string from ASCII characters, it will
>> usually be unibyte.  If you take it as a substring from a multibyte
>> buffer, it will usually be multibyte.
>
> And it's arguably a wart in Emacs's handling of chars-vs-bytes.
> But it's kind of hard to fix now.
>
> At some point I tried to change this handling (not exactly fix it) by
> treating multibyte ASCII strings specially (it's easy to recognize by
> checking that the char length is equal to the byte length and both are
> readily available in the "struct Lisp_String" object).  Then when we
> read an ASCII string, instead of making it unibyte, I'd keep it as
> multibyte.  And then change things like "concat" so that those "ASCII
> multibyte" strings don't force the result to be multibyte.
>
> My local Emacs still runs with those changes, but in the end I don't
> think the result is really better (or sufficiently better to justify
> the subtle incompatibilities it introduces).
>
> [ Also, I wouldn't be surprised to hear that such a change causes real
>   problems with utf-7 or EBCDIC, or other systems where decoding/encoding
>   a string of bytes/chars all <127 is not a no-op.  ]

Isn't Unicode fun? :)

-Toke