From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Alain Schneble Newsgroups: gmane.emacs.devel Subject: Re: distinguishing multibyte/unibyte ASCII Date: Fri, 9 Sep 2016 23:02:23 +0200 Message-ID: <86wpikpz1c.fsf@realize.ch> References: <20160907153014.15752-1-toke@toke.dk> <87inu7k5z4.fsf@toke.dk> <83bmzzaawr.fsf@gnu.org> <877fank1oc.fsf@toke.dk> <87inu6iim8.fsf@toke.dk> <2563921f-d20d-753b-09eb-c8671bc5b6d6@yandex.ru> <87a8fiidso.fsf@toke.dk> <86d1kdq7cs.fsf@realize.ch> <83bmzwaopr.fsf@gnu.org> <8660q4ria9.fsf@realize.ch> <8360q4amyx.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: blaine.gmane.org 1473455006 23560 195.159.176.226 (9 Sep 2016 21:03:26 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Fri, 9 Sep 2016 21:03:26 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4 (windows-nt) Cc: Eli Zaretskii , toke@toke.dk, emacs-devel@gnu.org, dgutov@yandex.ru To: Stefan Monnier Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Sep 09 23:03:21 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1biSxC-0005Jb-TC for ged-emacs-devel@m.gmane.org; Fri, 09 Sep 2016 23:03:19 +0200 Original-Received: from localhost ([::1]:60237 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1biSxA-0004SL-Lq for ged-emacs-devel@m.gmane.org; Fri, 09 Sep 2016 17:03:16 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:32882) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1biSx3-0004SC-Sj for emacs-devel@gnu.org; Fri, 09 Sep 2016 17:03:11 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1biSx1-0002ib-Ty for emacs-devel@gnu.org; Fri, 09 Sep 2016 17:03:08 -0400 Original-Received: from clientmail.realize.ch ([46.140.89.53]:3731) by eggs.gnu.org with smtp (Exim 4.71) (envelope-from ) id 1biSww-0002hO-0p; Fri, 09 Sep 2016 17:03:02 -0400 Original-Received: from rintintin.hq.realize.ch.lan.rit ([192.168.0.105]) by clientmail.realize.ch ; Fri, 9 Sep 2016 23:02:52 +0200 Original-Received: from MYNGB (192.168.66.64) by rintintin.hq.realize.ch.lan.rit (192.168.0.105) with Microsoft SMTP Server (TLS) id 15.0.516.32; Fri, 9 Sep 2016 23:02:24 +0200 In-Reply-To: (Stefan Monnier's message of "Fri, 09 Sep 2016 16:01:57 -0400") X-ClientProxiedBy: rintintin.hq.realize.ch.lan.rit (192.168.0.105) To rintintin.hq.realize.ch.lan.rit (192.168.0.105) X-detected-operating-system: by eggs.gnu.org: Windows NT kernel [generic] X-Received-From: 46.140.89.53 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:207342 Archived-At: Stefan Monnier writes: >> If you just generate an ASCII string from ASCII characters, it will >> usually be unibyte. If you take it as a substring from a multibyte >> buffer, it will usually be multibyte. > > And it's arguably a wart in Emacs's handling of chars-vs-bytes. > But it's kind of hard to fix now. > > At some point I tried to change this handling (not exactly fix it) by > treating multibyte ASCII strings specially (it's easy to recognize by > checking that the char length is equal to the byte length and both are > readily available in the "struct Lisp_String" object). Then when we > read an ASCII string, instead of making it unibyte, I'd keep it as > multibyte. And then change things like "concat" so that those "ASCII > multibyte" strings don't force the result to be multibyte. > > My local Emacs still runs with those changes, but in the end I don't > think the result is really better (or sufficiently better to justify > the subtle incompatibilities it introduces). I'm relieved to hear that :) Thanks for sharing it.