From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: distinguishing multibyte/unibyte ASCII (was: [PATCH] url: Wrap cookie headers in url-http--encode-string.) Date: Fri, 09 Sep 2016 16:01:57 -0400 Message-ID: References: <20160907153014.15752-1-toke@toke.dk> <87inu7k5z4.fsf@toke.dk> <83bmzzaawr.fsf@gnu.org> <877fank1oc.fsf@toke.dk> <87inu6iim8.fsf@toke.dk> <2563921f-d20d-753b-09eb-c8671bc5b6d6@yandex.ru> <87a8fiidso.fsf@toke.dk> <86d1kdq7cs.fsf@realize.ch> <83bmzwaopr.fsf@gnu.org> <8660q4ria9.fsf@realize.ch> <8360q4amyx.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: blaine.gmane.org 1473451173 6725 195.159.176.226 (9 Sep 2016 19:59:33 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Fri, 9 Sep 2016 19:59:33 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1.50 (gnu/linux) Cc: Alain Schneble , toke@toke.dk, dgutov@yandex.ru, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Sep 09 21:59:29 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1biRxM-0000t6-3P for ged-emacs-devel@m.gmane.org; Fri, 09 Sep 2016 21:59:24 +0200 Original-Received: from localhost ([::1]:59956 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1biRxK-0002tk-74 for ged-emacs-devel@m.gmane.org; Fri, 09 Sep 2016 15:59:22 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:49544) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1biRwn-0002tc-FQ for emacs-devel@gnu.org; Fri, 09 Sep 2016 15:58:50 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1biRwi-0004xy-F0 for emacs-devel@gnu.org; Fri, 09 Sep 2016 15:58:49 -0400 Original-Received: from pruche.dit.umontreal.ca ([132.204.246.22]:46842) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1biRwi-0004xo-9H; Fri, 09 Sep 2016 15:58:44 -0400 Original-Received: from ceviche.home (lechon.iro.umontreal.ca [132.204.27.242]) by pruche.dit.umontreal.ca (8.14.7/8.14.1) with ESMTP id u89JwfAd000943; Fri, 9 Sep 2016 15:58:41 -0400 Original-Received: by ceviche.home (Postfix, from userid 20848) id F37E16638C; Fri, 9 Sep 2016 16:01:57 -0400 (EDT) In-Reply-To: <8360q4amyx.fsf@gnu.org> (Eli Zaretskii's message of "Fri, 09 Sep 2016 22:32:06 +0300") X-NAI-Spam-Flag: NO X-NAI-Spam-Threshold: 5 X-NAI-Spam-Score: 0 X-NAI-Spam-Rules: 1 Rules triggered RV5792=0 X-NAI-Spam-Version: 2.3.0.9418 : core <5792> : inlines <5201> : streams <1697948> : uri <2285497> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 132.204.246.22 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:207336 Archived-At: > If you just generate an ASCII string from ASCII characters, it will > usually be unibyte. If you take it as a substring from a multibyte > buffer, it will usually be multibyte. And it's arguably a wart in Emacs's handling of chars-vs-bytes. But it's kind of hard to fix now. At some point I tried to change this handling (not exactly fix it) by treating multibyte ASCII strings specially (it's easy to recognize by checking that the char length is equal to the byte length and both are readily available in the "struct Lisp_String" object). Then when we read an ASCII string, instead of making it unibyte, I'd keep it as multibyte. And then change things like "concat" so that those "ASCII multibyte" strings don't force the result to be multibyte. My local Emacs still runs with those changes, but in the end I don't think the result is really better (or sufficiently better to justify the subtle incompatibilities it introduces). [ Also, I wouldn't be surprised to hear that such a change causes real problems with utf-7 or EBCDIC, or other systems where decoding/encoding a string of bytes/chars all <127 is not a no-op. ] Stefan