From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: emacs-26 8f18d12: Improve documentation of decoding into a unibyte buffer Date: Tue, 28 May 2019 13:43:47 -0400 Message-ID: References: <20190525191039.14136.23307@vcs0.savannah.gnu.org> <20190525191040.CCD6C207F5@vcs0.savannah.gnu.org> <88F01F35-BE24-4F6E-B832-64AFE28CD06B@gnu.org> <83woiazjyo.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="233260"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) Cc: emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue May 28 19:44:52 2019 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1hVg9c-000yYx-6J for ged-emacs-devel@m.gmane.org; Tue, 28 May 2019 19:44:52 +0200 Original-Received: from localhost ([127.0.0.1]:40325 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hVg9b-0004fY-4f for ged-emacs-devel@m.gmane.org; Tue, 28 May 2019 13:44:51 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:44985) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hVg8f-0004eE-Pc for emacs-devel@gnu.org; Tue, 28 May 2019 13:43:55 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hVg8e-000091-SJ for emacs-devel@gnu.org; Tue, 28 May 2019 13:43:53 -0400 Original-Received: from mailscanner.iro.umontreal.ca ([132.204.25.50]:29626) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1hVg8d-00007m-BJ; Tue, 28 May 2019 13:43:51 -0400 Original-Received: from pmg3.iro.umontreal.ca (localhost [127.0.0.1]) by pmg3.iro.umontreal.ca (Proxmox) with ESMTP id 5001B442D22; Tue, 28 May 2019 13:43:50 -0400 (EDT) Original-Received: from mail02.iro.umontreal.ca (unknown [172.31.2.1]) by pmg3.iro.umontreal.ca (Proxmox) with ESMTP id F3D87442D01; Tue, 28 May 2019 13:43:48 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=iro.umontreal.ca; s=mail; t=1559065429; bh=HVtM7Ontjcy6CWCjz5Qf/JBpaFDDsvOGKhbI926UoyY=; h=From:To:Cc:Subject:References:Date:In-Reply-To:From; b=oyEBTcZOyFk6cyfAS7RaFl7Hbv5fywgcPOts2To4jFkiworyN08tjrdALYG1rDJrP GXKaTmCE1j3QdSmpRt4UvNXcTwY4MqCCyuSc2Wwvo/HcIRrjHWGGDd2ZmaX1VYUTZm wPLZYf6AhHhq446hKQvvmdXYWGIqTB8F73Bn2VuechT3lBsTyxwGtzTdBvzWCnJQ0T H7eGKDEXUsFOevkkNVDBlND9m0WouDaGEa2oxN7XpCXMPOJFMG3WXYg5UbvB+VY6Js SRXRB9UC+rtgQ0Fkk8f7HGn9xU6qCrVsNTdnxdtan9jZuKGrD1wo05YMEcmjoCPW9n IZuUpQzrW+fXg== Original-Received: from alfajor (192-171-44-92.cpe.pppoe.ca [192.171.44.92]) by mail02.iro.umontreal.ca (Postfix) with ESMTPSA id B9B8A120A58; Tue, 28 May 2019 13:43:48 -0400 (EDT) In-Reply-To: <83woiazjyo.fsf@gnu.org> (Eli Zaretskii's message of "Tue, 28 May 2019 18:18:07 +0300") X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 132.204.25.50 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:237115 Archived-At: > "Use the source, Luke!" But the dark side is so enticing! > (let* ((str1 (string-as-multibyte (string char))) > (str2 (string-as-multibyte (string char char))) Why on earth do we call string-as-multibyte here? AFAIK, the only cases where `string` returns a unibyte string is when char <128 (it could make sense to also do that for char =E2=89=A5128 and <160, but we don't seem to = do that currently) and these are better turned into multibyte via string-TO-unibyte (tho here we don't even need that, since the unibyte string works just as well for what we do) than string-AS-unibyte. I think this is an error. The patch below seems in order. > (found (find-coding-systems-string str1)) > enc1 enc2 i1 i2) > (if (and (consp found) > (eq (car found) 'undecided)) > str1 <<<<<<<<<<<<<<<<<<<<<<<<< > > If we return here, the value is str1, which is a multibyte string, see > how it was calculated. I think it's a bug. Largely harmless since it only applies to ASCII chars for which we conflate the char/byte status, but still, it's a wart. > I didn't think enough about this to figure out if there can be less > trivial use cases. If you can describe all the cases where > find-coding-systems-string will return a list whose 'car' is > 'undecided', my hat off to you. AFAIK it only happens for pure-ASCII strings. Stefan diff --git a/lisp/international/mule-cmds.el b/lisp/international/mule-cmds= .el index 2b0aaca664..391efbedc8 100644 --- a/lisp/international/mule-cmds.el +++ b/lisp/international/mule-cmds.el @@ -2926,12 +2926,11 @@ encode-coding-char If CODING-SYSTEM can't safely encode CHAR, return nil. The 3rd optional argument CHARSET, if non-nil, is a charset preferred on encoding." - (let* ((str1 (string-as-multibyte (string char))) - (str2 (string-as-multibyte (string char char))) + (let* ((str1 (string char)) + (str2 (string char char)) (found (find-coding-systems-string str1)) enc1 enc2 i1 i2) - (if (and (consp found) - (eq (car found) 'undecided)) + (if (not (multibyte-string-p str1)) str1 (when (memq (coding-system-base coding-system) found) ;; We must find the encoded string of CHAR. But, just encoding