From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: emacs-26 8f18d12: Improve documentation of decoding into a unibyte buffer Date: Mon, 27 May 2019 09:32:11 -0400 Message-ID: References: <20190525191039.14136.23307@vcs0.savannah.gnu.org> <20190525191040.CCD6C207F5@vcs0.savannah.gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="46751"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon May 27 15:33:33 2019 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1hVFkq-000C1J-Ot for ged-emacs-devel@m.gmane.org; Mon, 27 May 2019 15:33:32 +0200 Original-Received: from localhost ([127.0.0.1]:46040 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hVFkp-0006qB-MX for ged-emacs-devel@m.gmane.org; Mon, 27 May 2019 09:33:31 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:35048) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hVFjg-0006p5-Pm for emacs-devel@gnu.org; Mon, 27 May 2019 09:32:21 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hVFjf-0004ES-Sv for emacs-devel@gnu.org; Mon, 27 May 2019 09:32:20 -0400 Original-Received: from [195.159.176.226] (port=43152 helo=blaine.gmane.org) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1hVFjf-0004Dd-Mz for emacs-devel@gnu.org; Mon, 27 May 2019 09:32:19 -0400 Original-Received: from list by blaine.gmane.org with local (Exim 4.89) (envelope-from ) id 1hVFje-000AQr-02 for emacs-devel@gnu.org; Mon, 27 May 2019 15:32:17 +0200 X-Injected-Via-Gmane: http://gmane.org/ Cancel-Lock: sha1:4LfLkEx/XPyMW6qvVO5/X7+Pa/k= X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 195.159.176.226 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:237041 Archived-At: > Almost all uses of string-as-unibyte are gone now, but the one I was > looking at is this one in international/mule-cmds.el: > > (defun encoded-string-description (str coding-system) > "Return a pretty description of STR that is encoded by CODING-SYSTEM." > (setq str (string-as-unibyte str)) > (mapconcat > (if (and coding-system (eq (coding-system-type coding-system) 'iso-2022)) > ;; Try to get a pretty description for ISO 2022 escape sequences. > (function (lambda (x) (or (cdr (assq x iso-2022-control-alist)) > (format "#x%02X" x)))) > (function (lambda (x) (format "#x%02X" x)))) > str " ")) > > If I take a string of say "β", and replace string-as-unibyte with > (encode-coding-string 'emacs-internal), `encoded-string-description' > prints "#xCE #xB2", which is the correct UTF-8 encoded > value. 'raw-text works too. Iʼm certain that there are subtle > differences between the two that I donʼt understand. But "β" is not a "STR that is encoded by CODING-SYSTEM", so this output is neither correct nor incorrect in any case. I think the right thing to do here is one of: - signal an error if `str` is multibyte. - signal an error if `str` is multibyte and contains non-byte chars. - if multibyte, encode `str` with `coding-system`. - just don't bother looking at whether `str` is unibyte or not, just pass it as is to `mapconcat`. - just don't bother looking at whether `str` is unibyte or not, just pass it as is to `mapconcat` but in the lambda, do catch the case where `x` is an "eight bit raw-byte char" and if so pass it to multibyte-char-to-unibyte. - ... But encoding `str` with any coding system like raw-text or emacs-internal doesn't seem to make much sense. Stefan