From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Robert Pluim Newsgroups: gmane.emacs.devel Subject: Re: emacs-26 8f18d12: Improve documentation of decoding into a unibyte buffer Date: Mon, 27 May 2019 15:02:42 +0200 Message-ID: References: <20190525191039.14136.23307@vcs0.savannah.gnu.org> <20190525191040.CCD6C207F5@vcs0.savannah.gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="176484"; mail-complaints-to="usenet@blaine.gmane.org" Cc: emacs-devel@gnu.org To: Stefan Monnier Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon May 27 15:03:42 2019 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1hVFHx-000jnA-L3 for ged-emacs-devel@m.gmane.org; Mon, 27 May 2019 15:03:41 +0200 Original-Received: from localhost ([127.0.0.1]:45738 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hVFHw-0007Vd-8n for ged-emacs-devel@m.gmane.org; Mon, 27 May 2019 09:03:40 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:58286) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hVFH6-0007VP-Jy for emacs-devel@gnu.org; Mon, 27 May 2019 09:02:52 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hVFH5-0000zh-KC for emacs-devel@gnu.org; Mon, 27 May 2019 09:02:48 -0400 Original-Received: from mail-ed1-x52d.google.com ([2a00:1450:4864:20::52d]:43454) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hVFH5-0000zI-ES for emacs-devel@gnu.org; Mon, 27 May 2019 09:02:47 -0400 Original-Received: by mail-ed1-x52d.google.com with SMTP id w33so23263858edb.10 for ; Mon, 27 May 2019 06:02:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:references:mail-followup-to:mail-copies-to :gmane-reply-to-list:date:in-reply-to:message-id:mime-version :content-transfer-encoding; bh=WofDh1+iRQ6LI6Rv+LrkME6RSE7MYap5clDwxY++ODY=; b=IF6ca/oYgAze/KHjIccaewI4eyaaAnXXQ202b1CT/Q7KvMCIQf0Fh3+TXMihfrs91m 3FERQrPotNvMapPt7m2VqZLMq3nqYwa5PGGDtVfapBufBuNd0ZOXb4GPfXLUWMp81dNC lBwmjtWu6AToXb5sF3pVxNIN5dEC7eGi0mMxTokZLX5fucK8CSgj5S9AmpO2WCF/BLZz NAGfoj3MKIc30HtyuxVDGL8AaYDNWrPL6z6Qjg7Pf6S40PwbLzqbzfgpUrXuHbFrMSwv kbm8RZHXpMOIZ/MFufvsssTSwNxQPKplaeA7sV5CXE/Jo0Lxetxl+q3k8O7j5NWRuCI3 85sA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:references:mail-followup-to :mail-copies-to:gmane-reply-to-list:date:in-reply-to:message-id :mime-version:content-transfer-encoding; bh=WofDh1+iRQ6LI6Rv+LrkME6RSE7MYap5clDwxY++ODY=; b=aouRnSaevZW9shINiTDHp3HhAk4ZlodyBDHOX1awm0cZE3+y+3GXZoZyM1nxqjdiDn EktwM+NZl1JeXESZZ+OA9v5J9M/YWfS7VUTCcuIgg737eGkMv/RQ9V38gA82msX6ENCq hAcnCuh3qA+UPu+ka75FqFdqBEaaICGwNye/w/0P1xfMuyIyIF1X38eS1S/IDUt/Bbzn fTIowjk4oTxOUXFYZINQ27e1qptSK+LvbVcTZ7aNLwJjl7StzqHyeFlimFlADrwPE57g Ue2Gr1tJxE9ZYDhGJjOTC7jbAUpv7lGwbZlVVOw46KnTlhxvqKfmH9MT+RXb4ca6Clel bB5g== X-Gm-Message-State: APjAAAX+Ns1ER+cz7r2RbP18UzghQUm1Nuv1eiKndWzCSnWPI7Mw5b1H eGtfNW7P3ud+m2cfdfQYh1R1XdOa X-Google-Smtp-Source: APXvYqx5FDROJoJoTT5+FvuUaVFlaacu4mIwHzTcJmxBTl7+GY05jQuXBLJhpBsawjnW9gHJIqtNgw== X-Received: by 2002:a50:9264:: with SMTP id j33mr120584226eda.125.1558962164908; Mon, 27 May 2019 06:02:44 -0700 (PDT) Original-Received: from rpluim-mac ([149.5.228.1]) by smtp.gmail.com with ESMTPSA id j18sm1712527ejx.67.2019.05.27.06.02.43 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Mon, 27 May 2019 06:02:43 -0700 (PDT) Mail-Followup-To: emacs-devel@gnu.org Mail-Copies-To: never Gmane-Reply-To-List: yes In-Reply-To: (Stefan Monnier's message of "Mon, 27 May 2019 08:24:46 -0400") X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:4864:20::52d X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:237040 Archived-At: >>>>> On Mon, 27 May 2019 08:24:46 -0400, Stefan Monnier said: >> A related issue: C-h f string-as-unibyte >>=20 >> string-as-unibyte is a built-in function in `src/fns.c'. >>=20 >> (string-as-unibyte STRING) >>=20 >> This function is obsolete since 26.1; >> use `encode-coding-string'. >> Probably introduced at or before Emacs version 20.3. >> This function does not change global state, including the match data. >>=20 >> Having trawled through the elisp manual, for the life of me it=CA=BC= s not >> clear which coding system I should use. 'raw-text'? 'us-ascii'? >> Something Else? Stefan> The coding that most closely corresponds to what string-as-unib= yte does Stefan> is `emacs-internal`. In 90% of the cases, it's not what you wa= nt, tho Stefan> because the code shouldn't have used string-as-unibyte in the Stefan> first place, so you'll need to find out what the code *really* = needs. Almost all uses of string-as-unibyte are gone now, but the one I was looking at is this one in international/mule-cmds.el: (defun encoded-string-description (str coding-system) "Return a pretty description of STR that is encoded by CODING-SYSTEM." (setq str (string-as-unibyte str)) (mapconcat (if (and coding-system (eq (coding-system-type coding-system) 'iso-2= 022)) ;; Try to get a pretty description for ISO 2022 escape sequences. (function (lambda (x) (or (cdr (assq x iso-2022-control-alist)) (format "#x%02X" x)))) (function (lambda (x) (format "#x%02X" x)))) str " ")) If I take a string of say "=CE=B2", and replace string-as-unibyte with (encode-coding-string 'emacs-internal), `encoded-string-description' prints "#xCE #xB2", which is the correct UTF-8 encoded value. 'raw-text works too. I=CA=BCm certain that there are subtle differences between the two that I don=CA=BCt understand. Robert