From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: master d57bb0c: Treat passed strings as raw-text when percent-escaping in epg Date: Thu, 12 Dec 2019 08:58:33 -0500 Message-ID: References: <20191212073737.19923.49084@vcs0.savannah.gnu.org> <20191212073738.9C7A3200E3@vcs0.savannah.gnu.org> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="97332"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) Cc: emacs-devel@gnu.org To: Robert Pluim Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Dec 12 15:17:22 2019 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1ifPHO-000PC1-4t for ged-emacs-devel@m.gmane.org; Thu, 12 Dec 2019 15:17:22 +0100 Original-Received: from localhost ([::1]:60226 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ifPHM-00061v-FI for ged-emacs-devel@m.gmane.org; Thu, 12 Dec 2019 09:17:20 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:52600) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ifOzK-0000XN-Sh for emacs-devel@gnu.org; Thu, 12 Dec 2019 08:58:44 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ifOzI-0006q9-AF for emacs-devel@gnu.org; Thu, 12 Dec 2019 08:58:41 -0500 Original-Received: from mailscanner.iro.umontreal.ca ([132.204.25.50]:29838) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ifOzI-0006ob-0u for emacs-devel@gnu.org; Thu, 12 Dec 2019 08:58:40 -0500 Original-Received: from pmg3.iro.umontreal.ca (localhost [127.0.0.1]) by pmg3.iro.umontreal.ca (Proxmox) with ESMTP id E2FF044D010; Thu, 12 Dec 2019 08:58:38 -0500 (EST) Original-Received: from mail01.iro.umontreal.ca (unknown [172.31.2.1]) by pmg3.iro.umontreal.ca (Proxmox) with ESMTP id 1171844D012; Thu, 12 Dec 2019 08:58:37 -0500 (EST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=iro.umontreal.ca; s=mail; t=1576159117; bh=NdYY1Zj88HCzgOXl6UPH9eYME0tWkYdy+6s+AZ3qpDo=; h=From:To:Cc:Subject:References:Date:In-Reply-To:From; b=Eyfkhb6bU80+zB5ZQkkFigZ8Nae8BhRWzWkcQOgka2quc1LpKezJL/mLd4zfe4PNh nmwKF/Xtl7Q5x8PSJPzViClQ4RdmeZ/Kn/FBjmAVuX/r+xyujFPuPqHVCR1TSHdVf/ kxy4LkXXHyU4yYF5Mo5/Ki6gVspN4r0vc7k5ZHP3mldsrDSbuBu9cyo51ofGDGD7Xj ZIqzJ+4d6ua3Vn7BZPdYCl+6thXVl9nHBA/DAkQsMH3C5dXMhKR/HbPh88CSWXVn4H PC2AocgYX38iqq/8ymSGHgpyDm1xlMk+LBBCwpfwOHNqRwgx4kRb0gHWqLBrzyt4dt 6PiZO+ey4THhQ== Original-Received: from pastel (unknown [45.72.154.226]) by mail01.iro.umontreal.ca (Postfix) with ESMTPSA id C631412049E; Thu, 12 Dec 2019 08:58:36 -0500 (EST) In-Reply-To: <20191212073738.9C7A3200E3@vcs0.savannah.gnu.org> (Robert Pluim's message of "Thu, 12 Dec 2019 02:37:38 -0500 (EST)") X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 132.204.25.50 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:243318 Archived-At: Hi Robert, > The strings contained in gpg keys can contain UTF-8 data, but can also > use percent-escapes to encode non-ASCII chars. When converting those > escapes, use 'raw-text' coding system rather than 'string-to-unibyte', > since the latter signals an error for non-ASCII characters. I don't quite understand: "can contain UTF-8 data" seems odd here since you're calling `encode-coding-string` whose input argument is a sequence of characters whereas "UTF-8 data" can only be found in sequences of bytes. Did you mean "can contain non-ASCII characters"? The other problem with the above description is the "raw-text" since it's far from clear what it means (personally I really have no idea what is "raw text" and the way Emacs understands "raw text" is more or less "EOL-separated lines of bytes" which does not seem to match your description since string-to-unibyte doesn't signal errors when encountering bytes). Looking at the code, I see that the only caller of `epg--decode-percent-escape` seems to be `epg--decode-percent-escape-utf-8` which decodes the bytes returned by `epg--decode-percent-escape` using `utf-8` so I think it makes more sense to encode using `utf-8` than `raw-text`, WDYT? Stefan diff --git a/lisp/epg.el b/lisp/epg.el index 5466716e34..e2ce68e161 100644 --- a/lisp/epg.el +++ b/lisp/epg.el @@ -2032,7 +2032,7 @@ epg-edit-key (epg-reset context))) (defun epg--decode-percent-escape (string) - (setq string (encode-coding-string string 'raw-text)) + ;; `string' is assumed to be a sequence of *bytes*. (let ((index 0)) (while (string-match "%\\(\\(%\\)\\|\\([[:xdigit:]][[:xdigit:]]\\)\\)" string index) @@ -2047,7 +2047,10 @@ epg--decode-percent-escape string)) (defun epg--decode-percent-escape-as-utf-8 (string) - (decode-coding-string (epg--decode-percent-escape string) 'utf-8)) + (decode-coding-string + (epg--decode-percent-escape + (encode-coding-string string 'utf-8)) + 'utf-8)) (defun epg--decode-hexstring (string) (let ((index 0))