From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Robert Pluim Newsgroups: gmane.emacs.devel Subject: Re: master d57bb0c: Treat passed strings as raw-text when percent-escaping in epg Date: Thu, 12 Dec 2019 16:19:46 +0100 Message-ID: References: <20191212073737.19923.49084@vcs0.savannah.gnu.org> <20191212073738.9C7A3200E3@vcs0.savannah.gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="102892"; mail-complaints-to="usenet@blaine.gmane.org" Cc: emacs-devel@gnu.org To: Stefan Monnier Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Dec 12 16:20:31 2019 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1ifQGU-000QWa-Pa for ged-emacs-devel@m.gmane.org; Thu, 12 Dec 2019 16:20:30 +0100 Original-Received: from localhost ([::1]:32978 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ifQGT-00055N-A5 for ged-emacs-devel@m.gmane.org; Thu, 12 Dec 2019 10:20:29 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:35495) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ifQFr-0004WM-Qk for emacs-devel@gnu.org; Thu, 12 Dec 2019 10:19:52 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ifQFq-00075H-JR for emacs-devel@gnu.org; Thu, 12 Dec 2019 10:19:51 -0500 Original-Received: from mail-wr1-x429.google.com ([2a00:1450:4864:20::429]:36645) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1ifQFq-00072U-A2 for emacs-devel@gnu.org; Thu, 12 Dec 2019 10:19:50 -0500 Original-Received: by mail-wr1-x429.google.com with SMTP id z3so3167884wru.3 for ; Thu, 12 Dec 2019 07:19:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:references:date:in-reply-to:message-id :mime-version:content-transfer-encoding; bh=HBRAikGUJsTyDDSEgTn51hHiZC4BvpTqaT1RSfHVoMw=; b=VH+Bc2fm89SruLcv1cxOQSkSMyDUwSw/swAJllhgknTq1TNw6DKlFchFyjNQVvhGAV GXcisQgBzChY4Sehk03KZeLVjnKBIWRMtTVW3X+mozvEZHsKWg77s+6I3BjsspdLOEQc FpuQ9JzBPH6nyJ2d45K4n5m5euNzblOh1ULd7IGItHsn7ZJutM6S7tvrwLScdWBTSzSv oGMgYTZ6E6daBIG0jbqHXLXslZeDEEYVetdW62mL4KsslgfuDGxnRRDaY7GSarvm9Goi ij57MPFf2HwGYvQUHysfFcQAhhlbB7L5jSWv7N3A9f/OoYD9fdqxuyi/U3t1laIMjzru UAdQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to :message-id:mime-version:content-transfer-encoding; bh=HBRAikGUJsTyDDSEgTn51hHiZC4BvpTqaT1RSfHVoMw=; b=hZ57ZbHh6Hu2V/m14HJv/eAPqMSRz4vlYF6D8Uz/oHekmUQz9bmXfZyVgR9Zyl/wd7 W3F7EYX7OA6bqHRJ6F3TGXr9mQIKhR85d0ltlRFsL95fF86H7VLdqDYxvoajJ7L7ZG5e vJOz1+esVGNjVDuayQzTTYQIRP34eUc8+NdEfv/13vVDYZXfRDJQ3ZL8SnRv0rct/9G+ Q/B1i3ur9WREe58ak8dKVBvGQ+lGpXmgJnCa2hH1yIXMsMBi0WDUtGljnXUPWzEz22bC DjOD5LgYGANrIOh/3tCDYBVruldQnOw8SFuza7hr4z/enAQK8WNJ3pfTtcIPAUxtP2cE kf9Q== X-Gm-Message-State: APjAAAWgTYrtU5zwsbJilSPAR/VNpwMkN37GUjPCu23n26nOUW3/V30A f6mzwQgvuXUbYb1I41XCVuZF6/2/XzK2qw== X-Google-Smtp-Source: APXvYqyAOVQnuWV7N7xCd8e1PpMoULNuz3ikHDoZZc2kY/9eTBKPVB6HY4ovb37J+J4y8UTK4c5dfg== X-Received: by 2002:a5d:6ac3:: with SMTP id u3mr7160678wrw.25.1576163988301; Thu, 12 Dec 2019 07:19:48 -0800 (PST) Original-Received: from rpluim-mac ([149.5.228.1]) by smtp.gmail.com with ESMTPSA id d8sm6316597wre.13.2019.12.12.07.19.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 12 Dec 2019 07:19:47 -0800 (PST) In-Reply-To: (Stefan Monnier's message of "Thu, 12 Dec 2019 08:58:33 -0500") X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:4864:20::429 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:243319 Archived-At: >>>>> On Thu, 12 Dec 2019 08:58:33 -0500, Stefan Monnier said: Stefan> Hi Robert, >> The strings contained in gpg keys can contain UTF-8 data, but can al= so >> use percent-escapes to encode non-ASCII chars. When converting those >> escapes, use 'raw-text' coding system rather than 'string-to-unibyte= ', >> since the latter signals an error for non-ASCII characters. Stefan> I don't quite understand: "can contain UTF-8 data" seems odd he= re since Stefan> you're calling `encode-coding-string` whose input argument is a= sequence Stefan> of characters whereas "UTF-8 data" can only be found in sequenc= es of bytes. Stefan> Did you mean "can contain non-ASCII characters"? "can contain non-ASCII characters encoded using UTF-8", which means they end up in a multi-byte string in emacs. Stefan> The other problem with the above description is the "raw-text" = since Stefan> it's far from clear what it means (personally I really have no = idea Stefan> what is "raw text" and the way Emacs understands "raw text" is = more or Stefan> less "EOL-separated lines of bytes" which does not seem to matc= h your Stefan> description since string-to-unibyte doesn't signal errors when Stefan> encountering bytes). It=CA=BCs replacing the use of string-to-unibyte on a multibyte string containing non-ASCII characters, which signals an error, with encode-coding-string using 'raw-text, which produces a bunch of bytes. My other choices were 'binary or 'no-conversion, which do the same, but have even less meaningful names. Stefan> Looking at the code, I see that the only caller of Stefan> `epg--decode-percent-escape` seems to be Stefan> `epg--decode-percent-escape-utf-8` which decodes the bytes retu= rned by Stefan> `epg--decode-percent-escape` using `utf-8` so I think it makes = more Stefan> sense to encode using `utf-8` than `raw-text`, WDYT? No. The string that is passed to epg--decode-percent-escape can contain non-ASCII characters encoded as UTF-8, plus percent-escaped representations of non-ASCII characters. In order to convert those percent-escaped characters correctly, the string has to be treated as a unibyte array of bytes, then re-converted to multibyte by encoding with utf-8 afterwards. Robert