From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Dmitry Gutov Newsgroups: gmane.emacs.bugs Subject: bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request Date: Thu, 11 Aug 2016 05:52:42 +0300 Message-ID: References: <83d1ltq3p6.fsf@gnu.org> <83popsocg8.fsf@gnu.org> <7fb3540a-7b74-68cf-2c63-66474de26640@yandex.ru> <83mvkvmbv2.fsf@gnu.org> <27168f12-32d2-cb38-45c0-27d3339c75aa@yandex.ru> <83twf0lb5s.fsf@gnu.org> <83lh07i6g3.fsf@gnu.org> <83k2fri5kc.fsf@gnu.org> <87oa53i3si.fsf@linux-m68k.org> <83bn13i2x2.fsf@gnu.org> <87fuqfhy0q.fsf@linux-m68k.org> <837fbqise6.fsf@gnu.org> <834m6uhu87.fsf@gnu.org> <65f6508f-a464-7f66-fd14-1372dce86aa7@yandex.ru> <83bn10hetr.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------F9F4DF02E330CE03629D055E" X-Trace: blaine.gmane.org 1470884001 25051 195.159.176.226 (11 Aug 2016 02:53:21 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 11 Aug 2016 02:53:21 +0000 (UTC) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:47.0) Gecko/20100101 Thunderbird/47.0 Cc: stakemorii@gmail.com, larsi@gnus.org, schwab@linux-m68k.org, 24117@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Thu Aug 11 04:53:16 2016 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bXg7Q-0006Kp-9W for geb-bug-gnu-emacs@m.gmane.org; Thu, 11 Aug 2016 04:53:16 +0200 Original-Received: from localhost ([::1]:45155 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bXg7N-0007El-1R for geb-bug-gnu-emacs@m.gmane.org; Wed, 10 Aug 2016 22:53:13 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:34831) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bXg7G-0007BY-6W for bug-gnu-emacs@gnu.org; Wed, 10 Aug 2016 22:53:07 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bXg7C-0007OB-1t for bug-gnu-emacs@gnu.org; Wed, 10 Aug 2016 22:53:05 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:55088) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bXg7B-0007Nq-Uf for bug-gnu-emacs@gnu.org; Wed, 10 Aug 2016 22:53:01 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1bXg7B-0006lH-Oq for bug-gnu-emacs@gnu.org; Wed, 10 Aug 2016 22:53:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Dmitry Gutov Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 11 Aug 2016 02:53:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 24117 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 24117-submit@debbugs.gnu.org id=B24117.147088397225976 (code B ref 24117); Thu, 11 Aug 2016 02:53:01 +0000 Original-Received: (at 24117) by debbugs.gnu.org; 11 Aug 2016 02:52:52 +0000 Original-Received: from localhost ([127.0.0.1]:52800 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bXg72-0006kt-9c for submit@debbugs.gnu.org; Wed, 10 Aug 2016 22:52:52 -0400 Original-Received: from mail-wm0-f50.google.com ([74.125.82.50]:38699) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bXg71-0006ke-0J for 24117@debbugs.gnu.org; Wed, 10 Aug 2016 22:52:51 -0400 Original-Received: by mail-wm0-f50.google.com with SMTP id o80so5050522wme.1 for <24117@debbugs.gnu.org>; Wed, 10 Aug 2016 19:52:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to; bh=M1xVYVDhf0/2kfeszCe3qVn5oCvsvGOCHbhI2lo1fs4=; b=h4BOr0VnrHsGIfO8WAlgAa5tVGi+VaWHeKWhPyoT2BSDgQqG6MfA2DzduKuWd4T637 gzem7Fx/MoEj42j6PyNOJkocW/B6eRQ2/4/1LZPYTm/5L3KFu85pCNth2vFS4VJXu5Xl 0m7A6ZTFb0pU8AKzOlPFQdAE2VcxA+a1e4OA51rzWcurr6rzku+mmZ0BUNM0HFR10G1A lYdMDNV0NnmcyaO20WF5leKqMReIRO07GNTpn6tF/KeiDAS4LPcDZA70zv01+WFI54+E g8ZDqLFY2/5xJeBVrjvxXl5D0O77fg9DBQTZYwfoivt2VZwqFBgkQhG54PGlXRqMqCQZ rJnQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:subject:to:references:cc:from:message-id :date:user-agent:mime-version:in-reply-to; bh=M1xVYVDhf0/2kfeszCe3qVn5oCvsvGOCHbhI2lo1fs4=; b=LLNCTABZRYcJVzwHQX3vHycFynReNCffZW33w1742CNDXkgPTZiVztdxzLehlwechq NK//LXQ21Y16+A65F8GIrVbc0GGQgzyKXzluTjKO9JaOGnjWl9Ck7bjggYaY8sKOVROh TFur+3rekVOu8tZNUQYLV8vPsTM3UTzavXfwIxbQ78I3sj+oYtjGDYSeuhg4SmAPT/a+ iOG3n79A8lxD5e1K3xNgM6Tim/wbMUt0Slc8FxkOS/eve4iKTLn/0MeavhJdbKctwqBh h15C0lcflSX+g0CWqsWApkkpVK24MvObywIFU1WQ9aBqKOflbFiGTrDnsRke1EVhH/pS eM5w== X-Gm-Message-State: AEkoousvu9Qw3XFqLWkvveVgxwY8Hodgk3wlensL1CkfKRgN7r7bfiWQGM1BRkdLxEtEXA== X-Received: by 10.25.219.10 with SMTP id s10mr1218300lfg.101.1470883965052; Wed, 10 Aug 2016 19:52:45 -0700 (PDT) Original-Received: from [192.168.1.190] ([178.252.127.239]) by smtp.googlemail.com with ESMTPSA id 74sm97863ljb.36.2016.08.10.19.52.43 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Aug 2016 19:52:44 -0700 (PDT) In-Reply-To: <83bn10hetr.fsf@gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:122052 Archived-At: This is a multi-part message in MIME format. --------------F9F4DF02E330CE03629D055E Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit On 08/10/2016 05:35 PM, Eli Zaretskii wrote: > Are you saying that url-generic-parse-url performs this encoding, and > that using a unibyte buffer causes that to fail? No, url-generic-parse-url contains logic that allows to distinguish between the domain and the path parts of an URL. So apparently it might have to work on multibyte URLs. That's not strictly necessary, however, given how url-encode-url uses it currently (it performs encode-coding-string and decode-coding-string on the URL string). That approach seems flawed to me, but either way, someone will have to choose how url-encode-url should use url-generic-parse-url. If we intend to leave it as-is, then the proposed patch using set-buffer-multibyte actually works fine, even on master, with multibyte URLs. >> So I think the encoding of the URL parts should be performed inside >> url-http-create-request. > > Fine with me, but when I suggested that, you didn't like the > suggestion. If you changed your mind, let's do that. See below. But yes, I'm more inclined toward this approach now, after Lar's objection, and after looking at the code in master. >> On the master branch, host is passed through IDNA encoding, but >> real-fname is untouched. On emacs-25, I think we should convert both >> to unibyte. > > Not sure I understand why there should be a difference between the two > branches. Encoding an ASCII string doesn't do any harm. Since it's ASCII, using utf-8 there seems misleading to me. It's a question of readability. As a bonus, using us-ascii will validate that the strings indeed do not contain any unexpected characters. >> (Why doesn't (encode-coding-string "aaaa" 'ascii) work?) > > It's 'us-ascii, not 'ascii. Thanks. Attaching a patch, it seems to work well enough. I'd like to wait for Lar's response now, but someone will have to make an executive decision. Both patches (this and the set-multibyte-buffer-p one), work in the cases I've tested. This one seems more conservative, but it'll require a manual merge to master. The other one is very trivial, will merge automatically, but might cause problems for potential less-careful uses of url-generic-parse-url. --------------F9F4DF02E330CE03629D055E Content-Type: text/x-patch; name="url-http--encode-string.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="url-http--encode-string.diff" diff --git a/lisp/url/url-http.el b/lisp/url/url-http.el index 7156e6f..860e652 100644 --- a/lisp/url/url-http.el +++ b/lisp/url/url-http.el @@ -235,7 +235,7 @@ url-http-create-request 'url-http-proxy-basic-auth-storage)) (url-get-authentication url-http-proxy nil 'any nil)))) (real-fname (url-filename url-http-target-url)) - (host (url-host url-http-target-url)) + (host (url-http--encode-string (url-host url-http-target-url))) (auth (if (cdr-safe (assoc "Authorization" url-http-extra-headers)) nil (url-get-authentication (or @@ -278,7 +278,8 @@ url-http-create-request (concat ;; The request (or url-http-method "GET") " " - (if using-proxy (url-recreate-url url-http-target-url) real-fname) + (url-http--encode-string + (if using-proxy (url-recreate-url url-http-target-url) real-fname)) " HTTP/" url-http-version "\r\n" ;; Version of MIME we speak "MIME-Version: 1.0\r\n" @@ -360,6 +361,11 @@ url-http-create-request (url-http-debug "Request is: \n%s" request) request)) +(defun url-http--encode-string (s) + (if (multibyte-string-p s) + (encode-coding-string s 'us-ascii) + s)) + ;; Parsing routines (defun url-http-clean-headers () "Remove trailing \r from header lines. --------------F9F4DF02E330CE03629D055E--