From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Dmitry Gutov Newsgroups: gmane.emacs.bugs Subject: bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request Date: Mon, 8 Aug 2016 04:56:58 +0300 Message-ID: References: <83d1ltq3p6.fsf@gnu.org> <83popsocg8.fsf@gnu.org> <7fb3540a-7b74-68cf-2c63-66474de26640@yandex.ru> <83mvkvmbv2.fsf@gnu.org> <27168f12-32d2-cb38-45c0-27d3339c75aa@yandex.ru> <83twf0lb5s.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Trace: blaine.gmane.org 1470621503 2099 195.159.176.226 (8 Aug 2016 01:58:23 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Mon, 8 Aug 2016 01:58:23 +0000 (UTC) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:47.0) Gecko/20100101 Thunderbird/47.0 Cc: stakemorii@gmail.com, larsi@gnus.org, 24117@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Mon Aug 08 03:58:18 2016 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bWZpZ-0008Lz-2v for geb-bug-gnu-emacs@m.gmane.org; Mon, 08 Aug 2016 03:58:17 +0200 Original-Received: from localhost ([::1]:54477 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bWZpV-0002NA-Pg for geb-bug-gnu-emacs@m.gmane.org; Sun, 07 Aug 2016 21:58:13 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:47248) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bWZpO-0002KB-AH for bug-gnu-emacs@gnu.org; Sun, 07 Aug 2016 21:58:07 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bWZpK-0004pN-7P for bug-gnu-emacs@gnu.org; Sun, 07 Aug 2016 21:58:05 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:33856) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bWZpK-0004pJ-3S for bug-gnu-emacs@gnu.org; Sun, 07 Aug 2016 21:58:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1bWZpJ-0008PV-Ul for bug-gnu-emacs@gnu.org; Sun, 07 Aug 2016 21:58:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Dmitry Gutov Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 08 Aug 2016 01:58:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 24117 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 24117-submit@debbugs.gnu.org id=B24117.147062143032265 (code B ref 24117); Mon, 08 Aug 2016 01:58:01 +0000 Original-Received: (at 24117) by debbugs.gnu.org; 8 Aug 2016 01:57:10 +0000 Original-Received: from localhost ([127.0.0.1]:59386 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bWZoU-0008OL-EY for submit@debbugs.gnu.org; Sun, 07 Aug 2016 21:57:10 -0400 Original-Received: from mail-lf0-f51.google.com ([209.85.215.51]:32775) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bWZoT-0008O4-0o for 24117@debbugs.gnu.org; Sun, 07 Aug 2016 21:57:09 -0400 Original-Received: by mail-lf0-f51.google.com with SMTP id b199so237556939lfe.0 for <24117@debbugs.gnu.org>; Sun, 07 Aug 2016 18:57:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding; bh=myHP5JOuDqHn3c53NTqTtCEsWIY7tz6bMEGebJU+YR4=; b=i+VWN1S5k8aOQ3NljlC78sPxaz/PvxY6gJTu2L15JwJxMcmrt0eAscpVwx30IRSWIM pkYntYfT7nfRHdR+maqDfqOSdcBf9LjgWdd62xTckiUUrK9adIpVKGZHK/Nbe6cIorR7 lckPDQEs2P72sreGZEHZlgg8lm1AbSUr/tc7gplw/XNfTdXJJL/nNdyFF56aqBi3PDvS 5lP9QSBq9S3XlBEPYmWxW8QMNbLwpJpm8yOo/Op0uF0f3Fno/jcKVP4gZeF1s8wsbreH fWdiuKRaSXwvSSqdsBcb0ry04lDbOPGYmHlHd2V2kBn27gS7Qn3WaQkTIlnWchguWvTT rFpQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:subject:to:references:cc:from:message-id :date:user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=myHP5JOuDqHn3c53NTqTtCEsWIY7tz6bMEGebJU+YR4=; b=GjPdDwoEe8b6EbH/vk8hwa5uiO08h86DWtssUznqKFo2tBXMKFGav3gawBAznCQL/3 /PfGnuf50TWZvv5W9/FVH3e8I5EVE90ctZiSBrk3bItoJ0Bp4gZBK6JDuezNhamgvHTf x/dxuPBYTtVG7+wn37PmzblbMOsk+JEZniqmhLb4s0bJNXm9IjXRYyQ6iLCvNd9SVysU 6MYMO3A+z2uvS9PhmppxotTjpzXbLYG5SbJavzca2/8qIY+0cney6vLCSHEtmroSwUfJ 7oleyQNMZyogHB37T8Y6/xI0DCwgeU+b4MLTiiYznvE8iotBhTa7sXJ7ssZNwEyJ1Vci Fb0g== X-Gm-Message-State: AEkoouvvphwa9bWoMzE8joKvsHFUksryn3PsoA6bA0DA69D4/myg82TQkJlPJIjM7Pt4Mg== X-Received: by 10.25.16.162 with SMTP id 34mr29874365lfq.127.1470621421514; Sun, 07 Aug 2016 18:57:01 -0700 (PDT) Original-Received: from [192.168.1.190] ([178.252.127.239]) by smtp.googlemail.com with ESMTPSA id r190sm5283196lfg.49.2016.08.07.18.56.59 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 07 Aug 2016 18:57:00 -0700 (PDT) In-Reply-To: <83twf0lb5s.fsf@gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:121950 Archived-At: Hi Eli, On 08/04/2016 08:02 PM, Eli Zaretskii wrote: > Hmm, but url-generic-parse-url is called from gazillion other places, > so maybe this is not safe. Only about 40 places, all of them either in lisp/url or lisp/gnus. Sadly, Lars is being silent on the matter. It might not be 100% safe, but maybe doing TRT could be enough. > No, I meant that since RFC 3986 doesn't allow non-ASCII characters, Indeed. > and url-generic-parse-url doesn't do anything about that, it is either > already broken for non-ASCII characters, or already copes with them. > So we don't need to worry about that. I imagined that some code that uses the return value of url-http-create-request might perform the escaping. But that doesn't seem to be the case, see below. > However, a safer change would be to do something like this: > > (or (not (multibyte-string-p url-http-target-url)) > (setq url-http-target-url > (decode-coding-string url-http-target-url 'utf-8))) > > in url-http-create-request. Can you try this? I'll try it if you insist, but that choice of encoding seems rather arbitrary. I think we should go with your previous suggestion: make the URL parsing buffer unibyte. But we do try to handle non-ASCII URLs on the level above url-generic-parse-url. See url-retrieve-internal: one of the first things it does is (setq url (url-encode-url url)). And only after that, (setq url (url-generic-parse-url url)). The URL package doesn't seem to support international domains anyway. This fails: (url-retrieve-synchronously "http://банки.рф") However, the error it fails with is a bit more comprehensible if the URL parsing buffer is unibyte: Debugger entered--Lisp error: (error "банки.рф/80 Name or service not known") Instead of: Debugger entered--Lisp error: (error "\301\220\300\261\301\220\300\260\301\220\300\275\301\220\300\272\301\220\300\270.\301\221\300\200\301\221\300\204/80 Name or service not known")