From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Ted Zlatanov Newsgroups: gmane.emacs.bugs Subject: bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request Date: Thu, 11 Aug 2016 08:57:50 -0400 Organization: =?UTF-8?Q?=D0=A2=D0=B5=D0=BE=D0=B4=D0=BE=D1=80_?= =?UTF-8?Q?=D0=97=D0=BB=D0=B0=D1=82=D0=B0=D0=BD=D0=BE=D0=B2?= @ Cienfuegos Message-ID: <874m6rjwdt.fsf_-_@lifelogs.com> References: <83popsocg8.fsf@gnu.org> <7fb3540a-7b74-68cf-2c63-66474de26640@yandex.ru> <83mvkvmbv2.fsf@gnu.org> <27168f12-32d2-cb38-45c0-27d3339c75aa@yandex.ru> <83twf0lb5s.fsf@gnu.org> <83lh07i6g3.fsf@gnu.org> <83k2fri5kc.fsf@gnu.org> <87oa53i3si.fsf@linux-m68k.org> <83bn13i2x2.fsf@gnu.org> <87fuqfhy0q.fsf@linux-m68k.org> <837fbqise6.fsf@gnu.org> <834m6uhu87.fsf@gnu.org> <65f6508f-a464-7f66-fd14-1372dce86aa7@yandex.ru> <83bn10hetr.fsf@gnu.org> <50426141-3483-e5e4-a252-20b1198cde30@yandex.ru> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: blaine.gmane.org 1470920367 31683 195.159.176.226 (11 Aug 2016 12:59:27 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 11 Aug 2016 12:59:27 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1.50 (gnu/linux) Cc: stakemorii@gmail.com, Lars Ingebrigtsen , schwab@linux-m68k.org, 24117@debbugs.gnu.org To: Dmitry Gutov Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Thu Aug 11 14:59:19 2016 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bXpZv-00080S-5u for geb-bug-gnu-emacs@m.gmane.org; Thu, 11 Aug 2016 14:59:19 +0200 Original-Received: from localhost ([::1]:48406 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bXpZs-0000a6-7z for geb-bug-gnu-emacs@m.gmane.org; Thu, 11 Aug 2016 08:59:16 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:51166) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bXpZk-0000Za-G3 for bug-gnu-emacs@gnu.org; Thu, 11 Aug 2016 08:59:09 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bXpZf-0003yW-Fy for bug-gnu-emacs@gnu.org; Thu, 11 Aug 2016 08:59:07 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:55449) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bXpZe-0003yR-Iw for bug-gnu-emacs@gnu.org; Thu, 11 Aug 2016 08:59:03 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1bXpZe-0004RX-A4 for bug-gnu-emacs@gnu.org; Thu, 11 Aug 2016 08:59:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Ted Zlatanov Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 11 Aug 2016 12:59:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 24117 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 24117-submit@debbugs.gnu.org id=B24117.147092029117008 (code B ref 24117); Thu, 11 Aug 2016 12:59:02 +0000 Original-Received: (at 24117) by debbugs.gnu.org; 11 Aug 2016 12:58:11 +0000 Original-Received: from localhost ([127.0.0.1]:53161 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bXpYm-0004QD-A7 for submit@debbugs.gnu.org; Thu, 11 Aug 2016 08:58:11 -0400 Original-Received: from mail-pf0-f177.google.com ([209.85.192.177]:34403) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bXpYh-0004Pa-FL for 24117@debbugs.gnu.org; Thu, 11 Aug 2016 08:58:07 -0400 Original-Received: by mail-pf0-f177.google.com with SMTP id p64so882470pfb.1 for <24117@debbugs.gnu.org>; Thu, 11 Aug 2016 05:58:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lifelogs.com; s=google; h=from:to:cc:subject:organization:references:mail-copies-to :gmane-reply-to-list:date:in-reply-to:message-id:user-agent :mime-version; bh=IhnlPlNEU5XD6/UU59qg/0mb5iMhnDpqHc4NaHAW+QM=; b=tUmjgnM4njFbYMVmOtAnNny/w5yFG0EVI+/eZeaUXMvQoBCVtiQpG1xXjeWMknUqba gGVUiGcgWdgK4LVWHz/uIgVRaMKB+lQHNhc6WQpYGHIwKwLghr8DW4jBTQOrxCBQ+9Y4 /i6MdqK6p9hSMjVNWKzVdIqxzxGT63zJXVZX4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:organization:references :mail-copies-to:gmane-reply-to-list:date:in-reply-to:message-id :user-agent:mime-version; bh=IhnlPlNEU5XD6/UU59qg/0mb5iMhnDpqHc4NaHAW+QM=; b=UA6CiUzR4iP7pCYieQf9gG2BJvspgw2g95nYtTJTj6I7f1qydLp815oHZtEQea6GzV 64viAL69Ase1ioZHDC4+S5JcyeGi8wECKfVNB0aY7Cunz8lbcEHnVwjE8LnDjawEl7hw lPI0AdPAPohLnBWJ7SPtf5UZQo3knX7KqUWys/ou/US6nVU+I1no4Sct+RbuwXZRkC/y WPpGm+ao5lp33B4M3D+j+cUxpg+kwmeYhdVUs9RbiYmoXBanZphr3bvt8Twh/mAmP1os Z4CyfLCff0XaAIoUayNTyiLw3zfv86E+DR0i/BCVDPjB54q9QpmBrcMs6DssSEf2YsYc i81w== X-Gm-Message-State: AEkoouvvSdY6IcEDEiV1tYUKIZ5JGSe8VdDD1XVtfTFdocVAZxX2IlCLKz7O/ELJbT0tig== X-Received: by 10.98.74.201 with SMTP id c70mr16888150pfj.113.1470920277610; Thu, 11 Aug 2016 05:57:57 -0700 (PDT) Original-Received: from flea (c-98-229-60-157.hsd1.ma.comcast.net. [98.229.60.157]) by smtp.gmail.com with ESMTPSA id d185sm5313786pfd.80.2016.08.11.05.57.54 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 11 Aug 2016 05:57:56 -0700 (PDT) X-Face: bd.DQ~'29fIs`T_%O%C\g%6jW)yi[zuz6; d4V0`@y-~$#3P_Ng{@m+e4o<4P'#(_GJQ%TT= D}[Ep*b!\e,fBZ'j_+#"Ps?s2!4H2-Y"sx" Mail-Copies-To: never Gmane-Reply-To-List: yes In-Reply-To: <50426141-3483-e5e4-a252-20b1198cde30@yandex.ru> (Dmitry Gutov's message of "Thu, 11 Aug 2016 15:31:11 +0300, Thu, 11 Aug 2016 13:05:12 +0200") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:122070 Archived-At: On Thu, 11 Aug 2016 15:31:11 +0300 Dmitry Gutov wrote: DG> On 08/11/2016 11:53 AM, Ted Zlatanov wrote: >> Could you add to your patch the cases you've tested? There's a specific >> place for URL parsing tests in test/lisp/url/url-parse-tests.el that >> would help everyone. DG> Sure, but only one of the patches affects URL parsing (and Lars prefers the DG> other one). Maybe the tests should be in a separate patch then. Neither your Russian example nor Lars' example have a parallel in the tests AFAICS. I'd also add the example hostname that Katsumi Yamaoka gave from the w3m source. Somewhat related: it would be nice if the URL parser also listed the non-ASCII scripts used in the domain name. Then eww and other programs could do one of the typical defenses: either ensure only one script is used; or allow only scripts that match the user's locale; or catch any non-ASCII domain names. Typically they'd use Punycode to display such suspicious domain names: https://en.wikipedia.org/wiki/IDN_homograph_attack I bring it up since explicitly allowing non-ASCII domain names automatically opens up these security concerns, and it's a bit hard to collect the confusables externally: https://elpa.gnu.org/packages/uni-confusables.html On Thu, 11 Aug 2016 13:05:12 +0200 Lars Ingebrigtsen wrote: LI> Yes, the fix here should be in url-http-create-request, not in the URL LI> parsing functions. The main issue here is that the URL request buffer LI> is a multibyte buffer and (as with all network connection buffers), it LI> shouldn't be. (Or, rather, that function just creates a string instead LI> of a buffer, but the same principle applies.) I think this is correct: the URL parsing should not care about the provenance or potential use of that URL to make a HTTP request or otherwise. But maybe the URL parsing can be smart enough to return both the IDNA version and the original domain name, plus some parsing information like the list of scripts I suggested above, to save user agents from doing that extra work? Ted