From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: "J.P." Newsgroups: gmane.emacs.bugs Subject: bug#46342: 28.0.50; socks-send-command munges IP address bytes to UTF-8 Date: Wed, 10 Feb 2021 05:16:58 -0800 Message-ID: <87ft24njud.fsf@neverwas.me> References: <875z355sh9.fsf@neverwas.me> <83pn1do008.fsf@gnu.org> <87r1lt2s8k.fsf@neverwas.me> <83czxdns61.fsf@gnu.org> <874kils22e.fsf@neverwas.me> <831rdpkyl6.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="2197"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) Cc: 46342@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Wed Feb 10 14:19:22 2021 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1l9pOr-0000RP-UC for geb-bug-gnu-emacs@m.gmane-mx.org; Wed, 10 Feb 2021 14:19:21 +0100 Original-Received: from localhost ([::1]:38796 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1l9pOr-0003EE-0r for geb-bug-gnu-emacs@m.gmane-mx.org; Wed, 10 Feb 2021 08:19:21 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:53362) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1l9pNa-0002Te-Pr for bug-gnu-emacs@gnu.org; Wed, 10 Feb 2021 08:18:02 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]:44183) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1l9pNa-00060W-Il for bug-gnu-emacs@gnu.org; Wed, 10 Feb 2021 08:18:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1l9pNa-00087g-DR for bug-gnu-emacs@gnu.org; Wed, 10 Feb 2021 08:18:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: "J.P." Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 10 Feb 2021 13:18:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 46342 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 46342-submit@debbugs.gnu.org id=B46342.161296302931159 (code B ref 46342); Wed, 10 Feb 2021 13:18:02 +0000 Original-Received: (at 46342) by debbugs.gnu.org; 10 Feb 2021 13:17:09 +0000 Original-Received: from localhost ([127.0.0.1]:55727 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1l9pMj-00086V-8i for submit@debbugs.gnu.org; Wed, 10 Feb 2021 08:17:09 -0500 Original-Received: from mail-109-mta59.mxroute.com ([136.175.109.59]:46165) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1l9pMg-00086L-BU for 46342@debbugs.gnu.org; Wed, 10 Feb 2021 08:17:07 -0500 Original-Received: from filter004.mxroute.com ([149.28.56.236] 149.28.56.236.vultr.com) (Authenticated sender: mN4UYu2MZsgR) by mail-109-mta59.mxroute.com (ZoneMTA) with ESMTPSA id 1778c16826c0006238.001 for <46342@debbugs.gnu.org> (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES128-GCM-SHA256); Wed, 10 Feb 2021 13:17:01 +0000 X-Zone-Loop: a3d6cba7064ffaad44d45b282a86d6cebe2fc9d12be9 X-Originating-IP: [149.28.56.236] In-Reply-To: <831rdpkyl6.fsf@gnu.org> (Eli Zaretskii's message of "Tue, 09 Feb 2021 18:14:29 +0200") X-AuthUser: masked@neverwas.me X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:199772 Archived-At: Eli Zaretskii writes: > what kind of string can this ADDRESS be? My reading of RFC 1928 is > that it normally is an IP address, in which case encoding is not > relevant, as it's an ASCII string. But it can also be a domain, right? This patch only affects IP addresses, but I'm happy to look into the domain name form as well. > If so, what form can this domain take? If the domain has non-ASCII > characters, shouldn't it be hex-encoded, or run through IDNA? I mean, > are non-ASCII characters in that place at all allowed? At first glance, both tor and ssh appear to call getaddrinfo() on the remote end without accounting for the sender's locale or passing any special IDN-related flags. But I'm still looking into these. For now, if we're allowing anecdotal caveman logic, I'd wager the answer is ASCII only. Here's why: It seems feeding tor and ssh the hostname for =D0=AF=D0=BD=D0=B4=D0=B5=D0= =BA=D1=81.=D1=80=D1=84 (Yandex) as the UTF-8 encoded byte string \xd0\xaf\xd0\xbd\xd0\xb4\xd0\xb5\xd0\xba\xd1\x81.\xd1\x80\xd1\x84 results in failure both when forwarding via CONNECT and when resolving via tor's nonstandard RESOLVE command. (This is direct, no Emacs.) However, passing the punified "xn--d1acpjx3f.xn--p1ai" works as intended, forwarding to (or, in the case of RESOLVE, producing) an IP from a Yandex-registered A record (for me, 77.88.55.66). To try this at home (on separate ttys): $ ssh -TND 4711 my.sshd # tcpdump -i lo -nnX "port 4711" $ curl --verbose --proxy socks5h://localhost:4711 =D0=AF=D0=BD=D0=B4=D0= =B5=D0=BA=D1=81.=D1=80=D1=84 Here's a trace for curl's actual call to the hostname conversion function idn2_lookup_ul() [1], which is provided by GNU libidn2 [2]. It's hard to see without context, but this happens before any connection is established (tcpdump will confirm this). #0 Curl_idnconvert_hostname at lib/url.c:1566 #1 create_conn at lib/url.c:3583 #2 Curl_connect at lib/url.c:4027 #3 multi_runsingle at lib/multi.c:1671 #4 curl_multi_perform at lib/multi.c:2412 #5 easy_transfer at lib/easy.c:606 #6 easy_perform at lib/easy.c:696 #7 curl_easy_perform at lib/easy.c:715 #8 serial_transfers at src/tool_operate.c:2327 #9 run_all_transfers at src/tool_operate.c:2505 #10 operate at src/tool_operate.c:2621 #11 main at src/tool_main.c:277 On my machine, curl was configured to pass these flags to idn2_lookup_ul[3]: /* IDN2_NFC_INPUT: Normalize input string using normalization form C. IDN2_NONTRANSITIONAL: Perform Unicode TR46 non-transitional processing. */ int flags =3D IDN2_NFC_INPUT | IDN2_NONTRANSITIONAL; Apparently there are two IDNA standards: 2003 and 2008 [4]. Curl uses the latter, but I'm not sure which, if any, puny.el favors. In the case of Yandex, (puny-encode-domain "=D0=AF=D0=BD=D0=B4=D0=B5=D0=BA=D1=81.=D1=80=D1=84") produces "xn--d1acpjx9e.xn--p1ai", which tor and ssh both reject (though it's very possible I'm missing something.) Anyway, passing the version above provided by libidn2 to socks-send-command works fine. [1] https://github.com/curl/curl/blob/ec5d9b44a2e837fc7b82d1c60d5fae3f85162= 0dc/lib/url.c#L1559 [2] https://www.gnu.org/software/libidn/libidn2/reference/libidn2-idn2.html= #idn2-lookup-ul [3] https://www.gnu.org/software/libidn/libidn2/reference/libidn2-idn2.html= #idn2-flags [4] https://www.unicode.org/reports/tr46/#Table_Example_Processing