From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: "J.P." Newsgroups: gmane.emacs.bugs Subject: bug#46342: 28.0.50; socks-send-command munges IP address bytes to UTF-8 Date: Thu, 11 Feb 2021 06:58:00 -0800 Message-ID: <87eehmlkhz.fsf@neverwas.me> References: <875z355sh9.fsf@neverwas.me> <83pn1do008.fsf@gnu.org> <87r1lt2s8k.fsf@neverwas.me> <83czxdns61.fsf@gnu.org> <874kils22e.fsf@neverwas.me> <831rdpkyl6.fsf@gnu.org> <87ft24njud.fsf@neverwas.me> <83o8grj4d3.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="36340"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) Cc: 46342@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Thu Feb 11 15:59:09 2021 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lADQz-0009LS-Gz for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 11 Feb 2021 15:59:09 +0100 Original-Received: from localhost ([::1]:37896 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lADQy-0001FP-K2 for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 11 Feb 2021 09:59:08 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:39600) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lADQs-0001FD-8X for bug-gnu-emacs@gnu.org; Thu, 11 Feb 2021 09:59:02 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]:47543) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1lADQs-0001Sx-1C for bug-gnu-emacs@gnu.org; Thu, 11 Feb 2021 09:59:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1lADQr-0001r3-Uh for bug-gnu-emacs@gnu.org; Thu, 11 Feb 2021 09:59:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: "J.P." Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 11 Feb 2021 14:59:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 46342 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 46342-submit@debbugs.gnu.org id=B46342.16130554937070 (code B ref 46342); Thu, 11 Feb 2021 14:59:01 +0000 Original-Received: (at 46342) by debbugs.gnu.org; 11 Feb 2021 14:58:13 +0000 Original-Received: from localhost ([127.0.0.1]:59088 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lADQ3-0001pv-Tz for submit@debbugs.gnu.org; Thu, 11 Feb 2021 09:58:12 -0500 Original-Received: from mail-109-mta121.mxroute.com ([136.175.109.121]:37725) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lADQ0-0001pm-TZ for 46342@debbugs.gnu.org; Thu, 11 Feb 2021 09:58:09 -0500 Original-Received: from filter004.mxroute.com ([149.28.56.236] 149.28.56.236.vultr.com) (Authenticated sender: mN4UYu2MZsgR) by mail-109-mta121.mxroute.com (ZoneMTA) with ESMTPSA id 177919960620006238.001 for <46342@debbugs.gnu.org> (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES128-GCM-SHA256); Thu, 11 Feb 2021 14:58:04 +0000 X-Zone-Loop: 9b2af5f7926fffa5c109978d273f33349d9f676c5210 X-Originating-IP: [149.28.56.236] In-Reply-To: <83o8grj4d3.fsf@gnu.org> (Eli Zaretskii's message of "Wed, 10 Feb 2021 18:04:56 +0200") X-AuthUser: masked@neverwas.me X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:199818 Archived-At: Eli Zaretskii writes: > Then I don't understand why we need to worry about encoding. IP > addresses are pure ASCII strings, so they need no encoding whatsoever. Clearly, I'm failing you here. Between my dearth of communications skills and lack of Emacs know how, I've obviously managed to deceive you into thinking that SOCKS IP address fields ought to contain ASCII text characters such as the following: 1 9 2 . 1 6 8 . 1 . 1 31 39 32 2e 31 36 38 2e 31 2e 31 However, this is not the case [a]. In version 4, all addresses are four-byte sequences, one byte for each component of an IPv4 address, and the ordering is left-to-right. For example: 192 168 1 1 c0 a8 01 01 In version 5, the one covered by RFC 1928, this is extended to include 16-byte IPv6 addresses as well as ASCII domain names. All three are exclusive to one another but occupy the same field in a union of sorts. The first byte of that field, the "ATYP" flag, denotes which of the three to expect, and it appears as the "atype" argument to socks-send-command. > I guess I will have to ask you to back up and describe what problems > you saw with the original code, and show me the details of the strings > involved in that. The Elisp manual distinguishes between multibyte and unibyte "sources" of strings [1]. For these (SOCKS 4) IP address strings, the function socks--open-network-stream is that source (it creates them). When such a string includes characters with code points between 128 and 255 (the latin-1/iso-8859-1 range), single characters are sent as two utf-8 encoded bytes, which the SOCKS service rejects as violating protocol. Specifically, when a user passes "example.com" to the entry-point function socks-open-network-stream, its internal helper socks--open-network-stream resolves the host name into an IP in list form and then converts this to a string by calling (apply #'format "%c%c%c%c" '(93 184 216 34)) This produces a multibyte string of the same character length: "]=C2=B8=C3=98\"" However, when socks-send-command passes this to process-send-string, whose coding system is (binary . binary), the underlying six-byte sequence is emitted verbatim: "]\302\270\303\230\"" My initial idea was to leverage the function unibyte-string to ensure every character can be encoded in 8 bits before transmission. Regardless, performing some combination of validating and converting before sending may be worthwhile since it'll only run once per connection. Sorry for the extended play-by-play. I certainly hope none of it came off as insulting or pedantic. I'm quite certain your grasp of such concepts long ago outpaced any understanding I could ever hope to attain. J.P. [a] My versions of tor and ssh definitely honor requests like curl --proxy socks5h://localhost:1080 http://93.184.216.34 passing the IP address as a domain name. Although this defies RFC 1928, which specifies FQDNs only [1], I'm getting the sense that influential projects treat the latter more as a living standard. (Note: in its unit tests, tor only includes this form for its extension commands [2].) [1] (elisp) Non-ASCII in Strings, second paragraph [1] https://tools.ietf.org/html/rfc1928#section-5 [2] https://gitweb.torproject.org/tor.git/tree/src/test/test_socks.c#n335