unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: "J.P." <jp@neverwas.me>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 46342@debbugs.gnu.org
Subject: bug#46342: 28.0.50; socks-send-command munges IP address bytes to UTF-8
Date: Thu, 11 Feb 2021 06:58:00 -0800	[thread overview]
Message-ID: <87eehmlkhz.fsf@neverwas.me> (raw)
In-Reply-To: <83o8grj4d3.fsf@gnu.org> (Eli Zaretskii's message of "Wed, 10 Feb 2021 18:04:56 +0200")

Eli Zaretskii <eliz@gnu.org> writes:

> Then I don't understand why we need to worry about encoding.  IP
> addresses are pure ASCII strings, so they need no encoding whatsoever.

Clearly, I'm failing you here. Between my dearth of communications
skills and lack of Emacs know how, I've obviously managed to deceive you
into thinking that SOCKS IP address fields ought to contain ASCII text
characters such as the following:

  1  9  2  .  1  6  8  .  1  .  1
  31 39 32 2e 31 36 38 2e 31 2e 31

However, this is not the case [a]. In version 4, all addresses are
four-byte sequences, one byte for each component of an IPv4 address, and
the ordering is left-to-right. For example:

  192 168 1  1
  c0  a8  01 01

In version 5, the one covered by RFC 1928, this is extended to include
16-byte IPv6 addresses as well as ASCII domain names. All three are
exclusive to one another but occupy the same field in a union of sorts.
The first byte of that field, the "ATYP" flag, denotes which of the
three to expect, and it appears as the "atype" argument to
socks-send-command.

> I guess I will have to ask you to back up and describe what problems
> you saw with the original code, and show me the details of the strings
> involved in that.

The Elisp manual distinguishes between multibyte and unibyte "sources"
of strings [1]. For these (SOCKS 4) IP address strings, the function
socks--open-network-stream is that source (it creates them). When such
a string includes characters with code points between 128 and 255 (the
latin-1/iso-8859-1 range), single characters are sent as two utf-8
encoded bytes, which the SOCKS service rejects as violating protocol.

Specifically, when a user passes "example.com" to the entry-point
function socks-open-network-stream, its internal helper
socks--open-network-stream resolves the host name into an IP in list
form and then converts this to a string by calling

  (apply #'format "%c%c%c%c" '(93 184 216 34))

This produces a multibyte string of the same character length:

  "]¸Ø\""

However, when socks-send-command passes this to process-send-string,
whose coding system is (binary . binary), the underlying six-byte
sequence is emitted verbatim:

  "]\302\270\303\230\""

My initial idea was to leverage the function unibyte-string to ensure
every character can be encoded in 8 bits before transmission. Regardless,
performing some combination of validating and converting before sending
may be worthwhile since it'll only run once per connection.

Sorry for the extended play-by-play. I certainly hope none of it came
off as insulting or pedantic. I'm quite certain your grasp of such
concepts long ago outpaced any understanding I could ever hope to
attain.

J.P.


[a] My versions of tor and ssh definitely honor requests like

  curl --proxy socks5h://localhost:1080 http://93.184.216.34

passing the IP address as a domain name. Although this defies RFC 1928,
which specifies FQDNs only [1], I'm getting the sense that influential
projects treat the latter more as a living standard. (Note: in its unit
tests, tor only includes this form for its extension commands [2].)

[1] (elisp) Non-ASCII in Strings, second paragraph
[1] https://tools.ietf.org/html/rfc1928#section-5
[2] https://gitweb.torproject.org/tor.git/tree/src/test/test_socks.c#n335





  reply	other threads:[~2021-02-11 14:58 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-06 11:46 bug#46342: 28.0.50; socks-send-command munges IP address bytes to UTF-8 J.P.
2021-02-06 12:26 ` Eli Zaretskii
2021-02-06 14:19   ` J.P.
2021-02-06 15:15     ` Eli Zaretskii
2021-02-07 14:22       ` J.P.
2021-02-09 15:17       ` J.P.
2021-02-09 16:14         ` Eli Zaretskii
2021-02-10 13:16           ` J.P.
2021-02-10 16:04             ` Eli Zaretskii
2021-02-11 14:58               ` J.P. [this message]
2021-02-11 15:28                 ` Eli Zaretskii
2021-02-12 14:30                   ` J.P.
2021-02-12 15:04                     ` Eli Zaretskii
2021-02-13 15:43                       ` J.P.
2021-02-17 14:59                       ` J.P.
2021-02-20  9:33                         ` Eli Zaretskii
2021-02-20 10:13                           ` J.P.
2021-02-20 11:08                             ` Eli Zaretskii
2021-02-20 15:08                               ` J.P.
2021-02-20 15:19                                 ` Eli Zaretskii
2021-02-20 10:41                           ` J.P.

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87eehmlkhz.fsf@neverwas.me \
    --to=jp@neverwas.me \
    --cc=46342@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).