Eli Zaretskii writes: >> Re appropriate encoding: correct me if I'm wrong (internet), but among >> the Emacs coding systems, it'd be latin-1. > > That depends on what the other end expects. Does it expect latin-1 in > this case? From the point of view of Emacs, I'd say yes: the other end, meaning the proxy service, expects latin-1. From the service's point of view, it only speaks byte sequences and doesn't interpret any fields as text [1]. This continues after proxying has commenced; incoming byte sequences are forwarded verbatim as opaque payloads. > Does emitting the single byte \330 produce the correct result in this > case? Then by all means please use > > (encode-coding-string address 'latin-1) It does indeed produce the correct result [2], and I've updated the patch to reflect this. I wasn't sure whether you wanted me to replace all the vectors in the tests with strings and/or annotate them with comments explaining the protocol, so I just left them as is for now. My main concern (based on sheer ignorance) was any possible side effects that may occur from encode-coding-string setting the variable last-coding-system-used to latin-1. I investigated a little by stepping through the subsequent send_process() call and found that the variable's value as latin-1 appears short lived because it's quickly reassigned to binary. I tried to demonstrate this in the attached log of my debug session (and also show that no conversion is performed). Please pardon my sad debugging skills. >> Re program on the other end: this would be any program offering a proxy >> service that speaks the same protocol. Popular ones include tor and ssh. >> [...] > > And those expect Latin-1 encoding in this case? I'd say yes, insofar as these programs are examples of a proxy service of the sort mentioned in the first answer above. Thanks again [1] Although, in the case of SOCKS 4A/5, non-numeric addresses, i.e., domain names, should probably be expressed in characters a resolver can understand, like the Punycode ASCII subset. [2] there is one tiny difference in behavior from the previous iteration of this patch, but it's not worth anyone's time, so I'll just note it here for the record: when called in the manner shown in the patch, encode-coding-string silently replaces multibyte characters with spaces. The only edge case I could think of in which accidentally passing a multibyte might be harder to debug than a normal typo would be when hitting an address like ec2-13-56-13-123.us-west-1.compute.amazonaws.com and accidentally passing 13.256.13.123 (as "\15\u0100\15\173"), which would be routed to 13.32.13.123 (flickr/cloudflare). One way to avoid this would be with validation like that performed by unibyte-string or, alternatively, by purposefully violating the protocol and sending say, "\15\15{" instead of "\15 \15{" (and thereby triggering an error response from the service). All in all, this seems unlikely enough not to warrant special attention.