unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* Very basic question regarding encoding and `open-network-stream'
@ 2018-11-27 21:34 Eric Abrahamsen
  2018-11-27 23:17 ` Stefan Monnier
  0 siblings, 1 reply; 3+ messages in thread
From: Eric Abrahamsen @ 2018-11-27 21:34 UTC (permalink / raw)
  To: help-gnu-emacs

I have an embarrassingly basic question regarding how encoding works
with relationship to `open-network-stream'. I don't have a strong grasp
of encoding issues, particularly regarding processes.

Long story short, I'm trying to get Gnus to internally use decoded group
names as much as possible. I'm investigating whether that means that the
process Gnus uses to talk to remote servers should be encoded differently.

Presently, when Gnus talks to an nntp server, it does so in a process
buffer in which multibyte has been disabled. It wraps the call to
`open-network-stream' in a let which sets `coding-system-for-read/write'
to 'binary. So far so clear.

Gnus also mostly leaves group names as unibyte internally, so reading
group names as bytes works out okay -- but this is what I'm looking at
changing.

The NNTP RFC notes that the default character set for the protocol has
changed from ascii to utf-8.
(https://tools.ietf.org/html/rfc3977#section-1)

What I don't grasp is: if the process buffers are left multibyte, and
the `coding-system-for-read/write' variables are changed to 'utf-8 (or
'undecided?), will this simply do the right thing?

More specifically, is the remote nntp server expected to send along some
information about the encoding it is using for its data? Or is the data
always binary, and we simply know via convention that it can be safely
decoded as 'utf-8?

Or maybe I should just be leaving the process buffer as-is, but doing
the decoding immediately after the `accept-process-output'?

I would very much appreciate it if someone could explain step-by-step,
using small words if possible, how the process encoding is negotiated,
and what might be a reasonable approach to this problem.

Thanks,
Eric




^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Very basic question regarding encoding and `open-network-stream'
  2018-11-27 21:34 Very basic question regarding encoding and `open-network-stream' Eric Abrahamsen
@ 2018-11-27 23:17 ` Stefan Monnier
  2018-11-27 23:41   ` Eric Abrahamsen
  0 siblings, 1 reply; 3+ messages in thread
From: Stefan Monnier @ 2018-11-27 23:17 UTC (permalink / raw)
  To: help-gnu-emacs

> The NNTP RFC notes that the default character set for the protocol has
> changed from ascii to utf-8.
> (https://tools.ietf.org/html/rfc3977#section-1)
>
> What I don't grasp is: if the process buffers are left multibyte, and
> the `coding-system-for-read/write' variables are changed to 'utf-8 (or
> 'undecided?), will this simply do the right thing?

No: the stream of bytes includes NNTP protocol commands as well as other
contents (typically actual messages) and they don't all use utf-8.
So the stream process needs to communicate in bytes (aka "unibyte"), and
then the Elisp code needs to decode/encode each part manually according
to the coding system that applies to each part.


        Stefan




^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Very basic question regarding encoding and `open-network-stream'
  2018-11-27 23:17 ` Stefan Monnier
@ 2018-11-27 23:41   ` Eric Abrahamsen
  0 siblings, 0 replies; 3+ messages in thread
From: Eric Abrahamsen @ 2018-11-27 23:41 UTC (permalink / raw)
  To: help-gnu-emacs

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> The NNTP RFC notes that the default character set for the protocol has
>> changed from ascii to utf-8.
>> (https://tools.ietf.org/html/rfc3977#section-1)
>>
>> What I don't grasp is: if the process buffers are left multibyte, and
>> the `coding-system-for-read/write' variables are changed to 'utf-8 (or
>> 'undecided?), will this simply do the right thing?
>
> No: the stream of bytes includes NNTP protocol commands as well as other
> contents (typically actual messages) and they don't all use utf-8.
> So the stream process needs to communicate in bytes (aka "unibyte"), and
> then the Elisp code needs to decode/encode each part manually according
> to the coding system that applies to each part.

Awesome, this will save me wandering down some garden paths, thank you.
I will make the Minimum Viable Changes, and just focus on group names.




^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-11-27 23:41 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-11-27 21:34 Very basic question regarding encoding and `open-network-stream' Eric Abrahamsen
2018-11-27 23:17 ` Stefan Monnier
2018-11-27 23:41   ` Eric Abrahamsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).