From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eric Abrahamsen Newsgroups: gmane.emacs.help Subject: Very basic question regarding encoding and `open-network-stream' Date: Tue, 27 Nov 2018 13:34:43 -0800 Message-ID: <87efb65hn0.fsf@ericabrahamsen.net> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: blaine.gmane.org 1543354436 22569 195.159.176.226 (27 Nov 2018 21:33:56 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 27 Nov 2018 21:33:56 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Tue Nov 27 22:33:51 2018 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gRkzP-0005ll-88 for geh-help-gnu-emacs@m.gmane.org; Tue, 27 Nov 2018 22:33:51 +0100 Original-Received: from localhost ([::1]:44666 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gRl1V-0005oM-Ka for geh-help-gnu-emacs@m.gmane.org; Tue, 27 Nov 2018 16:36:01 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:48872) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gRl0U-0005SG-12 for help-gnu-emacs@gnu.org; Tue, 27 Nov 2018 16:34:59 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gRl0P-0003Sc-Vi for help-gnu-emacs@gnu.org; Tue, 27 Nov 2018 16:34:57 -0500 Original-Received: from [195.159.176.226] (port=44981 helo=blaine.gmane.org) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1gRl0P-0003RP-NW for help-gnu-emacs@gnu.org; Tue, 27 Nov 2018 16:34:53 -0500 Original-Received: from list by blaine.gmane.org with local (Exim 4.84_2) (envelope-from ) id 1gRkyH-0004K5-5q for help-gnu-emacs@gnu.org; Tue, 27 Nov 2018 22:32:41 +0100 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 39 Original-X-Complaints-To: usenet@blaine.gmane.org Cancel-Lock: sha1:5aMNzSODOpdWzajrgjNme3S8j80= X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 195.159.176.226 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.org gmane.emacs.help:118777 Archived-At: I have an embarrassingly basic question regarding how encoding works with relationship to `open-network-stream'. I don't have a strong grasp of encoding issues, particularly regarding processes. Long story short, I'm trying to get Gnus to internally use decoded group names as much as possible. I'm investigating whether that means that the process Gnus uses to talk to remote servers should be encoded differently. Presently, when Gnus talks to an nntp server, it does so in a process buffer in which multibyte has been disabled. It wraps the call to `open-network-stream' in a let which sets `coding-system-for-read/write' to 'binary. So far so clear. Gnus also mostly leaves group names as unibyte internally, so reading group names as bytes works out okay -- but this is what I'm looking at changing. The NNTP RFC notes that the default character set for the protocol has changed from ascii to utf-8. (https://tools.ietf.org/html/rfc3977#section-1) What I don't grasp is: if the process buffers are left multibyte, and the `coding-system-for-read/write' variables are changed to 'utf-8 (or 'undecided?), will this simply do the right thing? More specifically, is the remote nntp server expected to send along some information about the encoding it is using for its data? Or is the data always binary, and we simply know via convention that it can be safely decoded as 'utf-8? Or maybe I should just be leaving the process buffer as-is, but doing the decoding immediately after the `accept-process-output'? I would very much appreciate it if someone could explain step-by-step, using small words if possible, how the process encoding is negotiated, and what might be a reasonable approach to this problem. Thanks, Eric