bug#22901: drain-input doesn't decode

unofficial mirror of bug-guile@gnu.org 
 help / color / mirror / Atom feed

From: Zefram <zefram@fysh.org>
To: 22901@debbugs.gnu.org
Subject: bug#22901: drain-input doesn't decode
Date: Fri, 4 Mar 2016 03:09:44 +0000	[thread overview]
Message-ID: <20160304030944.GA1318@fysh.org> (raw)

The documentation for drain-input says that it returns a string of
characters, implying that the result is equivalent to what you'd get
from calling read-char some number of times.  In fact it differs in a
significant respect: whereas read-char decodes input octets according to
the port's selected encoding, drain-input ignores the selected encoding
and always decodes according to ISO-8859-1 (thus preserving the octet
values in character form).

$ echo -n $'\1a\2b\3c' | guile-2.0 -c '(set-port-encoding! (current-input-port) "UCS-2BE") (write (port-encoding (current-input-port))) (newline) (write (map char->integer (let r ((l '\''())) (let ((c (read-char (current-input-port)))) (if (eof-object? c) (reverse l) (r (cons c l))))))) (newline)'
"UCS-2BE"
(353 610 867)
$ echo -n $'\1a\2b\3c' | guile-2.0 -c '(set-port-encoding! (current-input-port) "UCS-2BE") (write (port-encoding (current-input-port))) (newline) (peek-char (current-input-port)) (write (map char->integer (string->list (drain-input (current-input-port))))) (newline)'
"UCS-2BE"
(1 97 2 98 3 99)

The practical upshot is that the input returned by drain-input can't
be used in the same way as regular input from read-char.  It can still
be used if the code doing the reading is totally aware of the encoding,
so that it can perform the decoding manually, but this seems a failure
of abstraction.  The value returned by drain-input ought to be coherent
with the abstraction level at which it is specified.

I can see that there is a reason for drain-input to avoid performing
decoding: the problem that occurs if the buffer ends in the middle
of a character.  If drain-input is to return decoded characters then
presumably in this case it would have to read further octets beyond the
buffer contents, in an unbuffered manner, until it reaches a character
boundary.  If this is too unpalatable, perhaps drain-input should be
permitted only on ports configured for single-octet character encodings.

If, on the other hand, it is decided to endorse the current non-decoding
behaviour, then the break of abstraction needs to be documented.

-zefram

next             reply	other threads:[~2016-03-04  3:09 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-04  3:09 Zefram [this message]
2016-06-20 16:12 ` bug#22901: drain-input doesn't decode Andy Wingo
2017-02-26 17:46 ` Matt Wette
2017-02-26 17:58   ` Matt Wette
2021-05-16 17:55 ` Taylan Kammer
2021-05-19 11:41 ` Taylan Kammer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160304030944.GA1318@fysh.org \
    --to=zefram@fysh.org \
    --cc=22901@debbugs.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).