From: David Kastrup <dak@gnu.org>
To: 18520@debbugs.gnu.org
Subject: bug#18520: string ports should not have an encoding
Date: Mon, 22 Sep 2014 01:34:39 +0200 [thread overview]
Message-ID: <87iokgmttc.fsf@fencepost.gnu.org> (raw)
In Guile 2.0, at the time a string port is opened, the value of the
fluid %default-port-encoding is used for deciding how to encode the
string into a byte stream, and set-port-encoding! may then be used for
deciding how to decode that byte stream back into characters.
This does not make sense as ports deliver characters, and strings
contain characters. There is no point in going through bytes.
Guile-2.2 does not consult %default-port-encoding but uses UTF-8
consistently (I guess, overriding set-port-encoding! will again change
that).
That still is not satisfactory. For example, using ftell on the input
port will not report the string index of the string connected to the
string port but rather a byte index into a UTF-8 encoded version of the
string. This is a number that has nothing to do with the original
string and cannot be used for correlating string and port.
Ports fundamentally deliver characters, and so reading and writing from
a string source/sink should not involve _any_ coding system.
Files fundamentally deliver bytes, a conversion is required. The same
would be the case when opening a port on a _bytevector_. Here an
encoding would make equally make sense, and ftell/fseek offsets would
naturally be in bytes. But a port on a string delivers and consumes
characters. Any conversion, even a fixed UTF-8 conversion, will destroy
the predictable nature of with-output-to-string and
with-input-from-string and the respective uses of string ports.
In code like the following, the results should not depend on either the
fluid-set! or the set-port-encoding!, and the ftell should always output
successive integers independent from either fluid-set! or
set-port-encoding!. set-port-encoding! should probably flag an error,
like an fseek on an unseekable device.
(fluid-set! %default-port-encoding "UTF-8")
(define s (list->string (map integer->char '(20 200 2000 20000))))
(with-input-from-string s
(lambda ()
(set-port-encoding! (current-input-port) "ISO-8859-1")
(let loop ((ch (read-char (current-input-port))))
(if (not (eof-object? ch))
(begin
(format #t "~d, pos=~d\n" (char->integer ch) (ftell (current-input-port)))
(loop (read-char (current-input-port))))))))
Again, things are quite different from bytevectors which could be
accepted instead of a string for opening ports with the string-port
commands, or could have their own port open/close commands, and the
respective ports then definitely would want to obey set-port-encoding!
(defaulting to %default-port-encoding) for _decoding_ the bytevector.
I don't know what r7rs might think here. But for me, associating
encodings for connecting strings to ports does not make sense. The
relation is one of characters to characters.
--
David Kastrup
next reply other threads:[~2014-09-21 23:34 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-21 23:34 David Kastrup [this message]
2014-09-22 11:54 ` bug#18520: string ports should not have an encoding Ludovic Courtès
2014-09-22 13:09 ` David Kastrup
2014-09-22 12:21 ` Ludovic Courtès
2014-09-22 13:34 ` David Kastrup
2014-09-22 17:08 ` Ludovic Courtès
2014-09-22 17:20 ` David Kastrup
2014-09-22 20:39 ` Ludovic Courtès
2014-09-22 22:12 ` David Kastrup
2014-09-23 8:25 ` Ludovic Courtès
2014-09-23 9:00 ` David Kastrup
2014-09-23 9:45 ` Ludovic Courtès
2014-09-23 11:54 ` David Kastrup
2014-09-23 12:13 ` Ludovic Courtès
2014-09-23 13:02 ` David Kastrup
2014-09-23 16:01 ` Ludovic Courtès
2014-09-23 16:21 ` David Kastrup
2014-09-23 19:33 ` Ludovic Courtès
2014-09-24 5:30 ` Mark H Weaver
2014-09-24 12:00 ` David Kastrup
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/guile/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87iokgmttc.fsf@fencepost.gnu.org \
--to=dak@gnu.org \
--cc=18520@debbugs.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).