From: David Kastrup <dak@gnu.org>
To: ludo@gnu.org (Ludovic Courtès)
Cc: 18520@debbugs.gnu.org
Subject: bug#18520: string ports should not have an encoding
Date: Mon, 22 Sep 2014 15:09:25 +0200 [thread overview]
Message-ID: <87wq8vls3e.fsf@fencepost.gnu.org> (raw)
In-Reply-To: <87k34vrhu2.fsf@gnu.org> ("Ludovic Courtès"'s message of "Mon, 22 Sep 2014 13:54:29 +0200")
ludo@gnu.org (Ludovic Courtès) writes:
> This has been addressed in two ways:
No, it hasn't.
> 1. In 2.0, (srfi srfi-6) uses Unicode-capable string ports (commit
> ecb48dc.)
This issue report is not about adding more optional functionality on
top. It is about _removing_ unwarranted redirection and complication
from existing core functionality.
The artifacts of making with-input-from-string and with-output-to-string
go through an additional character->bytevector->character
encoding/recoding layer are not invisible.
> 2. In 2.2, string ports are always Unicode-capable, and
> ‘%default-port-encoding’ is ignored (commit 6dce942.)
String ports should not be "Unicode capable" but transparent.
Characters in, characters out. ftell/fseek should be based on character
position in strings rather than offsets in a magically created
bytestream of some particular encoding.
> So for 2.0, the workaround is to either use (srfi srfi-6), or force
> ‘%default-port-encoding’ to "UTF-8".
Which is what the latter _only_ does. It still interprets
set-port-encoding! with respect to a byte stream meaning, and it still
calculates positions according to a byte stream meaning not related to
string positions:
(use-modules (srfi srfi-6))
(define s (list->string (map integer->char '(20 200 2000 20000))))
(let ((port (open-input-string s)))
(let loop ((ch (read-char port)))
(if (not (eof-object? ch))
(begin
(format #t "~d, pos=~d\n" (char->integer ch) (ftell port))
(loop (read-char port))))))
20, pos=1
200, pos=3
2000, pos=5
20000, pos=8
Tying string ports to an artificial bytevector presentation in a manner
bleeding through like that means that it is not possible to synchronize
string positions and stream positions when parts of the source string
are _not_ processed from within the stream.
Which is precisely the problem I am currently dealing with while porting
LilyPond: it has its own lexer working on an (utf-8 encoded) byte stream
which is at the same time available as a string port. Whenever embedded
Scheme is interpreted, the string port is moved to the proper position,
GUILE reads an expression and is told what to do with it, the string
port position is picked off and the LilyPond lexer is moved to the
respective position to continue.
If you take a look at
<URL:http://git.savannah.gnu.org/cgit/lilypond.git/tree/scm/parser-ly-from-scheme.scm>,
ftell on a string port is here used for correlating the positions of
parsed subexpressions with the original data. Reencoding strings in
utf-8 is not going to make this work with string indexing since ftell
does not bear a useful relation to string positions.
The behavior of ftell and port-encoding is perfectly fine for reading
from bytevectors or files, and reading from bytevectors or files also
does not incur a encode-when-open action governed by
%default-port-encoding in GUILE-2.0 and by hardwired UTF-8 in GUILE-2.2.
But strings are already decoded characters. Reencoding makes no sense
and detaches things like ftell and fseek from the actual input into the
port.
--
David Kastrup
next prev parent reply other threads:[~2014-09-22 13:09 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-21 23:34 bug#18520: string ports should not have an encoding David Kastrup
2014-09-22 11:54 ` Ludovic Courtès
2014-09-22 13:09 ` David Kastrup [this message]
2014-09-22 12:21 ` Ludovic Courtès
2014-09-22 13:34 ` David Kastrup
2014-09-22 17:08 ` Ludovic Courtès
2014-09-22 17:20 ` David Kastrup
2014-09-22 20:39 ` Ludovic Courtès
2014-09-22 22:12 ` David Kastrup
2014-09-23 8:25 ` Ludovic Courtès
2014-09-23 9:00 ` David Kastrup
2014-09-23 9:45 ` Ludovic Courtès
2014-09-23 11:54 ` David Kastrup
2014-09-23 12:13 ` Ludovic Courtès
2014-09-23 13:02 ` David Kastrup
2014-09-23 16:01 ` Ludovic Courtès
2014-09-23 16:21 ` David Kastrup
2014-09-23 19:33 ` Ludovic Courtès
2014-09-24 5:30 ` Mark H Weaver
2014-09-24 12:00 ` David Kastrup
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/guile/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87wq8vls3e.fsf@fencepost.gnu.org \
--to=dak@gnu.org \
--cc=18520@debbugs.gnu.org \
--cc=ludo@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).