From: Andy Wingo <wingo@pobox.com>
To: ludo@gnu.org (Ludovic Courtès)
Cc: guile-devel@gnu.org
Subject: Re: string port encodings
Date: Thu, 31 Jan 2013 12:04:56 +0100 [thread overview]
Message-ID: <87mwvpvciv.fsf@pobox.com> (raw)
In-Reply-To: <87txqhdmdj.fsf@pobox.com> (Andy Wingo's message of "Wed, 16 Jan 2013 19:16:24 +0100")
Hi,
On Wed 16 Jan 2013 19:16, Andy Wingo <wingo@pobox.com> writes:
> On Wed 16 Jan 2013 18:37, ludo@gnu.org (Ludovic Courtès) writes:
>
>> I just think [string port encodings] may have to wait until 2.2.
>
> Oh yes, agreed here. Anyway let's let it simmer for a while. Another
> two or three of these threads should be enough to either reaffirm or
> change the current state of things :)
OK that was simmering long enough ;)
I just merged stable-2.0 to master. There is now a failing test.
(pass-if-equal
'(*TOP* (foo "\xA0"))
(xml->sxml "<foo> </foo>"
#:entities '((nbsp . "\xA0"))))
This one fails, with (encoding-error "scm_to_stringn" "cannot convert
narrow string to output locale" 84 #f #f).
It passes in stable-2.0 because "ASCII" is erroneously treated as equal
the same as "ISO-8859-1". In master, attempting to write a character
above #\x7F to an ASCII port will cause an encoding error. It seems
more correct than the 2.0 behavior. This error would have happened in
stable-2.0 if I had chose an entity with a character above #\xFF.
Looking further, the cause is in sxml/upstream/SSAX.scm:
(define (ssax:handle-parsed-entity port name entities
content-handler str-handler seed)
...
(call-with-input-string ent-body
(lambda (port) (content-handler port new-entities seed)))
...)
Here is where I think this code goes wrong: its correctness appears to
depend on the default port encoding. That is totally bogus. It was
written long before we had such a thing.
Again, I think the default encoding for a string port should be one that
can represent all characters, and we should change this in master.
Andy
--
http://wingolog.org/
next prev parent reply other threads:[~2013-01-31 11:04 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-15 14:36 string port encodings Andy Wingo
2013-01-15 15:20 ` Alex Shinn
2013-01-15 18:46 ` Mark H Weaver
2013-01-15 21:21 ` Mike Gran
2013-01-16 15:44 ` Ludovic Courtès
2013-01-16 16:57 ` Andy Wingo
2013-01-16 17:37 ` Ludovic Courtès
2013-01-16 18:16 ` Andy Wingo
2013-01-31 11:04 ` Andy Wingo [this message]
2013-01-31 17:55 ` Mark H Weaver
2013-08-07 5:37 ` Mark H Weaver
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/guile/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87mwvpvciv.fsf@pobox.com \
--to=wingo@pobox.com \
--cc=guile-devel@gnu.org \
--cc=ludo@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).