unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
From: Andy Wingo <wingo@pobox.com>
To: ludo@gnu.org (Ludovic Courtès)
Cc: guile-devel@gnu.org
Subject: Re: string port encodings
Date: Thu, 31 Jan 2013 12:04:56 +0100	[thread overview]
Message-ID: <87mwvpvciv.fsf@pobox.com> (raw)
In-Reply-To: <87txqhdmdj.fsf@pobox.com> (Andy Wingo's message of "Wed, 16 Jan 2013 19:16:24 +0100")

Hi,

On Wed 16 Jan 2013 19:16, Andy Wingo <wingo@pobox.com> writes:

> On Wed 16 Jan 2013 18:37, ludo@gnu.org (Ludovic Courtès) writes:
>
>> I just think [string port encodings] may have to wait until 2.2.
>
> Oh yes, agreed here.  Anyway let's let it simmer for a while.  Another
> two or three of these threads should be enough to either reaffirm or
> change the current state of things :)

OK that was simmering long enough ;)

I just merged stable-2.0 to master.  There is now a failing test.

    (pass-if-equal
      '(*TOP* (foo "\xA0"))
      (xml->sxml "<foo>&nbsp;</foo>"
                 #:entities '((nbsp . "\xA0"))))

This one fails, with (encoding-error "scm_to_stringn" "cannot convert
narrow string to output locale" 84 #f #f).

It passes in stable-2.0 because "ASCII" is erroneously treated as equal
the same as "ISO-8859-1".  In master, attempting to write a character
above #\x7F to an ASCII port will cause an encoding error.  It seems
more correct than the 2.0 behavior.  This error would have happened in
stable-2.0 if I had chose an entity with a character above #\xFF.

Looking further, the cause is in sxml/upstream/SSAX.scm:

   (define (ssax:handle-parsed-entity port name entities
                                      content-handler str-handler seed)
    ...
           (call-with-input-string ent-body
             (lambda (port) (content-handler port new-entities seed)))
    ...)

Here is where I think this code goes wrong: its correctness appears to
depend on the default port encoding.  That is totally bogus.  It was
written long before we had such a thing.

Again, I think the default encoding for a string port should be one that
can represent all characters, and we should change this in master.

Andy
-- 
http://wingolog.org/



  reply	other threads:[~2013-01-31 11:04 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-15 14:36 string port encodings Andy Wingo
2013-01-15 15:20 ` Alex Shinn
2013-01-15 18:46 ` Mark H Weaver
2013-01-15 21:21 ` Mike Gran
2013-01-16 15:44 ` Ludovic Courtès
2013-01-16 16:57   ` Andy Wingo
2013-01-16 17:37     ` Ludovic Courtès
2013-01-16 18:16       ` Andy Wingo
2013-01-31 11:04         ` Andy Wingo [this message]
2013-01-31 17:55           ` Mark H Weaver
2013-08-07  5:37           ` Mark H Weaver

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87mwvpvciv.fsf@pobox.com \
    --to=wingo@pobox.com \
    --cc=guile-devel@gnu.org \
    --cc=ludo@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).