unofficial mirror of bug-guile@gnu.org 
 help / color / mirror / Atom feed
From: David Kastrup <dak@gnu.org>
To: Mark H Weaver <mhw@netris.org>
Cc: 20109@debbugs.gnu.org
Subject: bug#20109: Incompatible API change in 2.0 series for string port encoding
Date: Wed, 18 Mar 2015 13:32:55 +0100	[thread overview]
Message-ID: <87r3smeb9k.fsf@fencepost.gnu.org> (raw)
In-Reply-To: <87pp87fdmm.fsf@netris.org> (Mark H. Weaver's message of "Tue, 17 Mar 2015 18:44:17 -0400")

Mark H Weaver <mhw@netris.org> writes:

> David Kastrup <dak@gnu.org> writes:
>
>> Mark H Weaver <mhw@netris.org> writes:
>>
>>> This hack of giving Guile a buffer containing UTF-8, but claiming that
>>> it is Latin-1, is not good.  It will cause Guile to see non-ASCII
>>> characters as garbage.
>>
>> For one thing we are talking about an external file here that is
>> mainly parsed by LilyPond.  LilyPond provides sensible pinpointing of
>> UTF-8 encoding errors, something which GUILE cannot do with its UTF-8
>> representation since it has no transparent or reproducible
>> representation of bad bytes.  Emacs uses overlong encodings for 0-127
>> to represent badly encoded bytes (which includes any overlong
>> sequences) in the range 128-255, making 128-255 encode as patterns
>> 0xc0 0x80 to 0xc1 0xbf.
>
> I intend to add a similar mechanism to Guile, but it is not yet done.

I think it would be pretty important since it makes it possible to treat
problems at those points in processing where it makes most sense.

However, it would also seem important to have GUILE handle utf-8
strings.  At the current point of time, its only native types are what
it calls "latin-1" and likely "UTF-32".  Which does not make much sense
in connection with its string ports being unconditionally UTF-8 instead.

Concatenating a string from smaller pieces sequentially via string
operations is O(n^2), so string ports are a natural way to assemble
large strings.  They are also nice for reading from strings.  Not
requiring conversions for most of that would be nice.

>>> However, if you insist on doing this, I would
>>> suggest using a bytevector input port instead, like this: (untested)
>>>
>>>   char *buf = c_str ();
>>>   SCM bv = scm_c_make_bytevector (strlen (buf) + 1);
>>>   strcpy (SCM_BYTEVECTOR_CONTENTS (bv), buf);
>>>   str_port_ = scm_open_bytevector_input_port (bv, SCM_UNDEFINED);
>>
>> dak@lola:/usr/local/tmp/guile$ git grep
>> scm_open_byte_vector_input_port v2.0.11
>> dak@lola:/usr/local/tmp/guile$ git grep
>> scm_open_byte_vector_input_port origin/stable-2.0
>> dak@lola:/usr/local/tmp/guile$ 
>
> You have mispelled the name of the function.  The following (untested)
> code should work on Guile 2.0.5 or later:
>
>    char *buf = c_str ();
>    size_t len = strlen (buf);
>    SCM bv = scm_c_make_bytevector (len);
>    memcpy (SCM_BYTEVECTOR_CONTENTS (bv), buf, len);
>    str_port_ = scm_open_bytevector_input_port (bv, SCM_UNDEFINED);

One would expect that I'd be able to do a simple copy&paste of a
function name.  Sorry for messing this up.

Yes, this looks like it should indeed provide a better match of
"encoding intentions" to our original code.  I'll have to see whether
I can make this approach work with the rest of our code.

I somehow missed that r6rs ports were more than just a compatibility
wrapper written in Scheme.

-- 
David Kastrup





  reply	other threads:[~2015-03-18 12:32 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-15 13:15 bug#20109: Incompatible API change in 2.0 series for string port encoding David Kastrup
2015-03-16 20:42 ` Mark H Weaver
2015-03-16 20:46   ` Mark H Weaver
2015-03-17  8:39   ` David Kastrup
2015-03-17 22:44     ` Mark H Weaver
2015-03-18 12:32       ` David Kastrup [this message]
2015-04-17  5:17 ` Mark H Weaver
2016-06-23 16:23   ` Andy Wingo
2016-06-23 16:46     ` David Kastrup
2016-06-23 17:58       ` Andy Wingo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r3smeb9k.fsf@fencepost.gnu.org \
    --to=dak@gnu.org \
    --cc=20109@debbugs.gnu.org \
    --cc=mhw@netris.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).