From: David Kastrup <dak@gnu.org>
To: Mark H Weaver <mhw@netris.org>
Cc: 20109@debbugs.gnu.org
Subject: bug#20109: Incompatible API change in 2.0 series for string port encoding
Date: Tue, 17 Mar 2015 09:39:46 +0100 [thread overview]
Message-ID: <874mpkf25p.fsf@fencepost.gnu.org> (raw)
In-Reply-To: <87zj7cznb5.fsf@netris.org> (Mark H. Weaver's message of "Mon, 16 Mar 2015 16:42:38 -0400")
Mark H Weaver <mhw@netris.org> writes:
> David Kastrup <dak@gnu.org> writes:
>
>> In 2.0.9, the following patch/code for getting what amounts to a binary
>> string port worked.
>>
>> commit 7f7a124d3470b0d566f796e88f4e2ad5aa043f16
>> Author: David Kastrup <dak@gnu.org>
>> Date: Sun Sep 21 18:40:06 2014 +0200
>>
>> Source_file::init_port: Keep GUILEv2 from redecoding string input
>>
>> diff --git a/lily/source-file.cc b/lily/source-file.cc
>> index 1118b9d..75ed0d9 100644
>> --- a/lily/source-file.cc
>> +++ b/lily/source-file.cc
>> @@ -152,7 +152,11 @@ Source_file::init_port ()
>> // we do our own utf8 encoding and verification in the parser, so we
>> // use the no-conversion equivalent of latin1
>> SCM str = scm_from_latin1_string (c_str ());
>> - str_port_ = scm_mkstrport (SCM_INUM0, str, SCM_OPN | SCM_RDNG, __FUNCTION__);
>> + scm_dynwind_begin ((scm_t_dynwind_flags)0);
>> + // Why doesn't scm_set_port_encoding_x work here?
>> + scm_dynwind_fluid (ly_lily_module_constant ("%default-port-encoding"), SCM_BOOL_F);
>> + str_port_ = scm_open_input_string (str);
>> + scm_dynwind_end ();
>> scm_set_port_filename_x (str_port_, ly_string2scm (name_));
>> }
>
> This hack of giving Guile a buffer containing UTF-8, but claiming that
> it is Latin-1, is not good. It will cause Guile to see non-ASCII
> characters as garbage.
For one thing we are talking about an external file here that is mainly
parsed by LilyPond. LilyPond provides sensible pinpointing of UTF-8
encoding errors, something which GUILE cannot do with its UTF-8
representation since it has no transparent or reproducible
representation of bad bytes. Emacs uses overlong encodings for 0-127 to
represent badly encoded bytes (which includes any overlong sequences) in
the range 128-255, making 128-255 encode as patterns 0xc0 0x80 to 0xc1
0xbf. Since this leads to a reproducible encoding, one always has the
information required for resynchronization even in the case of encoding
errors.
For another, synchronization of GUILE and LilyPond parsers requires that
both can make use of byte offsets for positioning. GUILE's mandatory
recoding on opening the port does not provide that.
> However, if you insist on doing this, I would
> suggest using a bytevector input port instead, like this: (untested)
>
> char *buf = c_str ();
> SCM bv = scm_c_make_bytevector (strlen (buf) + 1);
> strcpy (SCM_BYTEVECTOR_CONTENTS (bv), buf);
> str_port_ = scm_open_bytevector_input_port (bv, SCM_UNDEFINED);
dak@lola:/usr/local/tmp/guile$ git grep scm_open_byte_vector_input_port v2.0.11
dak@lola:/usr/local/tmp/guile$ git grep scm_open_byte_vector_input_port origin/stable-2.0
dak@lola:/usr/local/tmp/guile$
The idea would seem nice, but we are still talking about GUILE 2.0.11
here. "It is not good" for a facility that, unpretty as it may seem,
was changed _within_ a stable version series without functionally
equivalent replacement is not helpful.
The whole point of a stable release series is to provide dependable
functionality. Any changes based on the "we don't want people to use
that since it is not nice" rationale should happen between stable
release series.
The way it looks, we'll have to use one mechanism for version 2.0.5 to
2.0.9, have to find out whether to reject 2.0.10, have to reject 2.0.11
and pray for 2.0.12 to provide scm_open_byte_vector_input_port.
And depending on whether the dynamic library versions have been bumped,
we might have to do this at runtime.
--
David Kastrup
next prev parent reply other threads:[~2015-03-17 8:39 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-15 13:15 bug#20109: Incompatible API change in 2.0 series for string port encoding David Kastrup
2015-03-16 20:42 ` Mark H Weaver
2015-03-16 20:46 ` Mark H Weaver
2015-03-17 8:39 ` David Kastrup [this message]
2015-03-17 22:44 ` Mark H Weaver
2015-03-18 12:32 ` David Kastrup
2015-04-17 5:17 ` Mark H Weaver
2016-06-23 16:23 ` Andy Wingo
2016-06-23 16:46 ` David Kastrup
2016-06-23 17:58 ` Andy Wingo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/guile/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=874mpkf25p.fsf@fencepost.gnu.org \
--to=dak@gnu.org \
--cc=20109@debbugs.gnu.org \
--cc=mhw@netris.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).