From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: David Kastrup Newsgroups: gmane.lisp.guile.bugs Subject: bug#20109: Incompatible API change in 2.0 series for string port encoding Date: Tue, 17 Mar 2015 09:39:46 +0100 Message-ID: <874mpkf25p.fsf@fencepost.gnu.org> References: <87mw3eh04z.fsf@fencepost.gnu.org> <87zj7cznb5.fsf@netris.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1426581624 27775 80.91.229.3 (17 Mar 2015 08:40:24 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 17 Mar 2015 08:40:24 +0000 (UTC) Cc: 20109@debbugs.gnu.org To: Mark H Weaver Original-X-From: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Tue Mar 17 09:40:13 2015 Return-path: Envelope-to: guile-bugs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1YXn2r-0001UD-6C for guile-bugs@m.gmane.org; Tue, 17 Mar 2015 09:40:13 +0100 Original-Received: from localhost ([::1]:53364 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YXn2q-0004qU-6p for guile-bugs@m.gmane.org; Tue, 17 Mar 2015 04:40:12 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:58795) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YXn2l-0004oI-N6 for bug-guile@gnu.org; Tue, 17 Mar 2015 04:40:08 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YXn2k-0005FO-In for bug-guile@gnu.org; Tue, 17 Mar 2015 04:40:07 -0400 Original-Received: from debbugs.gnu.org ([140.186.70.43]:50581) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YXn2k-0005EY-Fm for bug-guile@gnu.org; Tue, 17 Mar 2015 04:40:06 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80) (envelope-from ) id 1YXn2j-00041N-A8 for bug-guile@gnu.org; Tue, 17 Mar 2015 04:40:05 -0400 X-Loop: help-debbugs@gnu.org Resent-From: David Kastrup Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Tue, 17 Mar 2015 08:40:05 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 20109 X-GNU-PR-Package: guile X-GNU-PR-Keywords: Original-Received: via spool by 20109-submit@debbugs.gnu.org id=B20109.142658160215442 (code B ref 20109); Tue, 17 Mar 2015 08:40:05 +0000 Original-Received: (at 20109) by debbugs.gnu.org; 17 Mar 2015 08:40:02 +0000 Original-Received: from localhost ([127.0.0.1]:49149 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1YXn2e-000410-P9 for submit@debbugs.gnu.org; Tue, 17 Mar 2015 04:40:01 -0400 Original-Received: from fencepost.gnu.org ([208.118.235.10]:34336) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1YXn2c-00040r-QB for 20109@debbugs.gnu.org; Tue, 17 Mar 2015 04:39:59 -0400 Original-Received: from localhost ([127.0.0.1]:41643 helo=lola) by fencepost.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YXn2c-0007sw-0F; Tue, 17 Mar 2015 04:39:58 -0400 Original-Received: by lola (Postfix, from userid 1000) id CB1E5E0612; Tue, 17 Mar 2015 09:39:46 +0100 (CET) In-Reply-To: <87zj7cznb5.fsf@netris.org> (Mark H. Weaver's message of "Mon, 16 Mar 2015 16:42:38 -0400") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.0.50 (gnu/linux) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 140.186.70.43 X-BeenThere: bug-guile@gnu.org List-Id: "Bug reports for GUILE, GNU's Ubiquitous Extension Language" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Original-Sender: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.bugs:7752 Archived-At: Mark H Weaver writes: > David Kastrup writes: > >> In 2.0.9, the following patch/code for getting what amounts to a binary >> string port worked. >> >> commit 7f7a124d3470b0d566f796e88f4e2ad5aa043f16 >> Author: David Kastrup >> Date: Sun Sep 21 18:40:06 2014 +0200 >> >> Source_file::init_port: Keep GUILEv2 from redecoding string input >> >> diff --git a/lily/source-file.cc b/lily/source-file.cc >> index 1118b9d..75ed0d9 100644 >> --- a/lily/source-file.cc >> +++ b/lily/source-file.cc >> @@ -152,7 +152,11 @@ Source_file::init_port () >> // we do our own utf8 encoding and verification in the parser, so we >> // use the no-conversion equivalent of latin1 >> SCM str = scm_from_latin1_string (c_str ()); >> - str_port_ = scm_mkstrport (SCM_INUM0, str, SCM_OPN | SCM_RDNG, __FUNCTION__); >> + scm_dynwind_begin ((scm_t_dynwind_flags)0); >> + // Why doesn't scm_set_port_encoding_x work here? >> + scm_dynwind_fluid (ly_lily_module_constant ("%default-port-encoding"), SCM_BOOL_F); >> + str_port_ = scm_open_input_string (str); >> + scm_dynwind_end (); >> scm_set_port_filename_x (str_port_, ly_string2scm (name_)); >> } > > This hack of giving Guile a buffer containing UTF-8, but claiming that > it is Latin-1, is not good. It will cause Guile to see non-ASCII > characters as garbage. For one thing we are talking about an external file here that is mainly parsed by LilyPond. LilyPond provides sensible pinpointing of UTF-8 encoding errors, something which GUILE cannot do with its UTF-8 representation since it has no transparent or reproducible representation of bad bytes. Emacs uses overlong encodings for 0-127 to represent badly encoded bytes (which includes any overlong sequences) in the range 128-255, making 128-255 encode as patterns 0xc0 0x80 to 0xc1 0xbf. Since this leads to a reproducible encoding, one always has the information required for resynchronization even in the case of encoding errors. For another, synchronization of GUILE and LilyPond parsers requires that both can make use of byte offsets for positioning. GUILE's mandatory recoding on opening the port does not provide that. > However, if you insist on doing this, I would > suggest using a bytevector input port instead, like this: (untested) > > char *buf = c_str (); > SCM bv = scm_c_make_bytevector (strlen (buf) + 1); > strcpy (SCM_BYTEVECTOR_CONTENTS (bv), buf); > str_port_ = scm_open_bytevector_input_port (bv, SCM_UNDEFINED); dak@lola:/usr/local/tmp/guile$ git grep scm_open_byte_vector_input_port v2.0.11 dak@lola:/usr/local/tmp/guile$ git grep scm_open_byte_vector_input_port origin/stable-2.0 dak@lola:/usr/local/tmp/guile$ The idea would seem nice, but we are still talking about GUILE 2.0.11 here. "It is not good" for a facility that, unpretty as it may seem, was changed _within_ a stable version series without functionally equivalent replacement is not helpful. The whole point of a stable release series is to provide dependable functionality. Any changes based on the "we don't want people to use that since it is not nice" rationale should happen between stable release series. The way it looks, we'll have to use one mechanism for version 2.0.5 to 2.0.9, have to find out whether to reject 2.0.10, have to reject 2.0.11 and pray for 2.0.12 to provide scm_open_byte_vector_input_port. And depending on whether the dynamic library versions have been bumped, we might have to do this at runtime. -- David Kastrup