From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: David Kastrup Newsgroups: gmane.lisp.guile.bugs Subject: bug#20109: Incompatible API change in 2.0 series for string port encoding Date: Wed, 18 Mar 2015 13:32:55 +0100 Message-ID: <87r3smeb9k.fsf@fencepost.gnu.org> References: <87mw3eh04z.fsf@fencepost.gnu.org> <87zj7cznb5.fsf@netris.org> <874mpkf25p.fsf@fencepost.gnu.org> <87pp87fdmm.fsf@netris.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1426682031 27135 80.91.229.3 (18 Mar 2015 12:33:51 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 18 Mar 2015 12:33:51 +0000 (UTC) Cc: 20109@debbugs.gnu.org To: Mark H Weaver Original-X-From: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Wed Mar 18 13:33:45 2015 Return-path: Envelope-to: guile-bugs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1YYDAM-0001me-BG for guile-bugs@m.gmane.org; Wed, 18 Mar 2015 13:33:42 +0100 Original-Received: from localhost ([::1]:33220 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YYDAL-0001gv-MG for guile-bugs@m.gmane.org; Wed, 18 Mar 2015 08:33:41 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:45115) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YYD9j-0000he-MC for bug-guile@gnu.org; Wed, 18 Mar 2015 08:33:04 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YYD9i-0001Sl-KK for bug-guile@gnu.org; Wed, 18 Mar 2015 08:33:03 -0400 Original-Received: from debbugs.gnu.org ([140.186.70.43]:52569) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YYD9i-0001Sh-IF for bug-guile@gnu.org; Wed, 18 Mar 2015 08:33:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80) (envelope-from ) id 1YYD9i-0007C2-6T for bug-guile@gnu.org; Wed, 18 Mar 2015 08:33:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: David Kastrup Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Wed, 18 Mar 2015 12:33:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 20109 X-GNU-PR-Package: guile X-GNU-PR-Keywords: Original-Received: via spool by 20109-submit@debbugs.gnu.org id=B20109.142668198027640 (code B ref 20109); Wed, 18 Mar 2015 12:33:02 +0000 Original-Received: (at 20109) by debbugs.gnu.org; 18 Mar 2015 12:33:00 +0000 Original-Received: from localhost ([127.0.0.1]:51137 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1YYD9f-0007Bj-HY for submit@debbugs.gnu.org; Wed, 18 Mar 2015 08:32:59 -0400 Original-Received: from fencepost.gnu.org ([208.118.235.10]:49686) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1YYD9d-0007Bb-8c for 20109@debbugs.gnu.org; Wed, 18 Mar 2015 08:32:57 -0400 Original-Received: from localhost ([127.0.0.1]:56991 helo=lola) by fencepost.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YYD9c-0006Q1-CR; Wed, 18 Mar 2015 08:32:56 -0400 Original-Received: by lola (Postfix, from userid 1000) id D04E3E0625; Wed, 18 Mar 2015 13:32:55 +0100 (CET) In-Reply-To: <87pp87fdmm.fsf@netris.org> (Mark H. Weaver's message of "Tue, 17 Mar 2015 18:44:17 -0400") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.0.50 (gnu/linux) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 140.186.70.43 X-BeenThere: bug-guile@gnu.org List-Id: "Bug reports for GUILE, GNU's Ubiquitous Extension Language" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Original-Sender: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.bugs:7754 Archived-At: Mark H Weaver writes: > David Kastrup writes: > >> Mark H Weaver writes: >> >>> This hack of giving Guile a buffer containing UTF-8, but claiming that >>> it is Latin-1, is not good. It will cause Guile to see non-ASCII >>> characters as garbage. >> >> For one thing we are talking about an external file here that is >> mainly parsed by LilyPond. LilyPond provides sensible pinpointing of >> UTF-8 encoding errors, something which GUILE cannot do with its UTF-8 >> representation since it has no transparent or reproducible >> representation of bad bytes. Emacs uses overlong encodings for 0-127 >> to represent badly encoded bytes (which includes any overlong >> sequences) in the range 128-255, making 128-255 encode as patterns >> 0xc0 0x80 to 0xc1 0xbf. > > I intend to add a similar mechanism to Guile, but it is not yet done. I think it would be pretty important since it makes it possible to treat problems at those points in processing where it makes most sense. However, it would also seem important to have GUILE handle utf-8 strings. At the current point of time, its only native types are what it calls "latin-1" and likely "UTF-32". Which does not make much sense in connection with its string ports being unconditionally UTF-8 instead. Concatenating a string from smaller pieces sequentially via string operations is O(n^2), so string ports are a natural way to assemble large strings. They are also nice for reading from strings. Not requiring conversions for most of that would be nice. >>> However, if you insist on doing this, I would >>> suggest using a bytevector input port instead, like this: (untested) >>> >>> char *buf =3D c_str (); >>> SCM bv =3D scm_c_make_bytevector (strlen (buf) + 1); >>> strcpy (SCM_BYTEVECTOR_CONTENTS (bv), buf); >>> str_port_ =3D scm_open_bytevector_input_port (bv, SCM_UNDEFINED); >> >> dak@lola:/usr/local/tmp/guile$ git grep >> scm_open_byte_vector_input_port v2.0.11 >> dak@lola:/usr/local/tmp/guile$ git grep >> scm_open_byte_vector_input_port origin/stable-2.0 >> dak@lola:/usr/local/tmp/guile$=20 > > You have mispelled the name of the function. The following (untested) > code should work on Guile 2.0.5 or later: > > char *buf =3D c_str (); > size_t len =3D strlen (buf); > SCM bv =3D scm_c_make_bytevector (len); > memcpy (SCM_BYTEVECTOR_CONTENTS (bv), buf, len); > str_port_ =3D scm_open_bytevector_input_port (bv, SCM_UNDEFINED); One would expect that I'd be able to do a simple copy&paste of a function name. Sorry for messing this up. Yes, this looks like it should indeed provide a better match of "encoding intentions" to our original code. I'll have to see whether I=A0can make this approach work with the rest of our code. I somehow missed that r6rs ports were more than just a compatibility wrapper written in Scheme. --=20 David Kastrup