From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Linas Vepstas Newsgroups: gmane.lisp.guile.bugs Subject: bug#25397: guile-2.2 regression in utf8 support in scm_puts scm_lfwrite scm_c_put_string Date: Wed, 1 Mar 2017 14:18:54 -0600 Message-ID: References: <87y3yj99hs.fsf@pobox.com> <87y3wpdmqx.fsf@pobox.com> Reply-To: linasvepstas@gmail.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=94eb2c071e9a96ad5f0549b105a9 X-Trace: blaine.gmane.org 1488399614 758 195.159.176.226 (1 Mar 2017 20:20:14 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Wed, 1 Mar 2017 20:20:14 +0000 (UTC) Cc: "25397@debbugs.gnu.org" <25397@debbugs.gnu.org> To: Andy Wingo Original-X-From: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Wed Mar 01 21:20:09 2017 Return-path: Envelope-to: guile-bugs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cjAjI-0007zm-A2 for guile-bugs@m.gmane.org; Wed, 01 Mar 2017 21:20:08 +0100 Original-Received: from localhost ([::1]:48596 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cjAjO-0004de-3t for guile-bugs@m.gmane.org; Wed, 01 Mar 2017 15:20:14 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:48158) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cjAjH-0004bB-4s for bug-guile@gnu.org; Wed, 01 Mar 2017 15:20:08 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cjAjE-0001W6-DH for bug-guile@gnu.org; Wed, 01 Mar 2017 15:20:07 -0500 Original-Received: from debbugs.gnu.org ([208.118.235.43]:36671) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1cjAjE-0001W2-9A for bug-guile@gnu.org; Wed, 01 Mar 2017 15:20:04 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1cjAjC-0000wl-JZ for bug-guile@gnu.org; Wed, 01 Mar 2017 15:20:04 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Linas Vepstas Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Wed, 01 Mar 2017 20:20:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 25397 X-GNU-PR-Package: guile X-GNU-PR-Keywords: Original-Received: via spool by 25397-submit@debbugs.gnu.org id=B25397.14883995433563 (code B ref 25397); Wed, 01 Mar 2017 20:20:02 +0000 Original-Received: (at 25397) by debbugs.gnu.org; 1 Mar 2017 20:19:03 +0000 Original-Received: from localhost ([127.0.0.1]:34870 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cjAiE-0000vO-K7 for submit@debbugs.gnu.org; Wed, 01 Mar 2017 15:19:02 -0500 Original-Received: from mail-qk0-f181.google.com ([209.85.220.181]:35687) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cjAiC-0000uu-Vs for 25397@debbugs.gnu.org; Wed, 01 Mar 2017 15:19:01 -0500 Original-Received: by mail-qk0-f181.google.com with SMTP id u188so89817965qkc.2 for <25397@debbugs.gnu.org>; Wed, 01 Mar 2017 12:19:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:reply-to:in-reply-to:references:from:date:message-id :subject:to:cc; bh=yOm5zg0CNT2fR2AlKikOmNLAkOrT/1jA4uyaZFEqRu4=; b=hlw7REaUZ57DzIJKxMVWxA7VwUV5YpaNg8HykVU6/dk0RjxY+JGj2f0AbjpyADpWwG wMKHp/1FjLe/wi+fT/H+tGhW9Ik8/vcP98xHMpxuaFmlIJyJpCzUe15u6hiFlncGEqS/ soeQzl9rZeGgYGFLzeQMJRoIjVTl7s/j27tKRtqxlJG4hRqQ9Udin7zDyKvYRG1Gb9zg PTbX2aEnY1dJeKIi2qzcLp3TP6zmttiQt9PP0+J8c+0jW9IvUInqZ37HyAYDeBWDHweR THcy3RFoIk9QSOO31WZGHxIV4OGkKKepLV7kigOQl1mt2GZFnQF74wMPFjzkEHJb9Ec1 rdEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:reply-to:in-reply-to:references :from:date:message-id:subject:to:cc; bh=yOm5zg0CNT2fR2AlKikOmNLAkOrT/1jA4uyaZFEqRu4=; b=AGA1W/73mOLmDLTrFv4rOUFdIm0C54BgftbmpHYkZ2xofCpCmw3+i14rP1UrMoQVAY MCidfDqyamjEIM/zuYG0cpOWWn1OJIv8kvEr5/6cRV/v7QY5uYU2tCSlejf451i1Y3X6 HwZQlLE13lZiy0LPxO4o7lEkf14KRLjbguuRbhxwyvoEeJyfV2rx2I7cQ3eFgzKHkjiV TgeRLIhkyuC8sXIAHsCQbtIM0esG3J/p/Kv/CYe2gbLheJFZGSBcZTepZuXUn0qpwRuY zEpIIHtYdsj01UflfVCAmV7njCfcxOmY0kFXQliJGilB+Iw+jT/KfZRKL2M9jnrLosvc cOWg== X-Gm-Message-State: AMke39kp8zrfLM68lJQM81QvRkfArU1xFanJA1piWZCHPv56ZoS5y0JgwTZnHS097MT1OBdhbT5kHz0yVyMcqg== X-Received: by 10.55.131.4 with SMTP id f4mr11959349qkd.1.1488399535351; Wed, 01 Mar 2017 12:18:55 -0800 (PST) Original-Received: by 10.12.174.231 with HTTP; Wed, 1 Mar 2017 12:18:54 -0800 (PST) In-Reply-To: <87y3wpdmqx.fsf@pobox.com> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-guile@gnu.org List-Id: "Bug reports for GUILE, GNU's Ubiquitous Extension Language" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Original-Sender: "bug-guile" Xref: news.gmane.org gmane.lisp.guile.bugs:8640 Archived-At: --94eb2c071e9a96ad5f0549b105a9 Content-Type: text/plain; charset=UTF-8 In the bad old days, not every thing was documented ... My use of scm_puts dates back to guile-1.8. I only ever send it utf8. I can change my code, no problem,... I just thought I'd report a regression in case .... others are affected. Linas On Wednesday, March 1, 2017, Andy Wingo wrote: > On Tue 10 Jan 2017 04:34, Linas Vepstas > writes: > > > void *wrap_puts(void* p) > > { > > char *wtf = p; > > > > SCM port = scm_current_output_port (); > > > > scm_puts("the port-encoding is=", port); > > scm_puts(scm_to_utf8_string(scm_port_encoding(port)), port); > > > > scm_puts("\nThe string to display is =", port); > > scm_puts (wtf, port); > > > > scm_puts("\nWas expecting to see this=", port); > > SCM str = scm_from_utf8_string(wtf); > > scm_display(str, port); > > scm_puts("\n\n", port); > > > > return NULL; > > } > > So, there are a few questions here. scm_puts and scm_lfwrite are not > documented, so we need to do basic science on them to see what they are > supposed to do. > > Firstly, is scm_puts() a textual interface or a binary interface? > I.e. does it write a sequence of characters or a sequence of bytes? > > If I look at uses of scm_puts in Guile sources, it seems clear that it's > a textual interface. That is to say, at all points, the intention seems > to be to write characters on a Guile port. All of the uses are of > strings. Please do a "git grep" on your source to see if your > perceptions correspond. > > Now the question is, what encoding is the argument in? If the port is > UTF-16, that byte string should be decoded to characters, and that > character sequence encoded to UTF-16. > > All of the scm_puts calls in Guile are of one-byte characters with > codepoints less than 128, so when doing some port refactoring I chose to > interpret the argument as latin1. > > FTR, in Guile 2.0, this was effectively a binary interface. Guile 2.0's > scm_lfwrite interpreted the incoming bytes as ISO-8859-1 codepoints for > the purposes of updating line and column, but scm_puts and scm_lfwrite > just wrote out the bytes to the port directly, regardless of the > encoding. That was the wrong thing. > > Are you arguing that the byte string given to scm_puts should be decoded > from UTF-8? That would be OK. > > Andy > --94eb2c071e9a96ad5f0549b105a9 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable In the bad old days, not every thing was documented ... My use of scm_puts = dates back to guile-1.8.=C2=A0 I only ever send it utf8.=C2=A0 I can change= my code, no problem,... I just thought I'd report a regression in case= .... others are affected.

Linas

On Wednesday, Ma= rch 1, 2017, Andy Wingo <wingo@pobox.= com> wrote:
On Tue 10 Jan 2017 04:= 34, Linas Vepstas <linasvepstas@gmail.com> = writes:

> void *wrap_puts(void* p)
> {
>=C2=A0 =C2=A0 char *wtf =3D p;
>
>=C2=A0 =C2=A0 SCM port =3D scm_current_output_port ();
>
>=C2=A0 =C2=A0 scm_puts("the port-encoding is=3D", port);
>=C2=A0 =C2=A0 scm_puts(scm_to_utf8_string(scm_port_encoding(port))= , port);
>
>=C2=A0 =C2=A0 scm_puts("\nThe string to display is =3D", port= );
>=C2=A0 =C2=A0 scm_puts (wtf, port);
>
>=C2=A0 =C2=A0 scm_puts("\nWas expecting to see this=3D", port= );
>=C2=A0 =C2=A0 SCM str =3D scm_from_utf8_string(wtf);
>=C2=A0 =C2=A0 scm_display(str, port);
>=C2=A0 =C2=A0 scm_puts("\n\n", port);
>
>=C2=A0 =C2=A0 return NULL;
> }

So, there are a few questions here.=C2=A0 scm_puts and scm_lfwrite are not<= br> documented, so we need to do basic science on them to see what they are
supposed to do.

Firstly, is scm_puts() a textual interface or a binary interface?
I.e. does it write a sequence of characters or a sequence of bytes?

If I look at uses of scm_puts in Guile sources, it seems clear that it'= s
a textual interface.=C2=A0 That is to say, at all points, the intention see= ms
to be to write characters on a Guile port.=C2=A0 All of the uses are of
strings.=C2=A0 Please do a "git grep" on your source to see if yo= ur
perceptions correspond.

Now the question is, what encoding is the argument in?=C2=A0 If the port is=
UTF-16, that byte string should be decoded to characters, and that
character sequence encoded to UTF-16.

All of the scm_puts calls in Guile are of one-byte characters with
codepoints less than 128, so when doing some port refactoring I chose to interpret the argument as latin1.

FTR, in Guile 2.0, this was effectively a binary interface.=C2=A0 Guile 2.0= 's
scm_lfwrite interpreted the incoming bytes as ISO-8859-1 codepoints for
the purposes of updating line and column, but scm_puts and scm_lfwrite
just wrote out the bytes to the port directly, regardless of the
encoding.=C2=A0 That was the wrong thing.

Are you arguing that the byte string given to scm_puts should be decoded from UTF-8?=C2=A0 That would be OK.

Andy
--94eb2c071e9a96ad5f0549b105a9--