unofficial mirror of bug-guile@gnu.org 
 help / color / mirror / Atom feed
From: Andy Wingo <wingo@pobox.com>
To: Linas Vepstas <linasvepstas@gmail.com>
Cc: 25397@debbugs.gnu.org
Subject: bug#25397: guile-2.2 regression in utf8 support in scm_puts scm_lfwrite scm_c_put_string
Date: Wed, 01 Mar 2017 16:45:26 +0100	[thread overview]
Message-ID: <87y3wpdmqx.fsf@pobox.com> (raw)
In-Reply-To: <CAHrUA36xT1x4xW-xgYw_-3zpfSDHFQU4kqtETFk7ScKW-5u0pQ@mail.gmail.com> (Linas Vepstas's message of "Mon, 9 Jan 2017 21:34:36 -0600")

On Tue 10 Jan 2017 04:34, Linas Vepstas <linasvepstas@gmail.com> writes:

> void *wrap_puts(void* p)
> {
>    char *wtf = p;
>
>    SCM port = scm_current_output_port ();
>
>    scm_puts("the port-encoding is=", port);
>    scm_puts(scm_to_utf8_string(scm_port_encoding(port)), port);
>
>    scm_puts("\nThe string to display is =", port);
>    scm_puts (wtf, port);
>
>    scm_puts("\nWas expecting to see this=", port);
>    SCM str = scm_from_utf8_string(wtf);
>    scm_display(str, port);
>    scm_puts("\n\n", port);
>
>    return NULL;
> }

So, there are a few questions here.  scm_puts and scm_lfwrite are not
documented, so we need to do basic science on them to see what they are
supposed to do.

Firstly, is scm_puts() a textual interface or a binary interface?
I.e. does it write a sequence of characters or a sequence of bytes?

If I look at uses of scm_puts in Guile sources, it seems clear that it's
a textual interface.  That is to say, at all points, the intention seems
to be to write characters on a Guile port.  All of the uses are of
strings.  Please do a "git grep" on your source to see if your
perceptions correspond.

Now the question is, what encoding is the argument in?  If the port is
UTF-16, that byte string should be decoded to characters, and that
character sequence encoded to UTF-16.

All of the scm_puts calls in Guile are of one-byte characters with
codepoints less than 128, so when doing some port refactoring I chose to
interpret the argument as latin1.

FTR, in Guile 2.0, this was effectively a binary interface.  Guile 2.0's
scm_lfwrite interpreted the incoming bytes as ISO-8859-1 codepoints for
the purposes of updating line and column, but scm_puts and scm_lfwrite
just wrote out the bytes to the port directly, regardless of the
encoding.  That was the wrong thing.

Are you arguing that the byte string given to scm_puts should be decoded
from UTF-8?  That would be OK.

Andy





  reply	other threads:[~2017-03-01 15:45 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-08 18:16 bug#25397: guile-2.2 regression in utf8 support in scm_puts scm_lfwrite scm_c_put_string Linas Vepstas
2017-01-09 22:03 ` Andy Wingo
2017-01-10  3:34   ` Linas Vepstas
2017-03-01 15:45     ` Andy Wingo [this message]
2017-03-01 20:18       ` Linas Vepstas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y3wpdmqx.fsf@pobox.com \
    --to=wingo@pobox.com \
    --cc=25397@debbugs.gnu.org \
    --cc=linasvepstas@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).