From: Linas Vepstas <linasvepstas@gmail.com>
To: Andy Wingo <wingo@pobox.com>
Cc: "25397@debbugs.gnu.org" <25397@debbugs.gnu.org>
Subject: bug#25397: guile-2.2 regression in utf8 support in scm_puts scm_lfwrite scm_c_put_string
Date: Wed, 1 Mar 2017 14:18:54 -0600 [thread overview]
Message-ID: <CAHrUA35TFqxtZuJ23huU1=5FOE-_Cr55R1HJHXjNrcqCtqQMGA@mail.gmail.com> (raw)
In-Reply-To: <87y3wpdmqx.fsf@pobox.com>
[-- Attachment #1: Type: text/plain, Size: 2351 bytes --]
In the bad old days, not every thing was documented ... My use of scm_puts
dates back to guile-1.8. I only ever send it utf8. I can change my code,
no problem,... I just thought I'd report a regression in case .... others
are affected.
Linas
On Wednesday, March 1, 2017, Andy Wingo <wingo@pobox.com> wrote:
> On Tue 10 Jan 2017 04:34, Linas Vepstas <linasvepstas@gmail.com
> <javascript:;>> writes:
>
> > void *wrap_puts(void* p)
> > {
> > char *wtf = p;
> >
> > SCM port = scm_current_output_port ();
> >
> > scm_puts("the port-encoding is=", port);
> > scm_puts(scm_to_utf8_string(scm_port_encoding(port)), port);
> >
> > scm_puts("\nThe string to display is =", port);
> > scm_puts (wtf, port);
> >
> > scm_puts("\nWas expecting to see this=", port);
> > SCM str = scm_from_utf8_string(wtf);
> > scm_display(str, port);
> > scm_puts("\n\n", port);
> >
> > return NULL;
> > }
>
> So, there are a few questions here. scm_puts and scm_lfwrite are not
> documented, so we need to do basic science on them to see what they are
> supposed to do.
>
> Firstly, is scm_puts() a textual interface or a binary interface?
> I.e. does it write a sequence of characters or a sequence of bytes?
>
> If I look at uses of scm_puts in Guile sources, it seems clear that it's
> a textual interface. That is to say, at all points, the intention seems
> to be to write characters on a Guile port. All of the uses are of
> strings. Please do a "git grep" on your source to see if your
> perceptions correspond.
>
> Now the question is, what encoding is the argument in? If the port is
> UTF-16, that byte string should be decoded to characters, and that
> character sequence encoded to UTF-16.
>
> All of the scm_puts calls in Guile are of one-byte characters with
> codepoints less than 128, so when doing some port refactoring I chose to
> interpret the argument as latin1.
>
> FTR, in Guile 2.0, this was effectively a binary interface. Guile 2.0's
> scm_lfwrite interpreted the incoming bytes as ISO-8859-1 codepoints for
> the purposes of updating line and column, but scm_puts and scm_lfwrite
> just wrote out the bytes to the port directly, regardless of the
> encoding. That was the wrong thing.
>
> Are you arguing that the byte string given to scm_puts should be decoded
> from UTF-8? That would be OK.
>
> Andy
>
[-- Attachment #2: Type: text/html, Size: 2951 bytes --]
prev parent reply other threads:[~2017-03-01 20:18 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-01-08 18:16 bug#25397: guile-2.2 regression in utf8 support in scm_puts scm_lfwrite scm_c_put_string Linas Vepstas
2017-01-09 22:03 ` Andy Wingo
2017-01-10 3:34 ` Linas Vepstas
2017-03-01 15:45 ` Andy Wingo
2017-03-01 20:18 ` Linas Vepstas [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/guile/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAHrUA35TFqxtZuJ23huU1=5FOE-_Cr55R1HJHXjNrcqCtqQMGA@mail.gmail.com' \
--to=linasvepstas@gmail.com \
--cc=25397@debbugs.gnu.org \
--cc=wingo@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).