unofficial mirror of bug-guile@gnu.org 
 help / color / mirror / Atom feed
From: Linas Vepstas <linasvepstas@gmail.com>
To: Andy Wingo <wingo@pobox.com>
Cc: 25397@debbugs.gnu.org
Subject: bug#25397: guile-2.2 regression in utf8 support in scm_puts scm_lfwrite scm_c_put_string
Date: Mon, 9 Jan 2017 21:34:36 -0600	[thread overview]
Message-ID: <CAHrUA36xT1x4xW-xgYw_-3zpfSDHFQU4kqtETFk7ScKW-5u0pQ@mail.gmail.com> (raw)
In-Reply-To: <87y3yj99hs.fsf@pobox.com>

This short C program illustrates the issue.  The locale, the output port etc.
are UTF-8.  The bad results are no surprise: the code currently in git for
scm_puts etc. explicitly ignores the locale setting, always, and always
assumes latin1 -- its hard-coded in there.

--linas

#include <libguile.h>

void *wrap_eval(void* p)
{
   char *wtf = "(setlocale LC_ALL \"\")";
   SCM eval_str = scm_from_utf8_string(wtf);
   scm_eval_string(eval_str);

   return NULL;
}

void *wrap_puts(void* p)
{
   char *wtf = p;

   SCM port = scm_current_output_port ();

   scm_puts("the port-encoding is=", port);
   scm_puts(scm_to_utf8_string(scm_port_encoding(port)), port);

   scm_puts("\nThe string to display is =", port);
   scm_puts (wtf, port);

   scm_puts("\nWas expecting to see this=", port);
   SCM str = scm_from_utf8_string(wtf);
   scm_display(str, port);
   scm_puts("\n\n", port);

   return NULL;
}

int main(int argc, char* argv[])
{
   scm_with_guile(wrap_eval, 0x0);

   char * wtf = "Ćićolina";
   scm_with_guile(wrap_puts, wtf);

   wtf = "Thủ Dầu Một";
   scm_with_guile(wrap_puts, wtf);

   wtf = "Småland";
   scm_with_guile(wrap_puts, wtf);

   wtf = "Hòa Phú Phú Tân";
   scm_with_guile(wrap_puts, wtf);

   wtf = "係 拉 丁 字 母";
   scm_with_guile(wrap_puts, wtf);
}

The output is always this:

the port-encoding is=UTF-8
The string to display is =Ćićolina
Was expecting to see this=Ćićolina

the port-encoding is=UTF-8
The string to display is =Thủ Dầu Một
Was expecting to see this=Thủ Dầu Một

the port-encoding is=UTF-8
The string to display is =Småland
Was expecting to see this=Småland

the port-encoding is=UTF-8
The string to display is =Hòa Phú Phú Tân
Was expecting to see this=Hòa Phú Phú Tân

the port-encoding is=UTF-8
Was expecting to see this=係 拉 丁 字 母 æ¯


What's cool is that all this stuff works in email!

--linas

On Mon, Jan 9, 2017 at 4:03 PM, Andy Wingo <wingo@pobox.com> wrote:
> On Sun 08 Jan 2017 19:16, Linas Vepstas <linasvepstas@gmail.com> writes:
>
>> There appears to be a regression in guile-2.2 with utf8 handling
>> in the scm_puts() scm_lfwrite() and scm_c_put_string() functions.
>>
>> In guile-2.0, one could give these utf8-encoded strings, and these
>> would display just fine.  In 2.2 they get mangled.
>
> Could it be this from NEWS:
>
>   ** Better locale support in Guile scripts
>
>   When Guile is invoked directly, either from the command line or via a
>   hash-bang line (e.g. "#!/usr/bin/guile"), it now installs the current
>   locale via a call to `(setlocale LC_ALL "")'.  For users with a unicode
>   locale, this makes all ports unicode-capable by default, without the
>   need to call `setlocale' in your program.  This behavior may be
>   controlled via the GUILE_INSTALL_LOCALE environment variable; see the
>   manual for more.





  reply	other threads:[~2017-01-10  3:34 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-08 18:16 bug#25397: guile-2.2 regression in utf8 support in scm_puts scm_lfwrite scm_c_put_string Linas Vepstas
2017-01-09 22:03 ` Andy Wingo
2017-01-10  3:34   ` Linas Vepstas [this message]
2017-03-01 15:45     ` Andy Wingo
2017-03-01 20:18       ` Linas Vepstas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHrUA36xT1x4xW-xgYw_-3zpfSDHFQU4kqtETFk7ScKW-5u0pQ@mail.gmail.com \
    --to=linasvepstas@gmail.com \
    --cc=25397@debbugs.gnu.org \
    --cc=wingo@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).