From: Linas Vepstas <linasvepstas@gmail.com>
To: Andy Wingo <wingo@pobox.com>
Cc: 25397@debbugs.gnu.org
Subject: bug#25397: guile-2.2 regression in utf8 support in scm_puts scm_lfwrite scm_c_put_string
Date: Mon, 9 Jan 2017 21:34:36 -0600 [thread overview]
Message-ID: <CAHrUA36xT1x4xW-xgYw_-3zpfSDHFQU4kqtETFk7ScKW-5u0pQ@mail.gmail.com> (raw)
In-Reply-To: <87y3yj99hs.fsf@pobox.com>
This short C program illustrates the issue. The locale, the output port etc.
are UTF-8. The bad results are no surprise: the code currently in git for
scm_puts etc. explicitly ignores the locale setting, always, and always
assumes latin1 -- its hard-coded in there.
--linas
#include <libguile.h>
void *wrap_eval(void* p)
{
char *wtf = "(setlocale LC_ALL \"\")";
SCM eval_str = scm_from_utf8_string(wtf);
scm_eval_string(eval_str);
return NULL;
}
void *wrap_puts(void* p)
{
char *wtf = p;
SCM port = scm_current_output_port ();
scm_puts("the port-encoding is=", port);
scm_puts(scm_to_utf8_string(scm_port_encoding(port)), port);
scm_puts("\nThe string to display is =", port);
scm_puts (wtf, port);
scm_puts("\nWas expecting to see this=", port);
SCM str = scm_from_utf8_string(wtf);
scm_display(str, port);
scm_puts("\n\n", port);
return NULL;
}
int main(int argc, char* argv[])
{
scm_with_guile(wrap_eval, 0x0);
char * wtf = "Ćićolina";
scm_with_guile(wrap_puts, wtf);
wtf = "Thủ Dầu Một";
scm_with_guile(wrap_puts, wtf);
wtf = "Småland";
scm_with_guile(wrap_puts, wtf);
wtf = "Hòa Phú Phú Tân";
scm_with_guile(wrap_puts, wtf);
wtf = "係 拉 丁 字 母";
scm_with_guile(wrap_puts, wtf);
}
The output is always this:
the port-encoding is=UTF-8
The string to display is =Ćićolina
Was expecting to see this=Ćićolina
the port-encoding is=UTF-8
The string to display is =Thủ Dầu Một
Was expecting to see this=Thủ Dầu Một
the port-encoding is=UTF-8
The string to display is =Småland
Was expecting to see this=Småland
the port-encoding is=UTF-8
The string to display is =Hòa Phú Phú Tân
Was expecting to see this=Hòa Phú Phú Tân
the port-encoding is=UTF-8
Was expecting to see this=係 拉 丁 字 母 æ¯
What's cool is that all this stuff works in email!
--linas
On Mon, Jan 9, 2017 at 4:03 PM, Andy Wingo <wingo@pobox.com> wrote:
> On Sun 08 Jan 2017 19:16, Linas Vepstas <linasvepstas@gmail.com> writes:
>
>> There appears to be a regression in guile-2.2 with utf8 handling
>> in the scm_puts() scm_lfwrite() and scm_c_put_string() functions.
>>
>> In guile-2.0, one could give these utf8-encoded strings, and these
>> would display just fine. In 2.2 they get mangled.
>
> Could it be this from NEWS:
>
> ** Better locale support in Guile scripts
>
> When Guile is invoked directly, either from the command line or via a
> hash-bang line (e.g. "#!/usr/bin/guile"), it now installs the current
> locale via a call to `(setlocale LC_ALL "")'. For users with a unicode
> locale, this makes all ports unicode-capable by default, without the
> need to call `setlocale' in your program. This behavior may be
> controlled via the GUILE_INSTALL_LOCALE environment variable; see the
> manual for more.
next prev parent reply other threads:[~2017-01-10 3:34 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-01-08 18:16 bug#25397: guile-2.2 regression in utf8 support in scm_puts scm_lfwrite scm_c_put_string Linas Vepstas
2017-01-09 22:03 ` Andy Wingo
2017-01-10 3:34 ` Linas Vepstas [this message]
2017-03-01 15:45 ` Andy Wingo
2017-03-01 20:18 ` Linas Vepstas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/guile/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAHrUA36xT1x4xW-xgYw_-3zpfSDHFQU4kqtETFk7ScKW-5u0pQ@mail.gmail.com \
--to=linasvepstas@gmail.com \
--cc=25397@debbugs.gnu.org \
--cc=wingo@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).