unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
From: Andy Wingo <wingo@pobox.com>
To: Mike Gran <spk121@yahoo.com>
Cc: bug-guile@gnu.org, linasvepstas@gmail.com,
	Guile Development <guile-devel@gnu.org>
Subject: Re: UTF-8 regression in guile 1.9.5
Date: Sat, 09 Jan 2010 19:07:38 +0100	[thread overview]
Message-ID: <m3aawnrw2d.fsf@pobox.com> (raw)
In-Reply-To: <188729.99650.qm@web37904.mail.mud.yahoo.com> (Mike Gran's message of "Fri, 11 Dec 2009 07:05:55 -0800 (PST)")

Hi,

Reviving an old thread...

On Fri 11 Dec 2009 16:05, Mike Gran <spk121@yahoo.com> writes:

>> On Sun 06 Dec 2009 21:43, Linas Vepstas writes:
>>
>> > 2009/12/6 Mike Gran :
>> >>
>> >>> > need to call (setlocale LC_ALL "")
>> >>
>> >> But for Guile to store characters as codepoints, declaring a locale
>> >> pretty much a requirement now.
>> >
>> > Would it make sense to add (setlocale LC_ALL "") to some default,
>> > e.g. boot-9.scm  ?
>
> If we always call setlocale, legacy code that used UTF-8 and other
> non-Latin locales will just work.  Legacy code that used strings to
> contain binary data would break.
>
> (Of couse, UTF-8 strings only worked on Guile 1.8.x so long
> as you either never looked at substrings or chars, or did
> UTF-8 parsing yourself.)
>
> As it is now, the opposite is true: legacy code with strings
> containing binary data will just work; strings containing non-8-bit
> locale encoded strings will break.
>
> | 1.8.x             | setlocale |
> | Strings           | called    | Guile 2.0
> | contain           | 1.8 | 2.0 | will
> -----------------------------------------------------------------
> | ASCII             | Y/N | Y/N | just work
> ----------------------------------------------------------------- 
> | locale-encoded    | Y/N | Y   | just work
> | strings           |     |     |
> -----------------------------------------------------------------
> | locale-encoded    | Y/N | N   | interpret string bytes as
> | strings           |     |     | Latin-1
> -----------------------------------------------------------------
> | binary data       | Y/N | Y   | if locale is Latin-1: just work
> |                   |     |     |
> |                   |     |     | if locale is not latin-1:
> |                   |     |     | interpret string bytes using
> |                   |     |     | locale encoding
> -----------------------------------------------------------------
> | binary data       | Y/N | N   | just work
> |                   |     |     |
>
> I think I prefer that the coder take the responsibility of calling
> setlocale, but, I only think that because it is how C works.  I'm used
> to that convention.

I would still prefer ponies and magic, but I realized: if we do a
setlocale(LC_ALL, "") at the beginning, might that not change e.g. the
floating point format, or some other locale-related variable, which
would make Guile modules unreadable, or otherwise semantically different
or invalid?

I'm asking because I ran into this bug now:

    scheme@(guile-user)> ,pr (resolve-module '(gnome gtk))
    Throw to key `wrong-type-arg' with args `("procedure-name" "Wrong type argument in position ~A: ~S" (1 #<dynamic-object "libgw-guile-gnome-pango">) (#<dynamic-object "libgw-guile-gnome-pango">))'.
    Entering the debugger. Type `bt' for a backtrace or `c' to continue.
    0 debug> bt
    In current input:
    <unknown-location>: 13 ERROR: cannot convert to output locale "NONE": ""dynamic-wind""

So I guess we need a special case for NONE there, or something. I really
don't understand i18n/l10n.

FWIW, it seems that both ruby and python require the user to call
setlocale.

Regards,

Andy
-- 
http://wingolog.org/




  parent reply	other threads:[~2010-01-09 18:07 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-06 18:43 UTF-8 regression in guile 1.9.5 Linas Vepstas
2009-12-06 19:16 ` Mike Gran
2009-12-06 19:33   ` Linas Vepstas
2009-12-06 20:40     ` Mike Gran
2009-12-06 20:43       ` Linas Vepstas
2009-12-11 10:29         ` Andy Wingo
2009-12-11 15:05           ` Mike Gran
2009-12-11 15:40             ` Linas Vepstas
2009-12-11 22:50             ` Ludovic Courtès
2010-01-09 18:07             ` Andy Wingo [this message]
2010-01-10 22:00               ` Mike Gran
2010-01-11 13:38                 ` Ludovic Courtès
2010-01-11 21:18                   ` Andy Wingo
2010-01-12 11:25                     ` Ludovic Courtès
2010-01-12 19:36                       ` Andy Wingo
2010-01-12 22:26                         ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m3aawnrw2d.fsf@pobox.com \
    --to=wingo@pobox.com \
    --cc=bug-guile@gnu.org \
    --cc=guile-devel@gnu.org \
    --cc=linasvepstas@gmail.com \
    --cc=spk121@yahoo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).