unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* Accessing the environment's locale encoding settings
@ 2011-11-16  0:13 Ludovic Courtès
  2011-11-16  2:00 ` Bruno Haible
  0 siblings, 1 reply; 16+ messages in thread
From: Ludovic Courtès @ 2011-11-16  0:13 UTC (permalink / raw)
  To: Bruno Haible; +Cc: bug-libunistring, guile-devel

[-- Attachment #1: Type: text/plain, Size: 848 bytes --]

Hi Bruno,

In Guile, strings coming from the C world are assumed to be encoded in
the current locale encoding.  Like in C, the current locale is set using
‘setlocale’, and it’s up to the user to write (setlocale LC_ALL "") to
set the locale according to the relevant environment variables.

The problem comes with command-line arguments: the user hasn’t yet had a
chance to call ‘setlocale’, yet they most likely have to be converted
from locale encoding.  Up to 2.0.3, they were instead assumed to be
ASCII, and we’re looking into fixing it [0].

The trick we came up with is to look at $LANG, $LC_ALL, & co. and try to
determine what the locale encoding “would be” if (setlocale LC_ALL "")
were called [1].

To do that, I use a variant of ‘localecharset’ in Gnulib, with a
special-case for the C locale:


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-csrc, Size: 1403 bytes --]

/* Return the name of the locale encoding suggested by environment
   variables, even if it's not current, or NULL if no encoding is
   defined.  Based on Gnulib's `localcharset.c'.  */
static const char *
locale_encoding (void)
{
  static char buf[2 + 10 + 1];
  const char *locale, *codeset = NULL;

  /* Allow user to override the codeset, as set in the operating system,
     with standard language environment variables.  */
  locale = getenv ("LC_ALL");
  if (locale == NULL || locale[0] == '\0')
    {
      locale = getenv ("LC_CTYPE");
      if (locale == NULL || locale[0] == '\0')
        locale = getenv ("LANG");
    }
  if (locale != NULL && locale[0] != '\0')
    {
      /* If the locale name contains an encoding after the dot, return it.  */
      const char *dot = strchr (locale, '.');

      if (dot != NULL)
        {
          const char *modifier;

          dot++;
          /* Look for the possible @... trailer and remove it, if any.  */
          modifier = strchr (dot, '@');
          if (modifier == NULL)
            return dot;
          if (modifier - dot < sizeof (buf))
            {
              memcpy (buf, dot, modifier - dot);
              buf [modifier - dot] = '\0';
              return buf;
            }
        }
      else if (strcmp (locale, "C") == 0)
	{
	  strcpy (buf, "ASCII");
	  return buf;
	}

      codeset = locale;
    }

  return codeset;
}

[-- Attachment #3: Type: text/plain, Size: 307 bytes --]


What do you think of this approach?

Should we be checking for charset aliases?  If so, we’d need help from
Gnulib since ‘get_charset_aliases’ is internal.

Thanks,
Ludo’.

[0] http://thread.gmane.org/gmane.lisp.guile.devel/12887
[1] http://thread.gmane.org/gmane.lisp.guile.devel/12900

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Accessing the environment's locale encoding settings
  2011-11-16  0:13 Accessing the environment's locale encoding settings Ludovic Courtès
@ 2011-11-16  2:00 ` Bruno Haible
  2011-11-16 10:35   ` Ludovic Courtès
  0 siblings, 1 reply; 16+ messages in thread
From: Bruno Haible @ 2011-11-16  2:00 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel

[Dropping bug-libunistring from the CC.]

Hi Ludo',

> Should we be checking for charset aliases?

Yes, without the system dependent aliases table the locale_charset()
function is buggy on nearly all platforms. Cf. gnulib/lib/config.charset.

> In Guile, strings coming from the C world are assumed to be encoded in
> the current locale encoding.  Like in C, the current locale is set using
> ‘setlocale’, and it’s up to the user to write (setlocale LC_ALL "") to
> set the locale according to the relevant environment variables.
> 
> The problem comes with command-line arguments: the user hasn’t yet had a
> chance to call ‘setlocale’, yet they most likely have to be converted
> from locale encoding. ...

I would recommend to have setlocale(...) happen *before* the command-line
arguments are parsed, not *after*. For two reasons:
  1) The parsing of command-line arguments can provoke errors, and errors
     should be displayed in the user's language, that is, depend on $LANG,
     $LC_MESSAGES, $LC_ALL.
  2) As you noticed, if setlocale(...) happens too late, you want to
     simulate the effects "as if" setlocale(LC_ALL, "") had been called.
     But you have thought only about the locale encoding (part of the
     LC_CTYPE category of the locale), not about LC_MESSAGES which is needed
     when you print an error message.

You wrote:
> > Unfortunately, I don't see a way for the user to call setlocale before a
> > Guile script converts the command-line arguments to Scheme strings, at
> > least not without providing their own `main' function in C.
>
> Hmm, very good point.

That is precisely the point. Only in C, C++, Objective C, PHP, and Guile,
it is the user's responsibility to set the locale. Look at the many
internationalization samples ("hello world" samples) in GNU gettext:
In all other languages (and even many GUI toolkits based on C, C++, or
Objective C) the setlocale call is implicit.

The user should *not* have to worry about conversion of strings from/to
locale encoding, because
  1) This is what people expect from a scripting language nowadays.
  2) In Guile strings are sequences of Unicode characters [1][2].

The fact that in C and C++ the default locale inside a program (that is,
the locale in effect when the program is started) is *not* the locale
specified by the user is only due to backward compatibility:
  - In C, because C started as a system programming language and the
    locale facilities were not there in the beginning,
  - In C++, because C++ has strong backward compatibility links with C.

So my suggestion is to do (setlocale LC_ALL "") as part of the Guile
initialization, very early. Yes, this might lead to some complexity
in the Guile implementation if you have the concept of locale also at
the Guile level and need to make sure that the locale at the C level and
the locale at the Guile level are consistent as soon as the latter is
defined. But this is manageable.

Bruno

[1] http://www.gnu.org/software/guile/manual/html_node/Strings.html
[2] http://www.gnu.org/software/guile/manual/html_node/Characters.html
-- 
In memoriam Kurt Gerron <http://en.wikipedia.org/wiki/Kurt_Gerron>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Accessing the environment's locale encoding settings
  2011-11-16  2:00 ` Bruno Haible
@ 2011-11-16 10:35   ` Ludovic Courtès
  2011-11-16 16:11     ` Noah Lavine
  2011-11-20 16:55     ` Bruno Haible
  0 siblings, 2 replies; 16+ messages in thread
From: Ludovic Courtès @ 2011-11-16 10:35 UTC (permalink / raw)
  To: Bruno Haible; +Cc: guile-devel

Hi Bruno,

Thanks for your quick and insightful feedback.

Bruno Haible <bruno@clisp.org> skribis:

> That is precisely the point. Only in C, C++, Objective C, PHP, and Guile,
> it is the user's responsibility to set the locale. Look at the many
> internationalization samples ("hello world" samples) in GNU gettext:
> In all other languages (and even many GUI toolkits based on C, C++, or
> Objective C) the setlocale call is implicit.

It seems to me that the implicit call is often desirable, but at the
same time, it imposes a policy on the application.  In C, Guile, & co.,
the application can choose to ignore the locale, or to just honor
LC_CTYPE, or to set something different.  Perhaps this point is moot if
the other languages allow the locale to be set afterward without any
loss of functionality, though...

> The user should *not* have to worry about conversion of strings from/to
> locale encoding, because
>   1) This is what people expect from a scripting language nowadays.
>   2) In Guile strings are sequences of Unicode characters [1][2].

Agreed.

[...]

> So my suggestion is to do (setlocale LC_ALL "") as part of the Guile
> initialization, very early. Yes, this might lead to some complexity
> in the Guile implementation if you have the concept of locale also at
> the Guile level and need to make sure that the locale at the C level and
> the locale at the Guile level are consistent as soon as the latter is
> defined. But this is manageable.

Are you suggesting that we could arrange to have Guile’s ‘main’ call
setlocale(LC_ALL, "") while still giving Scheme code the impression that
it’s started under the C locale as is currently the case?

Just adding setlocale(LC_ALL, "") in Guile’s ‘main’ would be an
incompatible change, which would break Scheme applications relying on
the current behavior–e.g., applications intended to be all-English.

A reasonable option would be to setlocale(LC_CTYPE, "") from Guile’s
‘main’, so that scm_from_locale_string & co. would DTRT.  But again that
would change the value of %default-port-encoding, leading to potential
application breakage.

Thanks,
Ludo’.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Accessing the environment's locale encoding settings
  2011-11-16 10:35   ` Ludovic Courtès
@ 2011-11-16 16:11     ` Noah Lavine
  2011-11-16 16:32       ` Peter Brett
  2011-11-20 16:55     ` Bruno Haible
  1 sibling, 1 reply; 16+ messages in thread
From: Noah Lavine @ 2011-11-16 16:11 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Bruno Haible, guile-devel

Hello,

It seems like the right thing to do might be to do setlocale(LC_ALL,
"") in Guile's main(). Let me argue that this accomplishes two goals
which we want to accomplish
  - it does the right thing by default: you want your program to be
able to talk to the user in the user's own language. This accomplishes
that, and also fixes the bug that has been discussed.
  - it lets the user customize Guile's behavior: the user has two
options for going back to the old behavior. First of all, they can do
setlocale(LC_ALL, "C") as soon as their program starts to get back the
default behavior. Second, if they really don't want any locale
changes, they can provide their own main function. We might even be
able to offer a configuration option, so that they could use Guile's
main except for the setlocale call.

However, it seems like in this case, setlocale(LC_ALL, "") is the
right thing in almost every case, and it should just be Guile's
default.

Noah

On Wed, Nov 16, 2011 at 2:35 AM, Ludovic Courtès <ludo@gnu.org> wrote:
> Hi Bruno,
>
> Thanks for your quick and insightful feedback.
>
> Bruno Haible <bruno@clisp.org> skribis:
>
>> That is precisely the point. Only in C, C++, Objective C, PHP, and Guile,
>> it is the user's responsibility to set the locale. Look at the many
>> internationalization samples ("hello world" samples) in GNU gettext:
>> In all other languages (and even many GUI toolkits based on C, C++, or
>> Objective C) the setlocale call is implicit.
>
> It seems to me that the implicit call is often desirable, but at the
> same time, it imposes a policy on the application.  In C, Guile, & co.,
> the application can choose to ignore the locale, or to just honor
> LC_CTYPE, or to set something different.  Perhaps this point is moot if
> the other languages allow the locale to be set afterward without any
> loss of functionality, though...
>
>> The user should *not* have to worry about conversion of strings from/to
>> locale encoding, because
>>   1) This is what people expect from a scripting language nowadays.
>>   2) In Guile strings are sequences of Unicode characters [1][2].
>
> Agreed.
>
> [...]
>
>> So my suggestion is to do (setlocale LC_ALL "") as part of the Guile
>> initialization, very early. Yes, this might lead to some complexity
>> in the Guile implementation if you have the concept of locale also at
>> the Guile level and need to make sure that the locale at the C level and
>> the locale at the Guile level are consistent as soon as the latter is
>> defined. But this is manageable.
>
> Are you suggesting that we could arrange to have Guile’s ‘main’ call
> setlocale(LC_ALL, "") while still giving Scheme code the impression that
> it’s started under the C locale as is currently the case?
>
> Just adding setlocale(LC_ALL, "") in Guile’s ‘main’ would be an
> incompatible change, which would break Scheme applications relying on
> the current behavior–e.g., applications intended to be all-English.
>
> A reasonable option would be to setlocale(LC_CTYPE, "") from Guile’s
> ‘main’, so that scm_from_locale_string & co. would DTRT.  But again that
> would change the value of %default-port-encoding, leading to potential
> application breakage.
>
> Thanks,
> Ludo’.
>
>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Accessing the environment's locale encoding settings
  2011-11-16 16:11     ` Noah Lavine
@ 2011-11-16 16:32       ` Peter Brett
  2011-11-18 22:17         ` Mark H Weaver
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Brett @ 2011-11-16 16:32 UTC (permalink / raw)
  To: guile-devel

Noah Lavine <noah.b.lavine@gmail.com> writes:

> It seems like the right thing to do might be to do setlocale(LC_ALL,
> "") in Guile's main(). Let me argue that this accomplishes two goals
> which we want to accomplish

That seems entirely reasonable to me, as long as libguile users can
still set a non-environment locale before first entering Guile mode and
have libguile respect that.

However, I'd still be concerned about the case where filenames are
passed on the command line in an encoding that differs from the
environment encoding.  A few months ago, Mark mentioned that he was
looking at "safe" encodings for filenames.  I wonder if that's relevant
to command-line argument processing?

                            Peter

-- 
Peter Brett <peter@peter-b.co.uk>
Remote Sensing Research Group
Surrey Space Centre




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Accessing the environment's locale encoding settings
  2011-11-16 16:32       ` Peter Brett
@ 2011-11-18 22:17         ` Mark H Weaver
  0 siblings, 0 replies; 16+ messages in thread
From: Mark H Weaver @ 2011-11-18 22:17 UTC (permalink / raw)
  To: Peter Brett; +Cc: guile-devel

Noah Lavine <noah.b.lavine@gmail.com> writes:
> It seems like the right thing to do might be to do setlocale(LC_ALL,
> "") in Guile's main(). Let me argue that this accomplishes two goals
> which we want to accomplish

I agree wholeheartedly; this should be the default behavior.  In the
rare cases where the user needs to avoid calling setlocale, they can
provide a different main().

Regarding backward compatibility: the current behavior already breaks
it.  For most practical purposes, Guile 1.8 effectively accepts
command-line arguments in the current locale encoding (although it is
certainly broken with regard to its handling of character indices within
such strings).  Guile 2.0 currently accepts only ASCII.

I don't see any sane way to preserve 100% backward compatibility with
the broken behavior of 1.8.  It seems to me that calling setlocale in
Guile's main() is not only the right thing moving forward, but also the
most backward-compatible option that is reasonably sane.

Peter Brett <peter@peter-b.co.uk> writes:
> That seems entirely reasonable to me, as long as libguile users can
> still set a non-environment locale before first entering Guile mode and
> have libguile respect that.

Setlocale would only be called within main() of the Guile executable.
It would never be automatically called within libguile, so any program
that contains its own main() in C would be responsible for calling
setlocale (if it chooses to).

> However, I'd still be concerned about the case where filenames are
> passed on the command line in an encoding that differs from the
> environment encoding.  A few months ago, Mark mentioned that he was
> looking at "safe" encodings for filenames.  I wonder if that's relevant
> to command-line argument processing?

I apologize for dragging my feet on this for so long.  A few months ago
I argued forcefully for a particular approach to this thorny issue, and
then later had second thoughts.  It is important to get this right,
because we will have to live with our chosen approach for a long time.
I hope to write another email on this subject soon.

     Mark



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Accessing the environment's locale encoding settings
  2011-11-16 10:35   ` Ludovic Courtès
  2011-11-16 16:11     ` Noah Lavine
@ 2011-11-20 16:55     ` Bruno Haible
  2011-11-20 17:41       ` Ludovic Courtès
  1 sibling, 1 reply; 16+ messages in thread
From: Bruno Haible @ 2011-11-20 16:55 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel

Ludovic Courtès wrote:
> It seems to me that the implicit call is often desirable, but at the
> same time, it imposes a policy on the application.  In C, Guile, & co.,
> the application can choose to ignore the locale, or to just honor
> LC_CTYPE, or to set something different.

What it actually does, then, is that it uses a mix of the user-provided
locale (for some locale categories) and the "C" locale (for some other
locale categories. When the default category is the user-provided one,
rather than "C", it is still possible, it just needs different code to
do so.

Instead of
    (setlocale LC_CTYPE "")
the user will now need to write
    (setlocale LC_NUMERIC "C")
    (setlocale LC_MONETARY "C")
    (setlocale LC_COLLATE "C")
    (setlocale LC_MESSAGES "C")
    (setlocale LC_TIME "C")

> > So my suggestion is to do (setlocale LC_ALL "") as part of the Guile
> > initialization, very early. Yes, this might lead to some complexity
> > in the Guile implementation if you have the concept of locale also at
> > the Guile level and need to make sure that the locale at the C level and
> > the locale at the Guile level are consistent as soon as the latter is
> > defined. But this is manageable.
> 
> Are you suggesting that we could arrange to have Guile’s ‘main’ call
> setlocale(LC_ALL, "") while still giving Scheme code the impression that
> it’s started under the C locale as is currently the case?

No, I'm suggesting to let the Scheme code know that is it using the user's
locale.

Yes, this is a backward-incompatible change, so probably you won't want to
do it on the guile 2.0.x branch, and you will want to advertise it in the
release notes or NEWS file.

> A reasonable option would be to setlocale(LC_CTYPE, "") from Guile’s
> ‘main’, so that scm_from_locale_string & co. would DTRT.  But again that
> would change the value of %default-port-encoding, leading to potential
> application breakage.

In the long run, this would be more complex: users who see that LC_CTYPE
from the environment is respected would also expect that LC_COLLATE etc.
from the environment is respected.

I would just do the change and make it clear that old applications that
really want to stay in the C locale should call
  (setlocale LC_ALL "C")

Bruno
-- 
In memoriam Kerem Yılmazer <http://en.wikipedia.org/wiki/Kerem_Yılmazer>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Accessing the environment's locale encoding settings
  2011-11-20 16:55     ` Bruno Haible
@ 2011-11-20 17:41       ` Ludovic Courtès
  2011-11-20 19:44         ` Mike Gran
  2011-11-20 20:12         ` Bruno Haible
  0 siblings, 2 replies; 16+ messages in thread
From: Ludovic Courtès @ 2011-11-20 17:41 UTC (permalink / raw)
  To: Bruno Haible; +Cc: guile-devel

Hi Bruno,

Bruno Haible <bruno@clisp.org> skribis:

> No, I'm suggesting to let the Scheme code know that is it using the user's
> locale.
>
> Yes, this is a backward-incompatible change, so probably you won't want to
> do it on the guile 2.0.x branch, and you will want to advertise it in the
> release notes or NEWS file.

I’m now convinced that an implicit setlocale(LC_ALL, "") is the right
thing for ‘master’.

For 2.0, though, this brings us back to the hack I proposed at the
beginning of this thread, namely trying to honor LC_CTYPE without
actually calling setlocale, so that command-line arguments are suitably
converted.

Could Gnulib’s get_charset_aliases be exported?

Thanks,
Ludo’.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Accessing the environment's locale encoding settings
  2011-11-20 17:41       ` Ludovic Courtès
@ 2011-11-20 19:44         ` Mike Gran
  2011-11-23 23:28           ` Ludovic Courtès
  2011-11-20 20:12         ` Bruno Haible
  1 sibling, 1 reply; 16+ messages in thread
From: Mike Gran @ 2011-11-20 19:44 UTC (permalink / raw)
  To: Ludovic Courtès, Bruno Haible; +Cc: guile-devel@gnu.org

>From: Ludovic Courtès <ludo@gnu.org>
>>Bruno Haible <bruno@clisp.org> skribis:
>
>> No, I'm suggesting to let the Scheme code know that is it using the user's
>> locale.
>>
>> Yes, this is a backward-incompatible change, so probably you won't want to
>> do it on the guile 2.0.x branch, and you will want to advertise it in the
>> release notes or NEWS file.
>
>I’m now convinced that an implicit setlocale(LC_ALL, "") is the right
>thing for ‘master’.
>
>For 2.0, though, this brings us back to the hack I proposed at the
>beginning of this thread, namely trying to honor LC_CTYPE without
>actually calling setlocale, so that command-line arguments are suitably
>converted.

I probably shouldn't express an opinion on this, since my SLOC count is so
low these days, but, I'll do it anyway. ;-)
 
As I am snarkily wont to say, everyone wants Unicode but no one really uses
it.  It you were to make that change in 2.0.x, I can almost guarantee that
no one will be adversely affected.  2.0.x programs that are already trying to
use locale will just have a redundant call to setlocale. 2.0.x programs that
ignore locale are de facto C locale anyway.  The scripts that would be at risk
are those that try to read binary data as string input instead of binary input.
 
Here's a suggestion.  One could add an option to the guile interpreter's command
line args (--locale=ARG perhaps) that has the effect of calling
setlocale(LC_ALL,"ARG") first thing.  If --locale is called with no ARG
specified, it would call to setlocale(LC_ALL, "").
 
That way, people could start future-proofing their code now by adding
--locale or --locale=C to the top of their scripts.  For now, the default can
be, in effect, --locale=C and for 2.2 it can be --locale.
 
That would slightly complicate getopt processing, since you'd have to scan for
the --locale before processing other getopt arguments.
 
-Mike



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Accessing the environment's locale encoding settings
  2011-11-20 17:41       ` Ludovic Courtès
  2011-11-20 19:44         ` Mike Gran
@ 2011-11-20 20:12         ` Bruno Haible
  2011-12-15  0:41           ` Ludovic Courtès
  1 sibling, 1 reply; 16+ messages in thread
From: Bruno Haible @ 2011-11-20 20:12 UTC (permalink / raw)
  To: Ludovic Courtès, bug-gnulib; +Cc: guile-devel

[CCing bug-gnulib. This is a reply to
 <http://lists.gnu.org/archive/html/guile-devel/2011-11/msg00053.html>].

Hi Ludovic,

> I’m now convinced that an implicit setlocale(LC_ALL, "") is the right
> thing for ‘master’.

Good, glad that I could help with my opinion :)

> For 2.0, though, this brings us back to the hack I proposed at the
> beginning of this thread, namely trying to honor LC_CTYPE without
> actually calling setlocale, so that command-line arguments are suitably
> converted.
> 
> Could Gnulib’s get_charset_aliases be exported?

As documented in [1], not every possible code modification is suitable for
gnulib proper. In this case, in particular, I don't think that code
that returns the locale encoding _if_ setlocale (LC_CTYPE, "") had been
called, without really calling it, is a frequent enough use-case.

If I were you, I would start using the gnulib-tool option --local-dir
with a local modification of the 'localcharset' module, as documented in [1].
This means:

  1) Hack your local copy of localcharset.c so that it not only defines
     the locale_charset() function, but also an additional function
     environ_locale_charset() that looks only at the environment variables.

  2) Store this file in guile:

       $ cp ludo-localcharset.c guile/gnulib-local/lib/localcharset.c

     or if there are few changes just the differences:

       $ diff -u gnulib/lib/localcharset.c ludo-localcharset.c \
         > guile/gnulib-local/lib/localcharset.c.diff

  3) Pass the option
        --local-dir gnulib-local 
     to the gnulib-tool invocation in autogen.sh.

Bruno

[1] http://www.gnu.org/s/hello/manual/gnulib/Extending-Gnulib.html
-- 
In memoriam Kerem Yılmazer <http://en.wikipedia.org/wiki/Kerem_Yılmazer>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Accessing the environment's locale encoding settings
  2011-11-20 19:44         ` Mike Gran
@ 2011-11-23 23:28           ` Ludovic Courtès
  2011-11-24  4:42             ` Mike Gran
                               ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Ludovic Courtès @ 2011-11-23 23:28 UTC (permalink / raw)
  To: Mike Gran; +Cc: Bruno Haible, guile-devel@gnu.org

[-- Attachment #1: Type: text/plain, Size: 375 bytes --]

Hi Mike,

Mike Gran <spk121@yahoo.com> skribis:

> Here's a suggestion.  One could add an option to the guile interpreter's command
> line args (--locale=ARG perhaps) that has the effect of calling
> setlocale(LC_ALL,"ARG") first thing.  If --locale is called with no ARG
> specified, it would call to setlocale(LC_ALL, "").

I tried something along these lines:


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-patch, Size: 1446 bytes --]

diff -ubB --show-c-function /home/ludo/src/guile/libguile/script.c /home/ludo/src/guile/libguile/script.c.locale.bak
--- guile/libguile/script.c	2011-11-21 21:41:02.000000000 +0100
+++ guile/libguile/script.c.locale.bak	2011-11-21 21:41:00.000000000 +0100
@@ -26,6 +26,7 @@
 #include <stdio.h>
 #include <errno.h>
 #include <ctype.h>
+#include <locale.h>
 
 #include "libguile/_scm.h"
 #include "libguile/eval.h"
@@ -369,6 +370,15 @@ scm_shell_usage (int fatal, char *messag
 }
 
 
+static int
+terminating_argument (const char *arg)
+{
+  return (strcmp (arg, "--") == 0
+	  || strcmp (arg, "-c") == 0
+	  || strcmp (arg, "-ds") == 0
+	  || strcmp (arg, "-s") == 0);
+}
+
 /* Given an array of command-line switches, return a Scheme expression
    to carry out the actions specified by the switches.
  */
@@ -376,6 +386,22 @@ scm_shell_usage (int fatal, char *messag
 SCM
 scm_compile_shell_switches (int argc, char **argv)
 {
+  int i;
+
+  for (i = 0; i < argc && !terminating_argument (argv[i]); i++)
+    {
+      if (strncmp (argv[i], "--locale", sizeof "--locale") == 0)
+	{
+	  const char *equal;
+
+	  equal = strchr (argv[i], '=');
+	  if (equal != NULL)
+	    setlocale (LC_ALL, &argv[i][equal + 1]);
+	  else
+	    setlocale (LC_ALL, "");
+	}
+    }
+
   return scm_call_2 (scm_c_public_ref ("ice-9 command-line",
                                        "compile-shell-switches"),
                      scm_makfromstrs (argc, argv),


[-- Attachment #3: Type: text/plain, Size: 310 bytes --]


WDYT?

It’s not completely satisfying either because --locale is not in
2.0.[0-3], so users who really need it will need some configury;
furthermore, from 2.2.x on, it will be mostly unneeded.

Yet, a choice has to be made between this hack and the other one.  :-)

Thoughts?

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Accessing the environment's locale encoding settings
  2011-11-23 23:28           ` Ludovic Courtès
@ 2011-11-24  4:42             ` Mike Gran
  2011-11-24 13:16             ` Peter Brett
  2011-11-25  2:11             ` Mark H Weaver
  2 siblings, 0 replies; 16+ messages in thread
From: Mike Gran @ 2011-11-24  4:42 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Bruno Haible, guile-devel@gnu.org

> From: Ludovic Courtès <ludo@gnu.org>

>>  Here's a suggestion.  One could add an option to the guile 
> interpreter's command
>>  line args (--locale=ARG perhaps) that has the effect of calling
>>  setlocale(LC_ALL,"ARG") first thing.  If --locale is called 
> with no ARG
>>  specified, it would call to setlocale(LC_ALL, "").
> 
> I tried something along these lines:

Looks good to me, but, I didn't much chance to test it.

A basic check would be if

   guile --locale -c "(write 'キ)" | wc

outputs 3 characters or 1.

One big check would be if we can run a script with a non-ASCII
file name -- which would be in the system's locale encoding.

That script could contain code, and a "coding:" declaration, in
a different non-ASCII encoding, just to make it fun.

-Mike




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Accessing the environment's locale encoding settings
  2011-11-23 23:28           ` Ludovic Courtès
  2011-11-24  4:42             ` Mike Gran
@ 2011-11-24 13:16             ` Peter Brett
  2011-11-25  2:11             ` Mark H Weaver
  2 siblings, 0 replies; 16+ messages in thread
From: Peter Brett @ 2011-11-24 13:16 UTC (permalink / raw)
  To: guile-devel

ludo@gnu.org (Ludovic Courtès) writes:

> Hi Mike,
>
> Mike Gran <spk121@yahoo.com> skribis:
>
>> Here's a suggestion.  One could add an option to the guile
>> interpreter's command line args (--locale=ARG perhaps) that has the
>> effect of calling setlocale(LC_ALL,"ARG") first thing.  If --locale
>> is called with no ARG specified, it would call to setlocale(LC_ALL,
>> "").
>
> I tried something along these lines:
>

My main objection is that I don't know of any other interpreter that
finds it necessary to have a --locale switch (Perl and Python both
manage without quite comfortably).  How is:

  guile --locale="foobar"

Better than:

  LANG="foobar" guile  ?

                               Peter

-- 
Peter Brett <peter@peter-b.co.uk>
Remote Sensing Research Group
Surrey Space Centre




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Accessing the environment's locale encoding settings
  2011-11-23 23:28           ` Ludovic Courtès
  2011-11-24  4:42             ` Mike Gran
  2011-11-24 13:16             ` Peter Brett
@ 2011-11-25  2:11             ` Mark H Weaver
  2011-12-15 19:08               ` Ludovic Courtès
  2 siblings, 1 reply; 16+ messages in thread
From: Mark H Weaver @ 2011-11-25  2:11 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Bruno Haible, guile-devel

ludo@gnu.org (Ludovic Courtès) writes:
> It’s not completely satisfying either because --locale is not in
> 2.0.[0-3], so users who really need it will need some configury;
> furthermore, from 2.2.x on, it will be mostly unneeded.
>
> Yet, a choice has to be made between this hack and the other one.  :-)
>
> Thoughts?

I like the --locale argument, except that the default (if there is no
--locale argument) should be to either (A) call setlocale(LC_ALL, "") or
(B) do something like the other hack, where the arguments are decoded as
if the locale had been temporarily set according to the environment
variables.

My strong preference would be Option A.

If you are trying to maintain backward compatibility, keep in mind that
for most practical purposes, Guile 1.8 acts closer to option A than the
other options.  Furthermore, Option A is the Right Thing moving forward,
and matches what Guile 2.2 will do.

If we accept only ASCII arguments by default, then _most_ Guile scripts
will need to add the --locale argument, but only for versions after
2.0.3.  More likely, many authors won't bother with this ugliness, and
their scripts will be broken for non-ASCII locales.

If we choose option A, then 2.0.[0123] simply have a bug that was fixed
in 2.0.4, and apart from those versions, things will mostly work the
right way by default, and fairly close to how Guile 1.8 worked.

Also, I see a problem with your code:

> @@ -376,6 +386,22 @@ scm_shell_usage (int fatal, char *messag
>  SCM
>  scm_compile_shell_switches (int argc, char **argv)
>  {
> +  int i;
> +
> +  for (i = 0; i < argc && !terminating_argument (argv[i]); i++)
> +    {
> +      if (strncmp (argv[i], "--locale", sizeof "--locale") == 0)

(sizeof "--locale") is 9, not 8.  It includes the NULL terminator, so
this test will not recognize "--locale=...", as you apparently intended
based on the code that follows.

    Best,
     Mark


> +	{
> +	  const char *equal;
> +
> +	  equal = strchr (argv[i], '=');
> +	  if (equal != NULL)
> +	    setlocale (LC_ALL, &argv[i][equal + 1]);
> +	  else
> +	    setlocale (LC_ALL, "");
> +	}
> +    }
> +
>    return scm_call_2 (scm_c_public_ref ("ice-9 command-line",
>                                         "compile-shell-switches"),
>                       scm_makfromstrs (argc, argv),



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Accessing the environment's locale encoding settings
  2011-11-20 20:12         ` Bruno Haible
@ 2011-12-15  0:41           ` Ludovic Courtès
  0 siblings, 0 replies; 16+ messages in thread
From: Ludovic Courtès @ 2011-12-15  0:41 UTC (permalink / raw)
  To: Bruno Haible; +Cc: bug-gnulib, guile-devel

Hi,

Since adding an implicit setlocale(LC_ALL, "") cannot be done in Guile
2.0 (it would change the default port encoding as a side effect, etc.),
I committed something similar to the hack proposed in [0].

Bruno Haible <bruno@clisp.org> skribis:

> If I were you, I would start using the gnulib-tool option --local-dir
> with a local modification of the 'localcharset' module, as documented in [1].

I followed your suggestion:

  http://git.sv.gnu.org/cgit/guile.git/tree/gnulib-local/lib?h=stable-2.0

Thanks!

Ludo’.

[0] http://lists.gnu.org/archive/html/guile-devel/2011-11/msg00039.html



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Accessing the environment's locale encoding settings
  2011-11-25  2:11             ` Mark H Weaver
@ 2011-12-15 19:08               ` Ludovic Courtès
  0 siblings, 0 replies; 16+ messages in thread
From: Ludovic Courtès @ 2011-12-15 19:08 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: Bruno Haible, guile-devel

Hi,

Apologies for not answering earlier, and for making changes without
continuing this discussion.  So here’s an after-the-fact reply.

Mark H Weaver <mhw@netris.org> skribis:

> ludo@gnu.org (Ludovic Courtès) writes:
>> It’s not completely satisfying either because --locale is not in
>> 2.0.[0-3], so users who really need it will need some configury;
>> furthermore, from 2.2.x on, it will be mostly unneeded.
>>
>> Yet, a choice has to be made between this hack and the other one.  :-)
>>
>> Thoughts?
>
> I like the --locale argument, except that the default (if there is no
> --locale argument) should be to either (A) call setlocale(LC_ALL, "") or
> (B) do something like the other hack, where the arguments are decoded as
> if the locale had been temporarily set according to the environment
> variables.
>
> My strong preference would be Option A.

The problem I see with changing the default to call setlocale(LC_ALL,
"") is that %default-port-encoding would no longer match the current
locale, as is currently the case.  Or we’d have to change its default
value.  But I thought this would be too disruptive in either case, and
thus I stick to Option B.

> If you are trying to maintain backward compatibility, keep in mind that
> for most practical purposes, Guile 1.8 acts closer to option A than the
> other options.  Furthermore, Option A is the Right Thing moving forward,
> and matches what Guile 2.2 will do.
>
> If we accept only ASCII arguments by default, then _most_ Guile scripts
> will need to add the --locale argument, but only for versions after
> 2.0.3.  More likely, many authors won't bother with this ugliness, and
> their scripts will be broken for non-ASCII locales.

Yes, I thought about that too, and thought it counted against Option A.

So the current situation is:

  • In 2.0, one still has to call ‘setlocale’ implicitly, but
    command-line arguments are converted from the locale encoding
    suggested by the environment variables (commit
    ed4c3739668b4b111b38555b8bc101cb74c87c1c).

  • In 2.1, the ‘setlocale’ call is implicit when the ‘guile’ executable
    is used (commit bb9b357e2b98cea82f844115ee5f66704bb6a1c8).  The
    %default-port-encoding value is still Latin-1, but presumably that
    would need to be changed to the current locale encoding.  Thoughts?

Nothing’s set in stone and I’m glad to revisit the situation.

Thanks,
Ludo’.



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2011-12-15 19:08 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-16  0:13 Accessing the environment's locale encoding settings Ludovic Courtès
2011-11-16  2:00 ` Bruno Haible
2011-11-16 10:35   ` Ludovic Courtès
2011-11-16 16:11     ` Noah Lavine
2011-11-16 16:32       ` Peter Brett
2011-11-18 22:17         ` Mark H Weaver
2011-11-20 16:55     ` Bruno Haible
2011-11-20 17:41       ` Ludovic Courtès
2011-11-20 19:44         ` Mike Gran
2011-11-23 23:28           ` Ludovic Courtès
2011-11-24  4:42             ` Mike Gran
2011-11-24 13:16             ` Peter Brett
2011-11-25  2:11             ` Mark H Weaver
2011-12-15 19:08               ` Ludovic Courtès
2011-11-20 20:12         ` Bruno Haible
2011-12-15  0:41           ` Ludovic Courtès

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).