unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* Accessing the environment's locale encoding settings
@ 2011-11-16  0:13 Ludovic Courtès
  2011-11-16  2:00 ` Bruno Haible
  0 siblings, 1 reply; 16+ messages in thread
From: Ludovic Courtès @ 2011-11-16  0:13 UTC (permalink / raw)
  To: Bruno Haible; +Cc: bug-libunistring, guile-devel

[-- Attachment #1: Type: text/plain, Size: 848 bytes --]

Hi Bruno,

In Guile, strings coming from the C world are assumed to be encoded in
the current locale encoding.  Like in C, the current locale is set using
‘setlocale’, and it’s up to the user to write (setlocale LC_ALL "") to
set the locale according to the relevant environment variables.

The problem comes with command-line arguments: the user hasn’t yet had a
chance to call ‘setlocale’, yet they most likely have to be converted
from locale encoding.  Up to 2.0.3, they were instead assumed to be
ASCII, and we’re looking into fixing it [0].

The trick we came up with is to look at $LANG, $LC_ALL, & co. and try to
determine what the locale encoding “would be” if (setlocale LC_ALL "")
were called [1].

To do that, I use a variant of ‘localecharset’ in Gnulib, with a
special-case for the C locale:


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-csrc, Size: 1403 bytes --]

/* Return the name of the locale encoding suggested by environment
   variables, even if it's not current, or NULL if no encoding is
   defined.  Based on Gnulib's `localcharset.c'.  */
static const char *
locale_encoding (void)
{
  static char buf[2 + 10 + 1];
  const char *locale, *codeset = NULL;

  /* Allow user to override the codeset, as set in the operating system,
     with standard language environment variables.  */
  locale = getenv ("LC_ALL");
  if (locale == NULL || locale[0] == '\0')
    {
      locale = getenv ("LC_CTYPE");
      if (locale == NULL || locale[0] == '\0')
        locale = getenv ("LANG");
    }
  if (locale != NULL && locale[0] != '\0')
    {
      /* If the locale name contains an encoding after the dot, return it.  */
      const char *dot = strchr (locale, '.');

      if (dot != NULL)
        {
          const char *modifier;

          dot++;
          /* Look for the possible @... trailer and remove it, if any.  */
          modifier = strchr (dot, '@');
          if (modifier == NULL)
            return dot;
          if (modifier - dot < sizeof (buf))
            {
              memcpy (buf, dot, modifier - dot);
              buf [modifier - dot] = '\0';
              return buf;
            }
        }
      else if (strcmp (locale, "C") == 0)
	{
	  strcpy (buf, "ASCII");
	  return buf;
	}

      codeset = locale;
    }

  return codeset;
}

[-- Attachment #3: Type: text/plain, Size: 307 bytes --]


What do you think of this approach?

Should we be checking for charset aliases?  If so, we’d need help from
Gnulib since ‘get_charset_aliases’ is internal.

Thanks,
Ludo’.

[0] http://thread.gmane.org/gmane.lisp.guile.devel/12887
[1] http://thread.gmane.org/gmane.lisp.guile.devel/12900

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2011-12-15 19:08 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-16  0:13 Accessing the environment's locale encoding settings Ludovic Courtès
2011-11-16  2:00 ` Bruno Haible
2011-11-16 10:35   ` Ludovic Courtès
2011-11-16 16:11     ` Noah Lavine
2011-11-16 16:32       ` Peter Brett
2011-11-18 22:17         ` Mark H Weaver
2011-11-20 16:55     ` Bruno Haible
2011-11-20 17:41       ` Ludovic Courtès
2011-11-20 19:44         ` Mike Gran
2011-11-23 23:28           ` Ludovic Courtès
2011-11-24  4:42             ` Mike Gran
2011-11-24 13:16             ` Peter Brett
2011-11-25  2:11             ` Mark H Weaver
2011-12-15 19:08               ` Ludovic Courtès
2011-11-20 20:12         ` Bruno Haible
2011-12-15  0:41           ` Ludovic Courtès

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).