unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
From: ludo@gnu.org (Ludovic Courtès)
To: Bruno Haible <bruno@clisp.org>
Cc: bug-libunistring@gnu.org, guile-devel@gnu.org
Subject: Accessing the environment's locale encoding settings
Date: Wed, 16 Nov 2011 01:13:51 +0100	[thread overview]
Message-ID: <877h30exfk.fsf@gnu.org> (raw)

[-- Attachment #1: Type: text/plain, Size: 848 bytes --]

Hi Bruno,

In Guile, strings coming from the C world are assumed to be encoded in
the current locale encoding.  Like in C, the current locale is set using
‘setlocale’, and it’s up to the user to write (setlocale LC_ALL "") to
set the locale according to the relevant environment variables.

The problem comes with command-line arguments: the user hasn’t yet had a
chance to call ‘setlocale’, yet they most likely have to be converted
from locale encoding.  Up to 2.0.3, they were instead assumed to be
ASCII, and we’re looking into fixing it [0].

The trick we came up with is to look at $LANG, $LC_ALL, & co. and try to
determine what the locale encoding “would be” if (setlocale LC_ALL "")
were called [1].

To do that, I use a variant of ‘localecharset’ in Gnulib, with a
special-case for the C locale:


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-csrc, Size: 1403 bytes --]

/* Return the name of the locale encoding suggested by environment
   variables, even if it's not current, or NULL if no encoding is
   defined.  Based on Gnulib's `localcharset.c'.  */
static const char *
locale_encoding (void)
{
  static char buf[2 + 10 + 1];
  const char *locale, *codeset = NULL;

  /* Allow user to override the codeset, as set in the operating system,
     with standard language environment variables.  */
  locale = getenv ("LC_ALL");
  if (locale == NULL || locale[0] == '\0')
    {
      locale = getenv ("LC_CTYPE");
      if (locale == NULL || locale[0] == '\0')
        locale = getenv ("LANG");
    }
  if (locale != NULL && locale[0] != '\0')
    {
      /* If the locale name contains an encoding after the dot, return it.  */
      const char *dot = strchr (locale, '.');

      if (dot != NULL)
        {
          const char *modifier;

          dot++;
          /* Look for the possible @... trailer and remove it, if any.  */
          modifier = strchr (dot, '@');
          if (modifier == NULL)
            return dot;
          if (modifier - dot < sizeof (buf))
            {
              memcpy (buf, dot, modifier - dot);
              buf [modifier - dot] = '\0';
              return buf;
            }
        }
      else if (strcmp (locale, "C") == 0)
	{
	  strcpy (buf, "ASCII");
	  return buf;
	}

      codeset = locale;
    }

  return codeset;
}

[-- Attachment #3: Type: text/plain, Size: 307 bytes --]


What do you think of this approach?

Should we be checking for charset aliases?  If so, we’d need help from
Gnulib since ‘get_charset_aliases’ is internal.

Thanks,
Ludo’.

[0] http://thread.gmane.org/gmane.lisp.guile.devel/12887
[1] http://thread.gmane.org/gmane.lisp.guile.devel/12900

             reply	other threads:[~2011-11-16  0:13 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-16  0:13 Ludovic Courtès [this message]
2011-11-16  2:00 ` Accessing the environment's locale encoding settings Bruno Haible
2011-11-16 10:35   ` Ludovic Courtès
2011-11-16 16:11     ` Noah Lavine
2011-11-16 16:32       ` Peter Brett
2011-11-18 22:17         ` Mark H Weaver
2011-11-20 16:55     ` Bruno Haible
2011-11-20 17:41       ` Ludovic Courtès
2011-11-20 19:44         ` Mike Gran
2011-11-23 23:28           ` Ludovic Courtès
2011-11-24  4:42             ` Mike Gran
2011-11-24 13:16             ` Peter Brett
2011-11-25  2:11             ` Mark H Weaver
2011-12-15 19:08               ` Ludovic Courtès
2011-11-20 20:12         ` Bruno Haible
2011-12-15  0:41           ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877h30exfk.fsf@gnu.org \
    --to=ludo@gnu.org \
    --cc=bruno@clisp.org \
    --cc=bug-libunistring@gnu.org \
    --cc=guile-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).