From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: ludo@gnu.org (Ludovic =?iso-8859-1?Q?Court=E8s?=) Newsgroups: gmane.comp.gnu.libunistring.bugs,gmane.lisp.guile.devel Subject: Accessing the environment's locale encoding settings Date: Wed, 16 Nov 2011 01:13:51 +0100 Message-ID: <877h30exfk.fsf@gnu.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Trace: dough.gmane.org 1321402450 24756 80.91.229.12 (16 Nov 2011 00:14:10 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Wed, 16 Nov 2011 00:14:10 +0000 (UTC) Cc: bug-libunistring@gnu.org, guile-devel@gnu.org To: Bruno Haible Original-X-From: bug-libunistring-bounces+gcglb-bug-libunistring=m.gmane.org@gnu.org Wed Nov 16 01:14:02 2011 Return-path: Envelope-to: gcglb-bug-libunistring@m.gmane.org Original-Received: from lists.gnu.org ([140.186.70.17]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1RQT8a-00020U-Hs for gcglb-bug-libunistring@m.gmane.org; Wed, 16 Nov 2011 01:14:00 +0100 Original-Received: from localhost ([::1]:44015 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RQT8Z-0000Oj-P9 for gcglb-bug-libunistring@m.gmane.org; Tue, 15 Nov 2011 19:13:59 -0500 Original-Received: from eggs.gnu.org ([140.186.70.92]:50283) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RQT8X-0000Oe-8s for bug-libunistring@gnu.org; Tue, 15 Nov 2011 19:13:58 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RQT8W-00087u-0T for bug-libunistring@gnu.org; Tue, 15 Nov 2011 19:13:57 -0500 Original-Received: from mail4-relais-sop.national.inria.fr ([192.134.164.105]:36443) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RQT8V-00087o-O4; Tue, 15 Nov 2011 19:13:55 -0500 X-IronPort-AV: E=Sophos;i="4.69,517,1315173600"; d="scan'208";a="119178241" Original-Received: from reverse-83.fdn.fr (HELO pluto) ([80.67.176.83]) by mail4-relais-sop.national.inria.fr with ESMTP/TLS/DHE-RSA-AES128-SHA; 16 Nov 2011 01:13:53 +0100 X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 26 Brumaire an 220 de la =?iso-8859-1?Q?R=E9volution?= X-PGP-Key-ID: 0xEA52ECF4 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 83C4 F8E5 10A3 3B4C 5BEA D15D 77DD 95E2 EA52 ECF4 X-OS: x86_64-unknown-linux-gnu User-Agent: Gnus/5.110018 (No Gnus v0.18) Emacs/24.0.90 (gnu/linux) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 192.134.164.105 X-BeenThere: bug-libunistring@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Bug reports for GNU libunistring List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-libunistring-bounces+gcglb-bug-libunistring=m.gmane.org@gnu.org Original-Sender: bug-libunistring-bounces+gcglb-bug-libunistring=m.gmane.org@gnu.org Xref: news.gmane.org gmane.comp.gnu.libunistring.bugs:85 gmane.lisp.guile.devel:12901 Archived-At: --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi Bruno, In Guile, strings coming from the C world are assumed to be encoded in the current locale encoding. Like in C, the current locale is set using =E2=80=98setlocale=E2=80=99, and it=E2=80=99s up to the user to write (setl= ocale LC_ALL "") to set the locale according to the relevant environment variables. The problem comes with command-line arguments: the user hasn=E2=80=99t yet = had a chance to call =E2=80=98setlocale=E2=80=99, yet they most likely have to be= converted from locale encoding. Up to 2.0.3, they were instead assumed to be ASCII, and we=E2=80=99re looking into fixing it [0]. The trick we came up with is to look at $LANG, $LC_ALL, & co. and try to determine what the locale encoding =E2=80=9Cwould be=E2=80=9D if (setlocale= LC_ALL "") were called [1]. To do that, I use a variant of =E2=80=98localecharset=E2=80=99 in Gnulib, w= ith a special-case for the C locale: --=-=-= Content-Type: text/x-csrc Content-Disposition: inline /* Return the name of the locale encoding suggested by environment variables, even if it's not current, or NULL if no encoding is defined. Based on Gnulib's `localcharset.c'. */ static const char * locale_encoding (void) { static char buf[2 + 10 + 1]; const char *locale, *codeset = NULL; /* Allow user to override the codeset, as set in the operating system, with standard language environment variables. */ locale = getenv ("LC_ALL"); if (locale == NULL || locale[0] == '\0') { locale = getenv ("LC_CTYPE"); if (locale == NULL || locale[0] == '\0') locale = getenv ("LANG"); } if (locale != NULL && locale[0] != '\0') { /* If the locale name contains an encoding after the dot, return it. */ const char *dot = strchr (locale, '.'); if (dot != NULL) { const char *modifier; dot++; /* Look for the possible @... trailer and remove it, if any. */ modifier = strchr (dot, '@'); if (modifier == NULL) return dot; if (modifier - dot < sizeof (buf)) { memcpy (buf, dot, modifier - dot); buf [modifier - dot] = '\0'; return buf; } } else if (strcmp (locale, "C") == 0) { strcpy (buf, "ASCII"); return buf; } codeset = locale; } return codeset; } --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable What do you think of this approach? Should we be checking for charset aliases? If so, we=E2=80=99d need help f= rom Gnulib since =E2=80=98get_charset_aliases=E2=80=99 is internal. Thanks, Ludo=E2=80=99. [0] http://thread.gmane.org/gmane.lisp.guile.devel/12887 [1] http://thread.gmane.org/gmane.lisp.guile.devel/12900 --=-=-=--