From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: ludovic.courtes@laas.fr (Ludovic =?iso-8859-1?Q?Court=E8s?=) Newsgroups: gmane.lisp.guile.devel Subject: Re: SRFI-14 and locale settings Date: Thu, 14 Sep 2006 15:22:48 +0200 Organization: LAAS-CNRS Message-ID: <87ac52d1lj.fsf@laas.fr> References: <87y7t03ngn.fsf@laas.fr> <87slj89lrk.fsf@ossau.uklinux.net> <87wt8krocj.fsf@laas.fr> <87odtvkxl1.fsf@zip.com.au> <87r6yodtv3.fsf@laas.fr> <87ejun5kj7.fsf@zip.com.au> <877j095t91.fsf@laas.fr> <87fyevs42r.fsf@zip.com.au> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: sea.gmane.org 1158240205 24257 80.91.229.2 (14 Sep 2006 13:23:25 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Thu, 14 Sep 2006 13:23:25 +0000 (UTC) Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Thu Sep 14 15:23:20 2006 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1GNrBQ-0005fx-KW for guile-devel@m.gmane.org; Thu, 14 Sep 2006 15:23:13 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1GNrBQ-0003Vw-1O for guile-devel@m.gmane.org; Thu, 14 Sep 2006 09:23:12 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1GNrBK-0003Sz-8P for guile-devel@gnu.org; Thu, 14 Sep 2006 09:23:06 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1GNrBJ-0003SS-MN for guile-devel@gnu.org; Thu, 14 Sep 2006 09:23:05 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1GNrBJ-0003SA-FY for guile-devel@gnu.org; Thu, 14 Sep 2006 09:23:05 -0400 Original-Received: from [140.93.0.15] (helo=laas.laas.fr) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA:32) (Exim 4.52) id 1GNrDG-0002GH-5G for guile-devel@gnu.org; Thu, 14 Sep 2006 09:25:06 -0400 Original-Received: by laas.laas.fr (8.13.7/8.13.4) with SMTP id k8EDN0eA006642; Thu, 14 Sep 2006 15:23:02 +0200 (CEST) Original-To: guile-devel@gnu.org X-URL: http://www.laas.fr/~lcourtes/ X-Revolutionary-Date: 28 Fructidor an 214 de la =?iso-8859-1?Q?R=E9volutio?= =?iso-8859-1?Q?n?= X-PGP-Key-ID: 0xEB1F5364 X-PGP-Key: http://www.laas.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 821D 815D 902A 7EAB 5CEE D120 7FBA 3D4F EB1F 5364 X-OS: powerpc-unknown-linux-gnu Mail-Followup-To: guile-devel@gnu.org In-Reply-To: <87fyevs42r.fsf@zip.com.au> (Kevin Ryde's message of "Thu, 14 Sep 2006 10:07:56 +1000") User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/21.4 (gnu/linux) X-Spam-Score: 0.496 () MAILTO_TO_SPAM_ADDR X-Scanned-By: MIMEDefang at CNRS-LAAS X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0b5 (laas.laas.fr [140.93.0.15]); Thu, 14 Sep 2006 15:23:02 +0200 (CEST) X-MIME-Autoconverted: from 8bit to quoted-printable by laas.laas.fr id k8EDN0eA006642 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:6088 Archived-At: Hi, Kevin Ryde writes: > ludovic.courtes@laas.fr (Ludovic Court=E8s) writes: >> >> An example to illustrate what >> I was trying to say: Both French and Castellano can be written using >> Latin-1; however, letter `=F1' (`n' with tilde) is not a French letter >> (thus, `isalpha ()' would return false with a Latin-1 `fr_FR' locale) > > In glibc fr_FR and es_ES have the same isalpha for all chars 0 to 255, > it appears to be a property of the charset, not the language or > location. Indeed: I tested the same thing yesterday evening to discover that. So my whole theory just seems to be falling apart! ;-) I did some research to try to understand whether this is a glibc-specific behavior, or whether this is made mandatory by some standard. Since I am not very knowledgeable about all these issues, I made a whole lot of discoveries. SUSv2 [0] explains that the `LC_CTYPE' category defines various character classes (Section 7.3.1), notably the `alpha' class, that are dependent on the "locale", without specifying whether they are dependent specifically on the language. On Debian GNU/Linux, the glibc-provided locale definition files are available under `/usr/share/i18n/locale'. Both the `fr_FR' and `es_ES' files contain a line, in the `LC_CTYPE' section, that reads this: copy "i18n" Actually, running the following command shows that a large number of locales (those for western languages) contain this line: $ grep -A1 '^LC_CTYPE' /usr/share/i18n/locales/*_* This "i18n" file contains a character classification definition (`LC_CTYPE' section) whose contents are defined in ISO 14652 [1] as part of a "generic" FDCC-set (Set of Formal Definitions of Cultural Conventions). The introduction to Section 4 of ISO 14652 reads this: This Technical Report also defines an FDCC-set named "i18n" with values for some of the above categories in order to simplify FDCC-set descriptions for a number of cultures. The contents of "i18n" categories should not necessarily be considered as the most commonly accepted values, while in many cases it could be the recommended values. The "i18n" character classification (listed in Section 4.3.2) is actually very broad: it considers at least all Latin, Greek and Cyrillic letters as part of the `alpha' character class. My understanding (take it with a grain of salt...) of the above quotation is that including "i18n" in various locales can be thought of as a good way to get things "roughly working" first; however, actual locale definitions could be refined to reflect more "commonly accepted values". So, for instance, one could refine the `LC_CTYPE' section of glibc's `fr_FR' locale definition to make sure it only includes French letters. To summarize, using `isalpha ()' to determine the contents of `char-set:letter' will probably yield correct results on most platforms, at least on current glibc-based systems. However, it seems that it is "theoretically" incorrect, in that character classes are language-dependent. Therefore, explicitly listing all Latin-1 letters in `srfi-14.c' as Neil suggested might be the best way. Thanks, Ludovic. [0] http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.htm= l [1] http://www.open-std.org/jtc1/sc22/wg20/docs/projects#14652 _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel