From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kevin Ryde Newsgroups: gmane.lisp.guile.devel Subject: Re: Text collation Date: Thu, 30 Nov 2006 10:08:03 +1100 Message-ID: <87fyc1df70.fsf@zip.com.au> References: <877j00cirs.fsf@laas.fr> <87hcz3mqhr.fsf@zip.com.au> <87r6x0qjyy.fsf@laas.fr> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: sea.gmane.org 1164841762 6096 80.91.229.2 (29 Nov 2006 23:09:22 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Wed, 29 Nov 2006 23:09:22 +0000 (UTC) Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Thu Nov 30 00:09:21 2006 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1GpYXo-0001Bp-Fp for guile-devel@m.gmane.org; Thu, 30 Nov 2006 00:08:49 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1GpYXo-0005gz-1M for guile-devel@m.gmane.org; Wed, 29 Nov 2006 18:08:48 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1GpYXk-0005gE-IM for guile-devel@gnu.org; Wed, 29 Nov 2006 18:08:44 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1GpYXh-0005f8-EQ for guile-devel@gnu.org; Wed, 29 Nov 2006 18:08:44 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1GpYXh-0005f5-Bk for guile-devel@gnu.org; Wed, 29 Nov 2006 18:08:41 -0500 Original-Received: from [61.8.2.231] (helo=mailout2.pacific.net.au) by monty-python.gnu.org with esmtp (Exim 4.52) id 1GpYXg-0003ES-Dl for guile-devel@gnu.org; Wed, 29 Nov 2006 18:08:41 -0500 Original-Received: from mailproxy1.pacific.net.au (mailproxy1.pacific.net.au [61.8.2.162]) by mailout2.pacific.net.au (Postfix) with ESMTP id 185E910A2A9 for ; Thu, 30 Nov 2006 10:08:34 +1100 (EST) Original-Received: from localhost (ppp25C0.dyn.pacific.net.au [61.8.37.192]) by mailproxy1.pacific.net.au (Postfix) with ESMTP id 1E8DC8C24 for ; Thu, 30 Nov 2006 10:08:31 +1100 (EST) Original-Received: from gg by localhost with local (Exim 4.63) (envelope-from ) id 1GpYX9-0007IC-Mw for guile-devel@gnu.org; Thu, 30 Nov 2006 10:08:07 +1100 Original-To: Guile-Devel Mail-Copies-To: never In-Reply-To: <87r6x0qjyy.fsf@laas.fr> (Ludovic =?iso-8859-1?Q?Court=E8s's?= message of "Sun, 22 Oct 2006 20:33:09 +0200") User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/21.4 (gnu/linux) X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:6266 Archived-At: ludovic.courtes@laas.fr (Ludovic Court=E8s) writes: > > I have come up with an `(ice-9 i18n)' module that contains > locale-dependent text collation functions and also character case > mapping and functions to read numbers. There would be a lot more things > to add, like `strfmon ()', but I think that's a good start. I would worry that r6rs may address these things too, leaving guile-specifics as, well, a dead-end. Though I can see this stuff is of use now. Myself I've been using a couple of bits from from localeconv and nl_langinfo. Some way to get at that would be a good addition (though hopefully in a cleaner way than the C level). > [0] http://sources.redhat.com/ml/libc-alpha/2006-09/msg00033.html You could stick that link and perhaps the tllocale.ps.gz one in i18n.c for reference, since it's not in the glibc manual. > +@node The ice-9 i18n Module See if you can think of a better section name. > +@deffn {Scheme Procedure} make-locale category_mask locale_name [base_lo= cale] > ... > +A @code{system-error} exception (@pxref{Handling Errors}) is raised by > +@code{make-locale} when @var{locale_name} does not match any of the > +locales compiled on the system. This bit could be moved to earlier in the description. And perhaps something non-committal like "locale_name must be known to the system". > +@deffn {Scheme Procedure} string-locale +@deffn {Scheme Procedure} string-locale>? s1 s2 [locale] > +@deffn {Scheme Procedure} string-locale-ci +@deffn {Scheme Procedure} string-locale-ci>? s1 s2 [locale] > +@deffn {Scheme Procedure} string-locale-ci=3D? s1 s2 [locale] These could be described in one block I think, to avoid five very similar descriptions. Likewise the char ones. > +... Note that SRFI-13 provides procedures that > +look similar (@pxref{Alphabetic Case Mapping}). However, the SRFI-13 > +procedures are locale-independent. That's the intention of the srfi I guess, but it's not true currently is it? Don't they use toupper() and therefore get whatever nonsense the current setlocale() gives. Perhaps better leave the description of srfi-13 to that section. > +@deffn {Scheme Procedure} string-locale-upcase str [locale] > +@deffn {Scheme Procedure} string-locale-downcase str [locale] Do you need a caveat about multibyte characters there, for now? Like "Note that in the current implementation Guile has no notion of multibyte characters and in a multibyte locale characters may not be converted correctly." > +@deffn {Scheme Procedure} locale-string->integer str [base [locale]] > +@deffn {Scheme Procedure} locale-string->inexact str [locale] I think you should cross-reference strtol and strtod here, since their parsing is rather idiosyncratic. I'd even be a bit tempted to name them strtol and strtod in guile, to make it clear they're only one possible way of parsing. Except those names aren't very nice ... > +... Return two values: Consider @pxref{Multiple Values}, since multi-values are (thankfully) fairly rare. > @c Local Variables: > @c TeX-master: "guile.texi" > +@c ispell-local-dictionary: "american" Best leave that out please, it'll only annoy those of us who don't have that dictionary installed. > +Note that @code{setlocale} affects locale settings for the whole > +process. For a safer, thread-safe and reentrant alternative, Go easy on the advertising! :) > - scmconfig.h.top gettext.h > + scmconfig.h.top libgettext.h I don't think that's good. Best leave gettext.h the gettext one, and use another name for guile. Gettext got there first, and it doesn't really matter which guile header has which prototypes. > +/* This mutex is used to serialize invocations of `setlocale ()' on non-= GNU > + systems (i.e., systems where a reentrant locale API is not available). > + See `i18n.c' for details. */ > +scm_i_pthread_mutex_t scm_i_locale_mutex; There's an scm_i_misc_mutex for use when protection is (or should be) rarely needed. > +++ mod/libguile/i18n.c > + > +#ifndef USE_GNU_LOCALE_API > +# include "libguile/posix.h" /* for `scm_i_locale_mutex' */ > +#endif No need to conditionalize that, it's ok if it's only used sometimes, it does no harm. > +/* Provide the locale category masks as found in glibc (copied from > + as found in glibc 2.3.6). This must be kept in sync with > + `locale-categories.h'. */ > +# define LC_CTYPE_MASK (1 << LC_CTYPE) > +# define LC_COLLATE_MASK (1 << LC_COLLATE) > +# define LC_MESSAGES_MASK (1 << LC_MESSAGES) > +# define LC_MONETARY_MASK (1 << LC_MONETARY) > +# define LC_NUMERIC_MASK (1 << LC_NUMERIC) > +# define LC_TIME_MASK (1 << LC_TIME) I think you should put some privately selected bits there, not depend on LC_CTYPE etc being in range 0 to 31. > +/* Alias for glibc's locale type. */ > +typedef locale_t scm_t_locale; I suppose the emulation could provide locale_t. Might make it hard to exercise on an actual gnu system. A #define locale_t would likely be ok. > +SCM_DEFINE (scm_locale_p, "locale?", 1, 0, 0, > ... > + if (SCM_SMOB_PREDICATE (scm_tc16_locale_smob_type, obj)) > + return SCM_BOOL_T; > + return SCM_BOOL_F; scm_from_bool perhaps. > +#ifdef USE_GNU_LOCALE_API > + freelocale ((locale_t)c_locale); > +#else > + c_locale->base_locale =3D SCM_UNDEFINED; > + free (c_locale->locale_name); > + scm_gc_free (c_locale, sizeof (* c_locale), "locale"); > +#endif A possibility there, and with other funcs, would be to implement a compatible freelocale(), instead of sticking conditionals in each usage. > +#ifdef USE_GNU_LOCALE_API > + > + c_locale =3D newlocale (c_category_mask, c_locale_name, c_base_locale); > + if (!c_locale) > + locale =3D SCM_BOOL_F; Your docs call for an exception on unknown locale don't they? And should you tell the gc something about the size of a locale_t, and perhaps extra for its underlying data? To approximate memory used, for the gc triggers. > +void > +scm_init_i18n () > +{ > + scm_add_feature ("ice-9-i18n"); Is there any point adding a feature after the module is loaded? :) I expect a better name would be possible too. > +(define (under-french-locale-or-unresolved thunk) > + ;; On non-GNU systems, an exception may be raised only when the locale= is > + ;; actually used rather than at `make-locale'-time. Thus, we must gua= rd > + ;; against both. > + (if %french-locale > + (catch 'system-error thunk > + (lambda (key . args) > + (throw 'unresolved))) > + (throw 'unresolved))) Do you mean 'unsupported rather than 'unresolved, when fr_FR isn't available from the system? > +(with-test-prefix "number parsing" Some french number parsing too? Just to show there's a point to locale dependent parsing :). _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://lists.gnu.org/mailman/listinfo/guile-devel