unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
From: Kevin Ryde <user42@zip.com.au>
Subject: Re: Text collation
Date: Thu, 30 Nov 2006 10:08:03 +1100	[thread overview]
Message-ID: <87fyc1df70.fsf@zip.com.au> (raw)
In-Reply-To: <87r6x0qjyy.fsf@laas.fr> (Ludovic Courtès's message of "Sun, 22 Oct 2006 20:33:09 +0200")

ludovic.courtes@laas.fr (Ludovic Courtès) writes:
>
> I have come up with an `(ice-9 i18n)' module that contains
> locale-dependent text collation functions and also character case
> mapping and functions to read numbers.  There would be a lot more things
> to add, like `strfmon ()', but I think that's a good start.

I would worry that r6rs may address these things too, leaving
guile-specifics as, well, a dead-end.  Though I can see this stuff is
of use now.

Myself I've been using a couple of bits from from localeconv and
nl_langinfo.  Some way to get at that would be a good addition (though
hopefully in a cleaner way than the C level).

> [0] http://sources.redhat.com/ml/libc-alpha/2006-09/msg00033.html

You could stick that link and perhaps the tllocale.ps.gz one in i18n.c
for reference, since it's not in the glibc manual.

> +@node The ice-9 i18n Module

See if you can think of a better section name.

> +@deffn {Scheme Procedure} make-locale category_mask locale_name [base_locale]
> ...
> +A @code{system-error} exception (@pxref{Handling Errors}) is raised by
> +@code{make-locale} when @var{locale_name} does not match any of the
> +locales compiled on the system.

This bit could be moved to earlier in the description.  And perhaps
something non-committal like "locale_name must be known to the
system".

> +@deffn {Scheme Procedure} string-locale<? s1 s2 [locale]
> +@deffn {Scheme Procedure} string-locale>? s1 s2 [locale]
> +@deffn {Scheme Procedure} string-locale-ci<? s1 s2 [locale]
> +@deffn {Scheme Procedure} string-locale-ci>? s1 s2 [locale]
> +@deffn {Scheme Procedure} string-locale-ci=? s1 s2 [locale]

These could be described in one block I think, to avoid five very
similar descriptions.  Likewise the char ones.

> +...  Note that SRFI-13 provides procedures that
> +look similar (@pxref{Alphabetic Case Mapping}).  However, the SRFI-13
> +procedures are locale-independent.

That's the intention of the srfi I guess, but it's not true currently
is it?  Don't they use toupper() and therefore get whatever nonsense
the current setlocale() gives.  Perhaps better leave the description
of srfi-13 to that section.

> +@deffn {Scheme Procedure} string-locale-upcase str [locale]
> +@deffn {Scheme Procedure} string-locale-downcase str [locale]

Do you need a caveat about multibyte characters there, for now?  Like
"Note that in the current implementation Guile has no notion of
multibyte characters and in a multibyte locale characters may not be
converted correctly."

> +@deffn {Scheme Procedure} locale-string->integer str [base [locale]]
> +@deffn {Scheme Procedure} locale-string->inexact str [locale]

I think you should cross-reference strtol and strtod here, since their
parsing is rather idiosyncratic.  I'd even be a bit tempted to name
them strtol and strtod in guile, to make it clear they're only one
possible way of parsing.  Except those names aren't very nice ...

> +...  Return two values:

Consider @pxref{Multiple Values}, since multi-values are (thankfully)
fairly rare.

>  @c Local Variables:
>  @c TeX-master: "guile.texi"
> +@c ispell-local-dictionary: "american"

Best leave that out please, it'll only annoy those of us who don't
have that dictionary installed.

> +Note that @code{setlocale} affects locale settings for the whole
> +process.  For a safer, thread-safe and reentrant alternative,

Go easy on the advertising! :)

> -    scmconfig.h.top gettext.h
> +    scmconfig.h.top libgettext.h

I don't think that's good.  Best leave gettext.h the gettext one, and
use another name for guile.  Gettext got there first, and it doesn't
really matter which guile header has which prototypes.

> +/* This mutex is used to serialize invocations of `setlocale ()' on non-GNU
> +   systems (i.e., systems where a reentrant locale API is not available).
> +   See `i18n.c' for details.  */
> +scm_i_pthread_mutex_t scm_i_locale_mutex;

There's an scm_i_misc_mutex for use when protection is (or should be)
rarely needed.

> +++ mod/libguile/i18n.c
> +
> +#ifndef USE_GNU_LOCALE_API
> +# include "libguile/posix.h"  /* for `scm_i_locale_mutex' */
> +#endif

No need to conditionalize that, it's ok if it's only used sometimes,
it does no harm.

> +/* Provide the locale category masks as found in glibc (copied from
> +   <locale.h> as found in glibc 2.3.6).  This must be kept in sync with
> +   `locale-categories.h'.  */
> +# define LC_CTYPE_MASK		(1 << LC_CTYPE)
> +# define LC_COLLATE_MASK	(1 << LC_COLLATE)
> +# define LC_MESSAGES_MASK	(1 << LC_MESSAGES)
> +# define LC_MONETARY_MASK	(1 << LC_MONETARY)
> +# define LC_NUMERIC_MASK	(1 << LC_NUMERIC)
> +# define LC_TIME_MASK		(1 << LC_TIME)

I think you should put some privately selected bits there, not depend
on LC_CTYPE etc being in range 0 to 31.

> +/* Alias for glibc's locale type.  */
> +typedef locale_t scm_t_locale;

I suppose the emulation could provide locale_t.  Might make it hard to
exercise on an actual gnu system.  A #define locale_t would likely be
ok.

> +SCM_DEFINE (scm_locale_p, "locale?", 1, 0, 0,
> ...
> +  if (SCM_SMOB_PREDICATE (scm_tc16_locale_smob_type, obj))
> +    return SCM_BOOL_T;
> +  return SCM_BOOL_F;

scm_from_bool perhaps.

> +#ifdef USE_GNU_LOCALE_API
> +  freelocale ((locale_t)c_locale);
> +#else
> +  c_locale->base_locale = SCM_UNDEFINED;
> +  free (c_locale->locale_name);
> +  scm_gc_free (c_locale, sizeof (* c_locale), "locale");
> +#endif

A possibility there, and with other funcs, would be to implement a
compatible freelocale(), instead of sticking conditionals in each
usage.

> +#ifdef USE_GNU_LOCALE_API
> +
> +  c_locale = newlocale (c_category_mask, c_locale_name, c_base_locale);
> +  if (!c_locale)
> +    locale = SCM_BOOL_F;

Your docs call for an exception on unknown locale don't they?

And should you tell the gc something about the size of a locale_t, and
perhaps extra for its underlying data?  To approximate memory used,
for the gc triggers.

> +void
> +scm_init_i18n ()
> +{
> +  scm_add_feature ("ice-9-i18n");

Is there any point adding a feature after the module is loaded? :)
I expect a better name would be possible too.

> +(define (under-french-locale-or-unresolved thunk)
> +  ;; On non-GNU systems, an exception may be raised only when the locale is
> +  ;; actually used rather than at `make-locale'-time.  Thus, we must guard
> +  ;; against both.
> +  (if %french-locale
> +      (catch 'system-error thunk
> +             (lambda (key . args)
> +               (throw 'unresolved)))
> +      (throw 'unresolved)))

Do you mean 'unsupported rather than 'unresolved, when fr_FR isn't
available from the system?

> +(with-test-prefix "number parsing"

Some french number parsing too?  Just to show there's a point to
locale dependent parsing :).


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel


  parent reply	other threads:[~2006-11-29 23:08 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-09-19  9:23 Text collation Ludovic Courtès
2006-09-19 22:38 ` Kevin Ryde
2006-10-22 18:33   ` Ludovic Courtès
2006-10-23  2:01     ` Rob Browning
2006-10-23  7:56       ` Ludovic Courtès
2006-10-24  8:37         ` Rob Browning
2006-10-25  8:16           ` Ludovic Courtès
2006-10-25  8:46             ` Rob Browning
2006-10-25 18:40               ` Neil Jerram
2006-10-25 19:55                 ` Rob Browning
2006-10-26  8:47                 ` Ludovic Courtès
2006-11-09  7:44                   ` Ludovic Courtès
2006-11-09 17:43                     ` Rob Browning
2006-11-10 13:39                       ` Ludovic Courtès
2006-11-11 15:17                         ` Neil Jerram
2006-11-20 13:24                         ` Ludovic Courtès
2006-11-21 22:03                           ` Neil Jerram
2006-11-22 13:38                             ` Ludovic Courtès
2006-10-25 18:43           ` Neil Jerram
2006-10-25 19:31             ` Rob Browning
2006-10-25 18:33     ` Neil Jerram
2006-10-26  8:39       ` Ludovic Courtès
2006-11-29 23:08     ` Kevin Ryde [this message]
2006-11-30 15:19       ` Ludovic Courtès
2006-12-02 21:56         ` Kevin Ryde
2006-12-04  9:01           ` Ludovic Courtès
2006-12-05  0:20             ` Kevin Ryde
2006-12-05 18:42               ` Carl Witty
2006-12-05 20:41                 ` Kevin Ryde
2006-12-05 22:29                   ` Carl Witty
2006-12-05  0:38         ` Kevin Ryde
2006-12-02 22:02       ` Kevin Ryde
2006-12-10 12:30       ` Ludovic Courtès
2006-12-11 22:32         ` Kevin Ryde
2006-12-12  8:38           ` Ludovic Courtès
2006-12-12 20:04             ` Kevin Ryde
2006-12-13  9:41               ` Ludovic Courtès
2006-12-31 17:10               ` Neil Jerram
2006-12-15 20:52             ` Kevin Ryde
2006-12-12 19:05     ` Kevin Ryde
2006-12-13  9:14       ` Ludovic Courtès
2006-12-12 19:16     ` Kevin Ryde
2006-12-13  9:20       ` Ludovic Courtès
2006-12-12 21:37     ` Kevin Ryde
2006-12-13  9:28       ` Ludovic Courtès
2006-12-13 20:10         ` Kevin Ryde

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87fyc1df70.fsf@zip.com.au \
    --to=user42@zip.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).