unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* SRFI-14 and locale settings
@ 2006-09-03 16:48 Ludovic Courtès
  2006-09-04  6:41 ` Neil Jerram
  0 siblings, 1 reply; 23+ messages in thread
From: Ludovic Courtès @ 2006-09-03 16:48 UTC (permalink / raw)


Hi,

SRFI-14 doesn't take into account the current locale, mostly because
`scm_init_srfi_14 ()' gets invoked before the user has had any chance to
run code like `(setlocale ...)'.  Thus, for instance, `char-set:letter'
is always initialized with the English set of letters.

Since SRFI-13 is initialized in core Guile, SRFI-14 needs to be
initialized there too.  But do you guys have an idea of how we could
work around this?

Thanks,
Ludovic.


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: SRFI-14 and locale settings
  2006-09-03 16:48 SRFI-14 and locale settings Ludovic Courtès
@ 2006-09-04  6:41 ` Neil Jerram
  2006-09-04  9:08   ` Ludovic Courtès
  0 siblings, 1 reply; 23+ messages in thread
From: Neil Jerram @ 2006-09-04  6:41 UTC (permalink / raw)


ludovic.courtes@laas.fr (Ludovic Courtès) writes:

> Hi,
>
> SRFI-14 doesn't take into account the current locale, mostly because
> `scm_init_srfi_14 ()' gets invoked before the user has had any chance to
> run code like `(setlocale ...)'.  Thus, for instance, `char-set:letter'
> is always initialized with the English set of letters.
>
> Since SRFI-13 is initialized in core Guile, SRFI-14 needs to be
> initialized there too.  But do you guys have an idea of how we could
> work around this?

Here's what SRFI 14 says about char-set:letter:

  char-set:letter

  In Unicode, a letter is any character with one of the letter
  categories (Lu, Ll, Lt, Lm, Lo) in the Unicode character database.

  There are 52 ASCII letters
  abcdefghijklmnopqrstuvwxyz
  ABCDEFGHIJKLMNOPQRSTUVWXYZ

  There are 117 Latin-1 letters. These are the 115 characters that are
  members of the Latin-1 char-set:lower-case and char-set:upper-case
  sets, plus

  00AA 	FEMININE ORDINAL INDICATOR
  00BA 	MASCULINE ORDINAL INDICATOR

  (These two letters are considered lower-case by Unicode, but not by
  Java or SRFI 14.)

My reading of this is that it is trying to be locale-independent,
based on Unicode category definitions.  Isn't that correct?

(It may of course be that Guile's current implementation doesn't
return the complete set that is implied by this definition, because
it's bugged or because we don't have Unicode support yet, but that's a
different kind of problem.)

Regards,
     Neil



_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: SRFI-14 and locale settings
  2006-09-04  6:41 ` Neil Jerram
@ 2006-09-04  9:08   ` Ludovic Courtès
  2006-09-04 23:42     ` Kevin Ryde
  0 siblings, 1 reply; 23+ messages in thread
From: Ludovic Courtès @ 2006-09-04  9:08 UTC (permalink / raw)
  Cc: Guile-Devel

Hi Neil,

Neil Jerram <neil@ossau.uklinux.net> writes:

> My reading of this is that it is trying to be locale-independent,
> based on Unicode category definitions.  Isn't that correct?

Indeed.  It also reads:

  This library is designed to be portable across implementations that
  use different character types and representations, especially ASCII,
  Latin-1 and Unicode.

  [...]

  While the exact composition of these sets may vary depending upon the
  character type provided by the underlying Scheme system, here are the
  definitions for some of the sets in an ASCII implementation:

  [...]

Currently, the only charset supported by Guile (or, rather, by its
implementation of this SRFI) is ASCII.  I believe Guile could support
any 8-bit charset, including Latin-1, at little or no cost.

But for this, we'd need a way for the user to tell which 8-bit charset
they are interested in.  The easiest way would be through a startup-time
locale setting, but there might be other options too.

What do you think?

Thanks,
Ludovic.


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: SRFI-14 and locale settings
  2006-09-04  9:08   ` Ludovic Courtès
@ 2006-09-04 23:42     ` Kevin Ryde
  2006-09-07  7:21       ` Ludovic Courtès
  0 siblings, 1 reply; 23+ messages in thread
From: Kevin Ryde @ 2006-09-04 23:42 UTC (permalink / raw)


ludovic.courtes@laas.fr (Ludovic Courtès) writes:
>
> But for this, we'd need a way for the user to tell which 8-bit charset
> they are interested in.  The easiest way would be through a startup-time
> locale setting, but there might be other options too.

The setlocale call would be a good way.  Maybe the charset tables
could be reinitialized in scm_setlocale (when setting LC_ALL or
LC_CTYPE).  I suppose that'd be moderately helpful, and would make
char-alphabetic? etc match how 1.6 worked.  But I guess really the
notion of what a character represents beyond ascii isn't specified
yet.


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: SRFI-14 and locale settings
  2006-09-04 23:42     ` Kevin Ryde
@ 2006-09-07  7:21       ` Ludovic Courtès
  2006-09-07 23:22         ` Kevin Ryde
  0 siblings, 1 reply; 23+ messages in thread
From: Ludovic Courtès @ 2006-09-07  7:21 UTC (permalink / raw)


Hi,

Kevin Ryde <user42@zip.com.au> writes:

> The setlocale call would be a good way.  Maybe the charset tables
> could be reinitialized in scm_setlocale (when setting LC_ALL or
> LC_CTYPE).  I suppose that'd be moderately helpful, and would make
> char-alphabetic? etc match how 1.6 worked.

In fact, I'm afraid we have a problem, because the `is' functions from
<ctype.h> are fully locale-dependent.  Thus, they don't only depend on
the charset being used but also on the language settings, which makes
them unsuitable for the implementation of `char-set:letter' (because it
should contain _all_ the letters representable with the current charset,
not only those of some particular language).

> But I guess really the
> notion of what a character represents beyond ascii isn't specified
> yet.

I'm not an expert in that domain, but the SRFI seemed to imply that the
notion of a letter is pretty well defined in Unicode (which is fortunate
because all the people using the various scripts do know what a letter
is in their script ;-)).

Thanks,
Ludovic.


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: SRFI-14 and locale settings
  2006-09-07  7:21       ` Ludovic Courtès
@ 2006-09-07 23:22         ` Kevin Ryde
  2006-09-12  9:28           ` Ludovic Courtès
  0 siblings, 1 reply; 23+ messages in thread
From: Kevin Ryde @ 2006-09-07 23:22 UTC (permalink / raw)


ludovic.courtes@laas.fr (Ludovic Courtès) writes:
>
> In fact, I'm afraid we have a problem, because the `is' functions from
> <ctype.h> are fully locale-dependent.  Thus, they don't only depend on
> the charset being used but also on the language settings,

I'd be surprised if there was a problem in practice, you'd have to
hope the ctypes were a property of the charset rather than the
language.


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: SRFI-14 and locale settings
  2006-09-07 23:22         ` Kevin Ryde
@ 2006-09-12  9:28           ` Ludovic Courtès
  2006-09-12 18:17             ` Neil Jerram
  2006-09-14  0:07             ` Kevin Ryde
  0 siblings, 2 replies; 23+ messages in thread
From: Ludovic Courtès @ 2006-09-12  9:28 UTC (permalink / raw)


Hi,

Kevin Ryde <user42@zip.com.au> writes:

> ludovic.courtes@laas.fr (Ludovic Courtès) writes:
>>
>> In fact, I'm afraid we have a problem, because the `is' functions from
>> <ctype.h> are fully locale-dependent.  Thus, they don't only depend on
>> the charset being used but also on the language settings,
>
> I'd be surprised if there was a problem in practice, you'd have to
> hope the ctypes were a property of the charset rather than the
> language.

I'm not sure I understand what you mean.  An example to illustrate what
I was trying to say: Both French and Castellano can be written using
Latin-1; however, letter `ñ' (`n' with tilde) is not a French letter
(thus, `isalpha ()' would return false with a Latin-1 `fr_FR' locale)
but it _is_ a letter in Castellano (thus, `isalpha ()' would return true
with a Latin-1 `es_ES', although the charset is the same).  Conversely,
letter `ê' is a letter in French but not in Castellano, and it is part
of Latin-1.

According to SRFI-14, a Latin-1 implementation should contain _both_ `ñ'
and `ê' in `char-set:letter', regardless of the current language
settings, hence the difficulty we might have building `char-set:letter'.

Does that clarify things?

Thanks,
Ludovic.


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: SRFI-14 and locale settings
  2006-09-12  9:28           ` Ludovic Courtès
@ 2006-09-12 18:17             ` Neil Jerram
  2006-09-13  8:29               ` Ludovic Courtès
  2006-09-14  0:07             ` Kevin Ryde
  1 sibling, 1 reply; 23+ messages in thread
From: Neil Jerram @ 2006-09-12 18:17 UTC (permalink / raw)


ludovic.courtes@laas.fr (Ludovic Courtès) writes:

> According to SRFI-14, a Latin-1 implementation should contain _both_ `ñ'
> and `ê' in `char-set:letter', regardless of the current language
> settings, hence the difficulty we might have building `char-set:letter'.
>
> Does that clarify things?

Yes.  So it seems to me, therefore, that we should not be using
isalpha() etc. to construct char-set:letter, but should instead hard
code it as the intersection of (char-set:letter as specified by SRFI
14) with (the set of characters that Guile can represent).

Would that work?

Regards,
     Neil



_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: SRFI-14 and locale settings
  2006-09-12 18:17             ` Neil Jerram
@ 2006-09-13  8:29               ` Ludovic Courtès
  2006-09-13 18:07                 ` Neil Jerram
  0 siblings, 1 reply; 23+ messages in thread
From: Ludovic Courtès @ 2006-09-13  8:29 UTC (permalink / raw)
  Cc: guile-devel

Hi,

Neil Jerram <neil@ossau.uklinux.net> writes:

> ludovic.courtes@laas.fr (Ludovic Courtès) writes:
>
>> According to SRFI-14, a Latin-1 implementation should contain _both_ `ñ'
>> and `ê' in `char-set:letter', regardless of the current language
>> settings, hence the difficulty we might have building `char-set:letter'.
>>
>> Does that clarify things?
>
> Yes.  So it seems to me, therefore, that we should not be using
> isalpha() etc. to construct char-set:letter, but should instead hard
> code it as the intersection of (char-set:letter as specified by SRFI
> 14) with (the set of characters that Guile can represent).

In practice, I can think of two ways to determine the set of _letters_
available in the current encoding (which is what `char-set:letter'
expects).

1. Since SRFI-14 lists all the characters that have to be added to the
   ASCII `char-set:letter' to get the Latin-1 `char-set:letter', we
   could somehow hard-code them.  But this is ugly.

2. Or, we can use a predicate that uses the `is' functions which we
   expect to be language-independent (i.e., those functions that only
   depend on the locale's charset), such as:

     (!isblank (c)) && (!ispunct (c)) && (!isdigit (c)) && (!iscntrl (c))

   This is certainly not perfect, but it should work for Latin-1, and
   hopefully for other 8-bit charsets as well.

As Kevin mentioned earlier, all the char sets could be re-computed in
`scm_setlocale ()'.

I think I'll give a try to the second option in the next few days if
nobody considers it too silly.

Thanks,
Ludovic.


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: SRFI-14 and locale settings
  2006-09-13  8:29               ` Ludovic Courtès
@ 2006-09-13 18:07                 ` Neil Jerram
  2006-09-14 15:58                   ` Ludovic Courtès
  0 siblings, 1 reply; 23+ messages in thread
From: Neil Jerram @ 2006-09-13 18:07 UTC (permalink / raw)


ludovic.courtes@laas.fr (Ludovic Courtès) writes:

> Hi,
>
> Neil Jerram <neil@ossau.uklinux.net> writes:
>
>> Yes.  So it seems to me, therefore, that we should not be using
>> isalpha() etc. to construct char-set:letter, but should instead hard
>> code it as the intersection of (char-set:letter as specified by SRFI
>> 14) with (the set of characters that Guile can represent).
>
> In practice, I can think of two ways to determine the set of _letters_
> available in the current encoding (which is what `char-set:letter'
> expects).
>
> 1. Since SRFI-14 lists all the characters that have to be added to the
>    ASCII `char-set:letter' to get the Latin-1 `char-set:letter', we
>    could somehow hard-code them.  But this is ugly.

I don't see why you think it's ugly.  If it's the right solution, it's
the right solution.

> 2. Or, we can use a predicate that uses the `is' functions which we
>    expect to be language-independent (i.e., those functions that only
>    depend on the locale's charset), such as:
>
>      (!isblank (c)) && (!ispunct (c)) && (!isdigit (c)) && (!iscntrl (c))

Now this is ugly, IMO!

>    This is certainly not perfect, but it should work for Latin-1, and
>    hopefully for other 8-bit charsets as well.
>
> As Kevin mentioned earlier, all the char sets could be re-computed in
> `scm_setlocale ()'.

This sounds even trickier, and wrong, given that the intention of SRFI
14 is for char-set:letter to be locale-independent.

Regards,
     Neil



_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: SRFI-14 and locale settings
  2006-09-12  9:28           ` Ludovic Courtès
  2006-09-12 18:17             ` Neil Jerram
@ 2006-09-14  0:07             ` Kevin Ryde
  2006-09-14 13:22               ` Ludovic Courtès
  1 sibling, 1 reply; 23+ messages in thread
From: Kevin Ryde @ 2006-09-14  0:07 UTC (permalink / raw)


ludovic.courtes@laas.fr (Ludovic Courtès) writes:
>
> An example to illustrate what
> I was trying to say: Both French and Castellano can be written using
> Latin-1; however, letter `ñ' (`n' with tilde) is not a French letter
> (thus, `isalpha ()' would return false with a Latin-1 `fr_FR' locale)

In glibc fr_FR and es_ES have the same isalpha for all chars 0 to 255,
it appears to be a property of the charset, not the language or
location.




#include <ctype.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <locale.h>

int fr[256];
int es[256];

int
main (void)
{
  int i;

  if (setlocale (LC_ALL, "fr_FR") == NULL)
    abort();
  printf ("%d\n", isalpha (0xEA));
  for (i = 0; i < 256; i++)
    fr[i] = isalpha (i);

  if (setlocale (LC_ALL, "es_ES") == NULL)
    abort();
  printf ("%d\n", isalpha (0xEA));
  for (i = 0; i < 256; i++)
    es[i] = isalpha (i);

  for (i = 0; i < 256; i++)
    if (fr[i] != es[i])
      printf ("%d\n", i);

  return 0;
}



_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: SRFI-14 and locale settings
  2006-09-14  0:07             ` Kevin Ryde
@ 2006-09-14 13:22               ` Ludovic Courtès
  2006-09-15  0:53                 ` Kevin Ryde
  0 siblings, 1 reply; 23+ messages in thread
From: Ludovic Courtès @ 2006-09-14 13:22 UTC (permalink / raw)


Hi,

Kevin Ryde <user42@zip.com.au> writes:

> ludovic.courtes@laas.fr (Ludovic Courtès) writes:
>>
>> An example to illustrate what
>> I was trying to say: Both French and Castellano can be written using
>> Latin-1; however, letter `ñ' (`n' with tilde) is not a French letter
>> (thus, `isalpha ()' would return false with a Latin-1 `fr_FR' locale)
>
> In glibc fr_FR and es_ES have the same isalpha for all chars 0 to 255,
> it appears to be a property of the charset, not the language or
> location.

Indeed: I tested the same thing yesterday evening to discover that.  So
my whole theory just seems to be falling apart!  ;-)

I did some research to try to understand whether this is a
glibc-specific behavior, or whether this is made mandatory by some
standard.  Since I am not very knowledgeable about all these issues, I
made a whole lot of discoveries.


SUSv2 [0] explains that the `LC_CTYPE' category defines various
character classes (Section 7.3.1), notably the `alpha' class, that are
dependent on the "locale", without specifying whether they are dependent
specifically on the language.

On Debian GNU/Linux, the glibc-provided locale definition files are
available under `/usr/share/i18n/locale'.  Both the `fr_FR' and `es_ES'
files contain a line, in the `LC_CTYPE' section, that reads this:

  copy "i18n"

Actually, running the following command shows that a large number of
locales (those for western languages) contain this line:

  $ grep -A1 '^LC_CTYPE' /usr/share/i18n/locales/*_*

This "i18n" file contains a character classification definition
(`LC_CTYPE' section) whose contents are defined in ISO 14652 [1] as part
of a "generic" FDCC-set (Set of Formal Definitions of Cultural
Conventions).  The introduction to Section 4 of ISO 14652 reads this:

  This Technical Report also defines an FDCC-set named "i18n" with
  values for some of the above categories in order to simplify FDCC-set
  descriptions for a number of cultures.  The contents of "i18n"
  categories should not necessarily be considered as the most commonly
  accepted values, while in many cases it could be the recommended
  values.

The "i18n" character classification (listed in Section 4.3.2) is
actually very broad: it considers at least all Latin, Greek and Cyrillic
letters as part of the `alpha' character class.

My understanding (take it with a grain of salt...) of the above
quotation is that including "i18n" in various locales can be thought of
as a good way to get things "roughly working" first; however, actual
locale definitions could be refined to reflect more "commonly accepted
values".  So, for instance, one could refine the `LC_CTYPE' section of
glibc's `fr_FR' locale definition to make sure it only includes French
letters.


To summarize, using `isalpha ()' to determine the contents of
`char-set:letter' will probably yield correct results on most platforms,
at least on current glibc-based systems.  However, it seems that it is
"theoretically" incorrect, in that character classes are
language-dependent.

Therefore, explicitly listing all Latin-1 letters in `srfi-14.c' as Neil
suggested might be the best way.

Thanks,
Ludovic.


[0] http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html
[1] http://www.open-std.org/jtc1/sc22/wg20/docs/projects#14652


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: SRFI-14 and locale settings
  2006-09-13 18:07                 ` Neil Jerram
@ 2006-09-14 15:58                   ` Ludovic Courtès
  0 siblings, 0 replies; 23+ messages in thread
From: Ludovic Courtès @ 2006-09-14 15:58 UTC (permalink / raw)
  Cc: guile-devel

Hi,

Neil Jerram <neil@ossau.uklinux.net> writes:

>> In practice, I can think of two ways to determine the set of _letters_
>> available in the current encoding (which is what `char-set:letter'
>> expects).
>>
>> 1. Since SRFI-14 lists all the characters that have to be added to the
>>    ASCII `char-set:letter' to get the Latin-1 `char-set:letter', we
>>    could somehow hard-code them.  But this is ugly.
>
> I don't see why you think it's ugly.  If it's the right solution, it's
> the right solution.

I'm not sure there's a "right solution".  I said I considered it ugly
because it would be Latin-1-specific and it may be incorrect for other
8-bit charsets.

>> As Kevin mentioned earlier, all the char sets could be re-computed in
>> `scm_setlocale ()'.
>
> This sounds even trickier, and wrong, given that the intention of SRFI
> 14 is for char-set:letter to be locale-independent.

The starting point of this thread was precisely that `char-set:letter'
must reflect the character set supported by Guile at the time it is
used.  As Kevin suggested [0], `setlocale' is currently the only way one
can change the charset supported by Guile, hence this suggestion.

Another option would be to add, say, a `--charset' command-line option
to Guile, or a `set-charset' call, something like that.  Would you
prefer something like this?

Thanks,
Ludovic.

[0] http://lists.gnu.org/archive/html/guile-devel/2006-09/msg00006.html


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: SRFI-14 and locale settings
  2006-09-14 13:22               ` Ludovic Courtès
@ 2006-09-15  0:53                 ` Kevin Ryde
  2006-09-15  9:28                   ` Neil Jerram
  2006-09-15 12:03                   ` Ludovic Courtès
  0 siblings, 2 replies; 23+ messages in thread
From: Kevin Ryde @ 2006-09-15  0:53 UTC (permalink / raw)


ludovic.courtes@laas.fr (Ludovic Courtès) writes:
>
> The "i18n" character classification (listed in Section 4.3.2) is
> actually very broad: it considers at least all Latin, Greek and Cyrillic
> letters as part of the `alpha' character class.

I think that makes sense.  Just because some letters in a charset are
not normally used in a particular language is no real reason not to
have them considered letters.

> Therefore, explicitly listing all Latin-1 letters in `srfi-14.c' as Neil
> suggested might be the best way.

Hard coded?  Doesn't sound good.

> Another option would be to add, say, a `--charset' command-line option
> to Guile, or a `set-charset' call, something like that.  Would you
> prefer something like this?

Doesn't sound like fun.  All the locale stuff is pretty horrible
already, better just do something sensible with the posix-ish
selection mechanisms.  Hopefully that'd cooperate best with external
libraries also trying to navigate the locale jungle.


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: SRFI-14 and locale settings
  2006-09-15  0:53                 ` Kevin Ryde
@ 2006-09-15  9:28                   ` Neil Jerram
  2006-09-16 13:46                     ` Ludovic Courtès
  2006-09-15 12:03                   ` Ludovic Courtès
  1 sibling, 1 reply; 23+ messages in thread
From: Neil Jerram @ 2006-09-15  9:28 UTC (permalink / raw)


Kevin Ryde <user42@zip.com.au> writes:

> ludovic.courtes@laas.fr (Ludovic Courtès) writes:
>>
>> The "i18n" character classification (listed in Section 4.3.2) is
>> actually very broad: it considers at least all Latin, Greek and Cyrillic
>> letters as part of the `alpha' character class.
>
> I think that makes sense.  Just because some letters in a charset are
> not normally used in a particular language is no real reason not to
> have them considered letters.

Yes; based on Kevin's and Ludovic's latest emails, I'm happy now with
the isalpha() solution if we can make it leverage this "i18n"
classification.

>> Another option would be to add, say, a `--charset' command-line option
>> to Guile, or a `set-charset' call, something like that.  Would you
>> prefer something like this?
>
> Doesn't sound like fun.  All the locale stuff is pretty horrible
> already, better just do something sensible with the posix-ish
> selection mechanisms.  Hopefully that'd cooperate best with external
> libraries also trying to navigate the locale jungle.

Agreed.

Regards,
     Neil



_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: SRFI-14 and locale settings
  2006-09-15  0:53                 ` Kevin Ryde
  2006-09-15  9:28                   ` Neil Jerram
@ 2006-09-15 12:03                   ` Ludovic Courtès
  1 sibling, 0 replies; 23+ messages in thread
From: Ludovic Courtès @ 2006-09-15 12:03 UTC (permalink / raw)


Hi,

Kevin Ryde <user42@zip.com.au> writes:

> ludovic.courtes@laas.fr (Ludovic Courtès) writes:
>>
>> The "i18n" character classification (listed in Section 4.3.2) is
>> actually very broad: it considers at least all Latin, Greek and Cyrillic
>> letters as part of the `alpha' character class.
>
> I think that makes sense.  Just because some letters in a charset are
> not normally used in a particular language is no real reason not to
> have them considered letters.

For the record, I started a discussion on `libc-locales' on this topic
[0].  The issue at hand is whether character classification in locales
should be (or "has to be", per some standard) language-independent.

Thanks,
Ludovic.

[0] http://sources.redhat.com/ml/libc-locales/2006-q3/msg00086.html


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: SRFI-14 and locale settings
  2006-09-15  9:28                   ` Neil Jerram
@ 2006-09-16 13:46                     ` Ludovic Courtès
  2006-09-18 23:48                       ` Kevin Ryde
  0 siblings, 1 reply; 23+ messages in thread
From: Ludovic Courtès @ 2006-09-16 13:46 UTC (permalink / raw)
  Cc: guile-devel

[-- Attachment #1: Type: text/plain, Size: 2117 bytes --]

Hi,

Neil Jerram <neil@ossau.uklinux.net> writes:

> Yes; based on Kevin's and Ludovic's latest emails, I'm happy now with
> the isalpha() solution if we can make it leverage this "i18n"
> classification.

Below is a patch that does what we agreed on: keep using the <ctype.h>
functions for character classification, and recompute the standard
SRFI-14 char sets upon successful `setlocale'.  It also makes char set
computation more efficient and fixes `char-set:punctuation' and
`char-set:symbol' in ASCII.

There are still issues.  The bug in `char-set:punctuation' and
`char-set:symbol' I mention above is due to the fact that there is not
<ctype.h> equivalent to those char sets (in particular, `ispunct ()'
does not match `char-set:punctuation').  Fixing it for ASCII was easy,
but it's not so easy for Latin-1.

The reason we can hardly get `char-set:punctuation' and
`char-set:symbol' for Latin-1 is that we don't want to hard-code too
much Latin-1-specific knowledge: one goal is to have SRFI-14 provide
also sensible results for non-Latin-1 8-bit charsets.

With this patch, all standard char sets are those expected by SRFI-14 in
ASCII.  In Latin-1, `char-set:letter', as well as `lower-case',
`upper-case', and `iso-control' are correct (at least, using current
glibc locales), but `punctuation', for instance, is a superset of what
SRFI-14 expects while `symbol' is (correspondingly) a subset of what it
should be, and `blank' lacks the "no-break space" character (#\0240).

I'm not sure we can do much better than that until Guile fully supports
Unicode.  The right solution, in the end, would be to process the whole
`UnicodeData.txt' and generate a character classification strictly
following the SRFI-14 rules.  In the meantime, I think this patch can be
an acceptable solution.

I'd be glad if some of you could test it, and especially run the test
cases.  I added Latin-1-specific test cases, but they require that a
Latin-1 locale is available, and it will try to guess what that can be
(yes, it looks quite hackish but I couldn't think of anything
better...).

Comments welcome.

Thanks,
Ludovic.


[-- Attachment #2: The SRFI-14 patch against CVS HEAD --]
[-- Type: text/plain, Size: 14387 bytes --]

--- orig/configure.in
+++ mod/configure.in
@@ -598,9 +598,10 @@
 #   readdir_r - recent posix, not on old systems
 #   stat64 - SuS largefile stuff, not on old systems
 #   sysconf - not on old systems
+#   isblank - available as a GNU extension or in C99
 #   _NSGetEnviron - Darwin specific
 #
-AC_CHECK_FUNCS([DINFINITY DQNAN ctermid fesetround ftime fchown getcwd geteuid gettimeofday gmtime_r ioctl lstat mkdir mknod nice readdir_r readlink rename rmdir select setegid seteuid setlocale setpgid setsid sigaction siginterrupt stat64 strftime strptime symlink sync sysconf tcgetpgrp tcsetpgrp times uname waitpid strdup system usleep atexit on_exit chown link fcntl ttyname getpwent getgrent kill getppid getpgrp fork setitimer getitimer strchr strcmp index bcopy memcpy rindex unsetenv _NSGetEnviron])
+AC_CHECK_FUNCS([DINFINITY DQNAN ctermid fesetround ftime fchown getcwd geteuid gettimeofday gmtime_r ioctl lstat mkdir mknod nice readdir_r readlink rename rmdir select setegid seteuid setlocale setpgid setsid sigaction siginterrupt stat64 strftime strptime symlink sync sysconf tcgetpgrp tcsetpgrp times uname waitpid strdup system usleep atexit on_exit chown link fcntl ttyname getpwent getgrent kill getppid getpgrp fork setitimer getitimer strchr strcmp index bcopy memcpy rindex unsetenv isblank _NSGetEnviron])
 
 # Reasons for testing:
 #   netdb.h - not in mingw


--- orig/libguile/posix.c
+++ mod/libguile/posix.c
@@ -34,6 +34,7 @@
 #include "libguile/feature.h"
 #include "libguile/strings.h"
 #include "libguile/srfi-13.h"
+#include "libguile/srfi-14.h"
 #include "libguile/vectors.h"
 #include "libguile/lang.h"
 
@@ -1392,6 +1393,10 @@
       SCM_SYSERROR;
     }
 
+  /* Recompute the standard SRFI-14 character sets in a locale-dependent
+     (actually charset-dependent) way.  */
+  scm_srfi_14_compute_char_sets ();
+
   scm_dynwind_end ();
   return scm_from_locale_string (rv);
 }


--- orig/libguile/srfi-14.c
+++ mod/libguile/srfi-14.c
@@ -17,18 +17,25 @@
  * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
  */
 
+#define _GNU_SOURCE  /* Ask for `isblank ()'.  */
 
 #include <string.h>
 #include <ctype.h>
 
+#include <config.h>
+
 #include "libguile.h"
 #include "libguile/srfi-14.h"
 
 
-#define SCM_CHARSET_SET(cs, idx) \
-  (((long *) SCM_SMOB_DATA (cs))[(idx) / SCM_BITS_PER_LONG] |= \
+#define SCM_CHARSET_SET(cs, idx)				\
+  (((long *) SCM_SMOB_DATA (cs))[(idx) / SCM_BITS_PER_LONG] |=	\
     (1L << ((idx) % SCM_BITS_PER_LONG)))
 
+#define SCM_CHARSET_UNSET(cs, idx)				\
+  (((long *) SCM_SMOB_DATA (cs))[(idx) / SCM_BITS_PER_LONG] &=	\
+    (~(1L << ((idx) % SCM_BITS_PER_LONG))))
+
 #define BYTES_PER_CHARSET (SCM_CHARSET_SIZE / 8)
 #define LONGS_PER_CHARSET (SCM_CHARSET_SIZE / SCM_BITS_PER_LONG)
 
@@ -1393,6 +1400,9 @@
 }
 #undef FUNC_NAME
 
+\f
+/* Standard character sets.  */
+
 SCM scm_char_set_lower_case;
 SCM scm_char_set_upper_case;
 SCM scm_char_set_title_case;
@@ -1411,48 +1421,100 @@
 SCM scm_char_set_empty;
 SCM scm_char_set_full;
 
-static SCM
-make_predset (int (*pred) (int))
-{
-  int ch;
-  SCM cs = make_char_set (NULL);
-  for (ch = 0; ch < 256; ch++)
-    if (pred (ch))
-      SCM_CHARSET_SET (cs, ch);
-  return cs;
-}
 
-static SCM
-define_predset (const char *name, int (*pred) (int))
+/* Create an empty character set and return it after binding it to NAME.  */
+static inline SCM
+define_charset (const char *name)
 {
-  SCM cs = make_predset (pred);
+  SCM cs = make_char_set (NULL);
   scm_c_define (name, cs);
   return scm_permanent_object (cs);
 }
 
-static SCM
-make_strset (const char *str)
+/* Membership predicates for the `symbol', `blank' and `punct' char sets.
+
+   XXX: The `punctuation' and `symbol' char sets have no direct equivalent in
+   <ctype.h>.  Thus, the predicates below yield correct results for ASCII,
+   but they do not provide the result described by the SRFI for Latin-1.  The
+   correct Latin-1 result could only be obtained by hard-coding the
+   characters listed by the SRFI, but the problem would remain for other
+   8-bit charsets.
+
+   Similarly, character 0xA0 in Latin-1 (unbreakable space, `#\0240') should
+   be part of `char-set:blank'.  However, glibc's current (2006/09) Latin-1
+   locales (which use the ISO 14652 "i18n" FDCC-set) do not consider it
+   `blank' so it ends up in `char-set:punctuation'.  */
+#ifdef HAVE_ISBLANK
+# define CSET_BLANK_PRED(c)  (isblank (c))
+#else
+# define CSET_BLANK_PRED(c)			\
+   (((c) == ' ') || ((c) == '\t'))
+#endif
+
+#define CSET_SYMBOL_PRED(c)					\
+  (((c) != '\0') && (strchr ("$+<=>^`|~", (c)) != NULL))
+#define CSET_PUNCT_PRED(c)					\
+  ((ispunct (c)) && (!CSET_SYMBOL_PRED (c)))
+
+/* False and true predicates.  */
+#define CSET_TRUE_PRED(c)    (1)
+#define CSET_FALSE_PRED(c)   (0)
+
+
+/* Compute the contents of all the standard character sets.  Computation may
+   need to be re-done at `setlocale'-time because some char sets (e.g.,
+   `char-set:letter') need to reflect the character set supported by Guile.
+
+   For instance, at startup time, the "C" locale is used, thus Guile supports
+   only ASCII; therefore, `char-set:letter' only contains English letters.
+   The user can change this by invoking `setlocale' and specifying a locale
+   with an 8-bit charset, thereby augmenting some of the SRFI-14 standard
+   character sets.
+
+   This works because some of the predicates used below to construct
+   character sets (e.g., `isalpha(3)') are locale-dependent (so
+   charset-dependent, though generally not language-dependent).  For details,
+   please see the `guile-devel' mailing list archive of September 2006.  */
+void
+scm_srfi_14_compute_char_sets (void)
 {
-  SCM cs = make_char_set (NULL);
-  while (*str)
+#define UPDATE_CSET(c, cset, pred)		\
+  do						\
+    {						\
+      if (pred (c))				\
+	SCM_CHARSET_SET ((cset), (c));		\
+      else					\
+	SCM_CHARSET_UNSET ((cset), (c));	\
+    }						\
+  while (0)
+
+  register int ch;
+
+  for (ch = 0; ch < 256; ch++)
     {
-      SCM_CHARSET_SET (cs, *str);
-      str++;
+      UPDATE_CSET (ch, scm_char_set_upper_case, isupper);
+      UPDATE_CSET (ch, scm_char_set_lower_case, islower);
+      UPDATE_CSET (ch, scm_char_set_title_case, CSET_FALSE_PRED);
+      UPDATE_CSET (ch, scm_char_set_letter, isalpha);
+      UPDATE_CSET (ch, scm_char_set_digit, isdigit);
+      UPDATE_CSET (ch, scm_char_set_letter_and_digit, isalnum);
+      UPDATE_CSET (ch, scm_char_set_graphic, isgraph);
+      UPDATE_CSET (ch, scm_char_set_printing, isprint);
+      UPDATE_CSET (ch, scm_char_set_whitespace, isspace);
+      UPDATE_CSET (ch, scm_char_set_iso_control, iscntrl);
+      UPDATE_CSET (ch, scm_char_set_punctuation, CSET_PUNCT_PRED);
+      UPDATE_CSET (ch, scm_char_set_symbol, CSET_SYMBOL_PRED);
+      UPDATE_CSET (ch, scm_char_set_hex_digit, isxdigit);
+      UPDATE_CSET (ch, scm_char_set_blank, CSET_BLANK_PRED);
+      UPDATE_CSET (ch, scm_char_set_ascii, isascii);
+      UPDATE_CSET (ch, scm_char_set_empty, CSET_FALSE_PRED);
+      UPDATE_CSET (ch, scm_char_set_full, CSET_TRUE_PRED);
     }
-  return cs;
-}
 
-static SCM
-define_strset (const char *name, const char *str)
-{
-  SCM cs = make_strset (str);
-  scm_c_define (name, cs);
-  return scm_permanent_object (cs);
+#undef UPDATE_CSET
 }
 
-static int false (int ch) { return 0; }
-static int true (int ch) { return 1; }
-
+\f
 void
 scm_init_srfi_14 (void)
 {
@@ -1461,24 +1523,25 @@
   scm_set_smob_free (scm_tc16_charset, charset_free);
   scm_set_smob_print (scm_tc16_charset, charset_print);
 
-  scm_char_set_upper_case = define_predset ("char-set:upper-case", isupper);
-  scm_char_set_lower_case = define_predset ("char-set:lower-case", islower);
-  scm_char_set_title_case = define_predset ("char-set:title-case", false);
-  scm_char_set_letter = define_predset ("char-set:letter", isalpha);
-  scm_char_set_digit = define_predset ("char-set:digit", isdigit);
-  scm_char_set_letter_and_digit = define_predset ("char-set:letter+digit",
-						  isalnum);
-  scm_char_set_graphic = define_predset ("char-set:graphic", isgraph);
-  scm_char_set_printing = define_predset ("char-set:printing", isprint);
-  scm_char_set_whitespace = define_predset ("char-set:whitespace", isspace);
-  scm_char_set_iso_control = define_predset ("char-set:iso-control", iscntrl);
-  scm_char_set_punctuation = define_predset ("char-set:punctuation", ispunct);
-  scm_char_set_symbol = define_strset ("char-set:symbol", "$+<=>^`|~");
-  scm_char_set_hex_digit = define_predset ("char-set:hex-digit", isxdigit);
-  scm_char_set_blank = define_strset ("char-set:blank", " \t");
-  scm_char_set_ascii = define_predset ("char-set:ascii", isascii);
-  scm_char_set_empty = define_predset ("char-set:empty", false);
-  scm_char_set_full = define_predset ("char-set:full", true);
+  scm_char_set_upper_case = define_charset ("char-set:upper-case");
+  scm_char_set_lower_case = define_charset ("char-set:lower-case");
+  scm_char_set_title_case = define_charset ("char-set:title-case");
+  scm_char_set_letter = define_charset ("char-set:letter");
+  scm_char_set_digit = define_charset ("char-set:digit");
+  scm_char_set_letter_and_digit = define_charset ("char-set:letter+digit");
+  scm_char_set_graphic = define_charset ("char-set:graphic");
+  scm_char_set_printing = define_charset ("char-set:printing");
+  scm_char_set_whitespace = define_charset ("char-set:whitespace");
+  scm_char_set_iso_control = define_charset ("char-set:iso-control");
+  scm_char_set_punctuation = define_charset ("char-set:punctuation");
+  scm_char_set_symbol = define_charset ("char-set:symbol");
+  scm_char_set_hex_digit = define_charset ("char-set:hex-digit");
+  scm_char_set_blank = define_charset ("char-set:blank");
+  scm_char_set_ascii = define_charset ("char-set:ascii");
+  scm_char_set_empty = define_charset ("char-set:empty");
+  scm_char_set_full = define_charset ("char-set:full");
+
+  scm_srfi_14_compute_char_sets ();
 
 #include "libguile/srfi-14.x"
 }


--- orig/libguile/srfi-14.h
+++ mod/libguile/srfi-14.h
@@ -106,7 +106,7 @@
 SCM_API SCM scm_char_set_empty;
 SCM_API SCM scm_char_set_full;
 
-SCM_API void scm_c_init_srfi_14 (void);
+SCM_API void scm_srfi_14_compute_char_sets (void);
 SCM_API void scm_init_srfi_14 (void);
 
 #endif /* SCM_SRFI_14_H */


--- orig/test-suite/tests/srfi-14.test
+++ mod/test-suite/tests/srfi-14.test
@@ -1,4 +1,4 @@
-;;;; srfi-14.test --- Test suite for Guile's SRFI-14 functions. -*- scheme -*-
+;;;; srfi-14.test --- Test suite for Guile's SRFI-14 functions.
 ;;;; Martin Grabmueller, 2001-07-16
 ;;;;
 ;;;; Copyright (C) 2001, 2006 Free Software Foundation, Inc.
@@ -18,7 +18,9 @@
 ;;;; the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
 ;;;; Boston, MA 02110-1301 USA
 
-(use-modules (srfi srfi-14))
+(use-modules (srfi srfi-14)
+             (srfi srfi-1) ;; `every'
+             (test-suite lib))
 
 (define exception:invalid-char-set-cursor
   (cons 'misc-error "^invalid character set cursor"))
@@ -186,3 +188,102 @@
   (pass-if "upper case char set"
      (char-set= (char-set-map char-upcase char-set:lower-case)
 		char-set:upper-case)))
+
+(with-test-prefix "string->char-set"
+
+  (pass-if "some char set"
+     (let ((chars '(#\g #\u #\i #\l #\e)))
+       (char-set= (list->char-set chars)
+		  (string->char-set (apply string chars))))))
+
+;; Make sure we get an ASCII charset and character classification.
+(if (defined? 'setlocale) (setlocale LC_CTYPE "C"))
+
+(with-test-prefix "standard char sets (ASCII)"
+
+  (pass-if "char-set:letter"
+     (char-set= (string->char-set
+		 (string-append "abcdefghijklmnopqrstuvwxyz"
+				"ABCDEFGHIJKLMNOPQRSTUVWXYZ"))
+		char-set:letter))
+
+  (pass-if "char-set:punctuation"
+     (char-set= (string->char-set "!\"#%&'()*,-./:;?@[\\]_{}")
+		char-set:punctuation))
+
+  (pass-if "char-set:symbol"
+     (char-set= (string->char-set "$+<=>^`|~")
+		char-set:symbol)))
+
+\f
+;;;
+;;; 8-bit charsets.
+;;;
+;;; Here, we only test ISO-8859-1 (Latin-1), notably because behavior of
+;;; SRFI-14 for implementations supporting this charset is well-defined.
+;;;
+
+(define (every? pred lst)
+  (not (not (every pred lst))))
+
+(define (find-latin1-locale)
+  ;; Try to find and install an ISO-8859-1 locale.  Return `#f' on failure.
+  (if (defined? 'setlocale)
+      (let loop ((locales (map (lambda (lang)
+				 (string-append lang ".iso88591"))
+			       '("de_DE" "en_GB" "en_US" "es_ES"
+				 "fr_FR" "it_IT"))))
+	(if (null? locales)
+	    #f
+	    (if (false-if-exception (setlocale LC_CTYPE (car locales)))
+		(car locales)
+		(loop (cdr locales)))))
+      #f))
+
+
+(define %latin1 (find-latin1-locale))
+
+(with-test-prefix "Latin-1 (8-bit charset)"
+
+  ;; Note: the membership tests below are not exhaustive.
+
+  (pass-if "char-set:letter (membership)"
+     (if (not %latin1)
+	 (throw 'unresolved)
+	 (let ((letters (char-set->list char-set:letter)))
+	   (every? (lambda (8-bit-char)
+		     (memq 8-bit-char letters))
+		   (append '(#\a #\b #\c)             ;; ASCII
+			   (string->list "çéèâùÉÀÈÊ") ;; French
+			   (string->list "øñÑíßåæðþ"))))))
+
+  (pass-if "char-set:letter (size)"
+     (if (not %latin1)
+	 (throw 'unresolved)
+	 (= (char-set-size char-set:letter) 117)))
+
+  (pass-if "char-set:lower-case (size)"
+     (if (not %latin1)
+	 (throw 'unresolved)
+	 (= (char-set-size char-set:lower-case) (+ 26 33))))
+
+  (pass-if "char-set:upper-case (size)"
+     (if (not %latin1)
+	 (throw 'unresolved)
+	 (= (char-set-size char-set:upper-case) (+ 26 30))))
+
+  (pass-if "char-set:punctuation (membership)"
+     (if (not %latin1)
+	 (thrown 'unresolved)
+	 (let ((punctuation (char-set->list char-set:punctuation)))
+	   (every? (lambda (8-bit-char)
+		     (memq 8-bit-char punctuation))
+		   (append '(#\! #\. #\?)            ;; ASCII
+			   (string->list "¡¿")       ;; Castellano
+			   (string->list "«»"))))))) ;; French
+
+
+;; Local Variables:
+;; mode: scheme
+;; coding: latin-1
+;; End:



[-- Attachment #3: Type: text/plain, Size: 143 bytes --]

_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: SRFI-14 and locale settings
  2006-09-16 13:46                     ` Ludovic Courtès
@ 2006-09-18 23:48                       ` Kevin Ryde
  2006-09-19 12:28                         ` Ludovic Courtès
  0 siblings, 1 reply; 23+ messages in thread
From: Kevin Ryde @ 2006-09-18 23:48 UTC (permalink / raw)
  Cc: guile-devel

ludovic.courtes@laas.fr (Ludovic Courtès) writes:
>
> (in particular, `ispunct ()' does not match `char-set:punctuation').

Oh, it's a bit bigger.

> but `punctuation', for instance, is a superset of what
> SRFI-14 expects while `symbol' is (correspondingly) a subset of what it
> should be,

Does the srfi specified relation to graphic still hold?  Ie.

	graphic = letter + digit + punctuation + symbol

> -#define SCM_CHARSET_SET(cs, idx) \
> -  (((long *) SCM_SMOB_DATA (cs))[(idx) / SCM_BITS_PER_LONG] |= \
> +#define SCM_CHARSET_SET(cs, idx)				\
> +  (((long *) SCM_SMOB_DATA (cs))[(idx) / SCM_BITS_PER_LONG] |=	\
>      (1L << ((idx) % SCM_BITS_PER_LONG)))

Is that a change?

> +      if (pred (c))				\
> +	SCM_CHARSET_SET ((cset), (c));		\
> +      else					\
> +	SCM_CHARSET_UNSET ((cset), (c));	\

It may be possible to do a "set to a value" rather than separate
set/unset macros.

> -(use-modules (srfi srfi-14))
> +(use-modules (srfi srfi-14)
> +             (srfi srfi-1) ;; `every'
> +             (test-suite lib))

A "define-module" there can prevent srfi-1 leaking out to subsequent
tests.

> +(define (find-latin1-locale)
> +  ;; Try to find and install an ISO-8859-1 locale.  Return `#f' on failure.
> +  (if (defined? 'setlocale)
> +      (let loop ((locales (map (lambda (lang)
> +				 (string-append lang ".iso88591"))
> +			       '("de_DE" "en_GB" "en_US" "es_ES"
> +				 "fr_FR" "it_IT"))))

The posix "locale -a" program can print all available locales, if you
wanted to ask nl_langinfo(CODESET) or "locale -k charmap" what the
charset is for each of them, or just try the undotted ones with
8859-1, or whatever.


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: SRFI-14 and locale settings
  2006-09-18 23:48                       ` Kevin Ryde
@ 2006-09-19 12:28                         ` Ludovic Courtès
  2006-09-19 22:42                           ` Kevin Ryde
  0 siblings, 1 reply; 23+ messages in thread
From: Ludovic Courtès @ 2006-09-19 12:28 UTC (permalink / raw)
  Cc: guile-devel

[-- Attachment #1: Type: text/plain, Size: 2750 bytes --]

Hi,

Kevin Ryde <user42@zip.com.au> writes:

>> but `punctuation', for instance, is a superset of what
>> SRFI-14 expects while `symbol' is (correspondingly) a subset of what it
>> should be,
>
> Does the srfi specified relation to graphic still hold?  Ie.
>
> 	graphic = letter + digit + punctuation + symbol

Yes.  I added tests for `char-set:graphic' in both Latin-1 and ASCII.

While I was at it, I modified `srfi-14.c' so that, for all char sets
defined by the SRFI as a union of other char sets, it explicitly uses a
predicate that reflects this (see, e.g., `CSET_GRAPHIC_PRED',
`CSET_PRINTING_PRED'), so that the property is verified "by
construction".

>> -#define SCM_CHARSET_SET(cs, idx) \
>> -  (((long *) SCM_SMOB_DATA (cs))[(idx) / SCM_BITS_PER_LONG] |= \
>> +#define SCM_CHARSET_SET(cs, idx)				\
>> +  (((long *) SCM_SMOB_DATA (cs))[(idx) / SCM_BITS_PER_LONG] |=	\
>>      (1L << ((idx) % SCM_BITS_PER_LONG)))
>
> Is that a change?

No, sorry, just re-formatting (applying `c-backslash-region').

>> +      if (pred (c))				\
>> +	SCM_CHARSET_SET ((cset), (c));		\
>> +      else					\
>> +	SCM_CHARSET_UNSET ((cset), (c));	\
>
> It may be possible to do a "set to a value" rather than separate
> set/unset macros.

In theory, but since we either set a bit by or'ing it or clear it by
and'ing its one's complement, it's not easily doable.  ;-)

>> -(use-modules (srfi srfi-14))
>> +(use-modules (srfi srfi-14)
>> +             (srfi srfi-1) ;; `every'
>> +             (test-suite lib))
>
> A "define-module" there can prevent srfi-1 leaking out to subsequent
> tests.

Agreed.  I changed this too.

>> +(define (find-latin1-locale)
>> +  ;; Try to find and install an ISO-8859-1 locale.  Return `#f' on failure.
>> +  (if (defined? 'setlocale)
>> +      (let loop ((locales (map (lambda (lang)
>> +				 (string-append lang ".iso88591"))
>> +			       '("de_DE" "en_GB" "en_US" "es_ES"
>> +				 "fr_FR" "it_IT"))))
>
> The posix "locale -a" program can print all available locales, if you
> wanted to ask nl_langinfo(CODESET) or "locale -k charmap" what the
> charset is for each of them, or just try the undotted ones with
> 8859-1, or whatever.

Yeah, I know, but then we'd have to rely on, say, `(ice-9 popen)' to run
`locale' and parse its output, and `locale' would have to be present and
standard-conforming, etc.  So I thought that hardcoding locales this way
would not be less reliable and at least simpler than running `locale'.

`nl_langinfo ()' would be great, but we'd need to provide bindings for
it first, and it's an X/Open API, not ISO C, so it may not be available
everywhere (unfortunately).

The updated patch is attached below (only `srfi-14.test' was changed).
Let me know if it's ok to commit.

Thanks,
Ludovic.


[-- Attachment #2: The updated patch --]
[-- Type: text/x-patch, Size: 16442 bytes --]

--- orig/configure.in
+++ mod/configure.in
@@ -598,9 +598,10 @@
 #   readdir_r - recent posix, not on old systems
 #   stat64 - SuS largefile stuff, not on old systems
 #   sysconf - not on old systems
+#   isblank - available as a GNU extension or in C99
 #   _NSGetEnviron - Darwin specific
 #
-AC_CHECK_FUNCS([DINFINITY DQNAN ctermid fesetround ftime fchown getcwd geteuid gettimeofday gmtime_r ioctl lstat mkdir mknod nice readdir_r readlink rename rmdir select setegid seteuid setlocale setpgid setsid sigaction siginterrupt stat64 strftime strptime symlink sync sysconf tcgetpgrp tcsetpgrp times uname waitpid strdup system usleep atexit on_exit chown link fcntl ttyname getpwent getgrent kill getppid getpgrp fork setitimer getitimer strchr strcmp index bcopy memcpy rindex unsetenv _NSGetEnviron])
+AC_CHECK_FUNCS([DINFINITY DQNAN ctermid fesetround ftime fchown getcwd geteuid gettimeofday gmtime_r ioctl lstat mkdir mknod nice readdir_r readlink rename rmdir select setegid seteuid setlocale setpgid setsid sigaction siginterrupt stat64 strftime strptime symlink sync sysconf tcgetpgrp tcsetpgrp times uname waitpid strdup system usleep atexit on_exit chown link fcntl ttyname getpwent getgrent kill getppid getpgrp fork setitimer getitimer strchr strcmp index bcopy memcpy rindex unsetenv isblank _NSGetEnviron])
 
 # Reasons for testing:
 #   netdb.h - not in mingw


--- orig/libguile/posix.c
+++ mod/libguile/posix.c
@@ -34,6 +34,7 @@
 #include "libguile/feature.h"
 #include "libguile/strings.h"
 #include "libguile/srfi-13.h"
+#include "libguile/srfi-14.h"
 #include "libguile/vectors.h"
 #include "libguile/lang.h"
 
@@ -1392,6 +1393,10 @@
       SCM_SYSERROR;
     }
 
+  /* Recompute the standard SRFI-14 character sets in a locale-dependent
+     (actually charset-dependent) way.  */
+  scm_srfi_14_compute_char_sets ();
+
   scm_dynwind_end ();
   return scm_from_locale_string (rv);
 }


--- orig/libguile/srfi-14.c
+++ mod/libguile/srfi-14.c
@@ -17,18 +17,27 @@
  * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
  */
 
+#define _GNU_SOURCE  /* Ask for `isblank ()'.  */
 
 #include <string.h>
 #include <ctype.h>
 
+#ifdef HAVE_CONFIG_H
+# include <config.h>
+#endif
+
 #include "libguile.h"
 #include "libguile/srfi-14.h"
 
 
-#define SCM_CHARSET_SET(cs, idx) \
-  (((long *) SCM_SMOB_DATA (cs))[(idx) / SCM_BITS_PER_LONG] |= \
+#define SCM_CHARSET_SET(cs, idx)				\
+  (((long *) SCM_SMOB_DATA (cs))[(idx) / SCM_BITS_PER_LONG] |=	\
     (1L << ((idx) % SCM_BITS_PER_LONG)))
 
+#define SCM_CHARSET_UNSET(cs, idx)				\
+  (((long *) SCM_SMOB_DATA (cs))[(idx) / SCM_BITS_PER_LONG] &=	\
+    (~(1L << ((idx) % SCM_BITS_PER_LONG))))
+
 #define BYTES_PER_CHARSET (SCM_CHARSET_SIZE / 8)
 #define LONGS_PER_CHARSET (SCM_CHARSET_SIZE / SCM_BITS_PER_LONG)
 
@@ -1393,6 +1402,9 @@
 }
 #undef FUNC_NAME
 
+\f
+/* Standard character sets.  */
+
 SCM scm_char_set_lower_case;
 SCM scm_char_set_upper_case;
 SCM scm_char_set_title_case;
@@ -1411,48 +1423,123 @@
 SCM scm_char_set_empty;
 SCM scm_char_set_full;
 
-static SCM
-make_predset (int (*pred) (int))
-{
-  int ch;
-  SCM cs = make_char_set (NULL);
-  for (ch = 0; ch < 256; ch++)
-    if (pred (ch))
-      SCM_CHARSET_SET (cs, ch);
-  return cs;
-}
 
-static SCM
-define_predset (const char *name, int (*pred) (int))
+/* Create an empty character set and return it after binding it to NAME.  */
+static inline SCM
+define_charset (const char *name)
 {
-  SCM cs = make_predset (pred);
+  SCM cs = make_char_set (NULL);
   scm_c_define (name, cs);
   return scm_permanent_object (cs);
 }
 
-static SCM
-make_strset (const char *str)
+/* Membership predicates for the various char sets.
+
+   XXX: The `punctuation' and `symbol' char sets have no direct equivalent in
+   <ctype.h>.  Thus, the predicates below yield correct results for ASCII,
+   but they do not provide the result described by the SRFI for Latin-1.  The
+   correct Latin-1 result could only be obtained by hard-coding the
+   characters listed by the SRFI, but the problem would remain for other
+   8-bit charsets.
+
+   Similarly, character 0xA0 in Latin-1 (unbreakable space, `#\0240') should
+   be part of `char-set:blank'.  However, glibc's current (2006/09) Latin-1
+   locales (which use the ISO 14652 "i18n" FDCC-set) do not consider it
+   `blank' so it ends up in `char-set:punctuation'.  */
+#ifdef HAVE_ISBLANK
+# define CSET_BLANK_PRED(c)  (isblank (c))
+#else
+# define CSET_BLANK_PRED(c)			\
+   (((c) == ' ') || ((c) == '\t'))
+#endif
+
+#define CSET_SYMBOL_PRED(c)					\
+  (((c) != '\0') && (strchr ("$+<=>^`|~", (c)) != NULL))
+#define CSET_PUNCT_PRED(c)					\
+  ((ispunct (c)) && (!CSET_SYMBOL_PRED (c)))
+
+#define CSET_LOWER_PRED(c)       (islower (c))
+#define CSET_UPPER_PRED(c)       (isupper (c))
+#define CSET_LETTER_PRED(c)      (isalpha (c))
+#define CSET_DIGIT_PRED(c)       (isdigit (c))
+#define CSET_WHITESPACE_PRED(c)  (isspace (c))
+#define CSET_CONTROL_PRED(c)     (iscntrl (c))
+#define CSET_HEX_DIGIT_PRED(c)   (isxdigit (c))
+#define CSET_ASCII_PRED(c)       (isascii (c))
+
+/* Some char sets are explicitly defined by the SRFI as a union of other char
+   sets so we try to follow this closely.  */
+
+#define CSET_LETTER_AND_DIGIT_PRED(c)		\
+  (CSET_LETTER_PRED (c) || CSET_DIGIT_PRED (c))
+
+#define CSET_GRAPHIC_PRED(c)				\
+  (CSET_LETTER_PRED (c) || CSET_DIGIT_PRED (c)		\
+   || CSET_PUNCT_PRED (c) || CSET_SYMBOL_PRED (c))
+
+#define CSET_PRINTING_PRED(c)				\
+  (CSET_GRAPHIC_PRED (c) || CSET_WHITESPACE_PRED (c))
+
+/* False and true predicates.  */
+#define CSET_TRUE_PRED(c)    (1)
+#define CSET_FALSE_PRED(c)   (0)
+
+
+/* Compute the contents of all the standard character sets.  Computation may
+   need to be re-done at `setlocale'-time because some char sets (e.g.,
+   `char-set:letter') need to reflect the character set supported by Guile.
+
+   For instance, at startup time, the "C" locale is used, thus Guile supports
+   only ASCII; therefore, `char-set:letter' only contains English letters.
+   The user can change this by invoking `setlocale' and specifying a locale
+   with an 8-bit charset, thereby augmenting some of the SRFI-14 standard
+   character sets.
+
+   This works because some of the predicates used below to construct
+   character sets (e.g., `isalpha(3)') are locale-dependent (so
+   charset-dependent, though generally not language-dependent).  For details,
+   please see the `guile-devel' mailing list archive of September 2006.  */
+void
+scm_srfi_14_compute_char_sets (void)
 {
-  SCM cs = make_char_set (NULL);
-  while (*str)
+#define UPDATE_CSET(c, cset, pred)		\
+  do						\
+    {						\
+      if (pred (c))				\
+	SCM_CHARSET_SET ((cset), (c));		\
+      else					\
+	SCM_CHARSET_UNSET ((cset), (c));	\
+    }						\
+  while (0)
+
+  register int ch;
+
+  for (ch = 0; ch < 256; ch++)
     {
-      SCM_CHARSET_SET (cs, *str);
-      str++;
+      UPDATE_CSET (ch, scm_char_set_upper_case, CSET_UPPER_PRED);
+      UPDATE_CSET (ch, scm_char_set_lower_case, CSET_LOWER_PRED);
+      UPDATE_CSET (ch, scm_char_set_title_case, CSET_FALSE_PRED);
+      UPDATE_CSET (ch, scm_char_set_letter, CSET_LETTER_PRED);
+      UPDATE_CSET (ch, scm_char_set_digit, CSET_DIGIT_PRED);
+      UPDATE_CSET (ch, scm_char_set_letter_and_digit,
+		   CSET_LETTER_AND_DIGIT_PRED);
+      UPDATE_CSET (ch, scm_char_set_graphic, CSET_GRAPHIC_PRED);
+      UPDATE_CSET (ch, scm_char_set_printing, CSET_PRINTING_PRED);
+      UPDATE_CSET (ch, scm_char_set_whitespace, CSET_WHITESPACE_PRED);
+      UPDATE_CSET (ch, scm_char_set_iso_control, CSET_CONTROL_PRED);
+      UPDATE_CSET (ch, scm_char_set_punctuation, CSET_PUNCT_PRED);
+      UPDATE_CSET (ch, scm_char_set_symbol, CSET_SYMBOL_PRED);
+      UPDATE_CSET (ch, scm_char_set_hex_digit, CSET_HEX_DIGIT_PRED);
+      UPDATE_CSET (ch, scm_char_set_blank, CSET_BLANK_PRED);
+      UPDATE_CSET (ch, scm_char_set_ascii, CSET_ASCII_PRED);
+      UPDATE_CSET (ch, scm_char_set_empty, CSET_FALSE_PRED);
+      UPDATE_CSET (ch, scm_char_set_full, CSET_TRUE_PRED);
     }
-  return cs;
-}
 
-static SCM
-define_strset (const char *name, const char *str)
-{
-  SCM cs = make_strset (str);
-  scm_c_define (name, cs);
-  return scm_permanent_object (cs);
+#undef UPDATE_CSET
 }
 
-static int false (int ch) { return 0; }
-static int true (int ch) { return 1; }
-
+\f
 void
 scm_init_srfi_14 (void)
 {
@@ -1461,24 +1548,25 @@
   scm_set_smob_free (scm_tc16_charset, charset_free);
   scm_set_smob_print (scm_tc16_charset, charset_print);
 
-  scm_char_set_upper_case = define_predset ("char-set:upper-case", isupper);
-  scm_char_set_lower_case = define_predset ("char-set:lower-case", islower);
-  scm_char_set_title_case = define_predset ("char-set:title-case", false);
-  scm_char_set_letter = define_predset ("char-set:letter", isalpha);
-  scm_char_set_digit = define_predset ("char-set:digit", isdigit);
-  scm_char_set_letter_and_digit = define_predset ("char-set:letter+digit",
-						  isalnum);
-  scm_char_set_graphic = define_predset ("char-set:graphic", isgraph);
-  scm_char_set_printing = define_predset ("char-set:printing", isprint);
-  scm_char_set_whitespace = define_predset ("char-set:whitespace", isspace);
-  scm_char_set_iso_control = define_predset ("char-set:iso-control", iscntrl);
-  scm_char_set_punctuation = define_predset ("char-set:punctuation", ispunct);
-  scm_char_set_symbol = define_strset ("char-set:symbol", "$+<=>^`|~");
-  scm_char_set_hex_digit = define_predset ("char-set:hex-digit", isxdigit);
-  scm_char_set_blank = define_strset ("char-set:blank", " \t");
-  scm_char_set_ascii = define_predset ("char-set:ascii", isascii);
-  scm_char_set_empty = define_predset ("char-set:empty", false);
-  scm_char_set_full = define_predset ("char-set:full", true);
+  scm_char_set_upper_case = define_charset ("char-set:upper-case");
+  scm_char_set_lower_case = define_charset ("char-set:lower-case");
+  scm_char_set_title_case = define_charset ("char-set:title-case");
+  scm_char_set_letter = define_charset ("char-set:letter");
+  scm_char_set_digit = define_charset ("char-set:digit");
+  scm_char_set_letter_and_digit = define_charset ("char-set:letter+digit");
+  scm_char_set_graphic = define_charset ("char-set:graphic");
+  scm_char_set_printing = define_charset ("char-set:printing");
+  scm_char_set_whitespace = define_charset ("char-set:whitespace");
+  scm_char_set_iso_control = define_charset ("char-set:iso-control");
+  scm_char_set_punctuation = define_charset ("char-set:punctuation");
+  scm_char_set_symbol = define_charset ("char-set:symbol");
+  scm_char_set_hex_digit = define_charset ("char-set:hex-digit");
+  scm_char_set_blank = define_charset ("char-set:blank");
+  scm_char_set_ascii = define_charset ("char-set:ascii");
+  scm_char_set_empty = define_charset ("char-set:empty");
+  scm_char_set_full = define_charset ("char-set:full");
+
+  scm_srfi_14_compute_char_sets ();
 
 #include "libguile/srfi-14.x"
 }


--- orig/libguile/srfi-14.h
+++ mod/libguile/srfi-14.h
@@ -106,7 +106,7 @@
 SCM_API SCM scm_char_set_empty;
 SCM_API SCM scm_char_set_full;
 
-SCM_API void scm_c_init_srfi_14 (void);
+SCM_API void scm_srfi_14_compute_char_sets (void);
 SCM_API void scm_init_srfi_14 (void);
 
 #endif /* SCM_SRFI_14_H */


--- orig/test-suite/tests/srfi-14.test
+++ mod/test-suite/tests/srfi-14.test
@@ -1,4 +1,4 @@
-;;;; srfi-14.test --- Test suite for Guile's SRFI-14 functions. -*- scheme -*-
+;;;; srfi-14.test --- Test suite for Guile's SRFI-14 functions.
 ;;;; Martin Grabmueller, 2001-07-16
 ;;;;
 ;;;; Copyright (C) 2001, 2006 Free Software Foundation, Inc.
@@ -18,7 +18,11 @@
 ;;;; the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
 ;;;; Boston, MA 02110-1301 USA
 
-(use-modules (srfi srfi-14))
+(define-module (test-suite test-srfi-14)
+  :use-module (srfi srfi-14)
+  :use-module (srfi srfi-1) ;; `every'
+  :use-module (test-suite lib))
+
 
 (define exception:invalid-char-set-cursor
   (cons 'misc-error "^invalid character set cursor"))
@@ -186,3 +190,128 @@
   (pass-if "upper case char set"
      (char-set= (char-set-map char-upcase char-set:lower-case)
 		char-set:upper-case)))
+
+(with-test-prefix "string->char-set"
+
+  (pass-if "some char set"
+     (let ((chars '(#\g #\u #\i #\l #\e)))
+       (char-set= (list->char-set chars)
+		  (string->char-set (apply string chars))))))
+
+;; Make sure we get an ASCII charset and character classification.
+(if (defined? 'setlocale) (setlocale LC_CTYPE "C"))
+
+(with-test-prefix "standard char sets (ASCII)"
+
+  (pass-if "char-set:letter"
+     (char-set= (string->char-set
+		 (string-append "abcdefghijklmnopqrstuvwxyz"
+				"ABCDEFGHIJKLMNOPQRSTUVWXYZ"))
+		char-set:letter))
+
+  (pass-if "char-set:punctuation"
+     (char-set= (string->char-set "!\"#%&'()*,-./:;?@[\\]_{}")
+		char-set:punctuation))
+
+  (pass-if "char-set:symbol"
+     (char-set= (string->char-set "$+<=>^`|~")
+		char-set:symbol))
+
+  (pass-if "char-set:letter+digit"
+     (char-set= char-set:letter+digit
+                (char-set-union char-set:letter char-set:digit)))
+
+  (pass-if "char-set:graphic"
+     (char-set= char-set:graphic
+                (char-set-union char-set:letter char-set:digit
+                                char-set:punctuation char-set:symbol)))
+
+  (pass-if "char-set:printing"
+      (char-set= char-set:printing
+                 (char-set-union char-set:whitespace char-set:graphic))))
+
+
+\f
+;;;
+;;; 8-bit charsets.
+;;;
+;;; Here, we only test ISO-8859-1 (Latin-1), notably because behavior of
+;;; SRFI-14 for implementations supporting this charset is well-defined.
+;;;
+
+(define (every? pred lst)
+  (not (not (every pred lst))))
+
+(define (find-latin1-locale)
+  ;; Try to find and install an ISO-8859-1 locale.  Return `#f' on failure.
+  (if (defined? 'setlocale)
+      (let loop ((locales (map (lambda (lang)
+				 (string-append lang ".iso88591"))
+			       '("de_DE" "en_GB" "en_US" "es_ES"
+				 "fr_FR" "it_IT"))))
+	(if (null? locales)
+	    #f
+	    (if (false-if-exception (setlocale LC_CTYPE (car locales)))
+		(car locales)
+		(loop (cdr locales)))))
+      #f))
+
+
+(define %latin1 (find-latin1-locale))
+
+(with-test-prefix "Latin-1 (8-bit charset)"
+
+  ;; Note: the membership tests below are not exhaustive.
+
+  (pass-if "char-set:letter (membership)"
+     (if (not %latin1)
+	 (throw 'unresolved)
+	 (let ((letters (char-set->list char-set:letter)))
+	   (every? (lambda (8-bit-char)
+		     (memq 8-bit-char letters))
+		   (append '(#\a #\b #\c)             ;; ASCII
+			   (string->list "çéèâùÉÀÈÊ") ;; French
+			   (string->list "øñÑíßåæðþ"))))))
+
+  (pass-if "char-set:letter (size)"
+     (if (not %latin1)
+	 (throw 'unresolved)
+	 (= (char-set-size char-set:letter) 117)))
+
+  (pass-if "char-set:lower-case (size)"
+     (if (not %latin1)
+	 (throw 'unresolved)
+	 (= (char-set-size char-set:lower-case) (+ 26 33))))
+
+  (pass-if "char-set:upper-case (size)"
+     (if (not %latin1)
+	 (throw 'unresolved)
+	 (= (char-set-size char-set:upper-case) (+ 26 30))))
+
+  (pass-if "char-set:punctuation (membership)"
+     (if (not %latin1)
+	 (thrown 'unresolved)
+	 (let ((punctuation (char-set->list char-set:punctuation)))
+	   (every? (lambda (8-bit-char)
+		     (memq 8-bit-char punctuation))
+		   (append '(#\! #\. #\?)            ;; ASCII
+			   (string->list "¡¿")       ;; Castellano
+			   (string->list "«»"))))))  ;; French
+
+  (pass-if "char-set:letter+digit"
+     (char-set= char-set:letter+digit
+                (char-set-union char-set:letter char-set:digit)))
+
+  (pass-if "char-set:graphic"
+     (char-set= char-set:graphic
+                (char-set-union char-set:letter char-set:digit
+                                char-set:punctuation char-set:symbol)))
+
+  (pass-if "char-set:printing"
+     (char-set= char-set:printing
+                (char-set-union char-set:whitespace char-set:graphic))))
+
+;; Local Variables:
+;; mode: scheme
+;; coding: latin-1
+;; End:



[-- Attachment #3: Type: text/plain, Size: 143 bytes --]

_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: SRFI-14 and locale settings
  2006-09-19 12:28                         ` Ludovic Courtès
@ 2006-09-19 22:42                           ` Kevin Ryde
  2006-09-20 13:21                             ` Ludovic Courtès
  0 siblings, 1 reply; 23+ messages in thread
From: Kevin Ryde @ 2006-09-19 22:42 UTC (permalink / raw)


ludovic.courtes@laas.fr (Ludovic Courtès) writes:
>
> +#ifdef HAVE_CONFIG_H
> +# include <config.h>
> +#endif

No need to conditionalize that, just the #include is enough.  And it
normally should be the first thing in the file, if it isn't already.
Otherwise looks ok.


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: SRFI-14 and locale settings
  2006-09-19 22:42                           ` Kevin Ryde
@ 2006-09-20 13:21                             ` Ludovic Courtès
  2006-09-22 20:02                               ` Neil Jerram
  0 siblings, 1 reply; 23+ messages in thread
From: Ludovic Courtès @ 2006-09-20 13:21 UTC (permalink / raw)


Hi,

Kevin Ryde <user42@zip.com.au> writes:

> ludovic.courtes@laas.fr (Ludovic Courtès) writes:
>>
>> +#ifdef HAVE_CONFIG_H
>> +# include <config.h>
>> +#endif
>
> No need to conditionalize that, just the #include is enough.  And it
> normally should be the first thing in the file, if it isn't already.

I left the conditional since (i) all other files have it and (ii) it
makes sense from an Autoconf viewpoint.  :-)

> Otherwise looks ok.

I committed the patch into HEAD, along with the following doc bits
(under ``Standard Character Sets''):

  Currently, the contents of these character sets are recomputed upon a
  successful @code{setlocale} call (@pxref{Locales}) in order to reflect
  the characters available in the current locale's codeset.  For
  instance, @code{char-set:letter} contains 52 characters under an ASCII
  locale (e.g., the default @code{C} locale) and 117 characters under an
  ISO-8859-1 (``Latin-1'') locale.

I'll eventually merge it into the 1.8 branch if nobody disagrees.

Thanks,
Ludovic.


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: SRFI-14 and locale settings
  2006-09-20 13:21                             ` Ludovic Courtès
@ 2006-09-22 20:02                               ` Neil Jerram
  2006-09-25  8:27                                 ` Ludovic Courtès
  0 siblings, 1 reply; 23+ messages in thread
From: Neil Jerram @ 2006-09-22 20:02 UTC (permalink / raw)


ludovic.courtes@laas.fr (Ludovic Courtès) writes:

> I committed the patch into HEAD, along with the following doc bits
> (under ``Standard Character Sets''):
>
>   Currently, the contents of these character sets are recomputed upon a
>   successful @code{setlocale} call (@pxref{Locales}) in order to reflect
>   the characters available in the current locale's codeset.  For
>   instance, @code{char-set:letter} contains 52 characters under an ASCII
>   locale (e.g., the default @code{C} locale) and 117 characters under an
>   ISO-8859-1 (``Latin-1'') locale.
>
> I'll eventually merge it into the 1.8 branch if nobody disagrees.

As long as you're happy that 1.8-targeting programs can't reasonably
be relying on some detail that has now changed, that sounds fine.

Regards,
     Neil



_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: SRFI-14 and locale settings
  2006-09-22 20:02                               ` Neil Jerram
@ 2006-09-25  8:27                                 ` Ludovic Courtès
  0 siblings, 0 replies; 23+ messages in thread
From: Ludovic Courtès @ 2006-09-25  8:27 UTC (permalink / raw)
  Cc: guile-devel

Hi,

Neil Jerram <neil@ossau.uklinux.net> writes:

> ludovic.courtes@laas.fr (Ludovic Courtès) writes:
>
>> I committed the patch into HEAD, along with the following doc bits
>> (under ``Standard Character Sets''):
>>
>>   Currently, the contents of these character sets are recomputed upon a
>>   successful @code{setlocale} call (@pxref{Locales}) in order to reflect
>>   the characters available in the current locale's codeset.  For
>>   instance, @code{char-set:letter} contains 52 characters under an ASCII
>>   locale (e.g., the default @code{C} locale) and 117 characters under an
>>   ISO-8859-1 (``Latin-1'') locale.
>>
>> I'll eventually merge it into the 1.8 branch if nobody disagrees.
>
> As long as you're happy that 1.8-targeting programs can't reasonably
> be relying on some detail that has now changed, that sounds fine.

Indeed, I'm pretty confident about it (and in any case, only
`setlocale'-using programs would see the difference).

Thus, I just merged it into 1.8.

Thanks,
Ludovic.


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2006-09-25  8:27 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-03 16:48 SRFI-14 and locale settings Ludovic Courtès
2006-09-04  6:41 ` Neil Jerram
2006-09-04  9:08   ` Ludovic Courtès
2006-09-04 23:42     ` Kevin Ryde
2006-09-07  7:21       ` Ludovic Courtès
2006-09-07 23:22         ` Kevin Ryde
2006-09-12  9:28           ` Ludovic Courtès
2006-09-12 18:17             ` Neil Jerram
2006-09-13  8:29               ` Ludovic Courtès
2006-09-13 18:07                 ` Neil Jerram
2006-09-14 15:58                   ` Ludovic Courtès
2006-09-14  0:07             ` Kevin Ryde
2006-09-14 13:22               ` Ludovic Courtès
2006-09-15  0:53                 ` Kevin Ryde
2006-09-15  9:28                   ` Neil Jerram
2006-09-16 13:46                     ` Ludovic Courtès
2006-09-18 23:48                       ` Kevin Ryde
2006-09-19 12:28                         ` Ludovic Courtès
2006-09-19 22:42                           ` Kevin Ryde
2006-09-20 13:21                             ` Ludovic Courtès
2006-09-22 20:02                               ` Neil Jerram
2006-09-25  8:27                                 ` Ludovic Courtès
2006-09-15 12:03                   ` Ludovic Courtès

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).