Hi, Neil Jerram writes: > Yes; based on Kevin's and Ludovic's latest emails, I'm happy now with > the isalpha() solution if we can make it leverage this "i18n" > classification. Below is a patch that does what we agreed on: keep using the functions for character classification, and recompute the standard SRFI-14 char sets upon successful `setlocale'. It also makes char set computation more efficient and fixes `char-set:punctuation' and `char-set:symbol' in ASCII. There are still issues. The bug in `char-set:punctuation' and `char-set:symbol' I mention above is due to the fact that there is not equivalent to those char sets (in particular, `ispunct ()' does not match `char-set:punctuation'). Fixing it for ASCII was easy, but it's not so easy for Latin-1. The reason we can hardly get `char-set:punctuation' and `char-set:symbol' for Latin-1 is that we don't want to hard-code too much Latin-1-specific knowledge: one goal is to have SRFI-14 provide also sensible results for non-Latin-1 8-bit charsets. With this patch, all standard char sets are those expected by SRFI-14 in ASCII. In Latin-1, `char-set:letter', as well as `lower-case', `upper-case', and `iso-control' are correct (at least, using current glibc locales), but `punctuation', for instance, is a superset of what SRFI-14 expects while `symbol' is (correspondingly) a subset of what it should be, and `blank' lacks the "no-break space" character (#\0240). I'm not sure we can do much better than that until Guile fully supports Unicode. The right solution, in the end, would be to process the whole `UnicodeData.txt' and generate a character classification strictly following the SRFI-14 rules. In the meantime, I think this patch can be an acceptable solution. I'd be glad if some of you could test it, and especially run the test cases. I added Latin-1-specific test cases, but they require that a Latin-1 locale is available, and it will try to guess what that can be (yes, it looks quite hackish but I couldn't think of anything better...). Comments welcome. Thanks, Ludovic.