unofficial mirror of bug-guile@gnu.org 
 help / color / mirror / Atom feed
* guile 1.8.5 test failure: srfi-14.test
@ 2008-05-15 22:43 Bruno Haible
  2008-05-18 21:04 ` Ludovic Courtès
  0 siblings, 1 reply; 6+ messages in thread
From: Bruno Haible @ 2008-05-15 22:43 UTC (permalink / raw)
  To: bug-guile

Hi,

Trying to install guile-1.8.5 on a Linux/x86 system, I get test failures:

$ ./configure --prefix=/packages/gnu CPPFLAGS=-Wall
$ make
§ make check
...
Running ports.test
UNRESOLVED: ports.test: port-for-each: passing freed cell
Running posix.test
...
Running srfi-14.test
FAIL: srfi-14.test: Latin-1 (8-bit charset): char-set:letter (size)
FAIL: srfi-14.test: Latin-1 (8-bit charset): char-set:lower-case (size)
FAIL: srfi-14.test: Latin-1 (8-bit charset): char-set:upper-case (size)
Running srfi-19.test
...
Running syntax.test
UNRESOLVED: syntax.test: while: in empty environment: empty body
UNRESOLVED: syntax.test: while: in empty environment: initially false
UNRESOLVED: syntax.test: while: in empty environment: iterating
Running threads.test
...
Running weaks.test
UNRESOLVED: weaks.test: weak-vector: dies

Totals for this test run:
passes:                 11896
failures:               3
unexpected passes:      0
expected failures:      25
unresolved test cases:  5
untested test cases:    0
unsupported test cases: 9
errors:                 0

FAIL: check-guile
==================================
1 of 1 tests failed
Please report to bug-guile@gnu.org
==================================
make[2]: *** [check-TESTS] Fehler 1

Environment information:
- gcc version is 3.3.1
- glibc version is 2.3.6
- locale is de_DE.UTF-8

What happens in srfi-14.test?

On my system, (find-latin1-locale) returns "de_DE.iso88591". Now look:
$ guile
guile> (char-set-size char-set:letter)
52
guile> (setlocale LC_CTYPE "de_DE.iso88591")
"de_DE.iso88591"
guile> (char-set-size char-set:letter)
124
guile> char-set:letter
#<charset {#\A #\B #\C #\D #\E #\F #\G #\H #\I #\J #\K #\L #\M #\N #\O #\P #\Q #\R #\S #\T #\U #\V #\W #\X #\Y #\Z #\a #\b #\c #\d #\e #\f #\g #\h #\i #\j #\k #\l #\m #\n #\o #\p #\q #\r #\s #\t #\u #\v #\w #\x #\y #\z #\246 #\250 #\252 #\264 #\265 #\270 #\272 #\274 #\275 #\276 #\300 #\301 #\302 #\303 #\304 #\305 #\306 #\307 #\310 #\311 #\312 #\313 #\314 #\315 #\316 #\317 #\320 #\321 #\322 #\323 #\324 #\325 #\326 #\330 #\331 #\332 #\333 #\334 #\335 #\336 #\337 #\340 #\341 #\342 #\343 #\344 #\345 #\346 #\347 #\350 #\351 #\352 #\353 #\354 #\355 #\356 #\357 #\360 #\361 #\362 #\363 #\364 #\365 #\366 #\370 #\371 #\372 #\373 #\374 #\375 #\376 #\377}>

So the notion of "letters" in a Latin1 locale may depend on the libc.
It might be safer to change the test code from

    (= (char-set-size char-set:letter) 117)

to

    (>= (char-set-size char-set:letter) 100)

If I do this change in srfi-14.test, this part of the test passes.
Similarly, I get
$ guile
guile> (setlocale LC_CTYPE "de_DE.iso88591")
"de_DE.iso88591"
guile> (char-set-size char-set:lower-case)
62
guile> (char-set-size char-set:upper-case)
60

Here is the complete patch. It makes the testsuite pass.


2008-05-15  Bruno Haible  <bruno@clisp.org>

	* test-suite/tests/srfi-14.test: Relax the check on the size of
	the letters, lower-case, upper-case character sets in Latin1 locales.
	Needed on glibc-2.3.6 systems.

--- guile-1.8.5/test-suite/tests/srfi-14.test	2008-04-07 23:30:03.000000000 +0200
+++ guile-1.8.5/test-suite/tests/srfi-14.test	2008-05-16 00:41:26.000000000 +0200
@@ -1,7 +1,7 @@
 ;;;; srfi-14.test --- Test suite for Guile's SRFI-14 functions.
 ;;;; Martin Grabmueller, 2001-07-16
 ;;;;
-;;;; Copyright (C) 2001, 2006 Free Software Foundation, Inc.
+;;;; Copyright (C) 2001, 2006, 2008 Free Software Foundation, Inc.
 ;;;; 
 ;;;; This program is free software; you can redistribute it and/or modify
 ;;;; it under the terms of the GNU General Public License as published by
@@ -276,17 +276,17 @@
   (pass-if "char-set:letter (size)"
      (if (not %latin1)
 	 (throw 'unresolved)
-	 (= (char-set-size char-set:letter) 117)))
+	 (>= (char-set-size char-set:letter) 100)))
 
   (pass-if "char-set:lower-case (size)"
      (if (not %latin1)
 	 (throw 'unresolved)
-	 (= (char-set-size char-set:lower-case) (+ 26 33))))
+	 (>= (char-set-size char-set:lower-case) (+ 26 30))))
 
   (pass-if "char-set:upper-case (size)"
      (if (not %latin1)
 	 (throw 'unresolved)
-	 (= (char-set-size char-set:upper-case) (+ 26 30))))
+	 (>= (char-set-size char-set:upper-case) (+ 26 30))))
 
   (pass-if "char-set:punctuation (membership)"
      (if (not %latin1)






^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: guile 1.8.5 test failure: srfi-14.test
  2008-05-15 22:43 guile 1.8.5 test failure: srfi-14.test Bruno Haible
@ 2008-05-18 21:04 ` Ludovic Courtès
  2008-05-27  0:33   ` Bruno Haible
  0 siblings, 1 reply; 6+ messages in thread
From: Ludovic Courtès @ 2008-05-18 21:04 UTC (permalink / raw)
  To: bug-guile

Hi Bruno,

Bruno Haible <bruno@clisp.org> writes:

> On my system, (find-latin1-locale) returns "de_DE.iso88591". Now look:
> $ guile
> guile> (char-set-size char-set:letter)
> 52
> guile> (setlocale LC_CTYPE "de_DE.iso88591")
> "de_DE.iso88591"
> guile> (char-set-size char-set:letter)
> 124
> guile> char-set:letter
> #<charset {#\A #\B #\C #\D #\E #\F #\G #\H #\I #\J #\K #\L #\M #\N #\O #\P #\Q #\R #\S #\T #\U #\V #\W #\X #\Y #\Z #\a #\b #\c #\d #\e #\f #\g #\h #\i #\j #\k #\l #\m #\n #\o #\p #\q #\r #\s #\t #\u #\v #\w #\x #\y #\z #\246 #\250 #\252 #\264 #\265 #\270 #\272 #\274 #\275 #\276 #\300 #\301 #\302 #\303 #\304 #\305 #\306 #\307 #\310 #\311 #\312 #\313 #\314 #\315 #\316 #\317 #\320 #\321 #\322 #\323 #\324 #\325 #\326 #\330 #\331 #\332 #\333 #\334 #\335 #\336 #\337 #\340 #\341 #\342 #\343 #\344 #\345 #\346 #\347 #\350 #\351 #\352 #\353 #\354 #\355 #\356 #\357 #\360 #\361 #\362 #\363 #\364 #\365 #\366 #\370 #\371 #\372 #\373 #\374 #\375 #\376 #\377}>
>
> So the notion of "letters" in a Latin1 locale may depend on the libc.
> It might be safer to change the test code from
>
>     (= (char-set-size char-set:letter) 117)
>
> to
>
>     (>= (char-set-size char-set:letter) 100)

The cardinals of these char sets were taken from SRFI-14:

  http://srfi.schemers.org/srfi-14/srfi-14.html#StandardCharsetDefs

This indicates that we should fix our SRFI-14 implementation, not the
test.  ;-)

The system I'm currently using also picks `de_DE.iso88591' but it uses
Glibc 2.7, which doesn't have this problem.  I'm pretty sure Glibc 2.5
didn't have this problem either, and FreeBSD 6.2's libc doesn't either.
I don't have any Glibc 2.3-based system at hand, so I can only try to
guess what's going on.

Glibc's `localedata/locales/i18n' appears to be what defines the
character classes.  According to the ChangeLog it was updated in
Feb. 2007 to match Unicode 5.0, and in Apr. 2002 (by you) to match
Unicode 3.2.  Glibc 2.3.6 was released sometime in 2005 (see
http://sourceware.org/ml/libc-announce/2005/msg00001.html), so it
included the latter.

The SRFI-14 locale-sensitive code in Guile and the corresponding tests
date back to Sept. 2006, so it seems unlikely that the Unicode 5.0
update changed anything.  Any idea what to look at?

(Of course, we should eventually use `UnicodeData.txt' directly but
that's not likely to happen anytime soon...)

Thanks,
Ludovic.





^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: guile 1.8.5 test failure: srfi-14.test
  2008-05-18 21:04 ` Ludovic Courtès
@ 2008-05-27  0:33   ` Bruno Haible
  2008-05-30 17:06     ` Ludovic Courtès
  2008-05-30 22:50     ` Bruno Haible
  0 siblings, 2 replies; 6+ messages in thread
From: Bruno Haible @ 2008-05-27  0:33 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: bug-guile

Ludovic Courtès wrote in
<http://lists.gnu.org/archive/html/bug-guile/2008-05/msg00014.html>:
> > So the notion of "letters" in a Latin1 locale may depend on the libc.
> > It might be safer to change the test code from
> >
> >     (= (char-set-size char-set:letter) 117)
> >
> > to
> >
> >     (>= (char-set-size char-set:letter) 100)
> 
> The cardinals of these char sets were taken from SRFI-14:
> 
>   http://srfi.schemers.org/srfi-14/srfi-14.html#StandardCharsetDefs
> 
> This indicates that we should fix our SRFI-14 implementation, not the
> test.  ;-)

I don't think it's appropriate to take these numbers (117 etc.) as precise
expectations. Unicode is a moving target: At every Unicode version, new
characters are being added, and sometimes also the character classification
into "letters" vs. "non-letters" changes.

The SRFI-14 text to which you point says at various places "... in Unicode 3.0".
This matches the date of origin (1999/2000) of that text.

Note also that the text talking about the Unicode letters and 117 is outside
the section "Specification", which makes me think that it is not normative.
Even if it is normative, it nowhere says that you have to use *exactly*
Unicode 3.0.

So you have a choice between 3 alternatives:

  1) Provide an implementation of char-set:letter that is tied to a particular
     Unicode version and will not evolve. Then you can hardwire specific
     letter counts in the test suite.

  2) Provide an implementation that does not rely on the libc locale system
     but still upgrades to new Unicode versions now and then. Then you have to
     update the letter count in the tests when you upgrade the library.

  3) Provide an implementation that relies on the libc locale system, and
     thus upgrades to new Unicode vesions when the libc does. Then you can
     only expect approximate letter counts.

Bruno





^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: guile 1.8.5 test failure: srfi-14.test
  2008-05-27  0:33   ` Bruno Haible
@ 2008-05-30 17:06     ` Ludovic Courtès
  2008-05-30 22:50     ` Bruno Haible
  1 sibling, 0 replies; 6+ messages in thread
From: Ludovic Courtès @ 2008-05-30 17:06 UTC (permalink / raw)
  To: bug-guile; +Cc: bruno

Hi Bruno,

Bruno Haible <bruno@clisp.org> writes:

> Ludovic Courtès wrote in
> <http://lists.gnu.org/archive/html/bug-guile/2008-05/msg00014.html>:

>> The cardinals of these char sets were taken from SRFI-14:
>> 
>>   http://srfi.schemers.org/srfi-14/srfi-14.html#StandardCharsetDefs
>> 
>> This indicates that we should fix our SRFI-14 implementation, not the
>> test.  ;-)
>
> I don't think it's appropriate to take these numbers (117 etc.) as precise
> expectations. Unicode is a moving target: At every Unicode version, new
> characters are being added, and sometimes also the character classification
> into "letters" vs. "non-letters" changes.

I'm unsure about this.  Certainly Unicode is a moving target, but ASCII
and ISO-8859-1 aren't.

OTOH, one could argue that it's Unicode that defines what category each
ASCII/Latin-1 character belongs to.  Nevertheless, it seems reasonable
to expect that character classification for ASCII/Latin-1 won't change
in the future (were there many changes in that area in the past?).

What do you think?

Thanks,
Ludovic.





^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: guile 1.8.5 test failure: srfi-14.test
  2008-05-27  0:33   ` Bruno Haible
  2008-05-30 17:06     ` Ludovic Courtès
@ 2008-05-30 22:50     ` Bruno Haible
  2008-05-31 20:39       ` Ludovic Courtès
  1 sibling, 1 reply; 6+ messages in thread
From: Bruno Haible @ 2008-05-30 22:50 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: bug-guile

> > The cardinals of these char sets were taken from SRFI-14:
> > 
> >   http://srfi.schemers.org/srfi-14/srfi-14.html#StandardCharsetDefs

This text also contains the complete lists of 117 characters for 'letter'.
Comparing it with the 124 characters that I got, the differences are at

 #\246 U+00A6  BROKEN BAR
 #\250 U+00A8  DIAERESIS
 #\264 U+00B4  ACUTE ACCENT
 #\270 U+00B8  CEDILLA
 #\274 U+00BC  VULGAR FRACTION ONE QUARTER
 #\275 U+00BD  VULGAR FRACTION ONE HALF
 #\276 U+00BE  VULGAR FRACTION THREE QUARTERS

The Unicode classification of these characters has not changed between
Unicode 3.0 and Unicode 5.0.

But these are exactly the byte values which differ between ISO-8859-1 and
ISO-8859-15.

$ LC_ALL=de_DE.iso88591 locale -c LC_CTYPE | head -6
LC_CTYPE
upper;lower;alpha;digit;xdigit;space;print;graph;blank;cntrl;punct;alnum;combining;combining_level3
toupper;tolower;totitle
16
1
ISO-8859-15

Oops. My system has a locale called 'de_DE.iso88591' which is in fact using
ISO-8859-15. Not a bug in guile. Sorry for the noise.

Bruno





^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: guile 1.8.5 test failure: srfi-14.test
  2008-05-30 22:50     ` Bruno Haible
@ 2008-05-31 20:39       ` Ludovic Courtès
  0 siblings, 0 replies; 6+ messages in thread
From: Ludovic Courtès @ 2008-05-31 20:39 UTC (permalink / raw)
  To: bug-guile; +Cc: bruno

Hi,

Bruno Haible <bruno@clisp.org> writes:

> Oops. My system has a locale called 'de_DE.iso88591' which is in fact using
> ISO-8859-15. Not a bug in guile. Sorry for the noise.

Eh, thanks for the investigation anyway!

Ludovic.





^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-05-31 20:39 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-15 22:43 guile 1.8.5 test failure: srfi-14.test Bruno Haible
2008-05-18 21:04 ` Ludovic Courtès
2008-05-27  0:33   ` Bruno Haible
2008-05-30 17:06     ` Ludovic Courtès
2008-05-30 22:50     ` Bruno Haible
2008-05-31 20:39       ` Ludovic Courtès

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).