unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / Atom feed
* bug#48114: Disarchive occasionally fails tests
@ 2021-04-30 10:00 Ludovic Courtès
  2021-04-30 19:49 ` Timothy Sample
  0 siblings, 1 reply; 7+ messages in thread
From: Ludovic Courtès @ 2021-04-30 10:00 UTC (permalink / raw)
  To: 48114

Hi Timothy,

Disarchive 0.2.0 occasionally fails two tests:

  FAIL: tests/kinds/octal.scm - [prop] Writing is reversible
  FAIL: tests/kinds/octal.scm - [prop] Serializing is reversible

(Thanks, Quickcheck! :-))

I added ‘pk’ calls like so:

--8<---------------cut here---------------start------------->8---
(test-assert "[prop] Writing is reversible"
  (quickcheck
   (property ((octal $octal))
     (test-when (valid-octal? octal)
       (begin
         (equal? (pk 'oct octal) (pk 'decode (decode-octal (encode-octal octal)))))))))

(test-assert "[prop] Serializing is reversible"
  (quickcheck
   (property ((octal $octal))
     (test-when (valid-octal? octal)
       (equal? (pk 'OCT octal) (pk 'DECODE (serdeser -octal- octal)))))))
--8<---------------cut here---------------end--------------->8---

and got this output:

--8<---------------cut here---------------start------------->8---
;;; (oct #<<unstructured-octal> value: 0 source: #<<zero-string> value: "\U0f94a4\u0912\U025627\U10e96a\u9576\u2077\u048f\U0f2f60\U0f744b" trailer: #vu8(172 156 23 48 25 29 159 226 210)>>)

;;; (decode #<<unstructured-octal> value: 0 source: #<<zero-string> value: "\U0f94a4\u0912\U025627\U10e96a\u9576\u2077\u048f\U0f2f60\U0f744b" trailer: #vu8(172 156 23 48 25 29 159 226 210)>>)
actual-value: #f
actual-error:
+ (out-of-range
+   #f
+   "Value out of range ~S to ~S: ~S"
+   (8 9 10)
+   (10))
result: FAIL

[…]

;;; (OCT #<<unstructured-octal> value: 0 source: #<<zero-string> value: "\U0f94a4\u0912\U025627\U10e96a\u9576\u2077\u048f\U0f2f60\U0f744b" trailer: #vu8(172 156 23 48 25 29 159 226 210)>>)

;;; (DECODE #<<unstructured-octal> value: 0 source: #<<zero-string> value: "\U0f94a4\u0912\U025627\U10e96a\u9576\u2077\u048f\U0f2f60\U0f744b" trailer: #vu8(172 156 23 48 25 29 159 226 210)>>)
actual-value: #f
actual-error:
+ (out-of-range
+   #f
+   "Value out of range ~S to ~S: ~S"
+   (8 9 10)
+   (10))
result: FAIL
--8<---------------cut here---------------end--------------->8---

I’m not sure where the exception comes from though.

Thoughts?

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#48114: Disarchive occasionally fails tests
  2021-04-30 10:00 bug#48114: Disarchive occasionally fails tests Ludovic Courtès
@ 2021-04-30 19:49 ` Timothy Sample
  2021-05-02 19:57   ` Ludovic Courtès
  0 siblings, 1 reply; 7+ messages in thread
From: Timothy Sample @ 2021-04-30 19:49 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 48114

Hey,

Ludovic Courtès <ludovic.courtes@inria.fr> writes:

> Disarchive 0.2.0 occasionally fails two tests:
>
>   FAIL: tests/kinds/octal.scm - [prop] Writing is reversible
>   FAIL: tests/kinds/octal.scm - [prop] Serializing is reversible

These two tests have a bit of a problem.  They occasionally fail by
“giving up”, which is when too many test cases are discarded rather than
used.  (This happens because you might write a generator for a superset
of the values you’re interested in, and then filter out some values with
“test-when”.)  I don’t think this is happening here, though.  You would
see something like “Gave up! Passed only 0 ests [sic].”

> I added ‘pk’ calls like so:
>
> (test-assert "[prop] Writing is reversible"
>   (quickcheck
>    (property ((octal $octal))
>      (test-when (valid-octal? octal)
>        (begin
>          (equal? (pk 'oct octal) (pk 'decode (decode-octal (encode-octal octal)))))))))
>
> (test-assert "[prop] Serializing is reversible"
>   (quickcheck
>    (property ((octal $octal))
>      (test-when (valid-octal? octal)
>        (equal? (pk 'OCT octal) (pk 'DECODE (serdeser -octal- octal)))))))
>
>
> and got this output:
>
> ;;; (oct #<<unstructured-octal> value: 0 source: #<<zero-string> value: "\U0f94a4\u0912\U025627\U10e96a\u9576\u2077\u048f\U0f2f60\U0f744b" trailer: #vu8(172 156 23 48 25 29 159 226 210)>>)
>
> ;;; (decode #<<unstructured-octal> value: 0 source: #<<zero-string> value: "\U0f94a4\u0912\U025627\U10e96a\u9576\u2077\u048f\U0f2f60\U0f744b" trailer: #vu8(172 156 23 48 25 29 159 226 210)>>)
> actual-value: #f
> actual-error:
> + (out-of-range
> +   #f
> +   "Value out of range ~S to ~S: ~S"
> +   (8 9 10)
> +   (10))
> result: FAIL
>
> […]
>
> ;;; (OCT #<<unstructured-octal> value: 0 source: #<<zero-string> value: "\U0f94a4\u0912\U025627\U10e96a\u9576\u2077\u048f\U0f2f60\U0f744b" trailer: #vu8(172 156 23 48 25 29 159 226 210)>>)
>
> ;;; (DECODE #<<unstructured-octal> value: 0 source: #<<zero-string> value: "\U0f94a4\u0912\U025627\U10e96a\u9576\u2077\u048f\U0f2f60\U0f744b" trailer: #vu8(172 156 23 48 25 29 159 226 210)>>)
> actual-value: #f
> actual-error:
> + (out-of-range
> +   #f
> +   "Value out of range ~S to ~S: ~S"
> +   (8 9 10)
> +   (10))
> result: FAIL
>
> I’m not sure where the exception comes from though.

I can’t seem to reproduce this.  I’ve run the test suite many, many
times, but I also tried:

    ,use (disarchive kinds octal)
    ,use (disarchive kinds zero-string)
    ,use (disarchive serialization)
    (define the-zero-string
      (make-zero-string
       "\U0f94a4\u0912\U025627\U10e96a\u9576\u2077\u048f\U0f2f60\U0f744b"
       #vu8(172 156 23 48 25 29 159 226 210)))
    (define the-octal
      (make-unstructured-octal 0 the-zero-string))
    (equal? the-octal (decode-octal (encode-octal the-octal)))
    (equal? the-octal (serdeser -octal- the-octal))

Which works fine.  (Does it work for you?)

However, isn’t it possible that these values aren’t the culprits?  With
the “pk” calls you added, isn’t it printing the last OK value without
telling us the value causing the issue?

What if you run it with the following?

    (test-assert "[prop] Writing is reversible"
      (quickcheck
       (property ((octal $octal))
         (test-when (valid-octal? octal)
           (false-if-exception  ; <-- changed!
             (equal? octal (decode-octal (encode-octal octal))))))))

This way, Guile-QuickCheck should print the offending value and the seed
used for the tests, which could be helpful for reproducing.  (The fact
that it doesn’t handle exceptions well is a known bug!)


-- Tim




^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#48114: Disarchive occasionally fails tests
  2021-04-30 19:49 ` Timothy Sample
@ 2021-05-02 19:57   ` Ludovic Courtès
  2021-05-03  2:24     ` Timothy Sample
  0 siblings, 1 reply; 7+ messages in thread
From: Ludovic Courtès @ 2021-05-02 19:57 UTC (permalink / raw)
  To: Timothy Sample; +Cc: 48114

Hello!

Timothy Sample <samplet@ngyro.com> skribis:

> I can’t seem to reproduce this.  I’ve run the test suite many, many
> times, but I also tried:

I can reproduce it quickly with:

  while make check TESTS=tests/kinds/octal.scm -j5 ; do : ; done

… in C locale (LC_ALL & co. all unset).

> However, isn’t it possible that these values aren’t the culprits?  With
> the “pk” calls you added, isn’t it printing the last OK value without
> telling us the value causing the issue?

You’re right, the values printed are not the culprit.  The problem comes
from the generator (I had to raise the (quickcheck …) form out of
‘test-assert’ so I could get a backtrace):

--8<---------------cut here---------------start------------->8---
Backtrace:
          13 (primitive-load "/data/src/disarchive/./build-aux/test-driver.scm")
In ice-9/eval.scm:
    619:8 12 (_ #(#(#<directory (guile-user) 7fccb09d9f00> ((() "./tests/kinds/octal.scm") (# . "no") (# . #) ?)) #))
    619:8 11 (_ #(#(#(#(#(#(#(#(#<directory (guile-user) 7fccb09d9f00> ("./tests/kinds/octal?") ?)) ?) ?) ?) ?) ?) ?))
In ice-9/boot-9.scm:
    142:2 10 (dynamic-wind _ _ #<procedure 7fccaf5b81a0 at ice-9/eval.scm:330:13 ()>)
In unknown file:
           9 (primitive-load "./tests/kinds/octal.scm")
In quickcheck.scm:
    118:6  8 (check #<<quickcheck-config> seed: 321557891 stop?: #<procedure 7fccaf8c3540 at ice-9/eval.scm:336:13?> ?)
    98:12  7 (check-results _ #<<property> names: (octal) gen/arbs: (#<<arbitrary> gen: #<<generator> proc: #<proce?>)
In quickcheck/generator.scm:
     65:2  6 (_ 7 #<<rng-state> start: #(1907167801 2749187034 1190323419 1039883844 766725436 3567744198) s1: #(29?>)
     65:2  5 (_ 7 #<<rng-state> start: #(1907167801 2749187034 1190323419 1039883844 766725436 3567744198) s1: #(29?>)
    78:17  4 (_ 7 #<<rng-state> start: #(1907167801 2749187034 1190323419 1039883844 766725436 3567744198) s1: #(28?>)
   105:22  3 (_ _)
In tests/kinds.scm:
    84:22  2 (fix-unstructured-octal-value #<<unstructured-octal> value: 7 source: #<<zero-string> value: "\U0f99aa?>)
    86:47  1 (_ _)
In unknown file:
           0 (substring "\U0f99aa?\U0ff7c1\U0fb97a\U0ff933?\U0fe7a1" 6 8)

ERROR: In procedure substring:
Value out of range 6 to 7: 8
--8<---------------cut here---------------end--------------->8---

Note that this is in C locale, which may mean that ‘regexp-exec’, which
passes strings to libc, gets offsets wrong somehow (see
‘fixup_multibyte_match’ in libguile), though I couldn’t reproduce it
with the string above.

Anyway, ‘guix build disarchive’ builds in en_US.utf8 locale, so the
thing above is probably a wrong lead.

If I switch to en_US.utf8, I occasionally get the following error
instead:

--8<---------------cut here---------------start------------->8---
test-name: [prop] Serializing is reversible
location: tests/kinds/octal.scm:154
source:
+ (test-assert
+   "[prop] Serializing is reversible"
+   (quickcheck
+     (property
+       ((octal $octal))
+       (test-when
+         (valid-octal? octal)
+         (equal?
+           (pk 'OCT octal)
+           (pk 'DECODE (serdeser -octal- octal)))))))

;;; (OCT #<<unstructured-octal> value: 0 source: #<<zero-string> value: "" trailer: "">>)

;;; (DECODE #<<unstructured-octal> value: 0 source: #<<zero-string> value: "" trailer: "">>)
Gave up! Passed only 1 est.
actual-value: #f
result: FAIL
--8<---------------cut here---------------end--------------->8---

This is more in line with what you described.  Any ideas on how to
address that?

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#48114: Disarchive occasionally fails tests
  2021-05-02 19:57   ` Ludovic Courtès
@ 2021-05-03  2:24     ` Timothy Sample
  2021-05-03  4:02       ` Timothy Sample
  0 siblings, 1 reply; 7+ messages in thread
From: Timothy Sample @ 2021-05-03  2:24 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 48114

Hi,

Ludovic Courtès <ludo@gnu.org> writes:

[...]

> ERROR: In procedure substring:
> Value out of range 6 to 7: 8
>
> Note that this is in C locale, which may mean that ‘regexp-exec’, which
> passes strings to libc, gets offsets wrong somehow (see
> ‘fixup_multibyte_match’ in libguile), though I couldn’t reproduce it
> with the string above.

I’m still looking into this, but I wanted to quickly post this
reproducer for the Guile bug:

    (use-modules (ice-9 regex))
    (define str "\U101514\U103ab0\U0f6e6e\U02e278\U01d9eb\U10b996\U1089b5\uea15\U0fa074\U101e41\U02e330\u0177\u2492")
    (match:substring (string-match "[0-8]+" str))

This triggers the out-of-range error when run with “LC_ALL=C”.


-- Tim




^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#48114: Disarchive occasionally fails tests
  2021-05-03  2:24     ` Timothy Sample
@ 2021-05-03  4:02       ` Timothy Sample
  2021-05-03  6:19         ` Bengt Richter
  2021-05-03 20:03         ` Ludovic Courtès
  0 siblings, 2 replies; 7+ messages in thread
From: Timothy Sample @ 2021-05-03  4:02 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 48114

Timothy Sample <samplet@ngyro.com> writes:

> I’m still looking into this, but I wanted to quickly post this
> reproducer for the Guile bug:
>
>     (use-modules (ice-9 regex))
>     (define str
> "\U101514\U103ab0\U0f6e6e\U02e278\U01d9eb\U10b996\U1089b5\uea15\U0fa074\U101e41\U02e330\u0177\u2492")
>     (match:substring (string-match "[0-8]+" str))
>
> This triggers the out-of-range error when run with “LC_ALL=C”.

It turns out that all that’s needed is the last code point, which is
“Number Eleven Full Stop”, or ‘⒒’.  When Guile converts this to an ASCII
C string using ‘u32_conv_from_encoding’, it becomes “11.”.  The regex
(“[0-8]+”) matches the “11” part with start index 0 and end index 2.
The ‘fixup_multibyte_match’ function does nothing (it only matters when
the locale encoding is multibyte) [1].  Guile then builds the match
vector with the original string but keeps the ASCII offsets.  In other
words, it thinks the match substring goes from 0 to 2 in a single code
point string:

    ,use (ice-9 regex)
    (string-match "11" "\u2492")
    => #("\u2492" (0 . 2))

I’m not sure there’s any way to solve this nicely in Guile.  It would be
clearer if the match vector included the string as libc matched it, but
it’s still surprising that the match happens with a different string.

In Disarchive, I can rewrite the generator without regex.  I’ll do that
and see what I can do about the “Gave up!” issue.

[1] It works on the converted-to-ASCII C string, which means that the
byte offsets and code point offsets are the same.  Hence, it has nothing
to do.


-- Tim




^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#48114: Disarchive occasionally fails tests
  2021-05-03  4:02       ` Timothy Sample
@ 2021-05-03  6:19         ` Bengt Richter
  2021-05-03 20:03         ` Ludovic Courtès
  1 sibling, 0 replies; 7+ messages in thread
From: Bengt Richter @ 2021-05-03  6:19 UTC (permalink / raw)
  To: Timothy Sample; +Cc: 48114

Hi Timothy, Ludo,

On +2021-05-03 00:02:09 -0400, Timothy Sample wrote:
> Timothy Sample <samplet@ngyro.com> writes:
> 
> > I’m still looking into this, but I wanted to quickly post this
> > reproducer for the Guile bug:
> >
> >     (use-modules (ice-9 regex))
> >     (define str
> > "\U101514\U103ab0\U0f6e6e\U02e278\U01d9eb\U10b996\U1089b5\uea15\U0fa074\U101e41\U02e330\u0177\u2492")
> >     (match:substring (string-match "[0-8]+" str))
> >
> > This triggers the out-of-range error when run with “LC_ALL=C”.
> 
> It turns out that all that’s needed is the last code point, which is
> “Number Eleven Full Stop”, or ‘⒒’.  When Guile converts this to an ASCII
> C string using ‘u32_conv_from_encoding’, it becomes “11.”.  The regex
> (“[0-8]+”) matches the “11” part with start index 0 and end index 2.
> The ‘fixup_multibyte_match’ function does nothing (it only matters when
> the locale encoding is multibyte) [1].  Guile then builds the match
> vector with the original string but keeps the ASCII offsets.  In other
> words, it thinks the match substring goes from 0 to 2 in a single code
> point string:
> 
>     ,use (ice-9 regex)
>     (string-match "11" "\u2492")
>     => #("\u2492" (0 . 2))
> 
> I’m not sure there’s any way to solve this nicely in Guile.  It would be
> clearer if the match vector included the string as libc matched it, but
> it’s still surprising that the match happens with a different string.
> 
> In Disarchive, I can rewrite the generator without regex.  I’ll do that
> and see what I can do about the “Gave up!” issue.
> 
> [1] It works on the converted-to-ASCII C string, which means that the
> byte offsets and code point offsets are the same.  Hence, it has nothing
> to do.
> 
> 
> -- Tim
>

> 
> 
What happens with these?
(code ppoints in decimal)

    8554 _Ⅺ_ "ROMAN NUMERAL ELEVEN"
    8570 _ⅺ_ "SMALL ROMAN NUMERAL ELEVEN"
    9322 _⑪_ "CIRCLED NUMBER ELEVEN"
    9342 _⑾_ "PARENTHESIZED NUMBER ELEVEN"
    9362 _⒒_ "NUMBER ELEVEN FULL STOP"
    9451 _⓫_ "NEGATIVE CIRCLED NUMBER ELEVEN"
   13155 _㍣_ "IDEOGRAPHIC TELEGRAPH SYMBOL FOR HOUR ELEVEN"
   13290 _㏪_ "IDEOGRAPHIC TELEGRAPH SYMBOL FOR DAY ELEVEN"

I would argue that none of these should be "decoded" into ascii polyglyphs
since they are atomic character glyphs. IMO It is over-eager transformation
to make them into ascii polyglyphs.

/Super/sub/-script placement metadata is another thing to consider --
"decode" to ascii art?? ;-)

Unicode characters representing mathematical values in
other languages are different. Those are subject to natural language
translation with locale-dependent semantics.

These might be candidates for that?:
(code points in decimal)

    8544 _Ⅰ_ "ROMAN NUMERAL ONE"
    8545 _Ⅱ_ "ROMAN NUMERAL TWO"
    8546 _Ⅲ_ "ROMAN NUMERAL THREE"
    8547 _Ⅳ_ "ROMAN NUMERAL FOUR"
    8548 _Ⅴ_ "ROMAN NUMERAL FIVE"
    8549 _Ⅵ_ "ROMAN NUMERAL SIX"
    8550 _Ⅶ_ "ROMAN NUMERAL SEVEN"
    8551 _Ⅷ_ "ROMAN NUMERAL EIGHT"
    8552 _Ⅸ_ "ROMAN NUMERAL NINE"
    8553 _Ⅹ_ "ROMAN NUMERAL TEN"
    8554 _Ⅺ_ "ROMAN NUMERAL ELEVEN"
    8555 _Ⅻ_ "ROMAN NUMERAL TWELVE"
    8556 _Ⅼ_ "ROMAN NUMERAL FIFTY"
    8557 _Ⅽ_ "ROMAN NUMERAL ONE HUNDRED"
    8558 _Ⅾ_ "ROMAN NUMERAL FIVE HUNDRED"
    8559 _Ⅿ_ "ROMAN NUMERAL ONE THOUSAND"
    8560 _ⅰ_ "SMALL ROMAN NUMERAL ONE"
    8561 _ⅱ_ "SMALL ROMAN NUMERAL TWO"
    8562 _ⅲ_ "SMALL ROMAN NUMERAL THREE"
    8563 _ⅳ_ "SMALL ROMAN NUMERAL FOUR"
    8564 _ⅴ_ "SMALL ROMAN NUMERAL FIVE"
    8565 _ⅵ_ "SMALL ROMAN NUMERAL SIX"
    8566 _ⅶ_ "SMALL ROMAN NUMERAL SEVEN"
    8567 _ⅷ_ "SMALL ROMAN NUMERAL EIGHT"
    8568 _ⅸ_ "SMALL ROMAN NUMERAL NINE"
    8569 _ⅹ_ "SMALL ROMAN NUMERAL TEN"
    8570 _ⅺ_ "SMALL ROMAN NUMERAL ELEVEN"
    8571 _ⅻ_ "SMALL ROMAN NUMERAL TWELVE"
    8572 _ⅼ_ "SMALL ROMAN NUMERAL FIFTY"
    8573 _ⅽ_ "SMALL ROMAN NUMERAL ONE HUNDRED"
    8574 _ⅾ_ "SMALL ROMAN NUMERAL FIVE HUNDRED"
    8575 _ⅿ_ "SMALL ROMAN NUMERAL ONE THOUSAND"
    8576 _ↀ_ "ROMAN NUMERAL ONE THOUSAND C D"
    8577 _ↁ_ "ROMAN NUMERAL FIVE THOUSAND"
    8578 _ↂ_ "ROMAN NUMERAL TEN THOUSAND"
    8579 _Ↄ_ "ROMAN NUMERAL REVERSED ONE HUNDRED"
    8581 _ↅ_ "ROMAN NUMERAL SIX LATE FORM"
    8582 _ↆ_ "ROMAN NUMERAL FIFTY EARLY FORM"
    8583 _ↇ_ "ROMAN NUMERAL FIFTY THOUSAND"
    8584 _ↈ_ "ROMAN NUMERAL ONE HUNDRED THOUSAND"
   12321 _〡_ "HANGZHOU NUMERAL ONE"
   12322 _〢_ "HANGZHOU NUMERAL TWO"
   12323 _〣_ "HANGZHOU NUMERAL THREE"
   12324 _〤_ "HANGZHOU NUMERAL FOUR"
   12325 _〥_ "HANGZHOU NUMERAL FIVE"
   12326 _〦_ "HANGZHOU NUMERAL SIX"
   12327 _〧_ "HANGZHOU NUMERAL SEVEN"
   12328 _〨_ "HANGZHOU NUMERAL EIGHT"
   12329 _〩_ "HANGZHOU NUMERAL NINE"
   12344 _〸_ "HANGZHOU NUMERAL TEN"
   12345 _〹_ "HANGZHOU NUMERAL TWENTY"
   12346 _〺_ "HANGZHOU NUMERAL THIRTY"

Just my intuitive reaction, no academic creds to back it up ;)

-- 
Regards,
Bengt Richter




^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#48114: Disarchive occasionally fails tests
  2021-05-03  4:02       ` Timothy Sample
  2021-05-03  6:19         ` Bengt Richter
@ 2021-05-03 20:03         ` Ludovic Courtès
  1 sibling, 0 replies; 7+ messages in thread
From: Ludovic Courtès @ 2021-05-03 20:03 UTC (permalink / raw)
  To: Timothy Sample; +Cc: 48114

Hi!

Timothy Sample <samplet@ngyro.com> skribis:

> Timothy Sample <samplet@ngyro.com> writes:
>
>> I’m still looking into this, but I wanted to quickly post this
>> reproducer for the Guile bug:
>>
>>     (use-modules (ice-9 regex))
>>     (define str
>> "\U101514\U103ab0\U0f6e6e\U02e278\U01d9eb\U10b996\U1089b5\uea15\U0fa074\U101e41\U02e330\u0177\u2492")
>>     (match:substring (string-match "[0-8]+" str))
>>
>> This triggers the out-of-range error when run with “LC_ALL=C”.
>
> It turns out that all that’s needed is the last code point, which is
> “Number Eleven Full Stop”, or ‘⒒’.

Whaaat? “Number Eleven Full Stop”, I wonder how the Unicode folks came
up with that one.  ㊷ = ㉚ + ⒓

> When Guile converts this to an ASCII C string using
> ‘u32_conv_from_encoding’, it becomes “11.”.  The regex (“[0-8]+”)
> matches the “11” part with start index 0 and end index 2.  The
> ‘fixup_multibyte_match’ function does nothing (it only matters when
> the locale encoding is multibyte) [1].  Guile then builds the match
> vector with the original string but keeps the ASCII offsets.  In other
> words, it thinks the match substring goes from 0 to 2 in a single code
> point string:
>
>     ,use (ice-9 regex)
>     (string-match "11" "\u2492")
>     => #("\u2492" (0 . 2))
>
> I’m not sure there’s any way to solve this nicely in Guile.  It would be
> clearer if the match vector included the string as libc matched it, but
> it’s still surprising that the match happens with a different string.

Yeah, I don’t think there’s much we can do.  It’s a lot of fun anyway.

Thanks for investigating!

Ludo’.




^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-05-03 20:05 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-30 10:00 bug#48114: Disarchive occasionally fails tests Ludovic Courtès
2021-04-30 19:49 ` Timothy Sample
2021-05-02 19:57   ` Ludovic Courtès
2021-05-03  2:24     ` Timothy Sample
2021-05-03  4:02       ` Timothy Sample
2021-05-03  6:19         ` Bengt Richter
2021-05-03 20:03         ` Ludovic Courtès

unofficial mirror of bug-guix@gnu.org 

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://yhetil.org/guix-bugs/0 guix-bugs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 guix-bugs guix-bugs/ https://yhetil.org/guix-bugs \
		bug-guix@gnu.org
	public-inbox-index guix-bugs

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.yhetil.org/yhetil.gnu.guix.bugs
	nntp://news.gmane.io/gmane.comp.gnu.guix.bugs


AGPL code for this site: git clone http://ou63pmih66umazou.onion/public-inbox.git