* bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales
@ 2020-03-07 12:00 pelzflorian (Florian Pelz)
2020-03-07 15:20 ` pelzflorian (Florian Pelz)
0 siblings, 1 reply; 13+ messages in thread
From: pelzflorian (Florian Pelz) @ 2020-03-07 12:00 UTC (permalink / raw)
To: 39970
[-- Attachment #1: Type: text/plain, Size: 1298 bytes --]
After running
export LC_ALL=tr_TR.utf8
many important Guix commands like 'guix environment', 'guix install'
and 'guix pull' fail.
$ guix environment --ad-hoc hello
Backtrace:
1 (primitive-load "/home/florian/.config/guix/current/bin…")
In guix/ui.scm:
1826:12 0 (run-guix-command _ . _)
guix/ui.scm:1826:12: In procedure run-guix-command:
In procedure string-length: Wrong type argument in position 1 (expecting string): #f
Running guix via ./pre-inst-env gives a more useful backtrace. The
reason is that in guix/store.scm
(use-modules (ice-9 regex))
(regexp-exec (make-regexp "^/gnu/store/([0-9a-df-np-sv-z]{32})-([^/]+)$")
"/gnu/store/bv9py3f2dsa5iw0aijqjv9zxwprcy1nb-fontconfig-2.13.1.drv")
evaluates to #f in Turkish, possibly because of the presence of
dotless i (ı) in the range.
The attached patch fixes the issue by including i explicitly, but I
believe enumerating all of [0-9abcdfghijklmnpqrsvwxyz] explicitly
might be more future-proof.
Shall I push the patch modified to list all letters in
[0-9abcdfghijklmnpqrsvwxyz] explicitly? Numbers too? I suppose there
is no downside to listing all without ranges.
I wonder what else is affected; the installer maybe? I have not
tested yet.
Regards,
Florian
[-- Attachment #2: 0001-store-Fix-many-guix-commands-failing-on-some-locales.patch --]
[-- Type: text/plain, Size: 1034 bytes --]
From 4445284e9fd40b3e271fa7b511d2856c03c8ccfb Mon Sep 17 00:00:00 2001
From: Florian Pelz <pelzflorian@pelzflorian.de>
Date: Sat, 7 Mar 2020 11:38:59 +0100
Subject: [PATCH] store: Fix many guix commands failing on some locales.
At least 'guix environment', 'guix install' and 'guix pull'
on 'az_AZ.utf8' and 'tr_TR.utf8' are affected.
* guix/store.scm (store-regexp*): Avoid dependence on locale.
---
guix/store.scm | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/guix/store.scm b/guix/store.scm
index f99fa581a8..a1d9713c24 100644
--- a/guix/store.scm
+++ b/guix/store.scm
@@ -1949,7 +1949,7 @@ valid inputs."
(mlambda (store)
"Return a regexp matching a file in STORE."
(make-regexp (string-append "^" (regexp-quote store)
- "/([0-9a-df-np-sv-z]{32})-([^/]+)$"))))
+ "/([0-9a-df-hij-np-sv-z]{32})-([^/]+)$"))))
(define (store-path-package-name path)
"Return the package name part of PATH, a file name in the store."
--
2.25.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales
2020-03-07 12:00 bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales pelzflorian (Florian Pelz)
@ 2020-03-07 15:20 ` pelzflorian (Florian Pelz)
2020-03-08 7:08 ` pelzflorian (Florian Pelz)
0 siblings, 1 reply; 13+ messages in thread
From: pelzflorian (Florian Pelz) @ 2020-03-07 15:20 UTC (permalink / raw)
To: 39970
On Sat, Mar 07, 2020 at 01:00:52PM +0100, pelzflorian (Florian Pelz) wrote:
> Running guix via ./pre-inst-env gives a more useful backtrace. The
> reason is that in guix/store.scm
>
> (use-modules (ice-9 regex))
> (regexp-exec (make-regexp "^/gnu/store/([0-9a-df-np-sv-z]{32})-([^/]+)$")
> "/gnu/store/bv9py3f2dsa5iw0aijqjv9zxwprcy1nb-fontconfig-2.13.1.drv")
>
> evaluates to #f in Turkish, possibly because of the presence of
> dotless i (ı) in the range.
>
Actually it seems the issue is that i is missing from the range [a-z]
ı and ğ are missing as well, as are non-Turkish letters like ä that
are included when using the en_US.utf8 locale, even though they are no
English letters either.
(use-modules (ice-9 regex))
(regexp-exec (make-regexp "^([a-z]+)$")
"iyiyim")
fails.
But running a glibc C program
florian@florianmacbook ~$ cat iyiyim.c
#include <regex.h>
#include <stdio.h>
#define STR "iyiyim"
int main (int argc,
char** argv)
{
regex_t only_letters;
int r = regcomp (&only_letters, "[a-z]", 0);
if (r != 0)
printf ("This error does not happen.\n");
r = regexec (&only_letters, STR, 0, NULL, 0);
if (r == 0)
printf ("The string " STR " matched!\n");
else
printf ("No match for " STR ".\n");
}
florian@florianmacbook ~$ gcc -o iyiyim iyiyim.c
florian@florianmacbook ~$ LANG=tr_TR.utf8 ./iyiyim
The string iyiyim matched!
succeeds on tr_TR.utf8 and en_US.utf8 locales (and a native Turkish
speaker confirmed to me ıi should be in the alphabet right after h).
Maybe this is a bug in Guile, somehow?
> […]
> I wonder what else is affected; the installer maybe? I have not
> tested yet.
>
I checked; the graphical installer appears unaffected, but the issue
appears on the installed system.
Regards,
Florian
^ permalink raw reply [flat|nested] 13+ messages in thread
* bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales
2020-03-07 15:20 ` pelzflorian (Florian Pelz)
@ 2020-03-08 7:08 ` pelzflorian (Florian Pelz)
2020-03-09 17:02 ` Ludovic Courtès
0 siblings, 1 reply; 13+ messages in thread
From: pelzflorian (Florian Pelz) @ 2020-03-08 7:08 UTC (permalink / raw)
To: 39970
This seems similar to <https://bugs.gnu.org/35785>. I think
enumerating all characters explicitly is a similar fix, whether or not
there is a bug in Guile.
Regards,
Florian
^ permalink raw reply [flat|nested] 13+ messages in thread
* bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales
2020-03-08 7:08 ` pelzflorian (Florian Pelz)
@ 2020-03-09 17:02 ` Ludovic Courtès
2020-03-12 11:02 ` pelzflorian (Florian Pelz)
0 siblings, 1 reply; 13+ messages in thread
From: Ludovic Courtès @ 2020-03-09 17:02 UTC (permalink / raw)
To: pelzflorian (Florian Pelz); +Cc: 39970
Hi Florian,
"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
> This seems similar to <https://bugs.gnu.org/35785>.
Yes, same story.
> I think enumerating all characters explicitly is a similar fix,
> whether or not there is a bug in Guile.
To me it’s not a bug in Guile, but simply the fact that regexps, as
implemented by the C library, are locale-dependent.
The patch you proposed looks good to me, though perhaps we could
explicitly list all the alphabet in the regexp?
A better option is to reimplement ‘store-path-package-name’ in a way
similar to ‘store-path-hash-part’, as in commit
35eb77b09d957019b2437e7681bd88013d67d3cd.
Thoughts?
Ludo’.
^ permalink raw reply [flat|nested] 13+ messages in thread
* bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales
2020-03-09 17:02 ` Ludovic Courtès
@ 2020-03-12 11:02 ` pelzflorian (Florian Pelz)
2020-03-12 16:05 ` Ludovic Courtès
2021-05-05 7:04 ` Taylan Kammer
0 siblings, 2 replies; 13+ messages in thread
From: pelzflorian (Florian Pelz) @ 2020-03-12 11:02 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: 39970
[-- Attachment #1: Type: text/plain, Size: 1992 bytes --]
On Mon, Mar 09, 2020 at 06:02:40PM +0100, Ludovic Courtès wrote:
> To me it’s not a bug in Guile, but simply the fact that regexps, as
> implemented by the C library, are locale-dependent.
>
(use-modules (ice-9 regex))
(regexp-exec (make-regexp "^([a-z]+)$")
"iyiyim")
⇒ #f
Guile’s behavior that i is not among [a-z] has been confirmed as
unexpected by a natively Turkish friend of mine. It is different from
the behavior of current glibc:
florian@florianmacbook ~$ cat iyiyim.c
#include <regex.h>
#include <stdio.h>
#include <stdlib.h>
#define STR "iyiyım"
int main (int argc,
char** argv)
{
regex_t only_letters;
int r = regcomp (&only_letters, "[a-z]+", REG_EXTENDED);
if (r != 0)
printf ("This error does not happen.\n");
r = regexec (&only_letters, STR, 1, malloc (sizeof (regmatch_t)), 0);
if (r == 0)
printf ("The string " STR " matched!\n");
else
printf ("No match for " STR ".\n");
}
florian@florianmacbook ~$ gcc -o iyiyim iyiyim.c
florian@florianmacbook ~$ LANG=tr_TR.utf8 ./iyiyim
The string iyiyım matched!
Apparently Guile uses a bundled regular expression library rather than
glibc. I can try making Guile use a newer GNUlib for its regular
expressions, maybe that helps. Shall I file a separate bug for Guile?
> The patch you proposed looks good to me, though perhaps we could
> explicitly list all the alphabet in the regexp?
>
> A better option is to reimplement ‘store-path-package-name’ in a way
> similar to ‘store-path-hash-part’, as in commit
> 35eb77b09d957019b2437e7681bd88013d67d3cd.
I suppose it would be better to cache the compiled regexp. What is
this mcached syntax inside (guix store)? Or do I use Scheme’s 'delay'
and 'force' for caching?
The attached patch fixes the regexp. Shall I push the attached patch
and then try making it cache the compiled regexp or do you still
prefer an implementation without regexps? Why would not using a
regexp be better?
Regards,
Florian
[-- Attachment #2: 0001-store-Fix-many-guix-commands-failing-on-some-locales.patch --]
[-- Type: text/plain, Size: 1028 bytes --]
From: Florian Pelz <pelzflorian@pelzflorian.de>
Date: Thu, 12 Mar 2020 11:08:16 +0100
Subject: [PATCH] store: Fix many guix commands failing on some locales.
Fixes bug #39970 (see: https://bugs.gnu.org/39970).
At least 'guix environment', 'guix install' and 'guix pull'
on 'az_AZ.utf8' and 'tr_TR.utf8' are affected.
* guix/store.scm (store-regexp*): Avoid dependence on locale.
---
guix/store.scm | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/guix/store.scm b/guix/store.scm
index f99fa581a8..82d7403bb6 100644
--- a/guix/store.scm
+++ b/guix/store.scm
@@ -1949,7 +1949,8 @@ valid inputs."
(mlambda (store)
"Return a regexp matching a file in STORE."
(make-regexp (string-append "^" (regexp-quote store)
- "/([0-9a-df-np-sv-z]{32})-([^/]+)$"))))
+ "\
+/([0-9abcdfghijklmnpqrsvwxyz]{32})-([^/]+)$"))))
(define (store-path-package-name path)
"Return the package name part of PATH, a file name in the store."
--
2.25.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales
2020-03-12 11:02 ` pelzflorian (Florian Pelz)
@ 2020-03-12 16:05 ` Ludovic Courtès
2020-03-17 9:44 ` pelzflorian (Florian Pelz)
2021-05-05 7:04 ` Taylan Kammer
1 sibling, 1 reply; 13+ messages in thread
From: Ludovic Courtès @ 2020-03-12 16:05 UTC (permalink / raw)
To: pelzflorian (Florian Pelz); +Cc: 39970
Hi Florian,
"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
> On Mon, Mar 09, 2020 at 06:02:40PM +0100, Ludovic Courtès wrote:
>> To me it’s not a bug in Guile, but simply the fact that regexps, as
>> implemented by the C library, are locale-dependent.
>>
>
> (use-modules (ice-9 regex))
> (regexp-exec (make-regexp "^([a-z]+)$")
> "iyiyim")
> ⇒ #f
>
> Guile’s behavior that i is not among [a-z] has been confirmed as
> unexpected by a natively Turkish friend of mine. It is different from
> the behavior of current glibc:
>
> florian@florianmacbook ~$ cat iyiyim.c
> #include <regex.h>
> #include <stdio.h>
> #include <stdlib.h>
> #define STR "iyiyım"
> int main (int argc,
> char** argv)
> {
You’re seeing a different behavior because you forgot a:
setlocale (LC_ALL, "");
call here.
>> The patch you proposed looks good to me, though perhaps we could
>> explicitly list all the alphabet in the regexp?
>>
>> A better option is to reimplement ‘store-path-package-name’ in a way
>> similar to ‘store-path-hash-part’, as in commit
>> 35eb77b09d957019b2437e7681bd88013d67d3cd.
>
> I suppose it would be better to cache the compiled regexp. What is
> this mcached syntax inside (guix store)? Or do I use Scheme’s 'delay'
> and 'force' for caching?
I lean towards avoiding regexps altogether, as I wrote above.
WDYT?
> The attached patch fixes the regexp. Shall I push the attached patch
> and then try making it cache the compiled regexp or do you still
> prefer an implementation without regexps? Why would not using a
> regexp be better?
It reduces reliance on libc, reduces complexity, and performs better as
noted in the commit log of 35eb77b09d957019b2437e7681bd88013d67d3cd.
Thanks,
Ludo’.
^ permalink raw reply [flat|nested] 13+ messages in thread
* bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales
2020-03-12 16:05 ` Ludovic Courtès
@ 2020-03-17 9:44 ` pelzflorian (Florian Pelz)
2020-03-17 21:20 ` Ludovic Courtès
0 siblings, 1 reply; 13+ messages in thread
From: pelzflorian (Florian Pelz) @ 2020-03-17 9:44 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: 39970
[-- Attachment #1: Type: text/plain, Size: 696 bytes --]
On Thu, Mar 12, 2020 at 05:05:26PM +0100, Ludovic Courtès wrote:
> "pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
> > Why would not using a regexp be better?
>
> It reduces reliance on libc, reduces complexity, and performs better as
> noted in the commit log of 35eb77b09d957019b2437e7681bd88013d67d3cd.
Thank you for your wisdom. I hope the attached patch is OK.
`LC_ALL=en_US.utf8 make check` is mostly fine (except tests/pack.scm,
which also failed before).
Manual testing of `./pre-inst-env guix environment` works.
`LC_ALL=tr_TR.utf8 make check` is still very unhappy though.
There are many failures. I will continue to investigate later today.
Regards,
Florian
[-- Attachment #2: 0001-store-Fix-many-guix-commands-failing-on-some-locales.patch --]
[-- Type: text/plain, Size: 3531 bytes --]
From: Florian Pelz <pelzflorian@pelzflorian.de>
Date: Thu, 12 Mar 2020 11:08:16 +0100
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Subject: [PATCH] store: Fix many guix commands failing on some locales.
Partly fixes bug #39970 (see: https://bugs.gnu.org/39970).
At least 'guix environment', 'guix install' and 'guix pull'
on 'az_AZ.utf8' and 'tr_TR.utf8' were affected.
* guix/store.scm (store-path-hash-part): Move base path detection to ...
(store-path-base): ... this new exported procedure.
(store-path-package-name): Use it instead of locale-dependent regexps.
(store-regexp*): Remove.
---
guix/store.scm | 32 +++++++++++++++-----------------
1 file changed, 15 insertions(+), 17 deletions(-)
diff --git a/guix/store.scm b/guix/store.scm
index f99fa581a8..5465204f5f 100644
--- a/guix/store.scm
+++ b/guix/store.scm
@@ -2,6 +2,7 @@
;;; Copyright © 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019 Ludovic Courtès <ludo@gnu.org>
;;; Copyright © 2018 Jan Nieuwenhuizen <janneke@gnu.org>
;;; Copyright © 2019 Mathieu Othacehe <m.othacehe@gmail.com>
+;;; Copyright © 2020 Florian Pelz <pelzflorian@pelzflorian.de>
;;;
;;; This file is part of GNU Guix.
;;;
@@ -43,7 +44,6 @@
#:use-module (srfi srfi-35)
#:use-module (srfi srfi-39)
#:use-module (ice-9 match)
- #:use-module (ice-9 regex)
#:use-module (ice-9 vlist)
#:use-module (ice-9 popen)
#:use-module (ice-9 threads)
@@ -172,6 +172,7 @@
store-path?
direct-store-path?
derivation-path?
+ store-path-base
store-path-package-name
store-path-hash-part
direct-store-path
@@ -1943,29 +1944,26 @@ valid inputs."
"Return #t if PATH is a derivation path."
(and (store-path? path) (string-suffix? ".drv" path)))
-(define store-regexp*
- ;; The substituter makes repeated calls to 'store-path-hash-part', hence
- ;; this optimization.
- (mlambda (store)
- "Return a regexp matching a file in STORE."
- (make-regexp (string-append "^" (regexp-quote store)
- "/([0-9a-df-np-sv-z]{32})-([^/]+)$"))))
+(define (store-path-base path)
+ "Return the base path of a path in the store."
+ (and (string-prefix? (%store-prefix) path)
+ (let ((base (string-drop path (+ 1 (string-length (%store-prefix))))))
+ (and (> (string-length base) 33)
+ (not (string-index base #\/))
+ base))))
(define (store-path-package-name path)
"Return the package name part of PATH, a file name in the store."
- (let ((path-rx (store-regexp* (%store-prefix))))
- (and=> (regexp-exec path-rx path)
- (cut match:substring <> 2))))
+ (let ((base (store-path-base path)))
+ (string-drop base (+ 32 1)))) ;32 hash part + 1 hyphen
(define (store-path-hash-part path)
"Return the hash part of PATH as a base32 string, or #f if PATH is not a
syntactically valid store path."
- (and (string-prefix? (%store-prefix) path)
- (let ((base (string-drop path (+ 1 (string-length (%store-prefix))))))
- (and (> (string-length base) 33)
- (let ((hash (string-take base 32)))
- (and (string-every %nix-base32-charset hash)
- hash))))))
+ (let* ((base (store-path-base path))
+ (hash (string-take base 32)))
+ (and (string-every %nix-base32-charset hash)
+ hash)))
(define (derivation-log-file drv)
"Return the build log file for DRV, a derivation file name, or #f if it
--
2.25.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales
2020-03-17 9:44 ` pelzflorian (Florian Pelz)
@ 2020-03-17 21:20 ` Ludovic Courtès
2020-03-18 6:47 ` pelzflorian (Florian Pelz)
0 siblings, 1 reply; 13+ messages in thread
From: Ludovic Courtès @ 2020-03-17 21:20 UTC (permalink / raw)
To: pelzflorian (Florian Pelz); +Cc: 39970
Hi,
"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
> On Thu, Mar 12, 2020 at 05:05:26PM +0100, Ludovic Courtès wrote:
>> "pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
>> > Why would not using a regexp be better?
>>
>> It reduces reliance on libc, reduces complexity, and performs better as
>> noted in the commit log of 35eb77b09d957019b2437e7681bd88013d67d3cd.
>
> Thank you for your wisdom. I hope the attached patch is OK.
>
> `LC_ALL=en_US.utf8 make check` is mostly fine (except tests/pack.scm,
> which also failed before).
>
> Manual testing of `./pre-inst-env guix environment` works.
Good!
> `LC_ALL=tr_TR.utf8 make check` is still very unhappy though.
> There are many failures. I will continue to investigate later today.
OK.
> From: Florian Pelz <pelzflorian@pelzflorian.de>
> Date: Thu, 12 Mar 2020 11:08:16 +0100
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> Subject: [PATCH] store: Fix many guix commands failing on some locales.
>
> Partly fixes bug #39970 (see: https://bugs.gnu.org/39970).
I’d just write:
Partly fixes <https://bugs.gnu.org/39970>.
Concise, clear, greppable. :-)
> At least 'guix environment', 'guix install' and 'guix pull'
> on 'az_AZ.utf8' and 'tr_TR.utf8' were affected.
>
> * guix/store.scm (store-path-hash-part): Move base path detection to ...
> (store-path-base): ... this new exported procedure.
> (store-path-package-name): Use it instead of locale-dependent regexps.
> (store-regexp*): Remove.
LGTM, thank you!
Ludo’.
^ permalink raw reply [flat|nested] 13+ messages in thread
* bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales
2020-03-17 21:20 ` Ludovic Courtès
@ 2020-03-18 6:47 ` pelzflorian (Florian Pelz)
2020-03-18 8:40 ` Ludovic Courtès
2021-05-05 4:47 ` Maxim Cournoyer
0 siblings, 2 replies; 13+ messages in thread
From: pelzflorian (Florian Pelz) @ 2020-03-18 6:47 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: 39970
On Tue, Mar 17, 2020 at 10:20:01PM +0100, Ludovic Courtès wrote:
> "pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
> > `LC_ALL=tr_TR.utf8 make check` is still very unhappy though.
> > There are many failures. I will continue to investigate later today.
>
> OK.
The tests fail to many other uses of [a-z] in regexps. I will look;
for e.g. guix/import/cran.scm
(if (string-match "^[A-Za-z][^ :]+:( |\n|$)" line)
…)
it would be easier and clearer to just list [a-z] explicitly:
> LGTM, thank you!
:) Pushed as 771c5e155d7862ed91a5d503eecc00c1db1150ad.
Regards,
Florian
^ permalink raw reply [flat|nested] 13+ messages in thread
* bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales
2020-03-18 6:47 ` pelzflorian (Florian Pelz)
@ 2020-03-18 8:40 ` Ludovic Courtès
2021-05-05 4:47 ` Maxim Cournoyer
1 sibling, 0 replies; 13+ messages in thread
From: Ludovic Courtès @ 2020-03-18 8:40 UTC (permalink / raw)
To: pelzflorian (Florian Pelz); +Cc: 39970
"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
> On Tue, Mar 17, 2020 at 10:20:01PM +0100, Ludovic Courtès wrote:
>> "pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
>> > `LC_ALL=tr_TR.utf8 make check` is still very unhappy though.
>> > There are many failures. I will continue to investigate later today.
>>
>> OK.
>
> The tests fail to many other uses of [a-z] in regexps. I will look;
> for e.g. guix/import/cran.scm
>
> (if (string-match "^[A-Za-z][^ :]+:( |\n|$)" line)
> …)
>
> it would be easier and clearer to just list [a-z] explicitly:
Yes, agreed.
It would be nice if ‘string-match’ & co. could take an optional locale
object (info "(guile) i18n Introduction") but that’s not the case
currently.
Thanks,
Ludo’.
^ permalink raw reply [flat|nested] 13+ messages in thread
* bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales
2020-03-18 6:47 ` pelzflorian (Florian Pelz)
2020-03-18 8:40 ` Ludovic Courtès
@ 2021-05-05 4:47 ` Maxim Cournoyer
2021-05-05 9:22 ` pelzflorian (Florian Pelz)
1 sibling, 1 reply; 13+ messages in thread
From: Maxim Cournoyer @ 2021-05-05 4:47 UTC (permalink / raw)
To: pelzflorian (Florian Pelz); +Cc: 39970-done
"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> writes:
> On Tue, Mar 17, 2020 at 10:20:01PM +0100, Ludovic Courtès wrote:
>> "pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
>> > `LC_ALL=tr_TR.utf8 make check` is still very unhappy though.
>> > There are many failures. I will continue to investigate later today.
>>
>> OK.
>
> The tests fail to many other uses of [a-z] in regexps. I will look;
> for e.g. guix/import/cran.scm
>
> (if (string-match "^[A-Za-z][^ :]+:( |\n|$)" line)
> …)
>
> it would be easier and clearer to just list [a-z] explicitly:
>
>
>> LGTM, thank you!
>
> :) Pushed as 771c5e155d7862ed91a5d503eecc00c1db1150ad.
Closing.
Thank you,
Maxim
^ permalink raw reply [flat|nested] 13+ messages in thread
* bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales
2020-03-12 11:02 ` pelzflorian (Florian Pelz)
2020-03-12 16:05 ` Ludovic Courtès
@ 2021-05-05 7:04 ` Taylan Kammer
1 sibling, 0 replies; 13+ messages in thread
From: Taylan Kammer @ 2021-05-05 7:04 UTC (permalink / raw)
To: pelzflorian (Florian Pelz), Ludovic Courtès; +Cc: 39970
On 12.03.2020 12:02, pelzflorian (Florian Pelz) wrote:
>
> Guile’s behavior that i is not among [a-z] has been confirmed as
> unexpected by a natively Turkish friend of mine. It is different from
> the behavior of current glibc:
>
> florian@florianmacbook ~$ cat iyiyim.c
> #include <regex.h>
> #include <stdio.h>
> #include <stdlib.h>
> #define STR "iyiyım"
> int main (int argc,
> char** argv)
> {
> regex_t only_letters;
> int r = regcomp (&only_letters, "[a-z]+", REG_EXTENDED);
> if (r != 0)
> printf ("This error does not happen.\n");
> r = regexec (&only_letters, STR, 1, malloc (sizeof (regmatch_t)), 0);
> if (r == 0)
> printf ("The string " STR " matched!\n");
> else
> printf ("No match for " STR ".\n");
> }
> florian@florianmacbook ~$ gcc -o iyiyim iyiyim.c
> florian@florianmacbook ~$ LANG=tr_TR.utf8 ./iyiyim
> The string iyiyım matched!
>
> Apparently Guile uses a bundled regular expression library rather than
> glibc. I can try making Guile use a newer GNUlib for its regular
> expressions, maybe that helps. Shall I file a separate bug for Guile?
>
Also native Turkish speaker here, and yeah that seems like a clear bug.
By the way, Turkish doesn't have q, w, or x. So if [a-z] is interpreted
by locale, it would fail to match those letters. I suppose that doesn't
matter for the patch you guys used but it might have been part of the
original problem.
The dotless lowercase i / dotted uppercase I mostly bites programmers in
case conversion. The uppercase of i is İ and the lowercase of I is ı.
There was even an exploit in GitHub related to this:
https://eng.getwisdom.io/hacking-github-with-unicode-dotless-i/
- Taylan
^ permalink raw reply [flat|nested] 13+ messages in thread
* bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales
2021-05-05 4:47 ` Maxim Cournoyer
@ 2021-05-05 9:22 ` pelzflorian (Florian Pelz)
0 siblings, 0 replies; 13+ messages in thread
From: pelzflorian (Florian Pelz) @ 2021-05-05 9:22 UTC (permalink / raw)
To: Maxim Cournoyer; +Cc: 39970-done
On Wed, May 05, 2021 at 12:47:02AM -0400, Maxim Cournoyer wrote:
> Closing.
>
> Thank you,
>
> Maxim
Sorry for forgetting about this bug. The above
LC_ALL=tr_TR.utf8 make check TESTS=tests/cran.scm
is *not* fixed, but I won’t take the time to really understand and fix
the few remaining troubles, I think. Possibly libc bug
<https://sourceware.org/bugzilla/show_bug.cgi?id=23393> is the real
issue.
Regards,
Florian
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2021-05-05 9:24 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-03-07 12:00 bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales pelzflorian (Florian Pelz)
2020-03-07 15:20 ` pelzflorian (Florian Pelz)
2020-03-08 7:08 ` pelzflorian (Florian Pelz)
2020-03-09 17:02 ` Ludovic Courtès
2020-03-12 11:02 ` pelzflorian (Florian Pelz)
2020-03-12 16:05 ` Ludovic Courtès
2020-03-17 9:44 ` pelzflorian (Florian Pelz)
2020-03-17 21:20 ` Ludovic Courtès
2020-03-18 6:47 ` pelzflorian (Florian Pelz)
2020-03-18 8:40 ` Ludovic Courtès
2021-05-05 4:47 ` Maxim Cournoyer
2021-05-05 9:22 ` pelzflorian (Florian Pelz)
2021-05-05 7:04 ` Taylan Kammer
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).