* bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales @ 2020-03-07 12:00 pelzflorian (Florian Pelz) 2020-03-07 15:20 ` pelzflorian (Florian Pelz) 0 siblings, 1 reply; 13+ messages in thread From: pelzflorian (Florian Pelz) @ 2020-03-07 12:00 UTC (permalink / raw) To: 39970 [-- Attachment #1: Type: text/plain, Size: 1298 bytes --] After running export LC_ALL=tr_TR.utf8 many important Guix commands like 'guix environment', 'guix install' and 'guix pull' fail. $ guix environment --ad-hoc hello Backtrace: 1 (primitive-load "/home/florian/.config/guix/current/bin…") In guix/ui.scm: 1826:12 0 (run-guix-command _ . _) guix/ui.scm:1826:12: In procedure run-guix-command: In procedure string-length: Wrong type argument in position 1 (expecting string): #f Running guix via ./pre-inst-env gives a more useful backtrace. The reason is that in guix/store.scm (use-modules (ice-9 regex)) (regexp-exec (make-regexp "^/gnu/store/([0-9a-df-np-sv-z]{32})-([^/]+)$") "/gnu/store/bv9py3f2dsa5iw0aijqjv9zxwprcy1nb-fontconfig-2.13.1.drv") evaluates to #f in Turkish, possibly because of the presence of dotless i (ı) in the range. The attached patch fixes the issue by including i explicitly, but I believe enumerating all of [0-9abcdfghijklmnpqrsvwxyz] explicitly might be more future-proof. Shall I push the patch modified to list all letters in [0-9abcdfghijklmnpqrsvwxyz] explicitly? Numbers too? I suppose there is no downside to listing all without ranges. I wonder what else is affected; the installer maybe? I have not tested yet. Regards, Florian [-- Attachment #2: 0001-store-Fix-many-guix-commands-failing-on-some-locales.patch --] [-- Type: text/plain, Size: 1034 bytes --] From 4445284e9fd40b3e271fa7b511d2856c03c8ccfb Mon Sep 17 00:00:00 2001 From: Florian Pelz <pelzflorian@pelzflorian.de> Date: Sat, 7 Mar 2020 11:38:59 +0100 Subject: [PATCH] store: Fix many guix commands failing on some locales. At least 'guix environment', 'guix install' and 'guix pull' on 'az_AZ.utf8' and 'tr_TR.utf8' are affected. * guix/store.scm (store-regexp*): Avoid dependence on locale. --- guix/store.scm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/guix/store.scm b/guix/store.scm index f99fa581a8..a1d9713c24 100644 --- a/guix/store.scm +++ b/guix/store.scm @@ -1949,7 +1949,7 @@ valid inputs." (mlambda (store) "Return a regexp matching a file in STORE." (make-regexp (string-append "^" (regexp-quote store) - "/([0-9a-df-np-sv-z]{32})-([^/]+)$")))) + "/([0-9a-df-hij-np-sv-z]{32})-([^/]+)$")))) (define (store-path-package-name path) "Return the package name part of PATH, a file name in the store." -- 2.25.0 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales 2020-03-07 12:00 bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales pelzflorian (Florian Pelz) @ 2020-03-07 15:20 ` pelzflorian (Florian Pelz) 2020-03-08 7:08 ` pelzflorian (Florian Pelz) 0 siblings, 1 reply; 13+ messages in thread From: pelzflorian (Florian Pelz) @ 2020-03-07 15:20 UTC (permalink / raw) To: 39970 On Sat, Mar 07, 2020 at 01:00:52PM +0100, pelzflorian (Florian Pelz) wrote: > Running guix via ./pre-inst-env gives a more useful backtrace. The > reason is that in guix/store.scm > > (use-modules (ice-9 regex)) > (regexp-exec (make-regexp "^/gnu/store/([0-9a-df-np-sv-z]{32})-([^/]+)$") > "/gnu/store/bv9py3f2dsa5iw0aijqjv9zxwprcy1nb-fontconfig-2.13.1.drv") > > evaluates to #f in Turkish, possibly because of the presence of > dotless i (ı) in the range. > Actually it seems the issue is that i is missing from the range [a-z] ı and ğ are missing as well, as are non-Turkish letters like ä that are included when using the en_US.utf8 locale, even though they are no English letters either. (use-modules (ice-9 regex)) (regexp-exec (make-regexp "^([a-z]+)$") "iyiyim") fails. But running a glibc C program florian@florianmacbook ~$ cat iyiyim.c #include <regex.h> #include <stdio.h> #define STR "iyiyim" int main (int argc, char** argv) { regex_t only_letters; int r = regcomp (&only_letters, "[a-z]", 0); if (r != 0) printf ("This error does not happen.\n"); r = regexec (&only_letters, STR, 0, NULL, 0); if (r == 0) printf ("The string " STR " matched!\n"); else printf ("No match for " STR ".\n"); } florian@florianmacbook ~$ gcc -o iyiyim iyiyim.c florian@florianmacbook ~$ LANG=tr_TR.utf8 ./iyiyim The string iyiyim matched! succeeds on tr_TR.utf8 and en_US.utf8 locales (and a native Turkish speaker confirmed to me ıi should be in the alphabet right after h). Maybe this is a bug in Guile, somehow? > […] > I wonder what else is affected; the installer maybe? I have not > tested yet. > I checked; the graphical installer appears unaffected, but the issue appears on the installed system. Regards, Florian ^ permalink raw reply [flat|nested] 13+ messages in thread
* bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales 2020-03-07 15:20 ` pelzflorian (Florian Pelz) @ 2020-03-08 7:08 ` pelzflorian (Florian Pelz) 2020-03-09 17:02 ` Ludovic Courtès 0 siblings, 1 reply; 13+ messages in thread From: pelzflorian (Florian Pelz) @ 2020-03-08 7:08 UTC (permalink / raw) To: 39970 This seems similar to <https://bugs.gnu.org/35785>. I think enumerating all characters explicitly is a similar fix, whether or not there is a bug in Guile. Regards, Florian ^ permalink raw reply [flat|nested] 13+ messages in thread
* bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales 2020-03-08 7:08 ` pelzflorian (Florian Pelz) @ 2020-03-09 17:02 ` Ludovic Courtès 2020-03-12 11:02 ` pelzflorian (Florian Pelz) 0 siblings, 1 reply; 13+ messages in thread From: Ludovic Courtès @ 2020-03-09 17:02 UTC (permalink / raw) To: pelzflorian (Florian Pelz); +Cc: 39970 Hi Florian, "pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis: > This seems similar to <https://bugs.gnu.org/35785>. Yes, same story. > I think enumerating all characters explicitly is a similar fix, > whether or not there is a bug in Guile. To me it’s not a bug in Guile, but simply the fact that regexps, as implemented by the C library, are locale-dependent. The patch you proposed looks good to me, though perhaps we could explicitly list all the alphabet in the regexp? A better option is to reimplement ‘store-path-package-name’ in a way similar to ‘store-path-hash-part’, as in commit 35eb77b09d957019b2437e7681bd88013d67d3cd. Thoughts? Ludo’. ^ permalink raw reply [flat|nested] 13+ messages in thread
* bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales 2020-03-09 17:02 ` Ludovic Courtès @ 2020-03-12 11:02 ` pelzflorian (Florian Pelz) 2020-03-12 16:05 ` Ludovic Courtès 2021-05-05 7:04 ` Taylan Kammer 0 siblings, 2 replies; 13+ messages in thread From: pelzflorian (Florian Pelz) @ 2020-03-12 11:02 UTC (permalink / raw) To: Ludovic Courtès; +Cc: 39970 [-- Attachment #1: Type: text/plain, Size: 1992 bytes --] On Mon, Mar 09, 2020 at 06:02:40PM +0100, Ludovic Courtès wrote: > To me it’s not a bug in Guile, but simply the fact that regexps, as > implemented by the C library, are locale-dependent. > (use-modules (ice-9 regex)) (regexp-exec (make-regexp "^([a-z]+)$") "iyiyim") ⇒ #f Guile’s behavior that i is not among [a-z] has been confirmed as unexpected by a natively Turkish friend of mine. It is different from the behavior of current glibc: florian@florianmacbook ~$ cat iyiyim.c #include <regex.h> #include <stdio.h> #include <stdlib.h> #define STR "iyiyım" int main (int argc, char** argv) { regex_t only_letters; int r = regcomp (&only_letters, "[a-z]+", REG_EXTENDED); if (r != 0) printf ("This error does not happen.\n"); r = regexec (&only_letters, STR, 1, malloc (sizeof (regmatch_t)), 0); if (r == 0) printf ("The string " STR " matched!\n"); else printf ("No match for " STR ".\n"); } florian@florianmacbook ~$ gcc -o iyiyim iyiyim.c florian@florianmacbook ~$ LANG=tr_TR.utf8 ./iyiyim The string iyiyım matched! Apparently Guile uses a bundled regular expression library rather than glibc. I can try making Guile use a newer GNUlib for its regular expressions, maybe that helps. Shall I file a separate bug for Guile? > The patch you proposed looks good to me, though perhaps we could > explicitly list all the alphabet in the regexp? > > A better option is to reimplement ‘store-path-package-name’ in a way > similar to ‘store-path-hash-part’, as in commit > 35eb77b09d957019b2437e7681bd88013d67d3cd. I suppose it would be better to cache the compiled regexp. What is this mcached syntax inside (guix store)? Or do I use Scheme’s 'delay' and 'force' for caching? The attached patch fixes the regexp. Shall I push the attached patch and then try making it cache the compiled regexp or do you still prefer an implementation without regexps? Why would not using a regexp be better? Regards, Florian [-- Attachment #2: 0001-store-Fix-many-guix-commands-failing-on-some-locales.patch --] [-- Type: text/plain, Size: 1028 bytes --] From: Florian Pelz <pelzflorian@pelzflorian.de> Date: Thu, 12 Mar 2020 11:08:16 +0100 Subject: [PATCH] store: Fix many guix commands failing on some locales. Fixes bug #39970 (see: https://bugs.gnu.org/39970). At least 'guix environment', 'guix install' and 'guix pull' on 'az_AZ.utf8' and 'tr_TR.utf8' are affected. * guix/store.scm (store-regexp*): Avoid dependence on locale. --- guix/store.scm | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/guix/store.scm b/guix/store.scm index f99fa581a8..82d7403bb6 100644 --- a/guix/store.scm +++ b/guix/store.scm @@ -1949,7 +1949,8 @@ valid inputs." (mlambda (store) "Return a regexp matching a file in STORE." (make-regexp (string-append "^" (regexp-quote store) - "/([0-9a-df-np-sv-z]{32})-([^/]+)$")))) + "\ +/([0-9abcdfghijklmnpqrsvwxyz]{32})-([^/]+)$")))) (define (store-path-package-name path) "Return the package name part of PATH, a file name in the store." -- 2.25.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales 2020-03-12 11:02 ` pelzflorian (Florian Pelz) @ 2020-03-12 16:05 ` Ludovic Courtès 2020-03-17 9:44 ` pelzflorian (Florian Pelz) 2021-05-05 7:04 ` Taylan Kammer 1 sibling, 1 reply; 13+ messages in thread From: Ludovic Courtès @ 2020-03-12 16:05 UTC (permalink / raw) To: pelzflorian (Florian Pelz); +Cc: 39970 Hi Florian, "pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis: > On Mon, Mar 09, 2020 at 06:02:40PM +0100, Ludovic Courtès wrote: >> To me it’s not a bug in Guile, but simply the fact that regexps, as >> implemented by the C library, are locale-dependent. >> > > (use-modules (ice-9 regex)) > (regexp-exec (make-regexp "^([a-z]+)$") > "iyiyim") > ⇒ #f > > Guile’s behavior that i is not among [a-z] has been confirmed as > unexpected by a natively Turkish friend of mine. It is different from > the behavior of current glibc: > > florian@florianmacbook ~$ cat iyiyim.c > #include <regex.h> > #include <stdio.h> > #include <stdlib.h> > #define STR "iyiyım" > int main (int argc, > char** argv) > { You’re seeing a different behavior because you forgot a: setlocale (LC_ALL, ""); call here. >> The patch you proposed looks good to me, though perhaps we could >> explicitly list all the alphabet in the regexp? >> >> A better option is to reimplement ‘store-path-package-name’ in a way >> similar to ‘store-path-hash-part’, as in commit >> 35eb77b09d957019b2437e7681bd88013d67d3cd. > > I suppose it would be better to cache the compiled regexp. What is > this mcached syntax inside (guix store)? Or do I use Scheme’s 'delay' > and 'force' for caching? I lean towards avoiding regexps altogether, as I wrote above. WDYT? > The attached patch fixes the regexp. Shall I push the attached patch > and then try making it cache the compiled regexp or do you still > prefer an implementation without regexps? Why would not using a > regexp be better? It reduces reliance on libc, reduces complexity, and performs better as noted in the commit log of 35eb77b09d957019b2437e7681bd88013d67d3cd. Thanks, Ludo’. ^ permalink raw reply [flat|nested] 13+ messages in thread
* bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales 2020-03-12 16:05 ` Ludovic Courtès @ 2020-03-17 9:44 ` pelzflorian (Florian Pelz) 2020-03-17 21:20 ` Ludovic Courtès 0 siblings, 1 reply; 13+ messages in thread From: pelzflorian (Florian Pelz) @ 2020-03-17 9:44 UTC (permalink / raw) To: Ludovic Courtès; +Cc: 39970 [-- Attachment #1: Type: text/plain, Size: 696 bytes --] On Thu, Mar 12, 2020 at 05:05:26PM +0100, Ludovic Courtès wrote: > "pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis: > > Why would not using a regexp be better? > > It reduces reliance on libc, reduces complexity, and performs better as > noted in the commit log of 35eb77b09d957019b2437e7681bd88013d67d3cd. Thank you for your wisdom. I hope the attached patch is OK. `LC_ALL=en_US.utf8 make check` is mostly fine (except tests/pack.scm, which also failed before). Manual testing of `./pre-inst-env guix environment` works. `LC_ALL=tr_TR.utf8 make check` is still very unhappy though. There are many failures. I will continue to investigate later today. Regards, Florian [-- Attachment #2: 0001-store-Fix-many-guix-commands-failing-on-some-locales.patch --] [-- Type: text/plain, Size: 3531 bytes --] From: Florian Pelz <pelzflorian@pelzflorian.de> Date: Thu, 12 Mar 2020 11:08:16 +0100 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Subject: [PATCH] store: Fix many guix commands failing on some locales. Partly fixes bug #39970 (see: https://bugs.gnu.org/39970). At least 'guix environment', 'guix install' and 'guix pull' on 'az_AZ.utf8' and 'tr_TR.utf8' were affected. * guix/store.scm (store-path-hash-part): Move base path detection to ... (store-path-base): ... this new exported procedure. (store-path-package-name): Use it instead of locale-dependent regexps. (store-regexp*): Remove. --- guix/store.scm | 32 +++++++++++++++----------------- 1 file changed, 15 insertions(+), 17 deletions(-) diff --git a/guix/store.scm b/guix/store.scm index f99fa581a8..5465204f5f 100644 --- a/guix/store.scm +++ b/guix/store.scm @@ -2,6 +2,7 @@ ;;; Copyright © 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019 Ludovic Courtès <ludo@gnu.org> ;;; Copyright © 2018 Jan Nieuwenhuizen <janneke@gnu.org> ;;; Copyright © 2019 Mathieu Othacehe <m.othacehe@gmail.com> +;;; Copyright © 2020 Florian Pelz <pelzflorian@pelzflorian.de> ;;; ;;; This file is part of GNU Guix. ;;; @@ -43,7 +44,6 @@ #:use-module (srfi srfi-35) #:use-module (srfi srfi-39) #:use-module (ice-9 match) - #:use-module (ice-9 regex) #:use-module (ice-9 vlist) #:use-module (ice-9 popen) #:use-module (ice-9 threads) @@ -172,6 +172,7 @@ store-path? direct-store-path? derivation-path? + store-path-base store-path-package-name store-path-hash-part direct-store-path @@ -1943,29 +1944,26 @@ valid inputs." "Return #t if PATH is a derivation path." (and (store-path? path) (string-suffix? ".drv" path))) -(define store-regexp* - ;; The substituter makes repeated calls to 'store-path-hash-part', hence - ;; this optimization. - (mlambda (store) - "Return a regexp matching a file in STORE." - (make-regexp (string-append "^" (regexp-quote store) - "/([0-9a-df-np-sv-z]{32})-([^/]+)$")))) +(define (store-path-base path) + "Return the base path of a path in the store." + (and (string-prefix? (%store-prefix) path) + (let ((base (string-drop path (+ 1 (string-length (%store-prefix)))))) + (and (> (string-length base) 33) + (not (string-index base #\/)) + base)))) (define (store-path-package-name path) "Return the package name part of PATH, a file name in the store." - (let ((path-rx (store-regexp* (%store-prefix)))) - (and=> (regexp-exec path-rx path) - (cut match:substring <> 2)))) + (let ((base (store-path-base path))) + (string-drop base (+ 32 1)))) ;32 hash part + 1 hyphen (define (store-path-hash-part path) "Return the hash part of PATH as a base32 string, or #f if PATH is not a syntactically valid store path." - (and (string-prefix? (%store-prefix) path) - (let ((base (string-drop path (+ 1 (string-length (%store-prefix)))))) - (and (> (string-length base) 33) - (let ((hash (string-take base 32))) - (and (string-every %nix-base32-charset hash) - hash)))))) + (let* ((base (store-path-base path)) + (hash (string-take base 32))) + (and (string-every %nix-base32-charset hash) + hash))) (define (derivation-log-file drv) "Return the build log file for DRV, a derivation file name, or #f if it -- 2.25.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales 2020-03-17 9:44 ` pelzflorian (Florian Pelz) @ 2020-03-17 21:20 ` Ludovic Courtès 2020-03-18 6:47 ` pelzflorian (Florian Pelz) 0 siblings, 1 reply; 13+ messages in thread From: Ludovic Courtès @ 2020-03-17 21:20 UTC (permalink / raw) To: pelzflorian (Florian Pelz); +Cc: 39970 Hi, "pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis: > On Thu, Mar 12, 2020 at 05:05:26PM +0100, Ludovic Courtès wrote: >> "pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis: >> > Why would not using a regexp be better? >> >> It reduces reliance on libc, reduces complexity, and performs better as >> noted in the commit log of 35eb77b09d957019b2437e7681bd88013d67d3cd. > > Thank you for your wisdom. I hope the attached patch is OK. > > `LC_ALL=en_US.utf8 make check` is mostly fine (except tests/pack.scm, > which also failed before). > > Manual testing of `./pre-inst-env guix environment` works. Good! > `LC_ALL=tr_TR.utf8 make check` is still very unhappy though. > There are many failures. I will continue to investigate later today. OK. > From: Florian Pelz <pelzflorian@pelzflorian.de> > Date: Thu, 12 Mar 2020 11:08:16 +0100 > Content-Type: text/plain; charset=UTF-8 > Content-Transfer-Encoding: 8bit > Subject: [PATCH] store: Fix many guix commands failing on some locales. > > Partly fixes bug #39970 (see: https://bugs.gnu.org/39970). I’d just write: Partly fixes <https://bugs.gnu.org/39970>. Concise, clear, greppable. :-) > At least 'guix environment', 'guix install' and 'guix pull' > on 'az_AZ.utf8' and 'tr_TR.utf8' were affected. > > * guix/store.scm (store-path-hash-part): Move base path detection to ... > (store-path-base): ... this new exported procedure. > (store-path-package-name): Use it instead of locale-dependent regexps. > (store-regexp*): Remove. LGTM, thank you! Ludo’. ^ permalink raw reply [flat|nested] 13+ messages in thread
* bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales 2020-03-17 21:20 ` Ludovic Courtès @ 2020-03-18 6:47 ` pelzflorian (Florian Pelz) 2020-03-18 8:40 ` Ludovic Courtès 2021-05-05 4:47 ` Maxim Cournoyer 0 siblings, 2 replies; 13+ messages in thread From: pelzflorian (Florian Pelz) @ 2020-03-18 6:47 UTC (permalink / raw) To: Ludovic Courtès; +Cc: 39970 On Tue, Mar 17, 2020 at 10:20:01PM +0100, Ludovic Courtès wrote: > "pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis: > > `LC_ALL=tr_TR.utf8 make check` is still very unhappy though. > > There are many failures. I will continue to investigate later today. > > OK. The tests fail to many other uses of [a-z] in regexps. I will look; for e.g. guix/import/cran.scm (if (string-match "^[A-Za-z][^ :]+:( |\n|$)" line) …) it would be easier and clearer to just list [a-z] explicitly: > LGTM, thank you! :) Pushed as 771c5e155d7862ed91a5d503eecc00c1db1150ad. Regards, Florian ^ permalink raw reply [flat|nested] 13+ messages in thread
* bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales 2020-03-18 6:47 ` pelzflorian (Florian Pelz) @ 2020-03-18 8:40 ` Ludovic Courtès 2021-05-05 4:47 ` Maxim Cournoyer 1 sibling, 0 replies; 13+ messages in thread From: Ludovic Courtès @ 2020-03-18 8:40 UTC (permalink / raw) To: pelzflorian (Florian Pelz); +Cc: 39970 "pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis: > On Tue, Mar 17, 2020 at 10:20:01PM +0100, Ludovic Courtès wrote: >> "pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis: >> > `LC_ALL=tr_TR.utf8 make check` is still very unhappy though. >> > There are many failures. I will continue to investigate later today. >> >> OK. > > The tests fail to many other uses of [a-z] in regexps. I will look; > for e.g. guix/import/cran.scm > > (if (string-match "^[A-Za-z][^ :]+:( |\n|$)" line) > …) > > it would be easier and clearer to just list [a-z] explicitly: Yes, agreed. It would be nice if ‘string-match’ & co. could take an optional locale object (info "(guile) i18n Introduction") but that’s not the case currently. Thanks, Ludo’. ^ permalink raw reply [flat|nested] 13+ messages in thread
* bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales 2020-03-18 6:47 ` pelzflorian (Florian Pelz) 2020-03-18 8:40 ` Ludovic Courtès @ 2021-05-05 4:47 ` Maxim Cournoyer 2021-05-05 9:22 ` pelzflorian (Florian Pelz) 1 sibling, 1 reply; 13+ messages in thread From: Maxim Cournoyer @ 2021-05-05 4:47 UTC (permalink / raw) To: pelzflorian (Florian Pelz); +Cc: 39970-done "pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> writes: > On Tue, Mar 17, 2020 at 10:20:01PM +0100, Ludovic Courtès wrote: >> "pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis: >> > `LC_ALL=tr_TR.utf8 make check` is still very unhappy though. >> > There are many failures. I will continue to investigate later today. >> >> OK. > > The tests fail to many other uses of [a-z] in regexps. I will look; > for e.g. guix/import/cran.scm > > (if (string-match "^[A-Za-z][^ :]+:( |\n|$)" line) > …) > > it would be easier and clearer to just list [a-z] explicitly: > > >> LGTM, thank you! > > :) Pushed as 771c5e155d7862ed91a5d503eecc00c1db1150ad. Closing. Thank you, Maxim ^ permalink raw reply [flat|nested] 13+ messages in thread
* bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales 2021-05-05 4:47 ` Maxim Cournoyer @ 2021-05-05 9:22 ` pelzflorian (Florian Pelz) 0 siblings, 0 replies; 13+ messages in thread From: pelzflorian (Florian Pelz) @ 2021-05-05 9:22 UTC (permalink / raw) To: Maxim Cournoyer; +Cc: 39970-done On Wed, May 05, 2021 at 12:47:02AM -0400, Maxim Cournoyer wrote: > Closing. > > Thank you, > > Maxim Sorry for forgetting about this bug. The above LC_ALL=tr_TR.utf8 make check TESTS=tests/cran.scm is *not* fixed, but I won’t take the time to really understand and fix the few remaining troubles, I think. Possibly libc bug <https://sourceware.org/bugzilla/show_bug.cgi?id=23393> is the real issue. Regards, Florian ^ permalink raw reply [flat|nested] 13+ messages in thread
* bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales 2020-03-12 11:02 ` pelzflorian (Florian Pelz) 2020-03-12 16:05 ` Ludovic Courtès @ 2021-05-05 7:04 ` Taylan Kammer 1 sibling, 0 replies; 13+ messages in thread From: Taylan Kammer @ 2021-05-05 7:04 UTC (permalink / raw) To: pelzflorian (Florian Pelz), Ludovic Courtès; +Cc: 39970 On 12.03.2020 12:02, pelzflorian (Florian Pelz) wrote: > > Guile’s behavior that i is not among [a-z] has been confirmed as > unexpected by a natively Turkish friend of mine. It is different from > the behavior of current glibc: > > florian@florianmacbook ~$ cat iyiyim.c > #include <regex.h> > #include <stdio.h> > #include <stdlib.h> > #define STR "iyiyım" > int main (int argc, > char** argv) > { > regex_t only_letters; > int r = regcomp (&only_letters, "[a-z]+", REG_EXTENDED); > if (r != 0) > printf ("This error does not happen.\n"); > r = regexec (&only_letters, STR, 1, malloc (sizeof (regmatch_t)), 0); > if (r == 0) > printf ("The string " STR " matched!\n"); > else > printf ("No match for " STR ".\n"); > } > florian@florianmacbook ~$ gcc -o iyiyim iyiyim.c > florian@florianmacbook ~$ LANG=tr_TR.utf8 ./iyiyim > The string iyiyım matched! > > Apparently Guile uses a bundled regular expression library rather than > glibc. I can try making Guile use a newer GNUlib for its regular > expressions, maybe that helps. Shall I file a separate bug for Guile? > Also native Turkish speaker here, and yeah that seems like a clear bug. By the way, Turkish doesn't have q, w, or x. So if [a-z] is interpreted by locale, it would fail to match those letters. I suppose that doesn't matter for the patch you guys used but it might have been part of the original problem. The dotless lowercase i / dotted uppercase I mostly bites programmers in case conversion. The uppercase of i is İ and the lowercase of I is ı. There was even an exploit in GitHub related to this: https://eng.getwisdom.io/hacking-github-with-unicode-dotless-i/ - Taylan ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2021-05-05 9:24 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-03-07 12:00 bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales pelzflorian (Florian Pelz) 2020-03-07 15:20 ` pelzflorian (Florian Pelz) 2020-03-08 7:08 ` pelzflorian (Florian Pelz) 2020-03-09 17:02 ` Ludovic Courtès 2020-03-12 11:02 ` pelzflorian (Florian Pelz) 2020-03-12 16:05 ` Ludovic Courtès 2020-03-17 9:44 ` pelzflorian (Florian Pelz) 2020-03-17 21:20 ` Ludovic Courtès 2020-03-18 6:47 ` pelzflorian (Florian Pelz) 2020-03-18 8:40 ` Ludovic Courtès 2021-05-05 4:47 ` Maxim Cournoyer 2021-05-05 9:22 ` pelzflorian (Florian Pelz) 2021-05-05 7:04 ` Taylan Kammer
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/guix.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).