unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales
@ 2020-03-07 12:00 pelzflorian (Florian Pelz)
  2020-03-07 15:20 ` pelzflorian (Florian Pelz)
  0 siblings, 1 reply; 13+ messages in thread
From: pelzflorian (Florian Pelz) @ 2020-03-07 12:00 UTC (permalink / raw)
  To: 39970

[-- Attachment #1: Type: text/plain, Size: 1298 bytes --]

After running

export LC_ALL=tr_TR.utf8

many important Guix commands like 'guix environment', 'guix install'
and 'guix pull' fail.

$ guix environment --ad-hoc hello
Backtrace:
           1 (primitive-load "/home/florian/.config/guix/current/bin…")
In guix/ui.scm:
  1826:12  0 (run-guix-command _ . _)

guix/ui.scm:1826:12: In procedure run-guix-command:
In procedure string-length: Wrong type argument in position 1 (expecting string): #f


Running guix via ./pre-inst-env gives a more useful backtrace.  The
reason is that in guix/store.scm

(use-modules (ice-9 regex))
(regexp-exec (make-regexp "^/gnu/store/([0-9a-df-np-sv-z]{32})-([^/]+)$")
             "/gnu/store/bv9py3f2dsa5iw0aijqjv9zxwprcy1nb-fontconfig-2.13.1.drv")

evaluates to #f in Turkish, possibly because of the presence of
dotless i (ı) in the range.

The attached patch fixes the issue by including i explicitly, but I
believe enumerating all of [0-9abcdfghijklmnpqrsvwxyz] explicitly
might be more future-proof.

Shall I push the patch modified to list all letters in
[0-9abcdfghijklmnpqrsvwxyz] explicitly?  Numbers too?  I suppose there
is no downside to listing all without ranges.

I wonder what else is affected; the installer maybe?  I have not
tested yet.

Regards,
Florian

[-- Attachment #2: 0001-store-Fix-many-guix-commands-failing-on-some-locales.patch --]
[-- Type: text/plain, Size: 1034 bytes --]

From 4445284e9fd40b3e271fa7b511d2856c03c8ccfb Mon Sep 17 00:00:00 2001
From: Florian Pelz <pelzflorian@pelzflorian.de>
Date: Sat, 7 Mar 2020 11:38:59 +0100
Subject: [PATCH] store: Fix many guix commands failing on some locales.

At least 'guix environment', 'guix install' and 'guix pull'
on 'az_AZ.utf8' and 'tr_TR.utf8' are affected.

* guix/store.scm (store-regexp*): Avoid dependence on locale.
---
 guix/store.scm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/guix/store.scm b/guix/store.scm
index f99fa581a8..a1d9713c24 100644
--- a/guix/store.scm
+++ b/guix/store.scm
@@ -1949,7 +1949,7 @@ valid inputs."
   (mlambda (store)
     "Return a regexp matching a file in STORE."
     (make-regexp (string-append "^" (regexp-quote store)
-                                "/([0-9a-df-np-sv-z]{32})-([^/]+)$"))))
+                                "/([0-9a-df-hij-np-sv-z]{32})-([^/]+)$"))))
 
 (define (store-path-package-name path)
   "Return the package name part of PATH, a file name in the store."
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales
  2020-03-07 12:00 bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales pelzflorian (Florian Pelz)
@ 2020-03-07 15:20 ` pelzflorian (Florian Pelz)
  2020-03-08  7:08   ` pelzflorian (Florian Pelz)
  0 siblings, 1 reply; 13+ messages in thread
From: pelzflorian (Florian Pelz) @ 2020-03-07 15:20 UTC (permalink / raw)
  To: 39970

On Sat, Mar 07, 2020 at 01:00:52PM +0100, pelzflorian (Florian Pelz) wrote:
> Running guix via ./pre-inst-env gives a more useful backtrace.  The
> reason is that in guix/store.scm
> 
> (use-modules (ice-9 regex))
> (regexp-exec (make-regexp "^/gnu/store/([0-9a-df-np-sv-z]{32})-([^/]+)$")
>              "/gnu/store/bv9py3f2dsa5iw0aijqjv9zxwprcy1nb-fontconfig-2.13.1.drv")
> 
> evaluates to #f in Turkish, possibly because of the presence of
> dotless i (ı) in the range.
> 

Actually it seems the issue is that i is missing from the range [a-z]
ı and ğ are missing as well, as are non-Turkish letters like ä that
are included when using the en_US.utf8 locale, even though they are no
English letters either.

(use-modules (ice-9 regex))
(regexp-exec (make-regexp "^([a-z]+)$")
             "iyiyim")

fails.

But running a glibc C program

florian@florianmacbook ~$ cat iyiyim.c
#include <regex.h>
#include <stdio.h>
#define STR "iyiyim"
int main (int    argc,
          char** argv)
{
  regex_t only_letters;
  int r = regcomp (&only_letters, "[a-z]", 0);
  if (r != 0)
    printf ("This error does not happen.\n");
  r = regexec (&only_letters, STR, 0, NULL, 0);
  if (r == 0)
    printf ("The string " STR " matched!\n");
  else
    printf ("No match for " STR ".\n");
}
florian@florianmacbook ~$ gcc -o iyiyim iyiyim.c 
florian@florianmacbook ~$ LANG=tr_TR.utf8 ./iyiyim 
The string iyiyim matched!

succeeds on tr_TR.utf8 and en_US.utf8 locales (and a native Turkish
speaker confirmed to me ıi should be in the alphabet right after h).
Maybe this is a bug in Guile, somehow?

> […]
> I wonder what else is affected; the installer maybe?  I have not
> tested yet.
>

I checked; the graphical installer appears unaffected, but the issue
appears on the installed system.

Regards,
Florian

^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales
  2020-03-07 15:20 ` pelzflorian (Florian Pelz)
@ 2020-03-08  7:08   ` pelzflorian (Florian Pelz)
  2020-03-09 17:02     ` Ludovic Courtès
  0 siblings, 1 reply; 13+ messages in thread
From: pelzflorian (Florian Pelz) @ 2020-03-08  7:08 UTC (permalink / raw)
  To: 39970

This seems similar to <https://bugs.gnu.org/35785>.  I think
enumerating all characters explicitly is a similar fix, whether or not
there is a bug in Guile.

Regards,
Florian

^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales
  2020-03-08  7:08   ` pelzflorian (Florian Pelz)
@ 2020-03-09 17:02     ` Ludovic Courtès
  2020-03-12 11:02       ` pelzflorian (Florian Pelz)
  0 siblings, 1 reply; 13+ messages in thread
From: Ludovic Courtès @ 2020-03-09 17:02 UTC (permalink / raw)
  To: pelzflorian (Florian Pelz); +Cc: 39970

Hi Florian,

"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:

> This seems similar to <https://bugs.gnu.org/35785>.

Yes, same story.

> I think enumerating all characters explicitly is a similar fix,
> whether or not there is a bug in Guile.

To me it’s not a bug in Guile, but simply the fact that regexps, as
implemented by the C library, are locale-dependent.

The patch you proposed looks good to me, though perhaps we could
explicitly list all the alphabet in the regexp?

A better option is to reimplement ‘store-path-package-name’ in a way
similar to ‘store-path-hash-part’, as in commit
35eb77b09d957019b2437e7681bd88013d67d3cd.

Thoughts?

Ludo’.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales
  2020-03-09 17:02     ` Ludovic Courtès
@ 2020-03-12 11:02       ` pelzflorian (Florian Pelz)
  2020-03-12 16:05         ` Ludovic Courtès
  2021-05-05  7:04         ` Taylan Kammer
  0 siblings, 2 replies; 13+ messages in thread
From: pelzflorian (Florian Pelz) @ 2020-03-12 11:02 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 39970

[-- Attachment #1: Type: text/plain, Size: 1992 bytes --]

On Mon, Mar 09, 2020 at 06:02:40PM +0100, Ludovic Courtès wrote:
> To me it’s not a bug in Guile, but simply the fact that regexps, as
> implemented by the C library, are locale-dependent.
> 

(use-modules (ice-9 regex))
(regexp-exec (make-regexp "^([a-z]+)$")
             "iyiyim")
⇒ #f

Guile’s behavior that i is not among [a-z] has been confirmed as
unexpected by a natively Turkish friend of mine.  It is different from
the behavior of current glibc:

florian@florianmacbook ~$ cat iyiyim.c
#include <regex.h>
#include <stdio.h>
#include <stdlib.h>
#define STR "iyiyım"
int main (int    argc,
          char** argv)
{
  regex_t only_letters;
  int r = regcomp (&only_letters, "[a-z]+", REG_EXTENDED);
  if (r != 0)
    printf ("This error does not happen.\n");
  r = regexec (&only_letters, STR, 1, malloc (sizeof (regmatch_t)), 0);
  if (r == 0)
    printf ("The string " STR " matched!\n");
  else
    printf ("No match for " STR ".\n");
}
florian@florianmacbook ~$ gcc -o iyiyim iyiyim.c
florian@florianmacbook ~$ LANG=tr_TR.utf8 ./iyiyim 
The string iyiyım matched!

Apparently Guile uses a bundled regular expression library rather than
glibc.  I can try making Guile use a newer GNUlib for its regular
expressions, maybe that helps.  Shall I file a separate bug for Guile?

> The patch you proposed looks good to me, though perhaps we could
> explicitly list all the alphabet in the regexp?
> 
> A better option is to reimplement ‘store-path-package-name’ in a way
> similar to ‘store-path-hash-part’, as in commit
> 35eb77b09d957019b2437e7681bd88013d67d3cd.

I suppose it would be better to cache the compiled regexp.  What is
this mcached syntax inside (guix store)?  Or do I use Scheme’s 'delay'
and 'force' for caching?

The attached patch fixes the regexp.  Shall I push the attached patch
and then try making it cache the compiled regexp or do you still
prefer an implementation without regexps?  Why would not using a
regexp be better?

Regards,
Florian

[-- Attachment #2: 0001-store-Fix-many-guix-commands-failing-on-some-locales.patch --]
[-- Type: text/plain, Size: 1028 bytes --]

From: Florian Pelz <pelzflorian@pelzflorian.de>
Date: Thu, 12 Mar 2020 11:08:16 +0100
Subject: [PATCH] store: Fix many guix commands failing on some locales.

Fixes bug #39970 (see: https://bugs.gnu.org/39970).

At least 'guix environment', 'guix install' and 'guix pull'
on 'az_AZ.utf8' and 'tr_TR.utf8' are affected.

* guix/store.scm (store-regexp*): Avoid dependence on locale.
---
 guix/store.scm | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/guix/store.scm b/guix/store.scm
index f99fa581a8..82d7403bb6 100644
--- a/guix/store.scm
+++ b/guix/store.scm
@@ -1949,7 +1949,8 @@ valid inputs."
   (mlambda (store)
     "Return a regexp matching a file in STORE."
     (make-regexp (string-append "^" (regexp-quote store)
-                                "/([0-9a-df-np-sv-z]{32})-([^/]+)$"))))
+                                "\
+/([0-9abcdfghijklmnpqrsvwxyz]{32})-([^/]+)$"))))
 
 (define (store-path-package-name path)
   "Return the package name part of PATH, a file name in the store."
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales
  2020-03-12 11:02       ` pelzflorian (Florian Pelz)
@ 2020-03-12 16:05         ` Ludovic Courtès
  2020-03-17  9:44           ` pelzflorian (Florian Pelz)
  2021-05-05  7:04         ` Taylan Kammer
  1 sibling, 1 reply; 13+ messages in thread
From: Ludovic Courtès @ 2020-03-12 16:05 UTC (permalink / raw)
  To: pelzflorian (Florian Pelz); +Cc: 39970

Hi Florian,

"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:

> On Mon, Mar 09, 2020 at 06:02:40PM +0100, Ludovic Courtès wrote:
>> To me it’s not a bug in Guile, but simply the fact that regexps, as
>> implemented by the C library, are locale-dependent.
>> 
>
> (use-modules (ice-9 regex))
> (regexp-exec (make-regexp "^([a-z]+)$")
>              "iyiyim")
> ⇒ #f
>
> Guile’s behavior that i is not among [a-z] has been confirmed as
> unexpected by a natively Turkish friend of mine.  It is different from
> the behavior of current glibc:
>
> florian@florianmacbook ~$ cat iyiyim.c
> #include <regex.h>
> #include <stdio.h>
> #include <stdlib.h>
> #define STR "iyiyım"
> int main (int    argc,
>           char** argv)
> {

You’re seeing a different behavior because you forgot a:

  setlocale (LC_ALL, "");

call here.

>> The patch you proposed looks good to me, though perhaps we could
>> explicitly list all the alphabet in the regexp?
>> 
>> A better option is to reimplement ‘store-path-package-name’ in a way
>> similar to ‘store-path-hash-part’, as in commit
>> 35eb77b09d957019b2437e7681bd88013d67d3cd.
>
> I suppose it would be better to cache the compiled regexp.  What is
> this mcached syntax inside (guix store)?  Or do I use Scheme’s 'delay'
> and 'force' for caching?

I lean towards avoiding regexps altogether, as I wrote above.

WDYT?

> The attached patch fixes the regexp.  Shall I push the attached patch
> and then try making it cache the compiled regexp or do you still
> prefer an implementation without regexps?  Why would not using a
> regexp be better?

It reduces reliance on libc, reduces complexity, and performs better as
noted in the commit log of 35eb77b09d957019b2437e7681bd88013d67d3cd.

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales
  2020-03-12 16:05         ` Ludovic Courtès
@ 2020-03-17  9:44           ` pelzflorian (Florian Pelz)
  2020-03-17 21:20             ` Ludovic Courtès
  0 siblings, 1 reply; 13+ messages in thread
From: pelzflorian (Florian Pelz) @ 2020-03-17  9:44 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 39970

[-- Attachment #1: Type: text/plain, Size: 696 bytes --]

On Thu, Mar 12, 2020 at 05:05:26PM +0100, Ludovic Courtès wrote:
> "pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
> > Why would not using a regexp be better?
> 
> It reduces reliance on libc, reduces complexity, and performs better as
> noted in the commit log of 35eb77b09d957019b2437e7681bd88013d67d3cd.

Thank you for your wisdom.  I hope the attached patch is OK.

`LC_ALL=en_US.utf8 make check` is mostly fine (except tests/pack.scm,
which also failed before).

Manual testing of `./pre-inst-env guix environment` works.

`LC_ALL=tr_TR.utf8 make check` is still very unhappy though.
There are many failures.  I will continue to investigate later today.

Regards,
Florian

[-- Attachment #2: 0001-store-Fix-many-guix-commands-failing-on-some-locales.patch --]
[-- Type: text/plain, Size: 3531 bytes --]

From: Florian Pelz <pelzflorian@pelzflorian.de>
Date: Thu, 12 Mar 2020 11:08:16 +0100
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Subject: [PATCH] store: Fix many guix commands failing on some locales.

Partly fixes bug #39970 (see: https://bugs.gnu.org/39970).

At least 'guix environment', 'guix install' and 'guix pull'
on 'az_AZ.utf8' and 'tr_TR.utf8' were affected.

* guix/store.scm (store-path-hash-part): Move base path detection to ...
(store-path-base): ... this new exported procedure.
(store-path-package-name): Use it instead of locale-dependent regexps.
(store-regexp*): Remove.
---
 guix/store.scm | 32 +++++++++++++++-----------------
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/guix/store.scm b/guix/store.scm
index f99fa581a8..5465204f5f 100644
--- a/guix/store.scm
+++ b/guix/store.scm
@@ -2,6 +2,7 @@
 ;;; Copyright © 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019 Ludovic Courtès <ludo@gnu.org>
 ;;; Copyright © 2018 Jan Nieuwenhuizen <janneke@gnu.org>
 ;;; Copyright © 2019 Mathieu Othacehe <m.othacehe@gmail.com>
+;;; Copyright © 2020 Florian Pelz <pelzflorian@pelzflorian.de>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -43,7 +44,6 @@
   #:use-module (srfi srfi-35)
   #:use-module (srfi srfi-39)
   #:use-module (ice-9 match)
-  #:use-module (ice-9 regex)
   #:use-module (ice-9 vlist)
   #:use-module (ice-9 popen)
   #:use-module (ice-9 threads)
@@ -172,6 +172,7 @@
             store-path?
             direct-store-path?
             derivation-path?
+            store-path-base
             store-path-package-name
             store-path-hash-part
             direct-store-path
@@ -1943,29 +1944,26 @@ valid inputs."
   "Return #t if PATH is a derivation path."
   (and (store-path? path) (string-suffix? ".drv" path)))
 
-(define store-regexp*
-  ;; The substituter makes repeated calls to 'store-path-hash-part', hence
-  ;; this optimization.
-  (mlambda (store)
-    "Return a regexp matching a file in STORE."
-    (make-regexp (string-append "^" (regexp-quote store)
-                                "/([0-9a-df-np-sv-z]{32})-([^/]+)$"))))
+(define (store-path-base path)
+  "Return the base path of a path in the store."
+  (and (string-prefix? (%store-prefix) path)
+       (let ((base (string-drop path (+ 1 (string-length (%store-prefix))))))
+         (and (> (string-length base) 33)
+              (not (string-index base #\/))
+              base))))
 
 (define (store-path-package-name path)
   "Return the package name part of PATH, a file name in the store."
-  (let ((path-rx (store-regexp* (%store-prefix))))
-    (and=> (regexp-exec path-rx path)
-           (cut match:substring <> 2))))
+  (let ((base (store-path-base path)))
+    (string-drop base (+ 32 1)))) ;32 hash part + 1 hyphen
 
 (define (store-path-hash-part path)
   "Return the hash part of PATH as a base32 string, or #f if PATH is not a
 syntactically valid store path."
-  (and (string-prefix? (%store-prefix) path)
-       (let ((base (string-drop path (+ 1 (string-length (%store-prefix))))))
-         (and (> (string-length base) 33)
-              (let ((hash (string-take base 32)))
-                (and (string-every %nix-base32-charset hash)
-                     hash))))))
+  (let* ((base (store-path-base path))
+         (hash (string-take base 32)))
+    (and (string-every %nix-base32-charset hash)
+         hash)))
 
 (define (derivation-log-file drv)
   "Return the build log file for DRV, a derivation file name, or #f if it
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales
  2020-03-17  9:44           ` pelzflorian (Florian Pelz)
@ 2020-03-17 21:20             ` Ludovic Courtès
  2020-03-18  6:47               ` pelzflorian (Florian Pelz)
  0 siblings, 1 reply; 13+ messages in thread
From: Ludovic Courtès @ 2020-03-17 21:20 UTC (permalink / raw)
  To: pelzflorian (Florian Pelz); +Cc: 39970

Hi,

"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:

> On Thu, Mar 12, 2020 at 05:05:26PM +0100, Ludovic Courtès wrote:
>> "pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
>> > Why would not using a regexp be better?
>> 
>> It reduces reliance on libc, reduces complexity, and performs better as
>> noted in the commit log of 35eb77b09d957019b2437e7681bd88013d67d3cd.
>
> Thank you for your wisdom.  I hope the attached patch is OK.
>
> `LC_ALL=en_US.utf8 make check` is mostly fine (except tests/pack.scm,
> which also failed before).
>
> Manual testing of `./pre-inst-env guix environment` works.

Good!

> `LC_ALL=tr_TR.utf8 make check` is still very unhappy though.
> There are many failures.  I will continue to investigate later today.

OK.

> From: Florian Pelz <pelzflorian@pelzflorian.de>
> Date: Thu, 12 Mar 2020 11:08:16 +0100
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> Subject: [PATCH] store: Fix many guix commands failing on some locales.
>
> Partly fixes bug #39970 (see: https://bugs.gnu.org/39970).

I’d just write:

  Partly fixes <https://bugs.gnu.org/39970>.

Concise, clear, greppable.  :-)

> At least 'guix environment', 'guix install' and 'guix pull'
> on 'az_AZ.utf8' and 'tr_TR.utf8' were affected.
>
> * guix/store.scm (store-path-hash-part): Move base path detection to ...
> (store-path-base): ... this new exported procedure.
> (store-path-package-name): Use it instead of locale-dependent regexps.
> (store-regexp*): Remove.

LGTM, thank you!

Ludo’.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales
  2020-03-17 21:20             ` Ludovic Courtès
@ 2020-03-18  6:47               ` pelzflorian (Florian Pelz)
  2020-03-18  8:40                 ` Ludovic Courtès
  2021-05-05  4:47                 ` Maxim Cournoyer
  0 siblings, 2 replies; 13+ messages in thread
From: pelzflorian (Florian Pelz) @ 2020-03-18  6:47 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 39970

On Tue, Mar 17, 2020 at 10:20:01PM +0100, Ludovic Courtès wrote:
> "pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
> > `LC_ALL=tr_TR.utf8 make check` is still very unhappy though.
> > There are many failures.  I will continue to investigate later today.
> 
> OK.

The tests fail to many other uses of [a-z] in regexps.  I will look;
for e.g. guix/import/cran.scm

(if (string-match "^[A-Za-z][^ :]+:( |\n|$)" line)
    …)

it would be easier and clearer to just list [a-z] explicitly:


> LGTM, thank you!

:) Pushed as 771c5e155d7862ed91a5d503eecc00c1db1150ad.

Regards,
Florian

^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales
  2020-03-18  6:47               ` pelzflorian (Florian Pelz)
@ 2020-03-18  8:40                 ` Ludovic Courtès
  2021-05-05  4:47                 ` Maxim Cournoyer
  1 sibling, 0 replies; 13+ messages in thread
From: Ludovic Courtès @ 2020-03-18  8:40 UTC (permalink / raw)
  To: pelzflorian (Florian Pelz); +Cc: 39970

"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:

> On Tue, Mar 17, 2020 at 10:20:01PM +0100, Ludovic Courtès wrote:
>> "pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
>> > `LC_ALL=tr_TR.utf8 make check` is still very unhappy though.
>> > There are many failures.  I will continue to investigate later today.
>> 
>> OK.
>
> The tests fail to many other uses of [a-z] in regexps.  I will look;
> for e.g. guix/import/cran.scm
>
> (if (string-match "^[A-Za-z][^ :]+:( |\n|$)" line)
>     …)
>
> it would be easier and clearer to just list [a-z] explicitly:

Yes, agreed.

It would be nice if ‘string-match’ & co. could take an optional locale
object (info "(guile) i18n Introduction") but that’s not the case
currently.

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales
  2020-03-18  6:47               ` pelzflorian (Florian Pelz)
  2020-03-18  8:40                 ` Ludovic Courtès
@ 2021-05-05  4:47                 ` Maxim Cournoyer
  2021-05-05  9:22                   ` pelzflorian (Florian Pelz)
  1 sibling, 1 reply; 13+ messages in thread
From: Maxim Cournoyer @ 2021-05-05  4:47 UTC (permalink / raw)
  To: pelzflorian (Florian Pelz); +Cc: 39970-done

"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> writes:

> On Tue, Mar 17, 2020 at 10:20:01PM +0100, Ludovic Courtès wrote:
>> "pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
>> > `LC_ALL=tr_TR.utf8 make check` is still very unhappy though.
>> > There are many failures.  I will continue to investigate later today.
>> 
>> OK.
>
> The tests fail to many other uses of [a-z] in regexps.  I will look;
> for e.g. guix/import/cran.scm
>
> (if (string-match "^[A-Za-z][^ :]+:( |\n|$)" line)
>     …)
>
> it would be easier and clearer to just list [a-z] explicitly:
>
>
>> LGTM, thank you!
>
> :) Pushed as 771c5e155d7862ed91a5d503eecc00c1db1150ad.

Closing.

Thank you,

Maxim




^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales
  2020-03-12 11:02       ` pelzflorian (Florian Pelz)
  2020-03-12 16:05         ` Ludovic Courtès
@ 2021-05-05  7:04         ` Taylan Kammer
  1 sibling, 0 replies; 13+ messages in thread
From: Taylan Kammer @ 2021-05-05  7:04 UTC (permalink / raw)
  To: pelzflorian (Florian Pelz), Ludovic Courtès; +Cc: 39970

On 12.03.2020 12:02, pelzflorian (Florian Pelz) wrote:
> 
> Guile’s behavior that i is not among [a-z] has been confirmed as
> unexpected by a natively Turkish friend of mine.  It is different from
> the behavior of current glibc:
> 
> florian@florianmacbook ~$ cat iyiyim.c
> #include <regex.h>
> #include <stdio.h>
> #include <stdlib.h>
> #define STR "iyiyım"
> int main (int    argc,
>           char** argv)
> {
>   regex_t only_letters;
>   int r = regcomp (&only_letters, "[a-z]+", REG_EXTENDED);
>   if (r != 0)
>     printf ("This error does not happen.\n");
>   r = regexec (&only_letters, STR, 1, malloc (sizeof (regmatch_t)), 0);
>   if (r == 0)
>     printf ("The string " STR " matched!\n");
>   else
>     printf ("No match for " STR ".\n");
> }
> florian@florianmacbook ~$ gcc -o iyiyim iyiyim.c
> florian@florianmacbook ~$ LANG=tr_TR.utf8 ./iyiyim 
> The string iyiyım matched!
> 
> Apparently Guile uses a bundled regular expression library rather than
> glibc.  I can try making Guile use a newer GNUlib for its regular
> expressions, maybe that helps.  Shall I file a separate bug for Guile?
> 
Also native Turkish speaker here, and yeah that seems like a clear bug.

By the way, Turkish doesn't have q, w, or x.  So if [a-z] is interpreted
by locale, it would fail to match those letters.  I suppose that doesn't
matter for the patch you guys used but it might have been part of the
original problem.

The dotless lowercase i / dotted uppercase I mostly bites programmers in
case conversion.  The uppercase of i is İ and the lowercase of I is ı.
There was even an exploit in GitHub related to this:

  https://eng.getwisdom.io/hacking-github-with-unicode-dotless-i/


- Taylan




^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales
  2021-05-05  4:47                 ` Maxim Cournoyer
@ 2021-05-05  9:22                   ` pelzflorian (Florian Pelz)
  0 siblings, 0 replies; 13+ messages in thread
From: pelzflorian (Florian Pelz) @ 2021-05-05  9:22 UTC (permalink / raw)
  To: Maxim Cournoyer; +Cc: 39970-done

On Wed, May 05, 2021 at 12:47:02AM -0400, Maxim Cournoyer wrote:
> Closing.
> 
> Thank you,
> 
> Maxim

Sorry for forgetting about this bug.  The above

LC_ALL=tr_TR.utf8 make check TESTS=tests/cran.scm

is *not* fixed, but I won’t take the time to really understand and fix
the few remaining troubles, I think.  Possibly libc bug
<https://sourceware.org/bugzilla/show_bug.cgi?id=23393> is the real
issue.

Regards,
Florian




^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-05-05  9:24 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-03-07 12:00 bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales pelzflorian (Florian Pelz)
2020-03-07 15:20 ` pelzflorian (Florian Pelz)
2020-03-08  7:08   ` pelzflorian (Florian Pelz)
2020-03-09 17:02     ` Ludovic Courtès
2020-03-12 11:02       ` pelzflorian (Florian Pelz)
2020-03-12 16:05         ` Ludovic Courtès
2020-03-17  9:44           ` pelzflorian (Florian Pelz)
2020-03-17 21:20             ` Ludovic Courtès
2020-03-18  6:47               ` pelzflorian (Florian Pelz)
2020-03-18  8:40                 ` Ludovic Courtès
2021-05-05  4:47                 ` Maxim Cournoyer
2021-05-05  9:22                   ` pelzflorian (Florian Pelz)
2021-05-05  7:04         ` Taylan Kammer

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).