From: Taylan Kammer <taylan.kammer@gmail.com>
To: "pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de>,
"Ludovic Courtès" <ludo@gnu.org>
Cc: 39970@debbugs.gnu.org
Subject: bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales
Date: Wed, 5 May 2021 09:04:32 +0200 [thread overview]
Message-ID: <e3929738-588c-e27a-69b4-bb405f536e75@gmail.com> (raw)
In-Reply-To: <20200312110206.2hsinzejnmcefmot@pelzflorian.localdomain>
On 12.03.2020 12:02, pelzflorian (Florian Pelz) wrote:
>
> Guile’s behavior that i is not among [a-z] has been confirmed as
> unexpected by a natively Turkish friend of mine. It is different from
> the behavior of current glibc:
>
> florian@florianmacbook ~$ cat iyiyim.c
> #include <regex.h>
> #include <stdio.h>
> #include <stdlib.h>
> #define STR "iyiyım"
> int main (int argc,
> char** argv)
> {
> regex_t only_letters;
> int r = regcomp (&only_letters, "[a-z]+", REG_EXTENDED);
> if (r != 0)
> printf ("This error does not happen.\n");
> r = regexec (&only_letters, STR, 1, malloc (sizeof (regmatch_t)), 0);
> if (r == 0)
> printf ("The string " STR " matched!\n");
> else
> printf ("No match for " STR ".\n");
> }
> florian@florianmacbook ~$ gcc -o iyiyim iyiyim.c
> florian@florianmacbook ~$ LANG=tr_TR.utf8 ./iyiyim
> The string iyiyım matched!
>
> Apparently Guile uses a bundled regular expression library rather than
> glibc. I can try making Guile use a newer GNUlib for its regular
> expressions, maybe that helps. Shall I file a separate bug for Guile?
>
Also native Turkish speaker here, and yeah that seems like a clear bug.
By the way, Turkish doesn't have q, w, or x. So if [a-z] is interpreted
by locale, it would fail to match those letters. I suppose that doesn't
matter for the patch you guys used but it might have been part of the
original problem.
The dotless lowercase i / dotted uppercase I mostly bites programmers in
case conversion. The uppercase of i is İ and the lowercase of I is ı.
There was even an exploit in GitHub related to this:
https://eng.getwisdom.io/hacking-github-with-unicode-dotless-i/
- Taylan
prev parent reply other threads:[~2021-05-05 7:07 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-07 12:00 bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales pelzflorian (Florian Pelz)
2020-03-07 15:20 ` pelzflorian (Florian Pelz)
2020-03-08 7:08 ` pelzflorian (Florian Pelz)
2020-03-09 17:02 ` Ludovic Courtès
2020-03-12 11:02 ` pelzflorian (Florian Pelz)
2020-03-12 16:05 ` Ludovic Courtès
2020-03-17 9:44 ` pelzflorian (Florian Pelz)
2020-03-17 21:20 ` Ludovic Courtès
2020-03-18 6:47 ` pelzflorian (Florian Pelz)
2020-03-18 8:40 ` Ludovic Courtès
2021-05-05 4:47 ` Maxim Cournoyer
2021-05-05 9:22 ` pelzflorian (Florian Pelz)
2021-05-05 7:04 ` Taylan Kammer [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e3929738-588c-e27a-69b4-bb405f536e75@gmail.com \
--to=taylan.kammer@gmail.com \
--cc=39970@debbugs.gnu.org \
--cc=ludo@gnu.org \
--cc=pelzflorian@pelzflorian.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/guix.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.