From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ludovic =?UTF-8?Q?Court=C3=A8s?= Subject: bug#39970: guix commands broken on Azerbaijani 'az_AZ' and Turkish 'tr_TR' locales Date: Thu, 12 Mar 2020 17:05:26 +0100 Message-ID: <874kutsgmx.fsf@gnu.org> References: <20200307120052.ocwzphlvemvmb2ts@pelzflorian.localdomain> <20200307152003.myj7jkjthokbmark@pelzflorian.localdomain> <20200308070804.ylpb5yrwpgbc3p3w@pelzflorian.localdomain> <8736ah1mxb.fsf@gnu.org> <20200312110206.2hsinzejnmcefmot@pelzflorian.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:470:142:3::10]:54460) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jCQLT-00077M-9v for bug-guix@gnu.org; Thu, 12 Mar 2020 12:06:04 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jCQLS-0002XR-5q for bug-guix@gnu.org; Thu, 12 Mar 2020 12:06:03 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:51758) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1jCQLS-0002XI-1C for bug-guix@gnu.org; Thu, 12 Mar 2020 12:06:02 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1jCQLR-0007Mc-RJ for bug-guix@gnu.org; Thu, 12 Mar 2020 12:06:01 -0400 Sender: "Debbugs-submit" Resent-Message-ID: In-Reply-To: <20200312110206.2hsinzejnmcefmot@pelzflorian.localdomain> (pelzflorian@pelzflorian.de's message of "Thu, 12 Mar 2020 12:02:06 +0100") List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane-mx.org@gnu.org Sender: "bug-Guix" To: "pelzflorian (Florian Pelz)" Cc: 39970@debbugs.gnu.org Hi Florian, "pelzflorian (Florian Pelz)" skribis: > On Mon, Mar 09, 2020 at 06:02:40PM +0100, Ludovic Court=C3=A8s wrote: >> To me it=E2=80=99s not a bug in Guile, but simply the fact that regexps,= as >> implemented by the C library, are locale-dependent. >>=20 > > (use-modules (ice-9 regex)) > (regexp-exec (make-regexp "^([a-z]+)$") > "iyiyim") > =E2=87=92 #f > > Guile=E2=80=99s behavior that i is not among [a-z] has been confirmed as > unexpected by a natively Turkish friend of mine. It is different from > the behavior of current glibc: > > florian@florianmacbook ~$ cat iyiyim.c > #include > #include > #include > #define STR "iyiy=C4=B1m" > int main (int argc, > char** argv) > { You=E2=80=99re seeing a different behavior because you forgot a: setlocale (LC_ALL, ""); call here. >> The patch you proposed looks good to me, though perhaps we could >> explicitly list all the alphabet in the regexp? >>=20 >> A better option is to reimplement =E2=80=98store-path-package-name=E2=80= =99 in a way >> similar to =E2=80=98store-path-hash-part=E2=80=99, as in commit >> 35eb77b09d957019b2437e7681bd88013d67d3cd. > > I suppose it would be better to cache the compiled regexp. What is > this mcached syntax inside (guix store)? Or do I use Scheme=E2=80=99s 'd= elay' > and 'force' for caching? I lean towards avoiding regexps altogether, as I wrote above. WDYT? > The attached patch fixes the regexp. Shall I push the attached patch > and then try making it cache the compiled regexp or do you still > prefer an implementation without regexps? Why would not using a > regexp be better? It reduces reliance on libc, reduces complexity, and performs better as noted in the commit log of 35eb77b09d957019b2437e7681bd88013d67d3cd. Thanks, Ludo=E2=80=99.