From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ludovic =?UTF-8?Q?Court=C3=A8s?= Subject: bug#35785: =?UTF-8?Q?=E2=80=98string->uri=E2=80=99?= is locale-dependent and breaks in =?UTF-8?Q?=E2=80=98sv=5FSE=E2=80=99?= Date: Mon, 20 May 2019 11:14:04 +0200 Message-ID: <87blzxwkrn.fsf_-_@gnu.org> References: <878sv4j1au.fsf@gmail.com> <87d0kgvuxj.fsf@gnu.org> <87tvdqgwyg.fsf@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([209.51.188.92]:54878) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hSeNs-0004VS-3y for bug-guix@gnu.org; Mon, 20 May 2019 05:15:05 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hSeNq-0000Ea-MA for bug-guix@gnu.org; Mon, 20 May 2019 05:15:03 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:50751) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hSeNq-0000EU-Ii for bug-guix@gnu.org; Mon, 20 May 2019 05:15:02 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1hSeNq-0002Ns-9p for bug-guix@gnu.org; Mon, 20 May 2019 05:15:02 -0400 Sender: "Debbugs-submit" Resent-Message-ID: In-Reply-To: <87tvdqgwyg.fsf@gmail.com> (Einar Largenius's message of "Sun, 19 May 2019 19:45:11 +0200") List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: "bug-Guix" To: Einar Largenius Cc: 35785@debbugs.gnu.org Hi! So the guts of the problem is that Guile=E2=80=99s =E2=80=98string->uri=E2= =80=99 procedure behaves incorrectly under that locale: --8<---------------cut here---------------start------------->8--- $ export GUIX_LOCPATH=3D$(guix build glibc-locales)/lib/locale $ LANGUAGE=3D LC_ALL=3Dsv_SE.utf8 ./pre-inst-env guile GNU Guile 2.2.4 Copyright (C) 1995-2017 Free Software Foundation, Inc. Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'. This program is free software, and you are welcome to redistribute it under certain conditions; type `,show c' for details. Enter `,help' for help. scheme@(guile-user)> ,use(web uri) scheme@(guile-user)> (string->uri "ftp://sourceware.org/pub/libffi/libffi-3= .2.1.tar.gz") $1 =3D #f --8<---------------cut here---------------end--------------->8--- More specifically, =E2=80=98parse-authority=E2=80=99 is failing under that = locale, because of the =E2=80=9Cw=E2=80=9D: --8<---------------cut here---------------start------------->8--- scheme@(guile-user)> ((@@ (web uri) parse-authority) "//sourceware.org" (co= nst 'fail)) $5 =3D fail scheme@(guile-user)> ((@@ (web uri) parse-authority) "//sourcevare.org" (co= nst 'fail)) $6 =3D #f $7 =3D "sourcevare.org" $8 =3D #f --8<---------------cut here---------------end--------------->8--- We can boil it down to this example: --8<---------------cut here---------------start------------->8--- scheme@(guile-user)> ,use(ice-9 regex) scheme@(guile-user)> (string-match "[a-z]" "a") $10 =3D #("a" (0 . 1)) scheme@(guile-user)> (string-match "[a-z]" "w") $11 =3D #f --8<---------------cut here---------------end--------------->8--- In short, under the sv_SE.utf8 locale of glibc 2.28, =E2=80=9Cw=E2=80=9D is= not considered part of the =E2=80=98a-z=E2=80=99 interval. Indeed, =E2=80=98localedata/locales/sv_SE=E2=80=99 in glibc reads this: % The letter w is normally not present in the Swedish alphabet. It % exists in some names in Swedish and foreign words, but is accounted % for as a variant of 'v'. Words and names with 'w' are in Swedish % ordered alphabetically among the words and names with 'v'. If two % words or names are only to be distinguished by 'v' or % 'w', 'v' is % placed before 'w'. Using the =E2=80=9Clower=E2=80=9D regexp class instead of =E2=80=9C[a-z]=E2= =80=9D works: --8<---------------cut here---------------start------------->8--- scheme@(guile-user)> (string-match "[[:lower:]]" "w") $12 =3D #("w" (0 . 1)) --8<---------------cut here---------------end--------------->8--- However, it=E2=80=99s not clear to me whether the =E2=80=9Clower=E2=80=9D c= lass is supposed to be the same for all locales or if we=E2=80=99re just lucky: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html Thoughts? The workaround until we=E2=80=99ve fixed it is to use another locale, thoug= h you can still set =E2=80=9CLC_MESSAGES=3Dsv_SE.utf8=E2=80=9D or =E2=80=9CLANGUA= GE=3Dsv=E2=80=9D. Ludo=E2=80=99.