From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ricardo Wurmus Subject: bug#35785: =?UTF-8?Q?=E2=80=98string->uri=E2=80=99?= is locale-dependent and breaks in =?UTF-8?Q?=E2=80=98sv=5FSE=E2=80=99?= Date: Mon, 27 May 2019 13:05:29 +0200 Message-ID: <87ftp017k6.fsf@elephly.net> References: <878sv4j1au.fsf@gmail.com> <87d0kgvuxj.fsf@gnu.org> <87tvdqgwyg.fsf@gmail.com> <87blzxwkrn.fsf_-_@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([209.51.188.92]:60759) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hVDT5-0005A0-EU for bug-guix@gnu.org; Mon, 27 May 2019 07:07:04 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hVDT4-0007sd-Bv for bug-guix@gnu.org; Mon, 27 May 2019 07:07:03 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:39250) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hVDT4-0007s6-2W for bug-guix@gnu.org; Mon, 27 May 2019 07:07:02 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1hVDT3-0003wN-QZ for bug-guix@gnu.org; Mon, 27 May 2019 07:07:01 -0400 Sender: "Debbugs-submit" Resent-Message-ID: In-reply-to: <87blzxwkrn.fsf_-_@gnu.org> List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: "bug-Guix" To: Ludovic =?UTF-8?Q?Court=C3=A8s?= Cc: 35785@debbugs.gnu.org, Einar Largenius Ludovic Court=C3=A8s writes: > Using the =E2=80=9Clower=E2=80=9D regexp class instead of =E2=80=9C[a-z]= =E2=80=9D works: > > --8<---------------cut here---------------start------------->8--- > scheme@(guile-user)> (string-match "[[:lower:]]" "w") > $12 =3D #("w" (0 . 1)) > --8<---------------cut here---------------end--------------->8--- > > However, it=E2=80=99s not clear to me whether the =E2=80=9Clower=E2=80=9D= class is supposed to > be the same for all locales or if we=E2=80=99re just lucky: > > http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html > > Thoughts? The lower class is much larger than [a-z]. If we only wanted to work around this particular problem we could explicitly spell out the range, which would be the same in all locales. (Obviously, that wouldn=E2=80=99t = be pretty.) But can=E2=80=99t URI parts contain more than those characters? To circumv= ent the question whether the lower class is locale dependent we could generate an explicit range from a charset. -- Ricardo