From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ludovic =?UTF-8?Q?Court=C3=A8s?= Subject: bug#35785: =?UTF-8?Q?=E2=80=98string->uri=E2=80=99?= is locale-dependent and breaks in =?UTF-8?Q?=E2=80=98sv=5FSE=E2=80=99?= Date: Mon, 03 Jun 2019 15:01:45 +0200 Message-ID: <871s0ahlfq.fsf@gnu.org> References: <878sv4j1au.fsf@gmail.com> <87d0kgvuxj.fsf@gnu.org> <87tvdqgwyg.fsf@gmail.com> <87blzxwkrn.fsf_-_@gnu.org> <87ftp017k6.fsf@elephly.net> <875zpw6mq0.fsf@ngyro.com> <8736ky3k1w.fsf@gnu.org> <87imtnsdsb.fsf@ngyro.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([209.51.188.92]:42842) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hXmcE-000325-SL for bug-guix@gnu.org; Mon, 03 Jun 2019 09:03:12 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hXmcA-0000WJ-Ch for bug-guix@gnu.org; Mon, 03 Jun 2019 09:03:03 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:56515) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hXmcA-0000W3-9i for bug-guix@gnu.org; Mon, 03 Jun 2019 09:03:02 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1hXmcA-0001za-4B for bug-guix@gnu.org; Mon, 03 Jun 2019 09:03:02 -0400 Sender: "Debbugs-submit" Resent-Message-ID: In-Reply-To: <87imtnsdsb.fsf@ngyro.com> (Timothy Sample's message of "Sun, 02 Jun 2019 20:39:16 -0400") List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: "bug-Guix" To: Timothy Sample Cc: 35785@debbugs.gnu.org, Einar Largenius Hi Timothy, Timothy Sample skribis: > Here=E2=80=99s a patch for Guile that uses explicit lists of characters i= n the > =E2=80=98(web uri)=E2=80=99 module instead of character ranges. It inclu= des two tests > that are pretty verbose, but seem to do the trick. > > I have a bit more background on the problem, mostly coming from a Glibc > bug report: . > > It turns out that it is well-known upstream, and avoiding character > ranges is the recommended approach for know. Some other GNU tools have > adopted what is being called the =E2=80=9CRational Range Interpretation= =E2=80=9D > . > AIUI, this means they use the underlying encoding numbers for ranges (I > checked the source, but I=E2=80=99m only mostly sure I read it right). I= t looks > like the Glibc folks are unsure how to proceed on this (but are maybe > slightly leaning towards the =E2=80=9Crational=E2=80=9D approach). Great that you gleaned good references on this topic! > It=E2=80=99s all a pretty big mess, really. I was hoping there would be = some > obvious thing that would fix the problem more generally. Short of > pulling in the Gnulib regex code or writing something in Scheme, it > looks like Guile is stuck where it is now. Yeah. The alternative would be to not use regexps in this context, I guess. > I=E2=80=99m unsure if the changes are considered =E2=80=9Ctrivial=E2=80= =9D from a copyright > perspective. It=E2=80=99s pretty close, but I think programmers tend to > underestimate here. I=E2=80=99ve started the FSF copyright assignment pr= ocess > either way, since is likely not my last Guile patch. :) If the process is already underway, I think it=E2=80=99s fine to commit this patch (I would rather wait if it were longer and/or if we didn=E2=80=99t kn= ow each other already). > From 7b02be4c050c7b17a0e2685e8e453295f798c360 Mon Sep 17 00:00:00 2001 > From: Timothy Sample > Date: Sun, 2 Jun 2019 14:41:20 -0400 > Subject: [PATCH] Make URI handling locale independent. > > Fixes . > > * module/web/uri.scm (digits, hex-digits, letters): New variables. > (ipv4-regexp, ipv6-regexp, domain-label-regexp, top-label-regexp, > userinfo-pat, host-pat, ipv6-host-pat, port-pat, scheme-pat): Explicitly > list each character instead of using character ranges. > * test-suite/tests/web-uri.test: Add corresponding tests. [...] > + (pass-if "http://www.example.com (sv_SE)" > + (dynamic-wind > + (lambda () #t) > + (lambda () > + (with-locale "sv_SE.utf8" > + (reload-module (resolve-module '(web uri))) > + (uri=3D? (string->uri "http://www.example.com") > + #:scheme 'http #:host "www.example.com" #:path ""))) Aren=E2=80=99t =E2=80=98reload-module=E2=80=99 calls a leftover that can no= w be removed (also in the other test)? For the sv_SE test, what about taking a host name with a =E2=80=98w=E2=80= =99, since that=E2=80=99s the use case that allowed us to uncover this bug? Apart from that it LGTM, thank you! Ludo=E2=80=99.