From: "Ludovic Courtès" <ludo@gnu.org>
To: Timothy Sample <samplet@ngyro.com>
Cc: 35785@debbugs.gnu.org, Einar Largenius <einar.largenius@gmail.com>
Subject: bug#35785: ‘string->uri’ is locale-dependent and breaks in ‘sv_SE’
Date: Mon, 03 Jun 2019 15:01:45 +0200 [thread overview]
Message-ID: <871s0ahlfq.fsf@gnu.org> (raw)
In-Reply-To: <87imtnsdsb.fsf@ngyro.com> (Timothy Sample's message of "Sun, 02 Jun 2019 20:39:16 -0400")
Hi Timothy,
Timothy Sample <samplet@ngyro.com> skribis:
> Here’s a patch for Guile that uses explicit lists of characters in the
> ‘(web uri)’ module instead of character ranges. It includes two tests
> that are pretty verbose, but seem to do the trick.
>
> I have a bit more background on the problem, mostly coming from a Glibc
> bug report: <https://sourceware.org/bugzilla/show_bug.cgi?id=23393>.
>
> It turns out that it is well-known upstream, and avoiding character
> ranges is the recommended approach for know. Some other GNU tools have
> adopted what is being called the “Rational Range Interpretation”
> <https://www.gnu.org/software/gawk/manual/html_node/Ranges-and-Locales.html>.
> AIUI, this means they use the underlying encoding numbers for ranges (I
> checked the source, but I’m only mostly sure I read it right). It looks
> like the Glibc folks are unsure how to proceed on this (but are maybe
> slightly leaning towards the “rational” approach).
Great that you gleaned good references on this topic!
> It’s all a pretty big mess, really. I was hoping there would be some
> obvious thing that would fix the problem more generally. Short of
> pulling in the Gnulib regex code or writing something in Scheme, it
> looks like Guile is stuck where it is now.
Yeah. The alternative would be to not use regexps in this context, I
guess.
> I’m unsure if the changes are considered “trivial” from a copyright
> perspective. It’s pretty close, but I think programmers tend to
> underestimate here. I’ve started the FSF copyright assignment process
> either way, since is likely not my last Guile patch. :)
If the process is already underway, I think it’s fine to commit this
patch (I would rather wait if it were longer and/or if we didn’t know
each other already).
> From 7b02be4c050c7b17a0e2685e8e453295f798c360 Mon Sep 17 00:00:00 2001
> From: Timothy Sample <samplet@ngyro.com>
> Date: Sun, 2 Jun 2019 14:41:20 -0400
> Subject: [PATCH] Make URI handling locale independent.
>
> Fixes <https://bugs.gnu.org/35785>.
>
> * module/web/uri.scm (digits, hex-digits, letters): New variables.
> (ipv4-regexp, ipv6-regexp, domain-label-regexp, top-label-regexp,
> userinfo-pat, host-pat, ipv6-host-pat, port-pat, scheme-pat): Explicitly
> list each character instead of using character ranges.
> * test-suite/tests/web-uri.test: Add corresponding tests.
[...]
> + (pass-if "http://www.example.com (sv_SE)"
> + (dynamic-wind
> + (lambda () #t)
> + (lambda ()
> + (with-locale "sv_SE.utf8"
> + (reload-module (resolve-module '(web uri)))
> + (uri=? (string->uri "http://www.example.com")
> + #:scheme 'http #:host "www.example.com" #:path "")))
Aren’t ‘reload-module’ calls a leftover that can now be removed (also in
the other test)?
For the sv_SE test, what about taking a host name with a ‘w’, since
that’s the use case that allowed us to uncover this bug?
Apart from that it LGTM, thank you!
Ludo’.
next prev parent reply other threads:[~2019-06-03 13:03 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-17 20:03 bug#35785: guix won't download if locale is set to swedish Einar Largenius
2019-05-18 11:55 ` Ludovic Courtès
2019-05-19 17:45 ` Einar Largenius
2019-05-20 8:20 ` Ludovic Courtès
2019-05-20 9:14 ` bug#35785: ‘string->uri’ is locale-dependent and breaks in ‘sv_SE’ Ludovic Courtès
2019-05-27 11:05 ` Ricardo Wurmus
2019-05-27 13:39 ` Timothy Sample
2019-05-28 11:17 ` Ludovic Courtès
2019-06-03 0:39 ` Timothy Sample
2019-06-03 13:01 ` Ludovic Courtès [this message]
2019-06-03 14:24 ` Timothy Sample
2019-06-04 7:42 ` Ludovic Courtès
2019-06-04 13:56 ` Timothy Sample
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://guix.gnu.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=871s0ahlfq.fsf@gnu.org \
--to=ludo@gnu.org \
--cc=35785@debbugs.gnu.org \
--cc=einar.largenius@gmail.com \
--cc=samplet@ngyro.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).