all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Timothy Sample <samplet@ngyro.com>
To: "Ludovic Courtès" <ludo@gnu.org>
Cc: 35785@debbugs.gnu.org, Einar Largenius <einar.largenius@gmail.com>
Subject: bug#35785: ‘string->uri’ is locale-dependent and breaks in ‘sv_SE’
Date: Mon, 03 Jun 2019 10:24:40 -0400	[thread overview]
Message-ID: <87ef4asq53.fsf@ngyro.com> (raw)
In-Reply-To: <871s0ahlfq.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Mon, 03 Jun 2019 15:01:45 +0200")

Hi Ludo,

Ludovic Courtès <ludo@gnu.org> writes:

> Hi Timothy,
>
> Timothy Sample <samplet@ngyro.com> skribis:
>
>> Here’s a patch for Guile that uses explicit lists of characters in the
>> ‘(web uri)’ module instead of character ranges.  It includes two tests
>> that are pretty verbose, but seem to do the trick.
>>
>> I have a bit more background on the problem, mostly coming from a Glibc
>> bug report: <https://sourceware.org/bugzilla/show_bug.cgi?id=23393>.
>>
>> It turns out that it is well-known upstream, and avoiding character
>> ranges is the recommended approach for know.  Some other GNU tools have
>> adopted what is being called the “Rational Range Interpretation”
>> <https://www.gnu.org/software/gawk/manual/html_node/Ranges-and-Locales.html>.
>> AIUI, this means they use the underlying encoding numbers for ranges (I
>> checked the source, but I’m only mostly sure I read it right).  It looks
>> like the Glibc folks are unsure how to proceed on this (but are maybe
>> slightly leaning towards the “rational” approach).
>
> Great that you gleaned good references on this topic!
>
>> It’s all a pretty big mess, really.  I was hoping there would be some
>> obvious thing that would fix the problem more generally.  Short of
>> pulling in the Gnulib regex code or writing something in Scheme, it
>> looks like Guile is stuck where it is now.
>
> Yeah.  The alternative would be to not use regexps in this context, I
> guess.

I meant fixing regexes in other contexts, since I’m sure the URI module
is not the only Guile code ever that assumed “[a-z]” would only match
ASCII lowercase letters.

>> I’m unsure if the changes are considered “trivial” from a copyright
>> perspective.  It’s pretty close, but I think programmers tend to
>> underestimate here.  I’ve started the FSF copyright assignment process
>> either way, since is likely not my last Guile patch.  :)
>
> If the process is already underway, I think it’s fine to commit this
> patch (I would rather wait if it were longer and/or if we didn’t know
> each other already).

Sounds good!

>> From 7b02be4c050c7b17a0e2685e8e453295f798c360 Mon Sep 17 00:00:00 2001
>> From: Timothy Sample <samplet@ngyro.com>
>> Date: Sun, 2 Jun 2019 14:41:20 -0400
>> Subject: [PATCH] Make URI handling locale independent.
>>
>> Fixes <https://bugs.gnu.org/35785>.
>>
>> * module/web/uri.scm (digits, hex-digits, letters): New variables.
>> (ipv4-regexp, ipv6-regexp, domain-label-regexp, top-label-regexp,
>> userinfo-pat, host-pat, ipv6-host-pat, port-pat, scheme-pat): Explicitly
>> list each character instead of using character ranges.
>> * test-suite/tests/web-uri.test: Add corresponding tests.
>
> [...]
>
>> +  (pass-if "http://www.example.com (sv_SE)"
>> +    (dynamic-wind
>> +      (lambda () #t)
>> +      (lambda ()
>> +        (with-locale "sv_SE.utf8"
>> +          (reload-module (resolve-module '(web uri)))
>> +          (uri=? (string->uri "http://www.example.com")
>> +                 #:scheme 'http #:host "www.example.com" #:path "")))
>
> Aren’t ‘reload-module’ calls a leftover that can now be removed (also in
> the other test)?

I needed to reload the modules like that to make the tests fail without
the patch and pass with it.  My understanding is that the bug happens
at regex compile time, which happens when the module is loaded.  If I
don’t reload the module, the old URI code passes the tests, since the
regexes were compiled with a locale that does not trigger the bug.  It’s
a little wacky, sure, but it was the best idea I could come up with.

> For the sv_SE test, what about taking a host name with a ‘w’, since
> that’s the use case that allowed us to uncover this bug?

I thought I was being clever by using a “www” hostname, but apparently
it’s so normalized as to be invisible!  Feel free to change it to
something more obvious like “w.com” or whatever.


-- Tim

  reply	other threads:[~2019-06-03 14:25 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-17 20:03 bug#35785: guix won't download if locale is set to swedish Einar Largenius
2019-05-18 11:55 ` Ludovic Courtès
2019-05-19 17:45   ` Einar Largenius
2019-05-20  8:20     ` Ludovic Courtès
2019-05-20  9:14     ` bug#35785: ‘string->uri’ is locale-dependent and breaks in ‘sv_SE’ Ludovic Courtès
2019-05-27 11:05       ` Ricardo Wurmus
2019-05-27 13:39         ` Timothy Sample
2019-05-28 11:17           ` Ludovic Courtès
2019-06-03  0:39             ` Timothy Sample
2019-06-03 13:01               ` Ludovic Courtès
2019-06-03 14:24                 ` Timothy Sample [this message]
2019-06-04  7:42                   ` Ludovic Courtès
2019-06-04 13:56                     ` Timothy Sample

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ef4asq53.fsf@ngyro.com \
    --to=samplet@ngyro.com \
    --cc=35785@debbugs.gnu.org \
    --cc=einar.largenius@gmail.com \
    --cc=ludo@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.