From: "Ludovic Courtès" <ludo@gnu.org>
To: Einar Largenius <einar.largenius@gmail.com>
Cc: 35785@debbugs.gnu.org
Subject: bug#35785: ‘string->uri’ is locale-dependent and breaks in ‘sv_SE’
Date: Mon, 20 May 2019 11:14:04 +0200 [thread overview]
Message-ID: <87blzxwkrn.fsf_-_@gnu.org> (raw)
In-Reply-To: <87tvdqgwyg.fsf@gmail.com> (Einar Largenius's message of "Sun, 19 May 2019 19:45:11 +0200")
Hi!
So the guts of the problem is that Guile’s ‘string->uri’ procedure
behaves incorrectly under that locale:
--8<---------------cut here---------------start------------->8---
$ export GUIX_LOCPATH=$(guix build glibc-locales)/lib/locale
$ LANGUAGE= LC_ALL=sv_SE.utf8 ./pre-inst-env guile
GNU Guile 2.2.4
Copyright (C) 1995-2017 Free Software Foundation, Inc.
Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
This program is free software, and you are welcome to redistribute it
under certain conditions; type `,show c' for details.
Enter `,help' for help.
scheme@(guile-user)> ,use(web uri)
scheme@(guile-user)> (string->uri "ftp://sourceware.org/pub/libffi/libffi-3.2.1.tar.gz")
$1 = #f
--8<---------------cut here---------------end--------------->8---
More specifically, ‘parse-authority’ is failing under that locale,
because of the “w”:
--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> ((@@ (web uri) parse-authority) "//sourceware.org" (const 'fail))
$5 = fail
scheme@(guile-user)> ((@@ (web uri) parse-authority) "//sourcevare.org" (const 'fail))
$6 = #f
$7 = "sourcevare.org"
$8 = #f
--8<---------------cut here---------------end--------------->8---
We can boil it down to this example:
--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> ,use(ice-9 regex)
scheme@(guile-user)> (string-match "[a-z]" "a")
$10 = #("a" (0 . 1))
scheme@(guile-user)> (string-match "[a-z]" "w")
$11 = #f
--8<---------------cut here---------------end--------------->8---
In short, under the sv_SE.utf8 locale of glibc 2.28, “w” is not
considered part of the ‘a-z’ interval.
Indeed, ‘localedata/locales/sv_SE’ in glibc reads this:
% The letter w is normally not present in the Swedish alphabet. It
% exists in some names in Swedish and foreign words, but is accounted
% for as a variant of 'v'. Words and names with 'w' are in Swedish
% ordered alphabetically among the words and names with 'v'. If two
% words or names are only to be distinguished by 'v' or % 'w', 'v' is
% placed before 'w'.
Using the “lower” regexp class instead of “[a-z]” works:
--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> (string-match "[[:lower:]]" "w")
$12 = #("w" (0 . 1))
--8<---------------cut here---------------end--------------->8---
However, it’s not clear to me whether the “lower” class is supposed to
be the same for all locales or if we’re just lucky:
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html
Thoughts?
The workaround until we’ve fixed it is to use another locale, though you
can still set “LC_MESSAGES=sv_SE.utf8” or “LANGUAGE=sv”.
Ludo’.
next prev parent reply other threads:[~2019-05-20 9:15 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-17 20:03 bug#35785: guix won't download if locale is set to swedish Einar Largenius
2019-05-18 11:55 ` Ludovic Courtès
2019-05-19 17:45 ` Einar Largenius
2019-05-20 8:20 ` Ludovic Courtès
2019-05-20 9:14 ` Ludovic Courtès [this message]
2019-05-27 11:05 ` bug#35785: ‘string->uri’ is locale-dependent and breaks in ‘sv_SE’ Ricardo Wurmus
2019-05-27 13:39 ` Timothy Sample
2019-05-28 11:17 ` Ludovic Courtès
2019-06-03 0:39 ` Timothy Sample
2019-06-03 13:01 ` Ludovic Courtès
2019-06-03 14:24 ` Timothy Sample
2019-06-04 7:42 ` Ludovic Courtès
2019-06-04 13:56 ` Timothy Sample
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://guix.gnu.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87blzxwkrn.fsf_-_@gnu.org \
--to=ludo@gnu.org \
--cc=35785@debbugs.gnu.org \
--cc=einar.largenius@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).