unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
From: "Ludovic Courtès" <ludo@gnu.org>
To: Einar Largenius <einar.largenius@gmail.com>
Cc: 35785@debbugs.gnu.org
Subject: bug#35785: ‘string->uri’ is locale-dependent and breaks in ‘sv_SE’
Date: Mon, 20 May 2019 11:14:04 +0200	[thread overview]
Message-ID: <87blzxwkrn.fsf_-_@gnu.org> (raw)
In-Reply-To: <87tvdqgwyg.fsf@gmail.com> (Einar Largenius's message of "Sun, 19 May 2019 19:45:11 +0200")

Hi!

So the guts of the problem is that Guile’s ‘string->uri’ procedure
behaves incorrectly under that locale:

--8<---------------cut here---------------start------------->8---
$ export GUIX_LOCPATH=$(guix build glibc-locales)/lib/locale
$ LANGUAGE= LC_ALL=sv_SE.utf8 ./pre-inst-env guile
GNU Guile 2.2.4
Copyright (C) 1995-2017 Free Software Foundation, Inc.

Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
This program is free software, and you are welcome to redistribute it
under certain conditions; type `,show c' for details.

Enter `,help' for help.
scheme@(guile-user)> ,use(web uri)
scheme@(guile-user)> (string->uri "ftp://sourceware.org/pub/libffi/libffi-3.2.1.tar.gz")
$1 = #f
--8<---------------cut here---------------end--------------->8---

More specifically, ‘parse-authority’ is failing under that locale,
because of the “w”:

--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> ((@@ (web uri) parse-authority) "//sourceware.org" (const 'fail))
$5 = fail
scheme@(guile-user)> ((@@ (web uri) parse-authority) "//sourcevare.org" (const 'fail))
$6 = #f
$7 = "sourcevare.org"
$8 = #f
--8<---------------cut here---------------end--------------->8---

We can boil it down to this example:

--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> ,use(ice-9 regex)
scheme@(guile-user)> (string-match "[a-z]" "a")
$10 = #("a" (0 . 1))
scheme@(guile-user)> (string-match "[a-z]" "w")
$11 = #f
--8<---------------cut here---------------end--------------->8---

In short, under the sv_SE.utf8 locale of glibc 2.28, “w” is not
considered part of the ‘a-z’ interval.

Indeed, ‘localedata/locales/sv_SE’ in glibc reads this:

  % The letter w is normally not present in the Swedish alphabet. It
  % exists in some names in Swedish and foreign words, but is accounted
  % for as a variant of 'v'.  Words and names with 'w' are in Swedish
  % ordered alphabetically among the words and names with 'v'. If two
  % words or names are only to be distinguished by 'v' or % 'w', 'v' is
  % placed before 'w'.

Using the “lower” regexp class instead of “[a-z]” works:

--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> (string-match "[[:lower:]]" "w")
$12 = #("w" (0 . 1))
--8<---------------cut here---------------end--------------->8---

However, it’s not clear to me whether the “lower” class is supposed to
be the same for all locales or if we’re just lucky:

  http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html

Thoughts?

The workaround until we’ve fixed it is to use another locale, though you
can still set “LC_MESSAGES=sv_SE.utf8” or “LANGUAGE=sv”.

Ludo’.

  parent reply	other threads:[~2019-05-20  9:15 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-17 20:03 bug#35785: guix won't download if locale is set to swedish Einar Largenius
2019-05-18 11:55 ` Ludovic Courtès
2019-05-19 17:45   ` Einar Largenius
2019-05-20  8:20     ` Ludovic Courtès
2019-05-20  9:14     ` Ludovic Courtès [this message]
2019-05-27 11:05       ` bug#35785: ‘string->uri’ is locale-dependent and breaks in ‘sv_SE’ Ricardo Wurmus
2019-05-27 13:39         ` Timothy Sample
2019-05-28 11:17           ` Ludovic Courtès
2019-06-03  0:39             ` Timothy Sample
2019-06-03 13:01               ` Ludovic Courtès
2019-06-03 14:24                 ` Timothy Sample
2019-06-04  7:42                   ` Ludovic Courtès
2019-06-04 13:56                     ` Timothy Sample

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87blzxwkrn.fsf_-_@gnu.org \
    --to=ludo@gnu.org \
    --cc=35785@debbugs.gnu.org \
    --cc=einar.largenius@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).