unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#54893: guix-daemon, locale, LANG, and unicode in git tag names
@ 2022-04-12 19:47 Attila Lendvai
  2022-04-12 20:40 ` Maxime Devos
  2022-04-19 18:09 ` bug#54893: [PATCH] guix: git-download: Set locale to deal with Unicode in git metadata Attila Lendvai
  0 siblings, 2 replies; 15+ messages in thread
From: Attila Lendvai @ 2022-04-12 19:47 UTC (permalink / raw)
  To: 54893

[-- Attachment #1: Type: text/plain, Size: 1976 bytes --]

i'm trying to build a golang package that i have just imported. its repo has a tag with unicode in it, namely v½.2.0, as observable at https://github.com/klauspost/pgzip/tags

(define-public the-pkg
  (package
    (name "go-github-com-klauspost-pgzip")
    (version "1.0.2-0.20170402124221-0bf5dcad4ada")
    (source
      (origin
        (method git-fetch)
        (uri (git-reference
               (url "https://github.com/klauspost/pgzip")
               (commit "0bf5dcad4ada")))
        (file-name (git-file-name name version))
        (sha256
          (base32 "0dgp2iljvhibzxia1g3lsfg4bjmfh4kf0bfrmfi7sd49hwhrvk7s"))))
    (build-system go-build-system)
    (arguments '(#:skip-build? #t #:import-path "github.com/klauspost/pgzip"))
    (home-page "https://github.com/klauspost/pgzip")
    (synopsis "pgzip")
    (description
      "Package pgzip implements reading and writing of gzip format compressed files, as
specified in @url{https://rfc-editor.org/rfc/rfc1952.html,RFC 1952}.")
    (license license:expat)))

i have attached the build log, but the essence is this:

guile: warning: failed to install locale

and i can't get rid of this^ warning. i installed glibc-locales to root and my user, reconfigured, restarted the guix-daemon.

which is probably the cause of the ultimate error:

warning: failed to delete .git/refs/tags/v??.2.0: No such file or directory
r:sha256 hash mismatch for...

the daemon starts from an empty env:

https://git.savannah.gnu.org/cgit/guix.git/tree/nix/libstore/build.cc#n1590

and then copies the env from the derivation, but it doesn't seem to contain any LANG value. i assume guile is also launched then without a LANG env. BTW, guile could be more informative in its warning, too.

--
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
“The unexamined life is not worth living for a human being.”
	— Socrates (c. 470–399 BC, tried and executed), 'Apology' (399 BC)

[-- Attachment #2: build-log --]
[-- Type: application/octet-stream, Size: 3204 bytes --]

$ ./pre-inst-env guix build go-github-com-klauspost-pgzip
The following derivations will be built:
  /gnu/store/sgcqcsxvwm1r7l8m7ihjjcz85pfbm6dk-go-github-com-klauspost-pgzip-1.0.2-0.20170402124221-0bf5dcad4ada.drv
  /gnu/store/56cs56nbiw4dvln7lxbsqlhqrpwy884q-go-std-1.17.8.drv
  /gnu/store/8dlj9p8m1m3brp63kvqirz4dc444zsfq-go-github-com-klauspost-pgzip-1.0.2-0.20170402124221-0bf5dcad4ada-checkout.drv
building /gnu/store/8dlj9p8m1m3brp63kvqirz4dc444zsfq-go-github-com-klauspost-pgzip-1.0.2-0.20170402124221-0bf5dcad4ada-checkout.drv...
guile: warning: failed to install locale
environment variable `PATH' set to `/gnu/store/0c1yfbxyv877mlgychfgvmk5ha2jqh52-gzip-1.10/bin:/gnu/store/8z1q7vjxylm6l4ibsywd4h6m6jv6zqm4-tar-1.34/bin'
Initialized empty Git repository in /gnu/store/6wsdfh6mmxzdr3rlclmnl947qqdx7bfl-go-github-com-klauspost-pgzip-1.0.2-0.20170402124221-0bf5dcad4ada-checkout/.git/
fatal: couldn't find remote ref 0bf5dcad4ada
Failed to do a shallow fetch; retrying a full fetch...
From https://github.com/klauspost/pgzip
 * [new branch]      master                 -> origin/master
 * [new branch]      rewrite-error-handling -> origin/rewrite-error-handling
 * [new branch]      support-writerto       -> origin/support-writerto
 * [new tag]         v1.0                   -> v1.0
 * [new tag]         v1.0.1                 -> v1.0.1
 * [new tag]         v1.1                   -> v1.1
 * [new tag]         v1.2.0                 -> v1.2.0
 * [new tag]         v1.2.1                 -> v1.2.1
 * [new tag]         v1.2.2                 -> v1.2.2
 * [new tag]         v1.2.3                 -> v1.2.3
 * [new tag]         v1.2.4                 -> v1.2.4
 * [new tag]         v1.2.5                 -> v1.2.5
 * [new tag]         v½.2.0                -> v½.2.0
Note: switching to '0bf5dcad4ada'.
HEAD is now at 0bf5dca Merge pull request #14 from klauspost/fix-truncated-gzip-input
warning: failed to delete .git/refs/tags/v??.2.0: No such file or directory
warning: failed to delete .git/refs/tags: Directory not empty
warning: failed to delete .git/refs: Directory not empty
warning: failed to delete .git: Directory not empty
r:sha256 hash mismatch for /gnu/store/6wsdfh6mmxzdr3rlclmnl947qqdx7bfl-go-github-com-klauspost-pgzip-1.0.2-0.20170402124221-0bf5dcad4ada-checkout:
  expected hash: 0dgp2iljvhibzxia1g3lsfg4bjmfh4kf0bfrmfi7sd49hwhrvk7s
  actual hash:   1vxn09q5mxkdi1vshpb2d1b8rdqqbvjcsd3bgdg77sxgas7dlmpg
hash mismatch for store item '/gnu/store/6wsdfh6mmxzdr3rlclmnl947qqdx7bfl-go-github-com-klauspost-pgzip-1.0.2-0.20170402124221-0bf5dcad4ada-checkout'
build of /gnu/store/8dlj9p8m1m3brp63kvqirz4dc444zsfq-go-github-com-klauspost-pgzip-1.0.2-0.20170402124221-0bf5dcad4ada-checkout.drv failed
View build log at '/var/log/guix/drvs/8d/lj9p8m1m3brp63kvqirz4dc444zsfq-go-github-com-klauspost-pgzip-1.0.2-0.20170402124221-0bf5dcad4ada-checkout.drv.gz'.
cannot build derivation `/gnu/store/sgcqcsxvwm1r7l8m7ihjjcz85pfbm6dk-go-github-com-klauspost-pgzip-1.0.2-0.20170402124221-0bf5dcad4ada.drv': 1 dependencies couldn't be built
guix build: error: build of `/gnu/store/sgcqcsxvwm1r7l8m7ihjjcz85pfbm6dk-go-github-com-klauspost-pgzip-1.0.2-0.20170402124221-0bf5dcad4ada.drv' failed

^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#54893: guix-daemon, locale, LANG, and unicode in git tag names
  2022-04-12 19:47 bug#54893: guix-daemon, locale, LANG, and unicode in git tag names Attila Lendvai
@ 2022-04-12 20:40 ` Maxime Devos
  2022-04-13  7:51   ` Attila Lendvai
  2022-04-19 18:09 ` bug#54893: [PATCH] guix: git-download: Set locale to deal with Unicode in git metadata Attila Lendvai
  1 sibling, 1 reply; 15+ messages in thread
From: Maxime Devos @ 2022-04-12 20:40 UTC (permalink / raw)
  To: Attila Lendvai, 54893

[-- Attachment #1: Type: text/plain, Size: 1799 bytes --]

Attila Lendvai schreef op di 12-04-2022 om 19:47 [+0000]:
> and i can't get rid of this^ warning. i installed glibc-locales to
> root and my user, reconfigured, restarted the guix-daemon.
> 
> which is probably the cause of the ultimate error:
> 
> warning: failed to delete .git/refs/tags/v??.2.0: No such file or
> directory
> r:sha256 hash mismatch for...
> 
> the daemon starts from an empty env:
> 
> https://git.savannah.gnu.org/cgit/guix.git/tree/nix/libstore/build.cc#n1590
> 
> and then copies the env from the derivation, but it doesn't seem to
> contain any LANG value. i assume guile is also launched then without
> a LANG env. BTW, guile could be more informative in its warning, too.

Some remarks:

  * LANG should be set, because it is in #:leaked-env-vars (see
    guix/git-download.scm).  I don't know whose LANG it is though
    -- the user's, or the daemon's?

  * To install an UTF-8 locale, you need glibc-locales (or possibly
    glibc-utf8-locales)  (At least, for now.  Upstream has some plans
    for including a C.UTF-8 locale so maybe eventually we can fallback
    to C.UTF-8.)

  * This locale data needs to be in $GUIX_LOCPATH.

  * GUIX_LOCPATH is not leaked.

  * Even if it was, I don't think that /gnu/store/...glibc-locales
    would be accessible from the build container (though you could give
    it a try?).

  * So perhaps GUIX_LOCPATH needs to be set in the gexp in
    guix/git-download.scm, + some setlocale as done by
    gnu-build-system.

  * Long-term, it could be interesting to remove the
    ‘file name = string encoded in current locale's encoding’
    assumption from Guile.

  * svn-download, hg-download, bzr-download and cvs-download
    probably have the same issue.

Greetings,
Maxime.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 260 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#54893: guix-daemon, locale, LANG, and unicode in git tag names
  2022-04-12 20:40 ` Maxime Devos
@ 2022-04-13  7:51   ` Attila Lendvai
  2022-04-13  8:03     ` Maxime Devos
                       ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Attila Lendvai @ 2022-04-13  7:51 UTC (permalink / raw)
  To: Maxime Devos; +Cc: 54893

> * LANG should be set, because it is in #:leaked-env-vars (see
> guix/git-download.scm). I don't know whose LANG it is though
> -- the user's, or the daemon's?


if i add this to the gexp:

(simple-format (current-error-port)
               "LANG is '~A'~%"
               (getenv "LANG"))
(setenv "LANG" "en_US.utf8")
(setenv "GUIX_LOCPATH" "/run/current-system/locale")
(setlocale LC_ALL (getenv "LANG"))

i see:

LANG is ''
Backtrace:
           2 (primitive-load "/gnu/store/z4bis94jg0s0y0xj1xbmliv7xs8?")
In ice-9/eval.scm:
    619:8  1 (_ #f)
In unknown file:
           0 (setlocale 6 "en_US.utf8")

ERROR: In procedure setlocale:
In procedure setlocale: Invalid argument


> * GUIX_LOCPATH is not leaked.


it's the same if i add GUIX_LOCPATH to the #:leaked-env-vars and don't setenv it explicitly.


> * Even if it was, I don't think that /gnu/store/...glibc-locales
> would be accessible from the build container (though you could give
> it a try?).


i didn't check this specifically, but i'm afraid you are right, and this is why my kludge doesn't work.


> * So perhaps GUIX_LOCPATH needs to be set in the gexp in
> guix/git-download.scm, + some setlocale as done by
> gnu-build-system.


i don't understand why the setlocale call in gnu-build-system's install-locale works, but my setlocale kludge in git-download doesn't.

i even tried to add glibc-locale as native-inputs to the package in question, but it didn't help.


> * Long-term, it could be interesting to remove the
> ‘file name = string encoded in current locale's encoding’
> assumption from Guile.


i'm not sure why the wrong locale breaks file-system walking and deleting, though.

i assume if every function in guile uses/assumes the same locale (character encoding), then both directions through the guile FFI should be idempotent, no? and i think both ASCII and UTF-8 are idempotent wrt C bytes <-> scheme string conversions. IOW, it's only the displaying of the chars that should be broken, not file operations.

or am i wrong to assume this?

or maybe the character encoding algo used in guile's FFI silently emits actual question marks in place of bytes that are outside the valid range of the encoding used? if so, that's not a very defensive way of coding, and it's eating up hours of my life...

hrm... this is not relevant here, only a related thought: things can go wrong in the GEXP serialization, too: if the writing side and the reading side doesn't use the same character encoding. locale should be set explicitly at the relevant entry points.

i'd appreciate if someone could help me come up with at least a kludge, so that i could make progress until it's fixed properly.

thanks for your insights Maxime,

--
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
If you never heal from what hurt you, you'll bleed on people who didn't cut you.





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#54893: guix-daemon, locale, LANG, and unicode in git tag names
  2022-04-13  7:51   ` Attila Lendvai
@ 2022-04-13  8:03     ` Maxime Devos
  2022-04-13  8:45       ` Attila Lendvai
  2022-04-13  8:22     ` Maxime Devos
  2022-04-13  8:29     ` Maxime Devos
  2 siblings, 1 reply; 15+ messages in thread
From: Maxime Devos @ 2022-04-13  8:03 UTC (permalink / raw)
  To: Attila Lendvai; +Cc: 54893

[-- Attachment #1: Type: text/plain, Size: 963 bytes --]

Attila Lendvai schreef op wo 13-04-2022 om 07:51 [+0000]:
> i don't understand why the setlocale call in gnu-build-system's
> install-locale works, but my setlocale kludge in git-download doesn't.

I don't expect /run/current-system/locale to exist inside the build
container.  Maybe try

   (setenv "GUIX_LOCPATH" #+(file-append glibc-locales "/lib/locale"))
   ;; for testing
   ((@ (guix build utils) invoke)
    #+(file-append coreutils "/bin/ls") (getenv "GUIX_LOCPATH"))

instead?

gnu-build-system has a (variant of) glibc-locales in its (implicit)
inputs, so there GUIX_LOCPATH can be set to the /gnu/store/.../locales
file name, in the 'set-paths' procedure.

> i even tried to add glibc-locale as native-inputs to the package in question, but it didn't help.

Building the package and downloading the source code are separate steps
(derivations), they don't automatically have access to each other's
inputs.

Greetings,
Maxime

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 260 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#54893: guix-daemon, locale, LANG, and unicode in git tag names
  2022-04-13  7:51   ` Attila Lendvai
  2022-04-13  8:03     ` Maxime Devos
@ 2022-04-13  8:22     ` Maxime Devos
  2022-04-13 10:40       ` Liliana Marie Prikler
  2022-04-13  8:29     ` Maxime Devos
  2 siblings, 1 reply; 15+ messages in thread
From: Maxime Devos @ 2022-04-13  8:22 UTC (permalink / raw)
  To: Attila Lendvai; +Cc: 54893

[-- Attachment #1: Type: text/plain, Size: 2569 bytes --]

Attila Lendvai schreef op wo 13-04-2022 om 07:51 [+0000]:
> i'm not sure why the wrong locale breaks file-system walking and deleting, though.
> 
> i assume if every function in guile uses/assumes the same locale (character
> encoding), then both directions through the guile FFI should be idempotent, no?
> and i think both ASCII and UTF-8 are idempotent wrt C bytes <-> scheme string
> conversions.

The problem is that the default character encoding is ANSI_X3.4-1968
(US-ASCII) and any bytes above 127 makes things non-ASCII.

Also, the string procedures internally always use UTF-8 (or possibly
ISO-85519-1 as an optimisation?), they are not raw bytes instead they
can be consideres a vector of characters (string-ref returns
characters, not bytes, and doesn't use byte positions).

>  IOW, it's only the displaying of the chars that should be broken,
>  not file operations.

LANG=bogus guile
(guile-user)> (setlocale LC_ALL)
(guile-user)> (use-modules (ice-9 i18n))
(guile-user)> (locale-encoding)
(guile-user)> (locale-encoding)
$2 = "ANSI_X3.4-1968"

Apparently the fallback encoding is ‘ANSI_X3.4-1968’.  Let's take a
look at this encoding.  According to IANA
(https://www.iana.org/assignments/character-sets/character-sets.xhtml),
this character encoding can also be named ‘US-ASCII’ and is specified
in RFC2046.  Some excerpts:

   "US-ASCII" does not indicate an arbitrary 7-bit
   character set[sic], but specifies that all octets in the body must
   be interpreted as characters according to the US-ASCII character
   set.

so it looks like, say, é cannot be encoded as US-ASCII, it does not
belong to the character set of the encoding.  More generally, anything
beyond the 127 (Unicode) codepoint cannot be encoded in ANSI_X3.4-1968.

Let's test this (in a new REPL with an UTF-8 locale):

((@ (ice-9 iconv) string->bytevector) "é" "ANSI_X3.4-1968")
ice-9/boot-9.scm:1669:16: In procedure raise-exception:
Throw to key `encoding-error' with args `("put-char" "conversion to port encoding failed" 84 #<output: string 7fd5bbc23ee0> #\é)'.

((@ (ice-9 iconv) string->bytevector) "é" "ANSI_X3.4-1968" 'substitute)
$2 = #vu8(63)
((@ (rnrs bytevectors) utf8->string) #vu8(63))
$3 = "?"

and the other direction:

((@ (ice-9 iconv) bytevector->string) #vu8(128) "ANSI_X3.4-1968" 'substitute)
$5 = "�" ;; why #\� and not #\?? I don't know, I guess Guile is inconsistent

(FWIW, I would throw an decoding-error here instead of silently corrupting the
file names.)

Greetings,
Maxime.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 260 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#54893: guix-daemon, locale, LANG, and unicode in git tag names
  2022-04-13  7:51   ` Attila Lendvai
  2022-04-13  8:03     ` Maxime Devos
  2022-04-13  8:22     ` Maxime Devos
@ 2022-04-13  8:29     ` Maxime Devos
  2 siblings, 0 replies; 15+ messages in thread
From: Maxime Devos @ 2022-04-13  8:29 UTC (permalink / raw)
  To: Attila Lendvai; +Cc: 54893

[-- Attachment #1: Type: text/plain, Size: 676 bytes --]

Attila Lendvai schreef op wo 13-04-2022 om 07:51 [+0000]:
> hrm... this is not relevant here, only a related thought: things
> can go wrong in the GEXP serialization, too: if the writing side
> and the reading side doesn't use the same character encoding.
> locale should be set explicitly at the relevant entry points.

Serialisation is always done in UTF-8, search for UTF-8 in (guix gexp).
I don't know if deserialisation of the script in done in UTF-8, though
it should be done that way.

Also, using the same character encoding is not sufficient, the
character encoding must also encode all characters (in practice, that
means UTF-8).

Greetings,
Maxime.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 260 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#54893: guix-daemon, locale, LANG, and unicode in git tag names
  2022-04-13  8:03     ` Maxime Devos
@ 2022-04-13  8:45       ` Attila Lendvai
  2022-04-19 11:38         ` Attila Lendvai
  0 siblings, 1 reply; 15+ messages in thread
From: Attila Lendvai @ 2022-04-13  8:45 UTC (permalink / raw)
  To: Maxime Devos; +Cc: 54893

> I don't expect /run/current-system/locale to exist inside the build
> container. Maybe try
>
> (setenv "GUIX_LOCPATH" #+(file-append glibc-locales "/lib/locale"))
> ;; for testing
> ((@ (guix build utils) invoke)
> #+(file-append coreutils "/bin/ls") (getenv "GUIX_LOCPATH"))
>
> instead?


thank you, this works indeed as a band aid:

(setenv "GUIX_LOCPATH" #+(file-append glibc-locales "/lib/locale"))
(setlocale LC_ALL "en_US.utf8")

--
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
“If a nation expects to be ignorant and free, in a state of civilization, it expects what never was and never will be.”
	— Thomas Jefferson (1743–1826)





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#54893: guix-daemon, locale, LANG, and unicode in git tag names
  2022-04-13  8:22     ` Maxime Devos
@ 2022-04-13 10:40       ` Liliana Marie Prikler
  2022-04-13 10:57         ` Maxime Devos
  0 siblings, 1 reply; 15+ messages in thread
From: Liliana Marie Prikler @ 2022-04-13 10:40 UTC (permalink / raw)
  To: Maxime Devos, Attila Lendvai; +Cc: 54893

Am Mittwoch, dem 13.04.2022 um 10:22 +0200 schrieb Maxime Devos:
> [...]
> Let's test this (in a new REPL with an UTF-8 locale):
> 
> ((@ (ice-9 iconv) string->bytevector) "é" "ANSI_X3.4-1968")
> ice-9/boot-9.scm:1669:16: In procedure raise-exception:
> Throw to key `encoding-error' with args `("put-char" "conversion to
> port encoding failed" 84 #<output: string 7fd5bbc23ee0> #\é)'.
> 
> ((@ (ice-9 iconv) string->bytevector) "é" "ANSI_X3.4-1968" 'substitute)
> $2 = #vu8(63)
> ((@ (rnrs bytevectors) utf8->string) #vu8(63))
> $3 = "?"
> 
> and the other direction:
> 
> ((@ (ice-9 iconv) bytevector->string) #vu8(128) "ANSI_X3.4-1968"
> 'substitute)
> $5 = "�" ;; why #\� and not #\?? I don't know, I guess Guile is
> inconsistent
You are first encoding a non-ASCII byte to ASCII, which has no glyph
for "I have no idea what this is", so a question mark (#\?) is used. 
When converting from invalid ASCII to UTF-8 on the other hand, you do
have #\� as the WTF character, so that is used instead.  This is
entirely consistent :)

Cheers




^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#54893: guix-daemon, locale, LANG, and unicode in git tag names
  2022-04-13 10:40       ` Liliana Marie Prikler
@ 2022-04-13 10:57         ` Maxime Devos
  0 siblings, 0 replies; 15+ messages in thread
From: Maxime Devos @ 2022-04-13 10:57 UTC (permalink / raw)
  To: Liliana Marie Prikler, Attila Lendvai; +Cc: 54893

[-- Attachment #1: Type: text/plain, Size: 404 bytes --]

Liliana Marie Prikler schreef op wo 13-04-2022 om 12:40 [+0200]:
> You are first encoding a non-ASCII byte to ASCII, which has no glyph
> for "I have no idea what this is", so a question mark (#\?) is used. 
> When converting from invalid ASCII to UTF-8 on the other hand, you do
> have #\� as the WTF character, so that is used instead.  This is
> entirely consistent :)

Right, makes sense.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 260 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#54893: guix-daemon, locale, LANG, and unicode in git tag names
  2022-04-13  8:45       ` Attila Lendvai
@ 2022-04-19 11:38         ` Attila Lendvai
  2022-04-19 15:45           ` Maxime Devos
  2022-04-19 16:07           ` Maxime Devos
  0 siblings, 2 replies; 15+ messages in thread
From: Attila Lendvai @ 2022-04-19 11:38 UTC (permalink / raw)
  To: Maxime Devos; +Cc: 54893

> thank you, this works indeed as a band aid:
>
> (setenv "GUIX_LOCPATH" #+(file-append glibc-locales "/lib/locale"))
> (setlocale LC_ALL "en_US.utf8")


i spoke too early. this works in a git checkout of guix, but it fails
to compile when i try to guix pull it.

even if i declare the dependency like this:

#:autoload   (gnu packages base) (glibc-locales)

IIUC, this is due to a circular dependency: glibc-locales (and its
variants) depend on git-fetch, therefore i cannot refer to them from
the implementation of git-fetch.

i tried to set the locale to "C" or "POSIX", but it results in ASCII encoding.

i tried to set the locale to "en_US.iso-8859-1", hoping that it's
available, but it isn't.

all that is needed here is an encoding that is idempotent wrt a cycle
through bytes->string, string->bytes. i think the iso-8859-n encodings
are like that.

to verify that hypothesis:

$ mkdir -p /tmp/delme/v½.2.0
$ LANG=C guix repl
scheme@(guix-user)> (use-modules (guix build utils))
scheme@(guix-user)> (delete-file-recursively "/tmp/delme")
warning: failed to delete /tmp/delme/v??.2.0: No such file or directory
warning: failed to delete /tmp/delme: Directory not empty
$1 = #t
$2 = #<vhash 7fd60aef5540 1 pairs>
scheme@(guix-user)>

$ LANG=en_US.iso-8859-1 guix repl
scheme@(guix-user)> (use-modules (guix build utils))
scheme@(guix-user)> (delete-file-recursively "/tmp/delme")
$1 = #<vhash 7f7d7acc2040 2 pairs>
scheme@(guix-user)>


so, is such an idempotent locale available/embedded in glibc without any external dependencies? searching the web suggests that there isn't.

if not, then what would be a bird's eye view plan to make one
available for git-fetch?

should we create a new, ASCII-only git-fetch variant used in the bootstrap process?

--
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
“The world is changed by your example, not by your opinion.”
	— Paulo Coelho (1947–)





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#54893: guix-daemon, locale, LANG, and unicode in git tag names
  2022-04-19 11:38         ` Attila Lendvai
@ 2022-04-19 15:45           ` Maxime Devos
  2022-04-19 16:07           ` Maxime Devos
  1 sibling, 0 replies; 15+ messages in thread
From: Maxime Devos @ 2022-04-19 15:45 UTC (permalink / raw)
  To: Attila Lendvai; +Cc: 54893


[-- Attachment #1.1: Type: text/plain, Size: 1060 bytes --]

Attila Lendvai schreef op di 19-04-2022 om 11:38 [+0000]:
> > thank you, this works indeed as a band aid:
> > 
> > (setenv "GUIX_LOCPATH" #+(file-append glibc-locales "/lib/locale"))
> > (setlocale LC_ALL "en_US.utf8")
> 
> 
> i spoke too early. this works in a git checkout of guix, but it fails
> to compile when i try to guix pull it.
> 
> even if i declare the dependency like this:
> 
> #:autoload   (gnu packages base) (glibc-locales)
> 
> IIUC, this is due to a circular dependency: glibc-locales (and its
> variants) depend on git-fetch, therefore i cannot refer to them from
> the implementation of git-fetch.

The module of the glibc-locales package depends on git-fetch, but I
don't think the package glibc-locales does.  Anyway, circular imports
are messy and (guix build-system ...) and (guix git-download) use an
extra-lazy variant of #:autoload that doesn't load the module even when
compiling (*).

(*) Limitation: this method cannot be used to use macros.

Maybe the attached variant works?

Greetings,
Maxime.

[-- Attachment #1.2: diff.diff --]
[-- Type: text/x-patch, Size: 769 bytes --]

diff --git a/guix/git-download.scm b/guix/git-download.scm
index 5e624b9ae9..a74ba5f592 100644
--- a/guix/git-download.scm
+++ b/guix/git-download.scm
@@ -104,6 +104,9 @@ (define guile-zlib
   (define gnutls
     (module-ref (resolve-interface '(gnu packages tls)) 'gnutls))
 
+  (define glibc-locales
+    (module-ref (resolve-interface '(gnu packages base)) 'glibc-locales))
+
   (define modules
     (delete '(guix config)
             (source-module-closure '((guix build git)
@@ -122,6 +125,8 @@ (define build
                          (guix swh)
                          (ice-9 match))
 
+            (pk #+glibc-locales)
+            (error "see the pk")
             (define recursive?
               (call-with-input-string (getenv "git recursive?") read))
 

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 260 bytes --]

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* bug#54893: guix-daemon, locale, LANG, and unicode in git tag names
  2022-04-19 11:38         ` Attila Lendvai
  2022-04-19 15:45           ` Maxime Devos
@ 2022-04-19 16:07           ` Maxime Devos
  1 sibling, 0 replies; 15+ messages in thread
From: Maxime Devos @ 2022-04-19 16:07 UTC (permalink / raw)
  To: Attila Lendvai; +Cc: 54893

[-- Attachment #1: Type: text/plain, Size: 997 bytes --]

Attila Lendvai schreef op di 19-04-2022 om 11:38 [+0000]:
> so, is such an idempotent locale available/embedded in glibc without
> any external dependencies? searching the web suggests that there isn't.

Try:

$ LC_CTYPE=anything.ISO-8859-2  guix repl é
hint: Consider installing the `glibc-locales' package and defining `GUIX_LOCPATH', along these lines:

     guix install glibc-locales
     export GUIX_LOCPATH="$HOME/.guix-profile/lib/locale"

See the "Application Setup" section in the manual, for more info.

;;; Stat of /home/[...]/ĂŠ failed:
;;; In procedure stat: Bestand of map bestaat niet: "/home/[...]/ĂŠ"
guix repl: fout: open-file: Bestand of map bestaat niet: "/home/regulator/source-code/rw/ĂŠ"

IIUC, this causes lib/localcharset.c in Guile to run 'eviron_locale_charset',
which just uses environment variables and ignores glibc's locale data.

I don't know if this requires 'setlocale' or requires the absence of 'setlocale'.

Greetings,
Maxime.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 260 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#54893: [PATCH] guix: git-download: Set locale to deal with Unicode in git metadata.
  2022-04-12 19:47 bug#54893: guix-daemon, locale, LANG, and unicode in git tag names Attila Lendvai
  2022-04-12 20:40 ` Maxime Devos
@ 2022-04-19 18:09 ` Attila Lendvai
  2022-04-20 20:12   ` bug#54893: guix-daemon, locale, LANG, and unicode in git tag names Ludovic Courtès
  2022-04-20 22:15   ` Ludovic Courtès
  1 sibling, 2 replies; 15+ messages in thread
From: Attila Lendvai @ 2022-04-19 18:09 UTC (permalink / raw)
  To: 54893; +Cc: Attila Lendvai

Without this the git-fetch GEXP is run in an environment that uses ASCII
character encoding when strings are crossing the Guile - C boundary.  It means
that e.g. tag names that have Unicode chars in them will cause problems,
e.g. when walking and deleting the .git directory.

An example in the wild: https://github.com/klauspost/pgzip/tags

For more details see: https://issues.guix.gnu.org/54893

* guix/git-download.scm (git-fetch): Call setlocale to set it to en_US.utf8.
---

thanks Maxime, this indeed seems to work! and i have successfully
guix pull'ed it, too.

 guix/git-download.scm | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/guix/git-download.scm b/guix/git-download.scm
index 5e624b9ae9..2fc5a06490 100644
--- a/guix/git-download.scm
+++ b/guix/git-download.scm
@@ -104,6 +104,9 @@ (define guile-zlib
   (define gnutls
     (module-ref (resolve-interface '(gnu packages tls)) 'gnutls))
 
+  (define glibc-locales
+    (module-ref (resolve-interface '(gnu packages base)) 'glibc-locales))
+
   (define modules
     (delete '(guix config)
             (source-module-closure '((guix build git)
@@ -121,6 +124,13 @@ (define build
                          (guix build download-nar)
                          (guix swh)
                          (ice-9 match))
+            ;; We must set the locale to something/anything that will make the
+            ;; Guile FFI use a character encoding that is idempotent through a
+            ;; bytes->string string->bytes roundtrip.  Otherwise e.g. git tags
+            ;; with Unicode characters would break things.  For more details
+            ;; and an example see https://issues.guix.gnu.org/54893
+            (setenv "GUIX_LOCPATH" #+(file-append glibc-locales "/lib/locale"))
+            (setlocale LC_ALL "en_US.utf8")
 
             (define recursive?
               (call-with-input-string (getenv "git recursive?") read))
-- 
2.35.1





^ permalink raw reply related	[flat|nested] 15+ messages in thread

* bug#54893: guix-daemon, locale, LANG, and unicode in git tag names
  2022-04-19 18:09 ` bug#54893: [PATCH] guix: git-download: Set locale to deal with Unicode in git metadata Attila Lendvai
@ 2022-04-20 20:12   ` Ludovic Courtès
  2022-04-20 22:15   ` Ludovic Courtès
  1 sibling, 0 replies; 15+ messages in thread
From: Ludovic Courtès @ 2022-04-20 20:12 UTC (permalink / raw)
  To: Attila Lendvai; +Cc: 54893-done, Liliana Marie Prikler

Hi,

Attila Lendvai <attila@lendvai.name> skribis:

> Without this the git-fetch GEXP is run in an environment that uses ASCII
> character encoding when strings are crossing the Guile - C boundary.  It means
> that e.g. tag names that have Unicode chars in them will cause problems,
> e.g. when walking and deleting the .git directory.
>
> An example in the wild: https://github.com/klauspost/pgzip/tags
>
> For more details see: https://issues.guix.gnu.org/54893
>
> * guix/git-download.scm (git-fetch): Call setlocale to set it to en_US.utf8.

[...]

> +  (define glibc-locales
> +    (module-ref (resolve-interface '(gnu packages base)) 'glibc-locales))

I changed this to ‘glibc-utf8-locales’, which is sufficient here, and
committed.

Thanks everyone for the investigation and fix!

Ludo’.




^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#54893: guix-daemon, locale, LANG, and unicode in git tag names
  2022-04-19 18:09 ` bug#54893: [PATCH] guix: git-download: Set locale to deal with Unicode in git metadata Attila Lendvai
  2022-04-20 20:12   ` bug#54893: guix-daemon, locale, LANG, and unicode in git tag names Ludovic Courtès
@ 2022-04-20 22:15   ` Ludovic Courtès
  1 sibling, 0 replies; 15+ messages in thread
From: Ludovic Courtès @ 2022-04-20 22:15 UTC (permalink / raw)
  To: Attila Lendvai; +Cc: 54893-done

Attila Lendvai <attila@lendvai.name> skribis:

> Without this the git-fetch GEXP is run in an environment that uses ASCII
> character encoding when strings are crossing the Guile - C boundary.  It means
> that e.g. tag names that have Unicode chars in them will cause problems,
> e.g. when walking and deleting the .git directory.
>
> An example in the wild: https://github.com/klauspost/pgzip/tags
>
> For more details see: https://issues.guix.gnu.org/54893
>
> * guix/git-download.scm (git-fetch): Call setlocale to set it to en_US.utf8.

I spoke a bit too fast and realized some adjustments were needed to
avoid a circular dependency on i586-gnu.  Pushed as
8852f911ff506dd50b714274ba0e2143f0285f78!

Ludo’.




^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-04-20 22:19 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-12 19:47 bug#54893: guix-daemon, locale, LANG, and unicode in git tag names Attila Lendvai
2022-04-12 20:40 ` Maxime Devos
2022-04-13  7:51   ` Attila Lendvai
2022-04-13  8:03     ` Maxime Devos
2022-04-13  8:45       ` Attila Lendvai
2022-04-19 11:38         ` Attila Lendvai
2022-04-19 15:45           ` Maxime Devos
2022-04-19 16:07           ` Maxime Devos
2022-04-13  8:22     ` Maxime Devos
2022-04-13 10:40       ` Liliana Marie Prikler
2022-04-13 10:57         ` Maxime Devos
2022-04-13  8:29     ` Maxime Devos
2022-04-19 18:09 ` bug#54893: [PATCH] guix: git-download: Set locale to deal with Unicode in git metadata Attila Lendvai
2022-04-20 20:12   ` bug#54893: guix-daemon, locale, LANG, and unicode in git tag names Ludovic Courtès
2022-04-20 22:15   ` Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).