unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
From: ludo@gnu.org (Ludovic Courtès)
To: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Cc: 26948@debbugs.gnu.org
Subject: bug#26948: ‘write-file’ output should not be locale-dependent
Date: Tue, 30 May 2017 13:57:24 +0200	[thread overview]
Message-ID: <878tle7e1n.fsf@gnu.org> (raw)
In-Reply-To: <87h903s9mf.fsf@gmail.com> (Maxim Cournoyer's message of "Mon, 29 May 2017 13:15:04 -0700")

[-- Attachment #1: Type: text/plain, Size: 4010 bytes --]

Hi Maxim,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

> ludo@gnu.org (Ludovic Courtès) writes:

[...]

>> But wait!  “guix build nss-certs --check -K” fails, and the diff is:
>>
>> $ LANGUAGE= diff -ur /gnu/store/3ql0vilc0zv6ra42ghi04787vrg6bb71-nss-certs-3.30.2{,-check}
>> Only in /gnu/store/3ql0vilc0zv6ra42ghi04787vrg6bb71-nss-certs-3.30.2-check/etc/ssl/certs: AC_Raíz_Certicámara_S.A.:2.15.7.126.82.147.123.224.21.227.87.240.105.140.203.236.12.pem
>> Only in /gnu/store/3ql0vilc0zv6ra42ghi04787vrg6bb71-nss-certs-3.30.2/etc/ssl/certs: AC_Ra?z_Certic?mara_S.A.:2.15.7.126.82.147.123.224.21.227.87.240.105.140.203.236.12.pem
>> diff -ur /gnu/store/3ql0vilc0zv6ra42ghi04787vrg6bb71-nss-certs-3.30.2/etc/ssl/certs/ae8153b9.0 /gnu/store/3ql0vilc0zv6ra42ghi04787vrg6bb71-nss-certs-3.30.2-check/etc/ssl/certs/ae8153b9.0
>> --- /gnu/store/3ql0vilc0zv6ra42ghi04787vrg6bb71-nss-certs-3.30.2/etc/ssl/certs/ae8153b9.0	1970-01-01 01:00:01.000000000 +0100
>> +++ /gnu/store/3ql0vilc0zv6ra42ghi04787vrg6bb71-nss-certs-3.30.2-check/etc/ssl/certs/ae8153b9.0	1970-01-01 01:00:01.000000000 +0100
>> @@ -3,10 +3,10 @@
>>  # distrust=
>>  # openssl-trust=codeSigning emailProtection serverAuth
>>  -----BEGIN CERTIFICATE-----
>> -MIIHyTCCBbGgAwIBAgIBATANBgkqhkiG9w0BAQUFADB9MQswCQYDVQQGEwJJTDEW
>> +MIIHhzCCBW+gAwIBAgIBLTANBgkqhkiG9w0BAQsFADB9MQswCQYDVQQGEwJJTDEW
>
> Can this be explained by locale alone? That is troubling.

Yes it’s troubling, it deserves more investigation.

>> There are two ways to create nars.  One is via the ‘export-paths’ RPC
>> (implemented in the daemon in C++), which does not interpret file names
>> and thus leaves them untouched.  The other one is via ‘write-file’ from
>> (guix serialization), which is written in Scheme and thus converts file
>> names from locale encoding (specifically, ‘scandir’ does that.)
>>
>> ‘guix publish’ uses the latter, so ‘guix publish’ is sensitive to locale
>> settings, which is pretty bad.
>>
>> Guile currently does not allow us to specify whether/how file names
>> should be decoded, but possible solutions have been discussed for 2.2.
>>
>> In the meantime, solutions are:
>>
>>   1. To run ‘guix publish’ in a UTF-8 locale, which apparently was not
>>      the case.
>
> I'm surprised by that. Wouldn't a utf8 locale be the default?

Users are free to choose their favorite locale.  Also, on a foreign
distro where locales are not properly set up, you end up in the C locale
with US-ASCII encoding (as was the case here).

>>   2. Add to (guix build syscalls) a separate locale-independent
>>      ‘scandir’ implementation and use that.
>
> If the general solution is to fix it in Guile, the workaround proposed
> in 1. seems preferable.

I implemented ‘scandir/utf-8’ and used that in ‘write-file’ (patches
attached).  Unfortunately that’s not enough since libguile procedures
like ‘open-file’ still do locale-dependent conversion, so we’d need to
duplicate those as well, which is not great.

But on second thought, I think the problem is not in the ‘write-file’
call that ‘guix publish’ makes: if it were, ‘scandir’ would return bogus
file names (with question marks), but then trying to open them would
fail, so ‘write-file’ wouldn’t produce a bogus nar.

So I think the culprit is ‘restore-file-set’ (used in ‘guix offload’
when retrieving store items from a build machine): this one reads file
names as UTF-8, via ‘read-store-path’, but then when it tries to create
those files using Guile’s primitives, their names can be be converted,
possibly with those question marks popping up.  Here ‘restore-file-set’
can’t notice that Guile changed the file names behind its back.

The short-term fix is to ensure guix-daemon itself runs in a UTF-8
locale.  :-/

I’ve restarted guix-daemon and ‘guix publish’ in a UTF-8 locale on
hydra.gnu.org.

Thanks,
Ludo’.



[-- Attachment #2: the patch --]
[-- Type: text/x-patch, Size: 5662 bytes --]

From e7f464bac58e1f09de5ceb194c4a30f6d899b29a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ludovic=20Court=C3=A8s?= <ludo@gnu.org>
Date: Tue, 30 May 2017 12:03:54 +0200
Subject: [PATCH] syscalls: Add 'scandir/utf-8'.

* guix/build/syscalls.scm (%struct-dirent-header): New C struct.
(opendir/utf-8, closedir/utf-8, readdir/utf-8, scandir/utf-8): New
procedures.
* tests/syscalls.scm ("scandir/utf-8, ENOENT")
("scandir/utf-8, ASCII file names")
("scandir/utf8, UTF-8 file names"): New tests.
---
 guix/build/syscalls.scm | 73 +++++++++++++++++++++++++++++++++++++++++++++++++
 tests/syscalls.scm      | 39 ++++++++++++++++++++++++++
 2 files changed, 112 insertions(+)

diff --git a/guix/build/syscalls.scm b/guix/build/syscalls.scm
index 52439afd4..cfb43e93b 100644
--- a/guix/build/syscalls.scm
+++ b/guix/build/syscalls.scm
@@ -67,6 +67,7 @@
             mkdtemp!
             fdatasync
             pivot-root
+            scandir/utf-8
             fcntl-flock
 
             set-thread-name
@@ -812,6 +813,78 @@ system to PUT-OLD."
 
 \f
 ;;;
+;;; Opendir & co.
+;;;
+
+(define-c-struct %struct-dirent-header
+  sizeof-dirent-header
+  list
+  read-dirent-header
+  write-dirent-header!
+  (inode  int64)
+  (offset int64)
+  (length unsigned-short)
+  (type   uint8)
+  (name   uint8))                                 ;first byte of 'd_name'
+
+(define opendir/utf-8
+  (let ((proc (syscall->procedure '* "opendir" '(*))))
+    (lambda (name)
+      (let-values (((ptr err)
+                    (proc (string->pointer name "UTF-8"))))
+        (if (null-pointer? ptr)
+            (throw 'system-error "opendir/utf-8"
+                   "opendir/utf-8: ~A"
+                   (list (strerror err))
+                   (list err))
+            ptr)))))
+
+(define closedir/utf-8
+  (let ((proc (syscall->procedure int "closedir" '(*))))
+    (lambda (directory)
+      (let-values (((ret err)
+                    (proc directory)))
+        (unless (zero? ret)
+          (throw 'system-error "closedir"
+                 "closedir: ~A" (list (strerror err))
+                 (list err)))))))
+
+(define readdir/utf-8
+  (let ((proc (syscall->procedure '* "readdir64" '(*))))
+    (lambda (directory)
+      (let ((ptr (proc directory)))
+        (and (not (null-pointer? ptr))
+             (pointer->string
+              (make-pointer (+ (pointer-address ptr)
+                               (c-struct-field-offset %struct-dirent-header name)))
+              -1
+              "UTF-8"))))))
+
+(define* (scandir/utf-8 name #:optional
+                        (select? (const #t))
+                        (string<? string<?))
+  "Like 'scandir', but (1) systematically decode file names as UTF-8, and (2)
+raise to 'system-error' when NAME cannot be opened.
+
+This works around a defect in Guile 2.0/2.2 where 'scandir' decodes file names
+according to the current locale, which is not always desirable."
+  (let ((directory (opendir/utf-8 name)))
+    (dynamic-wind
+      (const #t)
+      (lambda ()
+        (let loop ((result '()))
+          (match (readdir/utf-8 directory)
+            (#f
+             (sort result string<?))
+            (entry
+             (loop (if (select? entry)
+                       (cons entry result)
+                       result))))))
+      (lambda ()
+        (closedir/utf-8 directory)))))
+
+\f
+;;;
 ;;; Advisory file locking.
 ;;;
 
diff --git a/tests/syscalls.scm b/tests/syscalls.scm
index e20f0600b..02062ec9d 100644
--- a/tests/syscalls.scm
+++ b/tests/syscalls.scm
@@ -24,6 +24,8 @@
   #:use-module (srfi srfi-1)
   #:use-module (srfi srfi-26)
   #:use-module (srfi srfi-64)
+  #:use-module (system foreign)
+  #:use-module ((ice-9 ftw) #:select (scandir))
   #:use-module (ice-9 match))
 
 ;; Test the (guix build syscalls) module, although there's not much that can
@@ -184,6 +186,43 @@
                          (status:exit-val status))))
                (eq? #t result))))))))
 
+(test-equal "scandir/utf-8, ENOENT"
+  ENOENT
+  (catch 'system-error
+    (lambda ()
+      (scandir/utf-8 "/does/not/exist"))
+    (lambda args
+      (system-error-errno args))))
+
+(test-equal "scandir/utf-8, ASCII file names"
+  (scandir (dirname (search-path %load-path "guix/base32.scm")))
+  (scandir/utf-8 (dirname (search-path %load-path "guix/base32.scm"))))
+
+(test-equal "scandir/utf8, UTF-8 file names"
+  '("." ".." "α" "λ")
+  (call-with-temporary-directory
+   (lambda (directory)
+     ;; Wrap 'creat' to make sure that we really pass a UTF-8-encoded file
+     ;; name to the system call.
+     (let ((creat (pointer->procedure int
+                                      (dynamic-func "creat" (dynamic-link))
+                                      (list '* int))))
+       (creat (string->pointer (string-append directory "/α")
+                               "UTF-8")
+              #o644)
+       (creat (string->pointer (string-append directory "/λ")
+                               "UTF-8")
+              #o644)
+       (let ((locale (setlocale LC_ALL)))
+         (dynamic-wind
+           (lambda ()
+             ;; Make sure that even in a C locale we get the right result.
+             (setlocale LC_ALL "C"))
+           (lambda ()
+             (scandir/utf-8 directory))
+           (lambda ()
+             (setlocale LC_ALL locale))))))))
+
 (false-if-exception (delete-file temp-file))
 (test-equal "fcntl-flock wait"
   42                                              ; the child's exit status
-- 
2.13.0


[-- Attachment #3: Type: text/x-patch, Size: 3079 bytes --]

diff --git a/guix/serialization.scm b/guix/serialization.scm
index e6ae2fc30..77a54f904 100644
--- a/guix/serialization.scm
+++ b/guix/serialization.scm
@@ -18,6 +18,8 @@
 
 (define-module (guix serialization)
   #:use-module (guix combinators)
+  #:use-module ((guix build syscalls)
+                #:select (scandir/utf-8))
   #:use-module (rnrs bytevectors)
   #:use-module (srfi srfi-1)
   #:use-module (srfi srfi-26)
@@ -285,8 +287,11 @@ result of 'lstat'; exclude entries for which SELECT? does not return true."
               ;; 'scandir' defaults to 'string-locale<?' to sort files, but
               ;; this happens to be case-insensitive (at least in 'en_US'
               ;; locale on libc 2.18.)  Conversely, we want files to be
-              ;; sorted in a case-sensitive fashion.
-              (scandir f (negate (cut member <> '("." ".."))) string<?)))
+              ;; sorted in a case-sensitive fashion.  Also, always decode file
+              ;; names as UTF-8.
+              (scandir/utf-8
+               f (negate (cut member <> '("." "..")))
+                             string<?)))
          (for-each (lambda (e)
                      (let* ((f (string-append f "/" e))
                             (s (lstat f)))
diff --git a/tests/nar.scm b/tests/nar.scm
index 61646db96..d2eae65c4 100644
--- a/tests/nar.scm
+++ b/tests/nar.scm
@@ -25,6 +25,8 @@
                 #:select (open-sha256-port open-sha256-input-port))
   #:use-module ((guix packages)
                 #:select (base32))
+  #:use-module ((guix utils)
+                #:select (call-with-temporary-directory))
   #:use-module (rnrs bytevectors)
   #:use-module (rnrs io ports)
   #:use-module (srfi srfi-1)
@@ -272,6 +274,36 @@
       (lambda ()
         (rmdir input)))))
 
+(unless (equal? "UTF-8" (fluid-ref %default-port-encoding))
+  (test-skip 1))
+(test-assert "write-file + restore-file, UTF-8 file names"
+  (let* ((output %test-dir)
+         (nar    (string-append output ".nar"))
+         (locale (setlocale LC_ALL)))
+    (dynamic-wind
+      (lambda () #t)
+      (lambda ()
+        (call-with-temporary-directory
+         (lambda (input)
+           (call-with-output-file (string-append input "/α")
+             (const #t))
+           (call-with-output-file (string-append input "/λ")
+             (const #t))
+           (dynamic-wind
+             (const #f)
+             (lambda ()
+               (setlocale LC_ALL "C")
+               (call-with-output-file nar
+                 (cut write-file input <>)))
+             (lambda ()
+               (setlocale LC_ALL locale)))
+           (call-with-input-file nar
+             (cut restore-file <> output))
+           (file-tree-equal? input output))))
+      (lambda ()
+        (false-if-exception (delete-file nar))
+        (false-if-exception (rm-rf output))))))
+
 ;; 'restore-file-set' depends on 'open-sha256-input-port', which in turn
 ;; relies on a Guile 2.0.10+ feature.
 (test-skip (if (false-if-exception

  reply	other threads:[~2017-05-30 11:58 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-16  5:19 bug#26948: gnutls errors on multiple guix commands Maxim Cournoyer
2017-05-17 12:56 ` Ludovic Courtès
2017-05-25  7:26   ` Maxim Cournoyer
2017-05-26  8:56     ` Ludovic Courtès
2017-05-28 18:38       ` Mark H Weaver
2017-05-29  4:36         ` Maxim Cournoyer
2017-05-29  9:31         ` Ludovic Courtès
2017-05-29 21:26           ` Mark H Weaver
2017-05-30 11:25             ` Ludovic Courtès
2017-05-28 21:00       ` Maxim Cournoyer
2017-05-29  9:12         ` bug#26948: ‘write-file’ output should not be locale-dependent Ludovic Courtès
2017-05-29 20:15           ` Maxim Cournoyer
2017-05-30 11:57             ` Ludovic Courtès [this message]
2017-06-16 15:09               ` Ludovic Courtès
2017-07-27 12:55           ` Ludovic Courtès
2021-01-08 22:04             ` bug#26948: 'guix publish' file name decoding is locale-dependent Maxim Cournoyer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878tle7e1n.fsf@gnu.org \
    --to=ludo@gnu.org \
    --cc=26948@debbugs.gnu.org \
    --cc=maxim.cournoyer@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).