From mboxrd@z Thu Jan 1 00:00:00 1970 From: ludo@gnu.org (Ludovic =?UTF-8?Q?Court=C3=A8s?=) Subject: bug#26948: =?UTF-8?Q?=E2=80=98write-file=E2=80=99?= output should not be locale-dependent Date: Tue, 30 May 2017 13:57:24 +0200 Message-ID: <878tle7e1n.fsf@gnu.org> References: <8737c51e6r.fsf@gmail.com> <87shk3y74g.fsf@gnu.org> <8737btieie.fsf@gmail.com> <87vaoovvvz.fsf@gnu.org> <87o9ucu1t3.fsf@gmail.com> <87mv9wc9gp.fsf_-_@gnu.org> <87h903s9mf.fsf@gmail.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:47053) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dFfmn-000198-F0 for bug-guix@gnu.org; Tue, 30 May 2017 07:58:07 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dFfmk-00077z-Bu for bug-guix@gnu.org; Tue, 30 May 2017 07:58:05 -0400 Received: from debbugs.gnu.org ([208.118.235.43]:41561) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1dFfmk-00077l-7m for bug-guix@gnu.org; Tue, 30 May 2017 07:58:02 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1dFfmj-0000E9-Tj for bug-guix@gnu.org; Tue, 30 May 2017 07:58:01 -0400 Sender: "Debbugs-submit" Resent-Message-ID: In-Reply-To: <87h903s9mf.fsf@gmail.com> (Maxim Cournoyer's message of "Mon, 29 May 2017 13:15:04 -0700") List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: "bug-Guix" To: Maxim Cournoyer Cc: 26948@debbugs.gnu.org --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi Maxim, Maxim Cournoyer skribis: > ludo@gnu.org (Ludovic Court=C3=A8s) writes: [...] >> But wait! =E2=80=9Cguix build nss-certs --check -K=E2=80=9D fails, and = the diff is: >> >> $ LANGUAGE=3D diff -ur /gnu/store/3ql0vilc0zv6ra42ghi04787vrg6bb71-nss-c= erts-3.30.2{,-check} >> Only in /gnu/store/3ql0vilc0zv6ra42ghi04787vrg6bb71-nss-certs-3.30.2-che= ck/etc/ssl/certs: AC_Ra=C3=ADz_Certic=C3=A1mara_S.A.:2.15.7.126.82.147.123.= 224.21.227.87.240.105.140.203.236.12.pem >> Only in /gnu/store/3ql0vilc0zv6ra42ghi04787vrg6bb71-nss-certs-3.30.2/etc= /ssl/certs: AC_Ra?z_Certic?mara_S.A.:2.15.7.126.82.147.123.224.21.227.87.24= 0.105.140.203.236.12.pem >> diff -ur /gnu/store/3ql0vilc0zv6ra42ghi04787vrg6bb71-nss-certs-3.30.2/et= c/ssl/certs/ae8153b9.0 /gnu/store/3ql0vilc0zv6ra42ghi04787vrg6bb71-nss-cert= s-3.30.2-check/etc/ssl/certs/ae8153b9.0 >> --- /gnu/store/3ql0vilc0zv6ra42ghi04787vrg6bb71-nss-certs-3.30.2/etc/ssl= /certs/ae8153b9.0 1970-01-01 01:00:01.000000000 +0100 >> +++ /gnu/store/3ql0vilc0zv6ra42ghi04787vrg6bb71-nss-certs-3.30.2-check/e= tc/ssl/certs/ae8153b9.0 1970-01-01 01:00:01.000000000 +0100 >> @@ -3,10 +3,10 @@ >> # distrust=3D >> # openssl-trust=3DcodeSigning emailProtection serverAuth >> -----BEGIN CERTIFICATE----- >> -MIIHyTCCBbGgAwIBAgIBATANBgkqhkiG9w0BAQUFADB9MQswCQYDVQQGEwJJTDEW >> +MIIHhzCCBW+gAwIBAgIBLTANBgkqhkiG9w0BAQsFADB9MQswCQYDVQQGEwJJTDEW > > Can this be explained by locale alone? That is troubling. Yes it=E2=80=99s troubling, it deserves more investigation. >> There are two ways to create nars. One is via the =E2=80=98export-paths= =E2=80=99 RPC >> (implemented in the daemon in C++), which does not interpret file names >> and thus leaves them untouched. The other one is via =E2=80=98write-fil= e=E2=80=99 from >> (guix serialization), which is written in Scheme and thus converts file >> names from locale encoding (specifically, =E2=80=98scandir=E2=80=99 does= that.) >> >> =E2=80=98guix publish=E2=80=99 uses the latter, so =E2=80=98guix publish= =E2=80=99 is sensitive to locale >> settings, which is pretty bad. >> >> Guile currently does not allow us to specify whether/how file names >> should be decoded, but possible solutions have been discussed for 2.2. >> >> In the meantime, solutions are: >> >> 1. To run =E2=80=98guix publish=E2=80=99 in a UTF-8 locale, which appa= rently was not >> the case. > > I'm surprised by that. Wouldn't a utf8 locale be the default? Users are free to choose their favorite locale. Also, on a foreign distro where locales are not properly set up, you end up in the C locale with US-ASCII encoding (as was the case here). >> 2. Add to (guix build syscalls) a separate locale-independent >> =E2=80=98scandir=E2=80=99 implementation and use that. > > If the general solution is to fix it in Guile, the workaround proposed > in 1. seems preferable. I implemented =E2=80=98scandir/utf-8=E2=80=99 and used that in =E2=80=98wri= te-file=E2=80=99 (patches attached). Unfortunately that=E2=80=99s not enough since libguile procedur= es like =E2=80=98open-file=E2=80=99 still do locale-dependent conversion, so w= e=E2=80=99d need to duplicate those as well, which is not great. But on second thought, I think the problem is not in the =E2=80=98write-fil= e=E2=80=99 call that =E2=80=98guix publish=E2=80=99 makes: if it were, =E2=80=98scandi= r=E2=80=99 would return bogus file names (with question marks), but then trying to open them would fail, so =E2=80=98write-file=E2=80=99 wouldn=E2=80=99t produce a bogus nar. So I think the culprit is =E2=80=98restore-file-set=E2=80=99 (used in =E2= =80=98guix offload=E2=80=99 when retrieving store items from a build machine): this one reads file names as UTF-8, via =E2=80=98read-store-path=E2=80=99, but then when it tri= es to create those files using Guile=E2=80=99s primitives, their names can be be convert= ed, possibly with those question marks popping up. Here =E2=80=98restore-file-= set=E2=80=99 can=E2=80=99t notice that Guile changed the file names behind its back. The short-term fix is to ensure guix-daemon itself runs in a UTF-8 locale. :-/ I=E2=80=99ve restarted guix-daemon and =E2=80=98guix publish=E2=80=99 in a = UTF-8 locale on hydra.gnu.org. Thanks, Ludo=E2=80=99. --=-=-= Content-Type: text/x-patch; charset=utf-8 Content-Disposition: inline; filename=0001-syscalls-Add-scandir-utf-8.patch Content-Transfer-Encoding: quoted-printable Content-Description: the patch >From e7f464bac58e1f09de5ceb194c4a30f6d899b29a Mon Sep 17 00:00:00 2001 From: =3D?UTF-8?q?Ludovic=3D20Court=3DC3=3DA8s?=3D Date: Tue, 30 May 2017 12:03:54 +0200 Subject: [PATCH] syscalls: Add 'scandir/utf-8'. * guix/build/syscalls.scm (%struct-dirent-header): New C struct. (opendir/utf-8, closedir/utf-8, readdir/utf-8, scandir/utf-8): New procedures. * tests/syscalls.scm ("scandir/utf-8, ENOENT") ("scandir/utf-8, ASCII file names") ("scandir/utf8, UTF-8 file names"): New tests. --- guix/build/syscalls.scm | 73 +++++++++++++++++++++++++++++++++++++++++++++= ++++ tests/syscalls.scm | 39 ++++++++++++++++++++++++++ 2 files changed, 112 insertions(+) diff --git a/guix/build/syscalls.scm b/guix/build/syscalls.scm index 52439afd4..cfb43e93b 100644 --- a/guix/build/syscalls.scm +++ b/guix/build/syscalls.scm @@ -67,6 +67,7 @@ mkdtemp! fdatasync pivot-root + scandir/utf-8 fcntl-flock =20 set-thread-name @@ -812,6 +813,78 @@ system to PUT-OLD." =20 ;;; +;;; Opendir & co. +;;; + +(define-c-struct %struct-dirent-header + sizeof-dirent-header + list + read-dirent-header + write-dirent-header! + (inode int64) + (offset int64) + (length unsigned-short) + (type uint8) + (name uint8)) ;first byte of 'd_name' + +(define opendir/utf-8 + (let ((proc (syscall->procedure '* "opendir" '(*)))) + (lambda (name) + (let-values (((ptr err) + (proc (string->pointer name "UTF-8")))) + (if (null-pointer? ptr) + (throw 'system-error "opendir/utf-8" + "opendir/utf-8: ~A" + (list (strerror err)) + (list err)) + ptr))))) + +(define closedir/utf-8 + (let ((proc (syscall->procedure int "closedir" '(*)))) + (lambda (directory) + (let-values (((ret err) + (proc directory))) + (unless (zero? ret) + (throw 'system-error "closedir" + "closedir: ~A" (list (strerror err)) + (list err))))))) + +(define readdir/utf-8 + (let ((proc (syscall->procedure '* "readdir64" '(*)))) + (lambda (directory) + (let ((ptr (proc directory))) + (and (not (null-pointer? ptr)) + (pointer->string + (make-pointer (+ (pointer-address ptr) + (c-struct-field-offset %struct-dirent-heade= r name))) + -1 + "UTF-8")))))) + +(define* (scandir/utf-8 name #:optional + (select? (const #t)) + (stringprocedure int + (dynamic-func "creat" (dynamic-link)) + (list '* int)))) + (creat (string->pointer (string-append directory "/=CE=B1") + "UTF-8") + #o644) + (creat (string->pointer (string-append directory "/=CE=BB") + "UTF-8") + #o644) + (let ((locale (setlocale LC_ALL))) + (dynamic-wind + (lambda () + ;; Make sure that even in a C locale we get the right result. + (setlocale LC_ALL "C")) + (lambda () + (scandir/utf-8 directory)) + (lambda () + (setlocale LC_ALL locale)))))))) + (false-if-exception (delete-file temp-file)) (test-equal "fcntl-flock wait" 42 ; the child's exit status --=20 2.13.0 --=-=-= Content-Type: text/x-patch; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable diff --git a/guix/serialization.scm b/guix/serialization.scm index e6ae2fc30..77a54f904 100644 --- a/guix/serialization.scm +++ b/guix/serialization.scm @@ -18,6 +18,8 @@ =20 (define-module (guix serialization) #:use-module (guix combinators) + #:use-module ((guix build syscalls) + #:select (scandir/utf-8)) #:use-module (rnrs bytevectors) #:use-module (srfi srfi-1) #:use-module (srfi srfi-26) @@ -285,8 +287,11 @@ result of 'lstat'; exclude entries for which SELECT? d= oes not return true." ;; 'scandir' defaults to 'string-locale '("." ".."))) string '("." ".."))) + string))) + (lambda () + (setlocale LC_ALL locale))) + (call-with-input-file nar + (cut restore-file <> output)) + (file-tree-equal? input output)))) + (lambda () + (false-if-exception (delete-file nar)) + (false-if-exception (rm-rf output)))))) + ;; 'restore-file-set' depends on 'open-sha256-input-port', which in turn ;; relies on a Guile 2.0.10+ feature. (test-skip (if (false-if-exception --=-=-=--