unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#61722: (guix cpio) produces corrupted archives when there are non-ASCII filenames
@ 2023-02-23  3:14 Maxim Cournoyer
  2023-02-24  4:54 ` bug#61722: [PATCH] cpio: Properly handle Unicode characters in file names Maxim Cournoyer
  2023-02-24 13:26 ` bug#61722: [PATCH v2] " Maxim Cournoyer
  0 siblings, 2 replies; 5+ messages in thread
From: Maxim Cournoyer @ 2023-02-23  3:14 UTC (permalink / raw)
  To: 61722

Hi,

It appears that the code we have to generate CPIO archives doesn't
handle the presence of non-ASCII characters in the file names of files
to be archived well:

First, to make rpm usable on a Guix System:

--8<---------------cut here---------------start------------->8---
# mkdir /var/lib/rpm
# chown root:users /var/lib/rpm
# chmod g+rw /var/lib/rpm
--8<---------------cut here---------------end--------------->8---

Then, produce a problematic CPIO via 'guix pack -f rpm', which uses
(guix cpio):

--8<---------------cut here---------------start------------->8---
$ rpm_archive=$(guix pack -R -C none -f rpm nss-certs)
--8<---------------cut here---------------end--------------->8---

Notice that it cannot be installed:
--8<---------------cut here---------------start------------->8---
$ mkdir /tmp/nss-certs
# rpm --prefix=/tmp/nss-certs -i $rpm_archive
error: unpacking of archive failed: cpio: Bad magic
error: nss-certs-3.81-0.x86_64: install failed
--8<---------------cut here---------------end--------------->8---

Let's now inspect the cpio itself.

--8<---------------cut here---------------start------------->8---
$ guix shell rpm cpio
[env]$ rpm2cpio $rpm_archive > nss-certs.cpio
[env]$ cpio -t < nss-certs.cpio |& grep -B3 junk
./gnu/store/1klwvqm3njp070h982ydcix1gzf2zmdl-nss-certs-3.81/etc/ssl/certs/9482e63a.0
./gnu/store/1klwvqm3njp070h982ydcix1gzf2zmdl-nss-certs-3.81/etc/ssl/certs/9846683b.0
./gnu/store/1klwvqm3njp070h982ydcix1gzf2zmdl-nss-certs-3.81/etc/ssl/certs/988a38cb.0
cpio: warning: skipped 248 bytes of junk
--
./gnu/store/1klwvqm3njp070h982ydcix1gzf2zmdl-nss-certs-3.81/etc/ssl/certs/Microsoft_RSA_Root_Certificate_Authority_2017.pem
./gnu/store/1klwvqm3njp070h982ydcix1gzf2zmdl-nss-certs-3.81/etc/ssl/certs/NAVER_Global_Root_Certification_Authority.pem
./gnu/store/1klwvqm3njp070h982ydcix1gzf2zmdl-nss-certs-3.81/etc/ssl/certs/NetLock_Arany_=Class_Gold=_Főtanúsítvány.
cpio: warning: skipped 4 bytes of junk
--8<---------------cut here---------------end--------------->8---

I haven't yet pin-pointed what the problem is.

I could do with extra eyes :-).

-- 
Thanks,
Maxim




^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#61722: [PATCH] cpio: Properly handle Unicode characters in file names.
  2023-02-23  3:14 bug#61722: (guix cpio) produces corrupted archives when there are non-ASCII filenames Maxim Cournoyer
@ 2023-02-24  4:54 ` Maxim Cournoyer
  2023-02-24 11:46   ` Mark H Weaver
  2023-02-24 13:26 ` bug#61722: [PATCH v2] " Maxim Cournoyer
  1 sibling, 1 reply; 5+ messages in thread
From: Maxim Cournoyer @ 2023-02-24  4:54 UTC (permalink / raw)
  To: 61722
  Cc: Josselin Poiret, Tobias Geerinckx-Rice, Maxim Cournoyer,
	Simon Tournier, Mathieu Othacehe, Ludovic Courtès,
	Christopher Baines, Ricardo Wurmus

Fixes <https://issues.guix.gnu.org/61722>.

* guix/cpio.scm (file->cpio-header): Compute the file name length in bytes rather than in
characters.
(file->cpio-header*, special-file->cpio-header*): Likewise.
(write-cpio-archive): Likewise, and write the file name as UTF-8 bytes, not
textually, to avoid encoding it as ISO-8859-1.

---

 guix/cpio.scm | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/guix/cpio.scm b/guix/cpio.scm
index d4a7d5f1e0..8fd7552450 100644
--- a/guix/cpio.scm
+++ b/guix/cpio.scm
@@ -170,7 +170,8 @@ (define* (file->cpio-header file #:optional (file-name file)
                       #:size (stat:size st)
                       #:dev (stat:dev st)
                       #:rdev (stat:rdev st)
-                      #:name-size (string-length file-name))))
+                      #:name-size (bytevector-length
+                                   (string->utf8 file-name)))))
 
 (define* (file->cpio-header* file
                              #:optional (file-name file)
@@ -182,7 +183,8 @@ (define* (file->cpio-header* file
     (make-cpio-header #:mode (stat:mode st)
                       #:nlink (stat:nlink st)
                       #:size (stat:size st)
-                      #:name-size (string-length file-name))))
+                      #:name-size (bytevector-length
+                                   (string->utf8 file-name)))))
 
 (define* (special-file->cpio-header* file
                                      device-type
@@ -201,7 +203,8 @@ (define* (special-file->cpio-header* file
                                     permission-bits)
                     #:nlink 1
                     #:rdev (device-number device-major device-minor)
-                    #:name-size (string-length file-name)))
+                    #:name-size (bytevector-length
+                                 (string->utf8 file-name))))
 
 (define %trailer
   "TRAILER!!!")
@@ -237,7 +240,7 @@ (define (dump-file file)
 
       ;; We're padding the header + following file name + trailing zero, and
       ;; the header is 110 byte long.
-      (write-padding (+ 110 1 (string-length file)) port)
+      (write-padding (+ 110 (bytevector-length (string->utf8 file)) 1) port)
 
       (case (mode->type (cpio-header-mode header))
         ((regular)
@@ -246,7 +249,7 @@ (define (dump-file file)
              (dump-port input port))))
         ((symlink)
          (let ((target (readlink file)))
-           (put-string port target)))
+           (put-bytevector port (string->utf8 target))))
         ((directory)
          #t)
         ((block-special)

base-commit: c756c62cfdba8d4079be1ba9e370779b850f16b6
-- 
2.39.1





^ permalink raw reply related	[flat|nested] 5+ messages in thread

* bug#61722: [PATCH] cpio: Properly handle Unicode characters in file names.
  2023-02-24  4:54 ` bug#61722: [PATCH] cpio: Properly handle Unicode characters in file names Maxim Cournoyer
@ 2023-02-24 11:46   ` Mark H Weaver
  0 siblings, 0 replies; 5+ messages in thread
From: Mark H Weaver @ 2023-02-24 11:46 UTC (permalink / raw)
  To: Maxim Cournoyer, 61722
  Cc: Josselin Poiret, Christopher Baines, Maxim Cournoyer,
	Simon Tournier, Mathieu Othacehe, Ludovic Courtès,
	Tobias Geerinckx-Rice, Ricardo Wurmus

Hi Maxim,

Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

> Fixes <https://issues.guix.gnu.org/61722>.
>
> * guix/cpio.scm (file->cpio-header): Compute the file name length in bytes rather than in
> characters.
> (file->cpio-header*, special-file->cpio-header*): Likewise.
> (write-cpio-archive): Likewise, and write the file name as UTF-8 bytes, not
> textually, to avoid encoding it as ISO-8859-1.
>
> ---
>
>  guix/cpio.scm | 13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/guix/cpio.scm b/guix/cpio.scm
> index d4a7d5f1e0..8fd7552450 100644
> --- a/guix/cpio.scm
> +++ b/guix/cpio.scm
> @@ -170,7 +170,8 @@ (define* (file->cpio-header file #:optional (file-name file)
>                        #:size (stat:size st)
>                        #:dev (stat:dev st)
>                        #:rdev (stat:rdev st)
> -                      #:name-size (string-length file-name))))
> +                      #:name-size (bytevector-length
> +                                   (string->utf8 file-name)))))

(string-utf8-length file-name) would produce the same result more
efficiently.

      Regards,
        Mark




^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#61722: [PATCH v2] cpio: Properly handle Unicode characters in file names.
  2023-02-23  3:14 bug#61722: (guix cpio) produces corrupted archives when there are non-ASCII filenames Maxim Cournoyer
  2023-02-24  4:54 ` bug#61722: [PATCH] cpio: Properly handle Unicode characters in file names Maxim Cournoyer
@ 2023-02-24 13:26 ` Maxim Cournoyer
  2023-02-25 19:52   ` bug#61722: (guix cpio) produces corrupted archives when there are non-ASCII filenames Maxim Cournoyer
  1 sibling, 1 reply; 5+ messages in thread
From: Maxim Cournoyer @ 2023-02-24 13:26 UTC (permalink / raw)
  To: 61722
  Cc: Josselin Poiret, Tobias Geerinckx-Rice, Maxim Cournoyer,
	Simon Tournier, mhw, Ludovic Courtès, Christopher Baines,
	Ricardo Wurmus, Mathieu Othacehe

Fixes <https://issues.guix.gnu.org/61722>.

* guix/cpio.scm (file->cpio-header): Compute the file name length in bytes rather than in
characters.
(file->cpio-header*, special-file->cpio-header*): Likewise.
(write-cpio-archive): Likewise, and write the file name as UTF-8 bytes, not
textually, to avoid encoding it as ISO-8859-1.

---

Changes in v2:
- Use string-utf8-length

 guix/cpio.scm | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/guix/cpio.scm b/guix/cpio.scm
index d4a7d5f1e0..876f61ea3c 100644
--- a/guix/cpio.scm
+++ b/guix/cpio.scm
@@ -170,7 +170,7 @@ (define* (file->cpio-header file #:optional (file-name file)
                       #:size (stat:size st)
                       #:dev (stat:dev st)
                       #:rdev (stat:rdev st)
-                      #:name-size (string-length file-name))))
+                      #:name-size (string-utf8-length file-name))))
 
 (define* (file->cpio-header* file
                              #:optional (file-name file)
@@ -182,7 +182,7 @@ (define* (file->cpio-header* file
     (make-cpio-header #:mode (stat:mode st)
                       #:nlink (stat:nlink st)
                       #:size (stat:size st)
-                      #:name-size (string-length file-name))))
+                      #:name-size (string-utf8-length file-name))))
 
 (define* (special-file->cpio-header* file
                                      device-type
@@ -201,7 +201,7 @@ (define* (special-file->cpio-header* file
                                     permission-bits)
                     #:nlink 1
                     #:rdev (device-number device-major device-minor)
-                    #:name-size (string-length file-name)))
+                    #:name-size (string-utf8-length file-name)))
 
 (define %trailer
   "TRAILER!!!")
@@ -237,7 +237,7 @@ (define (dump-file file)
 
       ;; We're padding the header + following file name + trailing zero, and
       ;; the header is 110 byte long.
-      (write-padding (+ 110 1 (string-length file)) port)
+      (write-padding (+ 110 (string-utf8-length file) 1) port)
 
       (case (mode->type (cpio-header-mode header))
         ((regular)
@@ -246,7 +246,7 @@ (define (dump-file file)
              (dump-port input port))))
         ((symlink)
          (let ((target (readlink file)))
-           (put-string port target)))
+           (put-bytevector port (string->utf8 target))))
         ((directory)
          #t)
         ((block-special)

base-commit: c756c62cfdba8d4079be1ba9e370779b850f16b6
-- 
2.39.1





^ permalink raw reply related	[flat|nested] 5+ messages in thread

* bug#61722: (guix cpio) produces corrupted archives when there are non-ASCII filenames
  2023-02-24 13:26 ` bug#61722: [PATCH v2] " Maxim Cournoyer
@ 2023-02-25 19:52   ` Maxim Cournoyer
  0 siblings, 0 replies; 5+ messages in thread
From: Maxim Cournoyer @ 2023-02-25 19:52 UTC (permalink / raw)
  To: 61722-done
  Cc: Josselin Poiret, Christopher Baines, Simon Tournier, mhw,
	Ludovic Courtès, Tobias Geerinckx-Rice, Ricardo Wurmus,
	Mathieu Othacehe

Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

> Fixes <https://issues.guix.gnu.org/61722>.
>
> * guix/cpio.scm (file->cpio-header): Compute the file name length in bytes rather than in
> characters.
> (file->cpio-header*, special-file->cpio-header*): Likewise.
> (write-cpio-archive): Likewise, and write the file name as UTF-8 bytes, not
> textually, to avoid encoding it as ISO-8859-1.

Pushed to master.

Closing.

-- 
Thanks,
Maxim




^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-02-25 19:53 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-23  3:14 bug#61722: (guix cpio) produces corrupted archives when there are non-ASCII filenames Maxim Cournoyer
2023-02-24  4:54 ` bug#61722: [PATCH] cpio: Properly handle Unicode characters in file names Maxim Cournoyer
2023-02-24 11:46   ` Mark H Weaver
2023-02-24 13:26 ` bug#61722: [PATCH v2] " Maxim Cournoyer
2023-02-25 19:52   ` bug#61722: (guix cpio) produces corrupted archives when there are non-ASCII filenames Maxim Cournoyer

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).