* bug#61722: (guix cpio) produces corrupted archives when there are non-ASCII filenames @ 2023-02-23 3:14 Maxim Cournoyer 2023-02-24 4:54 ` bug#61722: [PATCH] cpio: Properly handle Unicode characters in file names Maxim Cournoyer 2023-02-24 13:26 ` bug#61722: [PATCH v2] " Maxim Cournoyer 0 siblings, 2 replies; 5+ messages in thread From: Maxim Cournoyer @ 2023-02-23 3:14 UTC (permalink / raw) To: 61722 Hi, It appears that the code we have to generate CPIO archives doesn't handle the presence of non-ASCII characters in the file names of files to be archived well: First, to make rpm usable on a Guix System: --8<---------------cut here---------------start------------->8--- # mkdir /var/lib/rpm # chown root:users /var/lib/rpm # chmod g+rw /var/lib/rpm --8<---------------cut here---------------end--------------->8--- Then, produce a problematic CPIO via 'guix pack -f rpm', which uses (guix cpio): --8<---------------cut here---------------start------------->8--- $ rpm_archive=$(guix pack -R -C none -f rpm nss-certs) --8<---------------cut here---------------end--------------->8--- Notice that it cannot be installed: --8<---------------cut here---------------start------------->8--- $ mkdir /tmp/nss-certs # rpm --prefix=/tmp/nss-certs -i $rpm_archive error: unpacking of archive failed: cpio: Bad magic error: nss-certs-3.81-0.x86_64: install failed --8<---------------cut here---------------end--------------->8--- Let's now inspect the cpio itself. --8<---------------cut here---------------start------------->8--- $ guix shell rpm cpio [env]$ rpm2cpio $rpm_archive > nss-certs.cpio [env]$ cpio -t < nss-certs.cpio |& grep -B3 junk ./gnu/store/1klwvqm3njp070h982ydcix1gzf2zmdl-nss-certs-3.81/etc/ssl/certs/9482e63a.0 ./gnu/store/1klwvqm3njp070h982ydcix1gzf2zmdl-nss-certs-3.81/etc/ssl/certs/9846683b.0 ./gnu/store/1klwvqm3njp070h982ydcix1gzf2zmdl-nss-certs-3.81/etc/ssl/certs/988a38cb.0 cpio: warning: skipped 248 bytes of junk -- ./gnu/store/1klwvqm3njp070h982ydcix1gzf2zmdl-nss-certs-3.81/etc/ssl/certs/Microsoft_RSA_Root_Certificate_Authority_2017.pem ./gnu/store/1klwvqm3njp070h982ydcix1gzf2zmdl-nss-certs-3.81/etc/ssl/certs/NAVER_Global_Root_Certification_Authority.pem ./gnu/store/1klwvqm3njp070h982ydcix1gzf2zmdl-nss-certs-3.81/etc/ssl/certs/NetLock_Arany_=Class_Gold=_Főtanúsítvány. cpio: warning: skipped 4 bytes of junk --8<---------------cut here---------------end--------------->8--- I haven't yet pin-pointed what the problem is. I could do with extra eyes :-). -- Thanks, Maxim ^ permalink raw reply [flat|nested] 5+ messages in thread
* bug#61722: [PATCH] cpio: Properly handle Unicode characters in file names. 2023-02-23 3:14 bug#61722: (guix cpio) produces corrupted archives when there are non-ASCII filenames Maxim Cournoyer @ 2023-02-24 4:54 ` Maxim Cournoyer 2023-02-24 11:46 ` Mark H Weaver 2023-02-24 13:26 ` bug#61722: [PATCH v2] " Maxim Cournoyer 1 sibling, 1 reply; 5+ messages in thread From: Maxim Cournoyer @ 2023-02-24 4:54 UTC (permalink / raw) To: 61722 Cc: Josselin Poiret, Tobias Geerinckx-Rice, Maxim Cournoyer, Simon Tournier, Mathieu Othacehe, Ludovic Courtès, Christopher Baines, Ricardo Wurmus Fixes <https://issues.guix.gnu.org/61722>. * guix/cpio.scm (file->cpio-header): Compute the file name length in bytes rather than in characters. (file->cpio-header*, special-file->cpio-header*): Likewise. (write-cpio-archive): Likewise, and write the file name as UTF-8 bytes, not textually, to avoid encoding it as ISO-8859-1. --- guix/cpio.scm | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/guix/cpio.scm b/guix/cpio.scm index d4a7d5f1e0..8fd7552450 100644 --- a/guix/cpio.scm +++ b/guix/cpio.scm @@ -170,7 +170,8 @@ (define* (file->cpio-header file #:optional (file-name file) #:size (stat:size st) #:dev (stat:dev st) #:rdev (stat:rdev st) - #:name-size (string-length file-name)))) + #:name-size (bytevector-length + (string->utf8 file-name))))) (define* (file->cpio-header* file #:optional (file-name file) @@ -182,7 +183,8 @@ (define* (file->cpio-header* file (make-cpio-header #:mode (stat:mode st) #:nlink (stat:nlink st) #:size (stat:size st) - #:name-size (string-length file-name)))) + #:name-size (bytevector-length + (string->utf8 file-name))))) (define* (special-file->cpio-header* file device-type @@ -201,7 +203,8 @@ (define* (special-file->cpio-header* file permission-bits) #:nlink 1 #:rdev (device-number device-major device-minor) - #:name-size (string-length file-name))) + #:name-size (bytevector-length + (string->utf8 file-name)))) (define %trailer "TRAILER!!!") @@ -237,7 +240,7 @@ (define (dump-file file) ;; We're padding the header + following file name + trailing zero, and ;; the header is 110 byte long. - (write-padding (+ 110 1 (string-length file)) port) + (write-padding (+ 110 (bytevector-length (string->utf8 file)) 1) port) (case (mode->type (cpio-header-mode header)) ((regular) @@ -246,7 +249,7 @@ (define (dump-file file) (dump-port input port)))) ((symlink) (let ((target (readlink file))) - (put-string port target))) + (put-bytevector port (string->utf8 target)))) ((directory) #t) ((block-special) base-commit: c756c62cfdba8d4079be1ba9e370779b850f16b6 -- 2.39.1 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* bug#61722: [PATCH] cpio: Properly handle Unicode characters in file names. 2023-02-24 4:54 ` bug#61722: [PATCH] cpio: Properly handle Unicode characters in file names Maxim Cournoyer @ 2023-02-24 11:46 ` Mark H Weaver 0 siblings, 0 replies; 5+ messages in thread From: Mark H Weaver @ 2023-02-24 11:46 UTC (permalink / raw) To: Maxim Cournoyer, 61722 Cc: Josselin Poiret, Christopher Baines, Maxim Cournoyer, Simon Tournier, Mathieu Othacehe, Ludovic Courtès, Tobias Geerinckx-Rice, Ricardo Wurmus Hi Maxim, Maxim Cournoyer <maxim.cournoyer@gmail.com> writes: > Fixes <https://issues.guix.gnu.org/61722>. > > * guix/cpio.scm (file->cpio-header): Compute the file name length in bytes rather than in > characters. > (file->cpio-header*, special-file->cpio-header*): Likewise. > (write-cpio-archive): Likewise, and write the file name as UTF-8 bytes, not > textually, to avoid encoding it as ISO-8859-1. > > --- > > guix/cpio.scm | 13 ++++++++----- > 1 file changed, 8 insertions(+), 5 deletions(-) > > diff --git a/guix/cpio.scm b/guix/cpio.scm > index d4a7d5f1e0..8fd7552450 100644 > --- a/guix/cpio.scm > +++ b/guix/cpio.scm > @@ -170,7 +170,8 @@ (define* (file->cpio-header file #:optional (file-name file) > #:size (stat:size st) > #:dev (stat:dev st) > #:rdev (stat:rdev st) > - #:name-size (string-length file-name)))) > + #:name-size (bytevector-length > + (string->utf8 file-name))))) (string-utf8-length file-name) would produce the same result more efficiently. Regards, Mark ^ permalink raw reply [flat|nested] 5+ messages in thread
* bug#61722: [PATCH v2] cpio: Properly handle Unicode characters in file names. 2023-02-23 3:14 bug#61722: (guix cpio) produces corrupted archives when there are non-ASCII filenames Maxim Cournoyer 2023-02-24 4:54 ` bug#61722: [PATCH] cpio: Properly handle Unicode characters in file names Maxim Cournoyer @ 2023-02-24 13:26 ` Maxim Cournoyer 2023-02-25 19:52 ` bug#61722: (guix cpio) produces corrupted archives when there are non-ASCII filenames Maxim Cournoyer 1 sibling, 1 reply; 5+ messages in thread From: Maxim Cournoyer @ 2023-02-24 13:26 UTC (permalink / raw) To: 61722 Cc: Josselin Poiret, Tobias Geerinckx-Rice, Maxim Cournoyer, Simon Tournier, mhw, Ludovic Courtès, Christopher Baines, Ricardo Wurmus, Mathieu Othacehe Fixes <https://issues.guix.gnu.org/61722>. * guix/cpio.scm (file->cpio-header): Compute the file name length in bytes rather than in characters. (file->cpio-header*, special-file->cpio-header*): Likewise. (write-cpio-archive): Likewise, and write the file name as UTF-8 bytes, not textually, to avoid encoding it as ISO-8859-1. --- Changes in v2: - Use string-utf8-length guix/cpio.scm | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/guix/cpio.scm b/guix/cpio.scm index d4a7d5f1e0..876f61ea3c 100644 --- a/guix/cpio.scm +++ b/guix/cpio.scm @@ -170,7 +170,7 @@ (define* (file->cpio-header file #:optional (file-name file) #:size (stat:size st) #:dev (stat:dev st) #:rdev (stat:rdev st) - #:name-size (string-length file-name)))) + #:name-size (string-utf8-length file-name)))) (define* (file->cpio-header* file #:optional (file-name file) @@ -182,7 +182,7 @@ (define* (file->cpio-header* file (make-cpio-header #:mode (stat:mode st) #:nlink (stat:nlink st) #:size (stat:size st) - #:name-size (string-length file-name)))) + #:name-size (string-utf8-length file-name)))) (define* (special-file->cpio-header* file device-type @@ -201,7 +201,7 @@ (define* (special-file->cpio-header* file permission-bits) #:nlink 1 #:rdev (device-number device-major device-minor) - #:name-size (string-length file-name))) + #:name-size (string-utf8-length file-name))) (define %trailer "TRAILER!!!") @@ -237,7 +237,7 @@ (define (dump-file file) ;; We're padding the header + following file name + trailing zero, and ;; the header is 110 byte long. - (write-padding (+ 110 1 (string-length file)) port) + (write-padding (+ 110 (string-utf8-length file) 1) port) (case (mode->type (cpio-header-mode header)) ((regular) @@ -246,7 +246,7 @@ (define (dump-file file) (dump-port input port)))) ((symlink) (let ((target (readlink file))) - (put-string port target))) + (put-bytevector port (string->utf8 target)))) ((directory) #t) ((block-special) base-commit: c756c62cfdba8d4079be1ba9e370779b850f16b6 -- 2.39.1 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* bug#61722: (guix cpio) produces corrupted archives when there are non-ASCII filenames 2023-02-24 13:26 ` bug#61722: [PATCH v2] " Maxim Cournoyer @ 2023-02-25 19:52 ` Maxim Cournoyer 0 siblings, 0 replies; 5+ messages in thread From: Maxim Cournoyer @ 2023-02-25 19:52 UTC (permalink / raw) To: 61722-done Cc: Josselin Poiret, Christopher Baines, Simon Tournier, mhw, Ludovic Courtès, Tobias Geerinckx-Rice, Ricardo Wurmus, Mathieu Othacehe Hi, Maxim Cournoyer <maxim.cournoyer@gmail.com> writes: > Fixes <https://issues.guix.gnu.org/61722>. > > * guix/cpio.scm (file->cpio-header): Compute the file name length in bytes rather than in > characters. > (file->cpio-header*, special-file->cpio-header*): Likewise. > (write-cpio-archive): Likewise, and write the file name as UTF-8 bytes, not > textually, to avoid encoding it as ISO-8859-1. Pushed to master. Closing. -- Thanks, Maxim ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-02-25 19:53 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-02-23 3:14 bug#61722: (guix cpio) produces corrupted archives when there are non-ASCII filenames Maxim Cournoyer 2023-02-24 4:54 ` bug#61722: [PATCH] cpio: Properly handle Unicode characters in file names Maxim Cournoyer 2023-02-24 11:46 ` Mark H Weaver 2023-02-24 13:26 ` bug#61722: [PATCH v2] " Maxim Cournoyer 2023-02-25 19:52 ` bug#61722: (guix cpio) produces corrupted archives when there are non-ASCII filenames Maxim Cournoyer
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/guix.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).