unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
From: Maxim Cournoyer <maxim.cournoyer@gmail.com>
To: "Ludovic Courtès" <ludo@gnu.org>
Cc: 30116@debbugs.gnu.org
Subject: bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates)
Date: Mon, 22 Jan 2018 23:27:04 -0500	[thread overview]
Message-ID: <87k1w9ryhz.fsf@gmail.com> (raw)
In-Reply-To: <87o9lmp3c2.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Mon, 22 Jan 2018 11:58:37 +0100")

[-- Attachment #1: Type: text/plain, Size: 10550 bytes --]

ludo@gnu.org (Ludovic Courtès) writes:

> Mark H Weaver <mhw@netris.org> skribis:
>
>> Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:
>>
>>> ludo@gnu.org (Ludovic Courtès) writes:
>>>
>>>> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>>>>
>>>>> I've encountered the following crash when trying to use substitute on a
>>>>> file which contains NUL characters:
>>>>
>>>> Yes, that’s because Guile’s ‘regexp-exec’ simply wraps libc’s ‘regexec’,
>>>> which does not handle NULs.
>>>>
>>>> We should consider switching to the pure-Scheme SRFI-115:
>>>>
>>>>   https://srfi.schemers.org/srfi-115/srfi-115.html
>>>
>>> This looks good, and I started looking into porting `substitute' to it,
>>> but quickly noticed it doesn't seem to be implemented in Guile yet?
>
> ISTR that the reference implementation works fine on Guile.
>
>> Indeed.  SRFI-115 for Guile is on my TODO list, although it might be
>> better to wait until after we switch to using UTF-8 encoding internally
>> for strings, since that will drastically affect the implementation of
>> any efficient regexp matcher on Scheme strings.
>
> Indeed, though I suppose it doesn’t matter much for the cases where
> ‘substitute*’ is used?
>
>> Anyway, 'substitute*' is to be used only on text files, and NUL bytes
>> are not a valid textual character.  So, I think that this case is
>> outside of what 'substitute*' is meant to do, and therefore not a bug in
>> 'substitute*', although of course a more graceful error would surely be
>> preferable.
>
> Yes, that’s also a good point.
>
> So yeah, I think it may be good “eventually” to switch to SRFI-115, but
> that’s not urgent.

Sorry for taking some time to answer; I was puzzled by the fact that my
repro didn't work when ran from the REPL. It seems the problem only
occurs when run inside Guix's build environment, maybe a side effect
which depends on the locale used?

In the `patch-el-files' phase of the emacs-build-system, we find the
following snippet:

    (with-directory-excursion el-dir
      ;; Some old '.el' files (e.g., tex-buf.el in AUCTeX) are still encoded
      ;; with the "ISO-8859-1" locale.
      (unless (false-if-exception (substitute-cmd))
        (with-fluids ((%default-port-encoding "ISO-8859-1"))
          (substitute-cmd))))

In case an exception is returned while processing the file, it is
retried being opened with the "ISO-8859-1" encoding. Or, this resolves
to a call to `open-file', which documentation says:

‘b’
          Use binary mode, ensuring that each byte in the file will be
          read as one Scheme character.

          To provide this property, the file will be opened with the
          8-bit character encoding "ISO-8859-1", ignoring the default
          port encoding.  *Note Ports::, for more information on port
          encodings.

So, by opening an file whose encoding is unknown as a ISO-8859-1 file,
we are doing the same as if we had passed the 'binary option. Could this
explain why we end up with NUL characters where we were expecting text?

To validate this hypothesis, I've added the following test message to
the patch-el-files phase:

      (unless (false-if-exception (substitute-cmd))
        (format (current-error-port) ">>> IS THIS IT? <<<")
        (with-fluids ((%default-port-encoding "ISO-8859-1"))
          (substitute-cmd))))

And re-ran the emacs-realgud build (minus the patch working around this
issue), and this is what I got:

--8<---------------cut here---------------start------------->8---
starting phase `patch-el-files'
>>> IS THIS IT? <<<Backtrace:
          15 (primitive-load "/gnu/store/xdmirz24yxpxzqh8xj83z9h0axm…")
In ice-9/eval.scm:
   191:35 14 (_ _)
In srfi/srfi-1.scm:
   863:16 13 (every1 #<procedure 9a9540 at /gnu/store/mz8vs1cxv1z7y…> …)
In /gnu/store/mz8vs1cxv1z7yrc1awzgby61qnxd481p-module-import/guix/build/gnu-build-system.scm:
   684:27 12 (_ _)
In /gnu/store/mz8vs1cxv1z7yrc1awzgby61qnxd481p-module-import/guix/build/emacs-build-system.scm:
   117:10 11 (patch-el-files #:outputs _)
In srfi/srfi-1.scm:
    640:9 10 (for-each #<procedure substitute-one-file (file-name)> _)
In ice-9/boot-9.scm:
    849:4  9 (with-throw-handler _ _ _)
In ice-9/ports.scm:
   444:17  8 (call-with-input-file _ _ #:binary _ #:encoding _ # _)
In /gnu/store/mz8vs1cxv1z7yrc1awzgby61qnxd481p-module-import/guix/build/utils.scm:
   609:26  7 (_ _)
   635:26  6 (_ #<input: ./realgud/common/bp-image-data.el 17> #<inp…>)
In srfi/srfi-1.scm:
   466:18  5 (fold #<procedure 7ffff53d1520 at /gnu/store/mz8vs1cxv…> …)
In /gnu/store/mz8vs1cxv1z7yrc1awzgby61qnxd481p-module-import/guix/build/utils.scm:
   638:37  4 (_ _ "\"II*\x00(\x03\x00\x00ÿÿÿÿÿÿÿÿþÿ@@@@ÿÿÿÿ\x04\x04\…")
In ice-9/regex.scm:
   189:12  3 (list-matches _ _ _)
   177:19  2 (fold-matches _ "\"II*\x00(\x03\x00\x00ÿÿÿÿÿÿÿÿþÿ@@@@ÿ…" …)
In unknown file:
           1 (regexp-exec #<regexp 81e400> "\"II*\x00(\x03\x00\x00ÿ…" …)
In ice-9/boot-9.scm:
   760:25  0 (dispatch-exception _ _ _)

ice-9/boot-9.scm:760:25: In procedure dispatch-exception:
ice-9/boot-9.scm:760:25: string contains #\nul character: "\"II*\x00(\x03\x00\x00ÿÿÿÿÿÿÿÿþÿ@@@@ÿÿÿÿ\x04\x04\x04\x04ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x04\x04\x04\x04ÿÿÿÿBBBBÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿþÿ@@@@ÿÿÿÿ\x04\x04\x04\x04ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x04\x04\x04\x04ÿÿÿÿBBBBÿÿÿÿÿÿÿÿÿÿÿÿ\x04\x04\x04\x04ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x04\x04\x04\x04ÿÿÿÿBBBBÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x10\x10\x10\x10ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x10\x10\x10\x10ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x10\x10\x10\x10ÿÿÿÿ\x04\x04\x04\x04ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x04\x04\x04\x04ÿÿþÿ>>>>ÿÿþÿ<<<<ÿÿÿÿ\x04\x04\x04\x04ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x04\x04\x04\x04ÿÿþÿ>>>>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿþÿ<<<<ÿÿÿÿ\x04\x04\x04\x04ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x04\x04\x04\x04ÿÿþÿ>>>>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿþÿ<<<<ÿÿÿÿ\x0f\x0f\x0f\x0fÿÿÿÿ\x0f\x0f\x0f\x0fÿÿÿÿ\x0f\x0f\x0f\x0fÿÿþÿ>>>>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ\x14\x00\x00\x01\x03\x00\x01\x00\x00\x00\n"
builder for `/gnu/store/ar2j6kxz99s3s5wjs2z7ykiw75m9vv72-emacs-realgud-1.4.4.drv' failed with exit code 1
@ build-failed /gnu/store/ar2j6kxz99s3s5wjs2z7ykiw75m9vv72-emacs-realgud-1.4.4.drv - 1 builder for `/gnu/store/ar2j6kxz99s3s5wjs2z7ykiw75m9vv72-emacs-realgud-1.4.4.drv' failed with exit code 1
guix build: error: build failed: build of
`/gnu/store/ar2j6kxz99s3s5wjs2z7ykiw75m9vv72-emacs-realgud-1.4.4.drv'
failed
--8<---------------cut here---------------end--------------->8---

So it is indeed triggered by switching to the "ISO-8859-1" encoding
(although I still cannot reproduce this from the REPL?).

If I remove the exception guard like this:

--8<---------------cut here---------------start------------->8---
     (with-directory-excursion el-dir
       ;; Some old '.el' files (e.g., tex-buf.el in AUCTeX) are still encoded
       ;; with the "ISO-8859-1" locale.
-      (unless (false-if-exception (substitute-cmd))
-        (with-fluids ((%default-port-encoding "ISO-8859-1"))
-          (substitute-cmd))))
+      (substitute-cmd))
     #t))
--8<---------------cut here---------------end--------------->8---

the exception thrown on the first substitute* call is this:

--8<---------------cut here---------------start------------->8---
starting phase `patch-el-files'
Backtrace:
          12 (primitive-load "/gnu/store/dvyyqxfr08fsr18k2f43gakh23d…")
In ice-9/eval.scm:
   191:35 11 (_ _)
In srfi/srfi-1.scm:
   863:16 10 (every1 #<procedure 9a96a0 at /gnu/store/xn6p33hhfyz6l…> …)
In /gnu/store/xn6p33hhfyz6l5j9jd9qpnblp9ajnb9k-module-import/guix/build/gnu-build-system.scm:
   684:27  9 (_ _)
In /gnu/store/xn6p33hhfyz6l5j9jd9qpnblp9ajnb9k-module-import/guix/build/emacs-build-system.scm:
   104:27  8 (patch-el-files #:outputs _)
In srfi/srfi-1.scm:
    640:9  7 (for-each #<procedure substitute-one-file (file-name)> _)
In ice-9/boot-9.scm:
    849:4  6 (with-throw-handler _ _ _)
In ice-9/ports.scm:
   444:17  5 (call-with-input-file _ _ #:binary _ #:encoding _ # _)
In /gnu/store/xn6p33hhfyz6l5j9jd9qpnblp9ajnb9k-module-import/guix/build/utils.scm:
   609:26  4 (_ _)
   645:22  3 (_ #<input: ./realgud/common/bp-image-data.el 15> #<inp…>)
In ice-9/rdelim.scm:
   195:24  2 (read-line _ _)
In unknown file:
           1 (%read-line #<input: ./realgud/common/bp-image-data.el …>)
In ice-9/boot-9.scm:
   760:25  0 (dispatch-exception _ _ _)

ice-9/boot-9.scm:760:25: In procedure dispatch-exception:
ice-9/boot-9.scm:760:25: Throw to key `decoding-error' with args
`("peek-char" "input decoding error" 84 #<input:
./realgud/common/bp-image-data.el 15>)'.
--8<---------------cut here---------------end--------------->8---

Should we keep my workaround for now? It seems there are valid cases to
have the file opened as "ISO-8859-1", but this can mean introducing
binary symbols such as NUL in the data (thus regexp crashes). When we
finally move to srfi-115, we should remove this workaround.

WDYT?

Here's an updated patch with Ludovic's suggestion:


[-- Attachment #2: 0001-utils-Prevent-substitute-from-crashing-on-files-cont.patch --]
[-- Type: text/x-patch, Size: 3708 bytes --]

From 573ecb3570355c47aa18c091c0a193e7d90a6949 Mon Sep 17 00:00:00 2001
From: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Date: Sun, 14 Jan 2018 20:31:33 -0500
Subject: [PATCH] utils: Prevent substitute from crashing on files containing
 NUL chars.

Fixes issue #30116.

* guix/build/utils.scm (substitute): Add condition to skip lines containing
the NUL character.
---
 guix/build/utils.scm | 46 ++++++++++++++++++++++++++++------------------
 1 file changed, 28 insertions(+), 18 deletions(-)

diff --git a/guix/build/utils.scm b/guix/build/utils.scm
index 7391307c8..2a37dba06 100644
--- a/guix/build/utils.scm
+++ b/guix/build/utils.scm
@@ -3,6 +3,7 @@
 ;;; Copyright © 2013 Andreas Enge <andreas@enge.fr>
 ;;; Copyright © 2013 Nikita Karetnikov <nikita@karetnikov.org>
 ;;; Copyright © 2015 Mark H Weaver <mhw@netris.org>
+;;; Copyright © 2018 Maxim Cournoyer <maxim.cournoyer@gmail.com>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -621,28 +622,37 @@ PROC as (PROC LINE MATCHES); PROC must return the line that will be written as
 a substitution of the original line.  Be careful about using '$' to match the
 end of a line; by itself it won't match the terminating newline of a line."
   (let ((rx+proc  (map (match-lambda
-                        (((? regexp? pattern) . proc)
-                         (cons pattern proc))
-                        ((pattern . proc)
-                         (cons (make-regexp pattern regexp/extended)
-                               proc)))
+                         (((? regexp? pattern) . proc)
+                          (cons pattern proc))
+                         ((pattern . proc)
+                          (cons (make-regexp pattern regexp/extended)
+                                proc)))
                        pattern+procs)))
     (with-atomic-file-replacement file
       (lambda (in out)
         (let loop ((line (read-line in 'concat)))
-          (if (eof-object? line)
-              #t
-              (let ((line (fold (lambda (r+p line)
-                                  (match r+p
-                                    ((regexp . proc)
-                                     (match (list-matches regexp line)
-                                       ((and m+ (_ _ ...))
-                                        (proc line m+))
-                                       (_ line)))))
-                                line
-                                rx+proc)))
-                (display line out)
-                (loop (read-line in 'concat)))))))))
+          (cond
+           ((eof-object? line)
+            #t)
+           ((string-index line #\nul)
+            ;; The regexp functions of the GNU C library (which Guile uses)
+            ;; cannot deal with NUL characters, so skip to the next line.
+            ;; TODO: Port to srfi-115, once we have it implemented in Guile.
+            (format (current-error-port)
+                    "skipping line with NUL characters: ~s\n" line)
+            (loop (read-line in 'concat)))
+           (else
+            (let ((line (fold (lambda (r+p line)
+                                (match r+p
+                                  ((regexp . proc)
+                                   (match (list-matches regexp line)
+                                     ((and m+ (_ _ ...))
+                                      (proc line m+))
+                                     (_ line)))))
+                              line
+                              rx+proc)))
+              (display line out)
+              (loop (read-line in 'concat))))))))))
 
 
 (define-syntax let-matches
-- 
2.16.0


[-- Attachment #3: Type: text/plain, Size: 7 bytes --]


Maxim

  reply	other threads:[~2018-01-23  4:28 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-15  1:27 bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates) Maxim Cournoyer
     [not found] ` <handler.30116.B.15159796942311.ack@debbugs.gnu.org>
2018-01-15  1:38   ` bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates)) Maxim Cournoyer
2018-01-17 14:37     ` Ludovic Courtès
2018-06-14  1:40       ` Maxim Cournoyer
2018-06-14  8:01         ` Ludovic Courtès
2018-06-14  7:02     ` Mark H Weaver
2018-06-14  8:02       ` Ludovic Courtès
2018-06-16 16:47       ` Maxim Cournoyer
2018-06-17  4:36         ` Mark H Weaver
2021-01-08 19:14           ` bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates) Maxim Cournoyer
2021-01-08 21:42             ` Mark H Weaver
2018-01-16 11:23 ` Ludovic Courtès
2018-01-21  4:24   ` Maxim Cournoyer
2018-01-21 18:17     ` Mark H Weaver
2018-01-22 10:58       ` Ludovic Courtès
2018-01-23  4:27         ` Maxim Cournoyer [this message]
2018-01-23 14:11           ` Ludovic Courtès
2018-01-25  5:11             ` Maxim Cournoyer
2018-01-25 11:11               ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k1w9ryhz.fsf@gmail.com \
    --to=maxim.cournoyer@gmail.com \
    --cc=30116@debbugs.gnu.org \
    --cc=ludo@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).