unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates)
@ 2018-01-15  1:27 Maxim Cournoyer
       [not found] ` <handler.30116.B.15159796942311.ack@debbugs.gnu.org>
  2018-01-16 11:23 ` Ludovic Courtès
  0 siblings, 2 replies; 19+ messages in thread
From: Maxim Cournoyer @ 2018-01-15  1:27 UTC (permalink / raw)
  To: 30116

Hello,

I've encountered the following crash when trying to use substitute on a
file which contains NUL characters:

--8<---------------cut here---------------start------------->8---
(define problematic-file "/tmp/bp-image-data.el")
scheme@(guix build utils)> ,m (guix build utils)
scheme@(guix build utils)> (substitute* problematic-file
			     (("toto") "tata"))
ice-9/boot-9.scm:752:25: In procedure dispatch-exception:
string contains #\nul character: "\"II*\x00(\x03\x00\x00����������@@@@����\x04\x04\x04\x04����\x01\x01\x01\x01����\x01\x01\x01\x01����\x01\x01\x01\x01����\x04\x04\x04\x04����BBBB��������������������@@@@����\x04\x04\x04\x04����\x01\x01\x01\x01����\x01\x01\x01\x01����\x01\x01\x01\x01����\x01\x01\x01\x01����\x01\x01\x01\x01����\x04\x04\x04\x04����BBBB������������\x04\x04\x04\x04����\x01\x01\x01\x01����\x01\x01\x01\x01����\x01\x01\x01\x01����\x01\x01\x01\x01����\x01\x01\x01\x01����\x01\x01\x01\x01����\x01\x01\x01\x01����\x04\x04\x04\x04����BBBB����\x01\x01\x01\x01����\x01\x01\x01\x01����\x01\x01\x01\x01����\x01\x01\x01\x01����\x01\x01\x01\x01����\x01\x01\x01\x01����\x01\x01\x01\x01����\x01\x01\x01\x01����\x01\x01\x01\x01����\x10\x10\x10\x10����\x01\x01\x01\x01����\x01\x01\x01\x01����\x01\x01\x01\x01����\x01\x01\x01\x01����\x01\x01\x01\x01����\x01\x01\x01\x01����\x01\x01\x01\x01����\x01\x01\x01\x01����\x01\x01\x01\x01����\x10\x10\x10\x10����\x01\x01\x01\x01����\x01\x01\x01\x01����\x01\x01\x01\x01����\x01\x01\x01\x01����\x01\x01\x01\x01����\x01\x01\x01\x01����\x01\x01\x01\x01����\x01\x01\x01\x01����\x01\x01\x01\x01����\x10\x10\x10\x10����\x04\x04\x04\x04����\x01\x01\x01\x01����\x01\x01\x01\x01����\x01\x01\x01\x01����\x01\x01\x01\x01����\x01\x01\x01\x01����\x01\x01\x01\x01����\x01\x01\x01\x01����\x04\x04\x04\x04����>>>>����<<<<����\x04\x04\x04\x04����\x01\x01\x01\x01����\x01\x01\x01\x01����\x01\x01\x01\x01����\x01\x01\x01\x01����\x01\x01\x01\x01����\x04\x04\x04\x04����>>>>��������������������<<<<����\x04\x04\x04\x04����\x01\x01\x01\x01����\x01\x01\x01\x01����\x01\x01\x01\x01����\x04\x04\x04\x04����>>>>������������������������������������<<<<����\x0f\x0f\x0f\x0f����\x0f\x0f\x0f\x0f����\x0f\x0f\x0f\x0f����>>>>��������������������������\x14\x00\x00\x01\x03\x00\x01\x00\x00\x00\n"

Entering a new prompt.  Type `,bt' for a backtrace or `,q' to continue.
scheme@(guix build utils) [1]> ,bt
In ice-9/boot-9.scm:
    841:4  9 (with-throw-handler _ _ _)
In ice-9/ports.scm:
   444:17  8 (call-with-input-file _ _ #:binary _ #:encoding _ #:guess-encoding _)
In guix/build/utils.scm:
   609:26  7 (_ _)
   635:26  6 (_ #<input: /tmp/bp-image-data.el 14> #<input-output: /tmp/bp-image-data.el.qVytzo 13>)
In srfi/srfi-1.scm:
   466:18  5 (fold #<procedure 7f29b8929520 at guix/build/utils.scm:635:32 (r+p line)> "\"II*\x00(\x03\x00\x00���…" …)
In guix/build/utils.scm:
   638:37  4 (_ _ "\"II*\x00(\x03\x00\x00����������@@@@����\x04\x04\x04\x04����\x01\x01\x01\x01����\x01\x01\x01\x0…")
In ice-9/regex.scm:
   189:12  3 (list-matches _ _ _)
   177:19  2 (fold-matches _ "\"II*\x00(\x03\x00\x00����������@@@@����\x04\x04\x04\x04����\x01\x01\x01\x01����\x0…" …)
In unknown file:
           1 (regexp-exec #<regexp 51f3bc0> "\"II*\x00(\x03\x00\x00����������@@@@����\x04\x04\x04\x04����\x01\x01…" …)
In ice-9/boot-9.scm:
   752:25  0 (dispatch-exception _ _ _)
--8<---------------cut here---------------end--------------->8---

That file comes from emacs-realgud, which I'm attempting to package:
https://github.com/realgud/realgud/blob/master/realgud/common/bp-image-data.el.

This was discovered when the patch-el-files phase of the
emacs-build-system crashed as above when it called substitute*.

Patch to follow.

Maxim

^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates))
       [not found] ` <handler.30116.B.15159796942311.ack@debbugs.gnu.org>
@ 2018-01-15  1:38   ` Maxim Cournoyer
  2018-01-17 14:37     ` Ludovic Courtès
  2018-06-14  7:02     ` Mark H Weaver
  0 siblings, 2 replies; 19+ messages in thread
From: Maxim Cournoyer @ 2018-01-15  1:38 UTC (permalink / raw)
  To: 30116

[-- Attachment #1: 0001-utils-Prevent-substitute-from-crashing-on-files-cont.patch --]
[-- Type: text/x-patch, Size: 3604 bytes --]

From 9891e428eae0ed24e0d61862b3f5e298606b79eb Mon Sep 17 00:00:00 2001
From: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Date: Sun, 14 Jan 2018 20:31:33 -0500
Subject: [PATCH] utils: Prevent substitute from crashing on files containing
 NUL chars.

Fixes issue #30116.

* guix/build/utils.scm (substitute): Add condition to skip lines containing
the NUL character.
---
 guix/build/utils.scm | 44 ++++++++++++++++++++++++++------------------
 1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/guix/build/utils.scm b/guix/build/utils.scm
index 7391307c8..975f4e70a 100644
--- a/guix/build/utils.scm
+++ b/guix/build/utils.scm
@@ -3,6 +3,7 @@
 ;;; Copyright © 2013 Andreas Enge <andreas@enge.fr>
 ;;; Copyright © 2013 Nikita Karetnikov <nikita@karetnikov.org>
 ;;; Copyright © 2015 Mark H Weaver <mhw@netris.org>
+;;; Copyright © 2018 Maxim Cournoyer <maxim.cournoyer@gmail.com>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -621,28 +622,35 @@ PROC as (PROC LINE MATCHES); PROC must return the line that will be written as
 a substitution of the original line.  Be careful about using '$' to match the
 end of a line; by itself it won't match the terminating newline of a line."
   (let ((rx+proc  (map (match-lambda
-                        (((? regexp? pattern) . proc)
-                         (cons pattern proc))
-                        ((pattern . proc)
-                         (cons (make-regexp pattern regexp/extended)
-                               proc)))
+                         (((? regexp? pattern) . proc)
+                          (cons pattern proc))
+                         ((pattern . proc)
+                          (cons (make-regexp pattern regexp/extended)
+                                proc)))
                        pattern+procs)))
     (with-atomic-file-replacement file
       (lambda (in out)
         (let loop ((line (read-line in 'concat)))
-          (if (eof-object? line)
-              #t
-              (let ((line (fold (lambda (r+p line)
-                                  (match r+p
-                                    ((regexp . proc)
-                                     (match (list-matches regexp line)
-                                       ((and m+ (_ _ ...))
-                                        (proc line m+))
-                                       (_ line)))))
-                                line
-                                rx+proc)))
-                (display line out)
-                (loop (read-line in 'concat)))))))))
+          (cond
+           ((eof-object? line)
+            #t)
+           ((string-contains line (make-string 1 #\nul))
+            ;; The regexp functions of the GNU C library (which Guile uses)
+            ;; cannot deal with NUL characters, so skip to the next line.
+            (format #t "skipping line with NUL characters: ~s\n" line)
+            (loop (read-line in 'concat)))
+           (else
+            (let ((line (fold (lambda (r+p line)
+                                (match r+p
+                                  ((regexp . proc)
+                                   (match (list-matches regexp line)
+                                     ((and m+ (_ _ ...))
+                                      (proc line m+))
+                                     (_ line)))))
+                              line
+                              rx+proc)))
+              (display line out)
+              (loop (read-line in 'concat))))))))))
 
 
 (define-syntax let-matches
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates)
  2018-01-15  1:27 bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates) Maxim Cournoyer
       [not found] ` <handler.30116.B.15159796942311.ack@debbugs.gnu.org>
@ 2018-01-16 11:23 ` Ludovic Courtès
  2018-01-21  4:24   ` Maxim Cournoyer
  1 sibling, 1 reply; 19+ messages in thread
From: Ludovic Courtès @ 2018-01-16 11:23 UTC (permalink / raw)
  To: Maxim Cournoyer; +Cc: 30116

Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

> I've encountered the following crash when trying to use substitute on a
> file which contains NUL characters:

Yes, that’s because Guile’s ‘regexp-exec’ simply wraps libc’s ‘regexec’,
which does not handle NULs.

We should consider switching to the pure-Scheme SRFI-115:

  https://srfi.schemers.org/srfi-115/srfi-115.html

Ludo’.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates))
  2018-01-15  1:38   ` bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates)) Maxim Cournoyer
@ 2018-01-17 14:37     ` Ludovic Courtès
  2018-06-14  1:40       ` Maxim Cournoyer
  2018-06-14  7:02     ` Mark H Weaver
  1 sibling, 1 reply; 19+ messages in thread
From: Ludovic Courtès @ 2018-01-17 14:37 UTC (permalink / raw)
  To: Maxim Cournoyer; +Cc: 30116

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

> From 9891e428eae0ed24e0d61862b3f5e298606b79eb Mon Sep 17 00:00:00 2001
> From: Maxim Cournoyer <maxim.cournoyer@gmail.com>
> Date: Sun, 14 Jan 2018 20:31:33 -0500
> Subject: [PATCH] utils: Prevent substitute from crashing on files containing
>  NUL chars.
>
> Fixes issue #30116.
>
> * guix/build/utils.scm (substitute): Add condition to skip lines containing
> the NUL character.

[...]

> +           ((string-contains line (make-string 1 #\nul))

Rather (string-index line #\nul).

> +            ;; The regexp functions of the GNU C library (which Guile uses)
> +            ;; cannot deal with NUL characters, so skip to the next line.
> +            (format #t "skipping line with NUL characters: ~s\n" line)
> +            (loop (read-line in 'concat)))

Rather (format (current-error-port) …).

It’s strange semantics, but it’s probably better than crashing in the
contexts where we use it.

Otherwise LGTM.  This would have to go to the next ‘core-updates’ (or
‘core-updates-next’ in the meantime.)

Thanks!

Ludo’.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates)
  2018-01-16 11:23 ` Ludovic Courtès
@ 2018-01-21  4:24   ` Maxim Cournoyer
  2018-01-21 18:17     ` Mark H Weaver
  0 siblings, 1 reply; 19+ messages in thread
From: Maxim Cournoyer @ 2018-01-21  4:24 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 30116

ludo@gnu.org (Ludovic Courtès) writes:

> Hi,
>
> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>
>> I've encountered the following crash when trying to use substitute on a
>> file which contains NUL characters:
>
> Yes, that’s because Guile’s ‘regexp-exec’ simply wraps libc’s ‘regexec’,
> which does not handle NULs.
>
> We should consider switching to the pure-Scheme SRFI-115:
>
>   https://srfi.schemers.org/srfi-115/srfi-115.html

This looks good, and I started looking into porting `substitute' to it,
but quickly noticed it doesn't seem to be implemented in Guile yet?

Thanks,

Maxim

^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates)
  2018-01-21  4:24   ` Maxim Cournoyer
@ 2018-01-21 18:17     ` Mark H Weaver
  2018-01-22 10:58       ` Ludovic Courtès
  0 siblings, 1 reply; 19+ messages in thread
From: Mark H Weaver @ 2018-01-21 18:17 UTC (permalink / raw)
  To: Maxim Cournoyer; +Cc: 30116

Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

> ludo@gnu.org (Ludovic Courtès) writes:
>
>> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>>
>>> I've encountered the following crash when trying to use substitute on a
>>> file which contains NUL characters:
>>
>> Yes, that’s because Guile’s ‘regexp-exec’ simply wraps libc’s ‘regexec’,
>> which does not handle NULs.
>>
>> We should consider switching to the pure-Scheme SRFI-115:
>>
>>   https://srfi.schemers.org/srfi-115/srfi-115.html
>
> This looks good, and I started looking into porting `substitute' to it,
> but quickly noticed it doesn't seem to be implemented in Guile yet?

Indeed.  SRFI-115 for Guile is on my TODO list, although it might be
better to wait until after we switch to using UTF-8 encoding internally
for strings, since that will drastically affect the implementation of
any efficient regexp matcher on Scheme strings.

Anyway, 'substitute*' is to be used only on text files, and NUL bytes
are not a valid textual character.  So, I think that this case is
outside of what 'substitute*' is meant to do, and therefore not a bug in
'substitute*', although of course a more graceful error would surely be
preferable.

What do you think?

      Mark

^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates)
  2018-01-21 18:17     ` Mark H Weaver
@ 2018-01-22 10:58       ` Ludovic Courtès
  2018-01-23  4:27         ` Maxim Cournoyer
  0 siblings, 1 reply; 19+ messages in thread
From: Ludovic Courtès @ 2018-01-22 10:58 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: 30116, Maxim Cournoyer

Mark H Weaver <mhw@netris.org> skribis:

> Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:
>
>> ludo@gnu.org (Ludovic Courtès) writes:
>>
>>> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>>>
>>>> I've encountered the following crash when trying to use substitute on a
>>>> file which contains NUL characters:
>>>
>>> Yes, that’s because Guile’s ‘regexp-exec’ simply wraps libc’s ‘regexec’,
>>> which does not handle NULs.
>>>
>>> We should consider switching to the pure-Scheme SRFI-115:
>>>
>>>   https://srfi.schemers.org/srfi-115/srfi-115.html
>>
>> This looks good, and I started looking into porting `substitute' to it,
>> but quickly noticed it doesn't seem to be implemented in Guile yet?

ISTR that the reference implementation works fine on Guile.

> Indeed.  SRFI-115 for Guile is on my TODO list, although it might be
> better to wait until after we switch to using UTF-8 encoding internally
> for strings, since that will drastically affect the implementation of
> any efficient regexp matcher on Scheme strings.

Indeed, though I suppose it doesn’t matter much for the cases where
‘substitute*’ is used?

> Anyway, 'substitute*' is to be used only on text files, and NUL bytes
> are not a valid textual character.  So, I think that this case is
> outside of what 'substitute*' is meant to do, and therefore not a bug in
> 'substitute*', although of course a more graceful error would surely be
> preferable.

Yes, that’s also a good point.

So yeah, I think it may be good “eventually” to switch to SRFI-115, but
that’s not urgent.

Thoughts?

Ludo’.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates)
  2018-01-22 10:58       ` Ludovic Courtès
@ 2018-01-23  4:27         ` Maxim Cournoyer
  2018-01-23 14:11           ` Ludovic Courtès
  0 siblings, 1 reply; 19+ messages in thread
From: Maxim Cournoyer @ 2018-01-23  4:27 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 30116

[-- Attachment #1: Type: text/plain, Size: 10550 bytes --]

ludo@gnu.org (Ludovic Courtès) writes:

> Mark H Weaver <mhw@netris.org> skribis:
>
>> Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:
>>
>>> ludo@gnu.org (Ludovic Courtès) writes:
>>>
>>>> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>>>>
>>>>> I've encountered the following crash when trying to use substitute on a
>>>>> file which contains NUL characters:
>>>>
>>>> Yes, that’s because Guile’s ‘regexp-exec’ simply wraps libc’s ‘regexec’,
>>>> which does not handle NULs.
>>>>
>>>> We should consider switching to the pure-Scheme SRFI-115:
>>>>
>>>>   https://srfi.schemers.org/srfi-115/srfi-115.html
>>>
>>> This looks good, and I started looking into porting `substitute' to it,
>>> but quickly noticed it doesn't seem to be implemented in Guile yet?
>
> ISTR that the reference implementation works fine on Guile.
>
>> Indeed.  SRFI-115 for Guile is on my TODO list, although it might be
>> better to wait until after we switch to using UTF-8 encoding internally
>> for strings, since that will drastically affect the implementation of
>> any efficient regexp matcher on Scheme strings.
>
> Indeed, though I suppose it doesn’t matter much for the cases where
> ‘substitute*’ is used?
>
>> Anyway, 'substitute*' is to be used only on text files, and NUL bytes
>> are not a valid textual character.  So, I think that this case is
>> outside of what 'substitute*' is meant to do, and therefore not a bug in
>> 'substitute*', although of course a more graceful error would surely be
>> preferable.
>
> Yes, that’s also a good point.
>
> So yeah, I think it may be good “eventually” to switch to SRFI-115, but
> that’s not urgent.

Sorry for taking some time to answer; I was puzzled by the fact that my
repro didn't work when ran from the REPL. It seems the problem only
occurs when run inside Guix's build environment, maybe a side effect
which depends on the locale used?

In the `patch-el-files' phase of the emacs-build-system, we find the
following snippet:

    (with-directory-excursion el-dir
      ;; Some old '.el' files (e.g., tex-buf.el in AUCTeX) are still encoded
      ;; with the "ISO-8859-1" locale.
      (unless (false-if-exception (substitute-cmd))
        (with-fluids ((%default-port-encoding "ISO-8859-1"))
          (substitute-cmd))))

In case an exception is returned while processing the file, it is
retried being opened with the "ISO-8859-1" encoding. Or, this resolves
to a call to `open-file', which documentation says:

‘b’
          Use binary mode, ensuring that each byte in the file will be
          read as one Scheme character.

          To provide this property, the file will be opened with the
          8-bit character encoding "ISO-8859-1", ignoring the default
          port encoding.  *Note Ports::, for more information on port
          encodings.

So, by opening an file whose encoding is unknown as a ISO-8859-1 file,
we are doing the same as if we had passed the 'binary option. Could this
explain why we end up with NUL characters where we were expecting text?

To validate this hypothesis, I've added the following test message to
the patch-el-files phase:

      (unless (false-if-exception (substitute-cmd))
        (format (current-error-port) ">>> IS THIS IT? <<<")
        (with-fluids ((%default-port-encoding "ISO-8859-1"))
          (substitute-cmd))))

And re-ran the emacs-realgud build (minus the patch working around this
issue), and this is what I got:

--8<---------------cut here---------------start------------->8---
starting phase `patch-el-files'
>>> IS THIS IT? <<<Backtrace:
          15 (primitive-load "/gnu/store/xdmirz24yxpxzqh8xj83z9h0axm…")
In ice-9/eval.scm:
   191:35 14 (_ _)
In srfi/srfi-1.scm:
   863:16 13 (every1 #<procedure 9a9540 at /gnu/store/mz8vs1cxv1z7y…> …)
In /gnu/store/mz8vs1cxv1z7yrc1awzgby61qnxd481p-module-import/guix/build/gnu-build-system.scm:
   684:27 12 (_ _)
In /gnu/store/mz8vs1cxv1z7yrc1awzgby61qnxd481p-module-import/guix/build/emacs-build-system.scm:
   117:10 11 (patch-el-files #:outputs _)
In srfi/srfi-1.scm:
    640:9 10 (for-each #<procedure substitute-one-file (file-name)> _)
In ice-9/boot-9.scm:
    849:4  9 (with-throw-handler _ _ _)
In ice-9/ports.scm:
   444:17  8 (call-with-input-file _ _ #:binary _ #:encoding _ # _)
In /gnu/store/mz8vs1cxv1z7yrc1awzgby61qnxd481p-module-import/guix/build/utils.scm:
   609:26  7 (_ _)
   635:26  6 (_ #<input: ./realgud/common/bp-image-data.el 17> #<inp…>)
In srfi/srfi-1.scm:
   466:18  5 (fold #<procedure 7ffff53d1520 at /gnu/store/mz8vs1cxv…> …)
In /gnu/store/mz8vs1cxv1z7yrc1awzgby61qnxd481p-module-import/guix/build/utils.scm:
   638:37  4 (_ _ "\"II*\x00(\x03\x00\x00ÿÿÿÿÿÿÿÿþÿ@@@@ÿÿÿÿ\x04\x04\…")
In ice-9/regex.scm:
   189:12  3 (list-matches _ _ _)
   177:19  2 (fold-matches _ "\"II*\x00(\x03\x00\x00ÿÿÿÿÿÿÿÿþÿ@@@@ÿ…" …)
In unknown file:
           1 (regexp-exec #<regexp 81e400> "\"II*\x00(\x03\x00\x00ÿ…" …)
In ice-9/boot-9.scm:
   760:25  0 (dispatch-exception _ _ _)

ice-9/boot-9.scm:760:25: In procedure dispatch-exception:
ice-9/boot-9.scm:760:25: string contains #\nul character: "\"II*\x00(\x03\x00\x00ÿÿÿÿÿÿÿÿþÿ@@@@ÿÿÿÿ\x04\x04\x04\x04ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x04\x04\x04\x04ÿÿÿÿBBBBÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿþÿ@@@@ÿÿÿÿ\x04\x04\x04\x04ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x04\x04\x04\x04ÿÿÿÿBBBBÿÿÿÿÿÿÿÿÿÿÿÿ\x04\x04\x04\x04ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x04\x04\x04\x04ÿÿÿÿBBBBÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x10\x10\x10\x10ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x10\x10\x10\x10ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x10\x10\x10\x10ÿÿÿÿ\x04\x04\x04\x04ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x04\x04\x04\x04ÿÿþÿ>>>>ÿÿþÿ<<<<ÿÿÿÿ\x04\x04\x04\x04ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x04\x04\x04\x04ÿÿþÿ>>>>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿþÿ<<<<ÿÿÿÿ\x04\x04\x04\x04ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x04\x04\x04\x04ÿÿþÿ>>>>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿþÿ<<<<ÿÿÿÿ\x0f\x0f\x0f\x0fÿÿÿÿ\x0f\x0f\x0f\x0fÿÿÿÿ\x0f\x0f\x0f\x0fÿÿþÿ>>>>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ\x14\x00\x00\x01\x03\x00\x01\x00\x00\x00\n"
builder for `/gnu/store/ar2j6kxz99s3s5wjs2z7ykiw75m9vv72-emacs-realgud-1.4.4.drv' failed with exit code 1
@ build-failed /gnu/store/ar2j6kxz99s3s5wjs2z7ykiw75m9vv72-emacs-realgud-1.4.4.drv - 1 builder for `/gnu/store/ar2j6kxz99s3s5wjs2z7ykiw75m9vv72-emacs-realgud-1.4.4.drv' failed with exit code 1
guix build: error: build failed: build of
`/gnu/store/ar2j6kxz99s3s5wjs2z7ykiw75m9vv72-emacs-realgud-1.4.4.drv'
failed
--8<---------------cut here---------------end--------------->8---

So it is indeed triggered by switching to the "ISO-8859-1" encoding
(although I still cannot reproduce this from the REPL?).

If I remove the exception guard like this:

--8<---------------cut here---------------start------------->8---
     (with-directory-excursion el-dir
       ;; Some old '.el' files (e.g., tex-buf.el in AUCTeX) are still encoded
       ;; with the "ISO-8859-1" locale.
-      (unless (false-if-exception (substitute-cmd))
-        (with-fluids ((%default-port-encoding "ISO-8859-1"))
-          (substitute-cmd))))
+      (substitute-cmd))
     #t))
--8<---------------cut here---------------end--------------->8---

the exception thrown on the first substitute* call is this:

--8<---------------cut here---------------start------------->8---
starting phase `patch-el-files'
Backtrace:
          12 (primitive-load "/gnu/store/dvyyqxfr08fsr18k2f43gakh23d…")
In ice-9/eval.scm:
   191:35 11 (_ _)
In srfi/srfi-1.scm:
   863:16 10 (every1 #<procedure 9a96a0 at /gnu/store/xn6p33hhfyz6l…> …)
In /gnu/store/xn6p33hhfyz6l5j9jd9qpnblp9ajnb9k-module-import/guix/build/gnu-build-system.scm:
   684:27  9 (_ _)
In /gnu/store/xn6p33hhfyz6l5j9jd9qpnblp9ajnb9k-module-import/guix/build/emacs-build-system.scm:
   104:27  8 (patch-el-files #:outputs _)
In srfi/srfi-1.scm:
    640:9  7 (for-each #<procedure substitute-one-file (file-name)> _)
In ice-9/boot-9.scm:
    849:4  6 (with-throw-handler _ _ _)
In ice-9/ports.scm:
   444:17  5 (call-with-input-file _ _ #:binary _ #:encoding _ # _)
In /gnu/store/xn6p33hhfyz6l5j9jd9qpnblp9ajnb9k-module-import/guix/build/utils.scm:
   609:26  4 (_ _)
   645:22  3 (_ #<input: ./realgud/common/bp-image-data.el 15> #<inp…>)
In ice-9/rdelim.scm:
   195:24  2 (read-line _ _)
In unknown file:
           1 (%read-line #<input: ./realgud/common/bp-image-data.el …>)
In ice-9/boot-9.scm:
   760:25  0 (dispatch-exception _ _ _)

ice-9/boot-9.scm:760:25: In procedure dispatch-exception:
ice-9/boot-9.scm:760:25: Throw to key `decoding-error' with args
`("peek-char" "input decoding error" 84 #<input:
./realgud/common/bp-image-data.el 15>)'.
--8<---------------cut here---------------end--------------->8---

Should we keep my workaround for now? It seems there are valid cases to
have the file opened as "ISO-8859-1", but this can mean introducing
binary symbols such as NUL in the data (thus regexp crashes). When we
finally move to srfi-115, we should remove this workaround.

WDYT?

Here's an updated patch with Ludovic's suggestion:


[-- Attachment #2: 0001-utils-Prevent-substitute-from-crashing-on-files-cont.patch --]
[-- Type: text/x-patch, Size: 3708 bytes --]

From 573ecb3570355c47aa18c091c0a193e7d90a6949 Mon Sep 17 00:00:00 2001
From: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Date: Sun, 14 Jan 2018 20:31:33 -0500
Subject: [PATCH] utils: Prevent substitute from crashing on files containing
 NUL chars.

Fixes issue #30116.

* guix/build/utils.scm (substitute): Add condition to skip lines containing
the NUL character.
---
 guix/build/utils.scm | 46 ++++++++++++++++++++++++++++------------------
 1 file changed, 28 insertions(+), 18 deletions(-)

diff --git a/guix/build/utils.scm b/guix/build/utils.scm
index 7391307c8..2a37dba06 100644
--- a/guix/build/utils.scm
+++ b/guix/build/utils.scm
@@ -3,6 +3,7 @@
 ;;; Copyright © 2013 Andreas Enge <andreas@enge.fr>
 ;;; Copyright © 2013 Nikita Karetnikov <nikita@karetnikov.org>
 ;;; Copyright © 2015 Mark H Weaver <mhw@netris.org>
+;;; Copyright © 2018 Maxim Cournoyer <maxim.cournoyer@gmail.com>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -621,28 +622,37 @@ PROC as (PROC LINE MATCHES); PROC must return the line that will be written as
 a substitution of the original line.  Be careful about using '$' to match the
 end of a line; by itself it won't match the terminating newline of a line."
   (let ((rx+proc  (map (match-lambda
-                        (((? regexp? pattern) . proc)
-                         (cons pattern proc))
-                        ((pattern . proc)
-                         (cons (make-regexp pattern regexp/extended)
-                               proc)))
+                         (((? regexp? pattern) . proc)
+                          (cons pattern proc))
+                         ((pattern . proc)
+                          (cons (make-regexp pattern regexp/extended)
+                                proc)))
                        pattern+procs)))
     (with-atomic-file-replacement file
       (lambda (in out)
         (let loop ((line (read-line in 'concat)))
-          (if (eof-object? line)
-              #t
-              (let ((line (fold (lambda (r+p line)
-                                  (match r+p
-                                    ((regexp . proc)
-                                     (match (list-matches regexp line)
-                                       ((and m+ (_ _ ...))
-                                        (proc line m+))
-                                       (_ line)))))
-                                line
-                                rx+proc)))
-                (display line out)
-                (loop (read-line in 'concat)))))))))
+          (cond
+           ((eof-object? line)
+            #t)
+           ((string-index line #\nul)
+            ;; The regexp functions of the GNU C library (which Guile uses)
+            ;; cannot deal with NUL characters, so skip to the next line.
+            ;; TODO: Port to srfi-115, once we have it implemented in Guile.
+            (format (current-error-port)
+                    "skipping line with NUL characters: ~s\n" line)
+            (loop (read-line in 'concat)))
+           (else
+            (let ((line (fold (lambda (r+p line)
+                                (match r+p
+                                  ((regexp . proc)
+                                   (match (list-matches regexp line)
+                                     ((and m+ (_ _ ...))
+                                      (proc line m+))
+                                     (_ line)))))
+                              line
+                              rx+proc)))
+              (display line out)
+              (loop (read-line in 'concat))))))))))
 
 
 (define-syntax let-matches
-- 
2.16.0


[-- Attachment #3: Type: text/plain, Size: 7 bytes --]


Maxim

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates)
  2018-01-23  4:27         ` Maxim Cournoyer
@ 2018-01-23 14:11           ` Ludovic Courtès
  2018-01-25  5:11             ` Maxim Cournoyer
  0 siblings, 1 reply; 19+ messages in thread
From: Ludovic Courtès @ 2018-01-23 14:11 UTC (permalink / raw)
  To: Maxim Cournoyer; +Cc: 30116

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

> In the `patch-el-files' phase of the emacs-build-system, we find the
> following snippet:
>
>     (with-directory-excursion el-dir
>       ;; Some old '.el' files (e.g., tex-buf.el in AUCTeX) are still encoded
>       ;; with the "ISO-8859-1" locale.
>       (unless (false-if-exception (substitute-cmd))
>         (with-fluids ((%default-port-encoding "ISO-8859-1"))
>           (substitute-cmd))))
>
> In case an exception is returned while processing the file, it is
> retried being opened with the "ISO-8859-1" encoding. Or, this resolves
> to a call to `open-file', which documentation says:
>
> ‘b’
>           Use binary mode, ensuring that each byte in the file will be
>           read as one Scheme character.
>
>           To provide this property, the file will be opened with the
>           8-bit character encoding "ISO-8859-1", ignoring the default
>           port encoding.  *Note Ports::, for more information on port
>           encodings.
>
> So, by opening an file whose encoding is unknown as a ISO-8859-1 file,
> we are doing the same as if we had passed the 'binary option. Could this
> explain why we end up with NUL characters where we were expecting text?

That could be the reason.  Guile provides a way to honor Emacs-style
‘encoding’ declarations, and ‘call-with-input-file’ does that if we pass
#:guess-encoding #t (info "(guile) Character Encoding of Source Files").

Did the faulty file have such a declaration?

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates)
  2018-01-23 14:11           ` Ludovic Courtès
@ 2018-01-25  5:11             ` Maxim Cournoyer
  2018-01-25 11:11               ` Ludovic Courtès
  0 siblings, 1 reply; 19+ messages in thread
From: Maxim Cournoyer @ 2018-01-25  5:11 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 30116

ludo@gnu.org (Ludovic Courtès) writes:

> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>
>> In the `patch-el-files' phase of the emacs-build-system, we find the
>> following snippet:
>>
>>     (with-directory-excursion el-dir
>>       ;; Some old '.el' files (e.g., tex-buf.el in AUCTeX) are still encoded
>>       ;; with the "ISO-8859-1" locale.
>>       (unless (false-if-exception (substitute-cmd))
>>         (with-fluids ((%default-port-encoding "ISO-8859-1"))
>>           (substitute-cmd))))
>>
>> In case an exception is returned while processing the file, it is
>> retried being opened with the "ISO-8859-1" encoding. Or, this resolves
>> to a call to `open-file', which documentation says:
>>
>> ‘b’
>>           Use binary mode, ensuring that each byte in the file will be
>>           read as one Scheme character.
>>
>>           To provide this property, the file will be opened with the
>>           8-bit character encoding "ISO-8859-1", ignoring the default
>>           port encoding.  *Note Ports::, for more information on port
>>           encodings.
>>
>> So, by opening an file whose encoding is unknown as a ISO-8859-1 file,
>> we are doing the same as if we had passed the 'binary option. Could this
>> explain why we end up with NUL characters where we were expecting text?
>
> That could be the reason.  Guile provides a way to honor Emacs-style
> ‘encoding’ declarations, and ‘call-with-input-file’ does that if we pass
> #:guess-encoding #t (info "(guile) Character Encoding of Source Files").
>
> Did the faulty file have such a declaration?

Sadly, it doesn't. Although even if it did, I don't think it would be
very robust to expect every misbehaving files we might encounter to
include one!

So I think we should apply my v2 patch to core-updates for now (see my
previous reply on this thread), until we have our substitute routine
implemented using srfi-115!

Thanks,

Maxim

^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates)
  2018-01-25  5:11             ` Maxim Cournoyer
@ 2018-01-25 11:11               ` Ludovic Courtès
  0 siblings, 0 replies; 19+ messages in thread
From: Ludovic Courtès @ 2018-01-25 11:11 UTC (permalink / raw)
  To: Maxim Cournoyer; +Cc: 30116

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

> ludo@gnu.org (Ludovic Courtès) writes:
>
>> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>>
>>> In the `patch-el-files' phase of the emacs-build-system, we find the
>>> following snippet:
>>>
>>>     (with-directory-excursion el-dir
>>>       ;; Some old '.el' files (e.g., tex-buf.el in AUCTeX) are still encoded
>>>       ;; with the "ISO-8859-1" locale.
>>>       (unless (false-if-exception (substitute-cmd))
>>>         (with-fluids ((%default-port-encoding "ISO-8859-1"))
>>>           (substitute-cmd))))
>>>
>>> In case an exception is returned while processing the file, it is
>>> retried being opened with the "ISO-8859-1" encoding. Or, this resolves
>>> to a call to `open-file', which documentation says:
>>>
>>> ‘b’
>>>           Use binary mode, ensuring that each byte in the file will be
>>>           read as one Scheme character.
>>>
>>>           To provide this property, the file will be opened with the
>>>           8-bit character encoding "ISO-8859-1", ignoring the default
>>>           port encoding.  *Note Ports::, for more information on port
>>>           encodings.
>>>
>>> So, by opening an file whose encoding is unknown as a ISO-8859-1 file,
>>> we are doing the same as if we had passed the 'binary option. Could this
>>> explain why we end up with NUL characters where we were expecting text?
>>
>> That could be the reason.  Guile provides a way to honor Emacs-style
>> ‘encoding’ declarations, and ‘call-with-input-file’ does that if we pass
>> #:guess-encoding #t (info "(guile) Character Encoding of Source Files").
>>
>> Did the faulty file have such a declaration?
>
> Sadly, it doesn't. Although even if it did, I don't think it would be
> very robust to expect every misbehaving files we might encounter to
> include one!

Sure, I was asking just because it’s an Emacs-related package.

> So I think we should apply my v2 patch to core-updates for now (see my
> previous reply on this thread), until we have our substitute routine
> implemented using srfi-115!

Sounds good!  Note that I’ll wait until after the current ‘core-updates’
has been merged.  Please do ping me if you think I’ve forgotten!

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates))
  2018-01-17 14:37     ` Ludovic Courtès
@ 2018-06-14  1:40       ` Maxim Cournoyer
  2018-06-14  8:01         ` Ludovic Courtès
  0 siblings, 1 reply; 19+ messages in thread
From: Maxim Cournoyer @ 2018-06-14  1:40 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 30116

ludo@gnu.org (Ludovic Courtès) writes:

> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>
>> From 9891e428eae0ed24e0d61862b3f5e298606b79eb Mon Sep 17 00:00:00 2001
>> From: Maxim Cournoyer <maxim.cournoyer@gmail.com>
>> Date: Sun, 14 Jan 2018 20:31:33 -0500
>> Subject: [PATCH] utils: Prevent substitute from crashing on files containing
>>  NUL chars.
>>
>> Fixes issue #30116.
>>
>> * guix/build/utils.scm (substitute): Add condition to skip lines containing
>> the NUL character.
>
> [...]
>
>> +           ((string-contains line (make-string 1 #\nul))
>
> Rather (string-index line #\nul).
>
>> +            ;; The regexp functions of the GNU C library (which Guile uses)
>> +            ;; cannot deal with NUL characters, so skip to the next line.
>> +            (format #t "skipping line with NUL characters: ~s\n" line)
>> +            (loop (read-line in 'concat)))
>
> Rather (format (current-error-port) …).
>
> It’s strange semantics, but it’s probably better than crashing in the
> contexts where we use it.
>
> Otherwise LGTM.  This would have to go to the next ‘core-updates’ (or
> ‘core-updates-next’ in the meantime.)
>
> Thanks!
>
> Ludo’.

Ping. Is it the right time to merge this?

Maxim

^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates))
  2018-01-15  1:38   ` bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates)) Maxim Cournoyer
  2018-01-17 14:37     ` Ludovic Courtès
@ 2018-06-14  7:02     ` Mark H Weaver
  2018-06-14  8:02       ` Ludovic Courtès
  2018-06-16 16:47       ` Maxim Cournoyer
  1 sibling, 2 replies; 19+ messages in thread
From: Mark H Weaver @ 2018-06-14  7:02 UTC (permalink / raw)
  To: Maxim Cournoyer; +Cc: 30116

Hi Maxim,

Thanks for working on this.  I found a problem with this patch,
and I also have a suggestion.  Please see below.

Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

> From 9891e428eae0ed24e0d61862b3f5e298606b79eb Mon Sep 17 00:00:00 2001
> From: Maxim Cournoyer <maxim.cournoyer@gmail.com>
> Date: Sun, 14 Jan 2018 20:31:33 -0500
> Subject: [PATCH] utils: Prevent substitute from crashing on files containing
>  NUL chars.
>
> Fixes issue #30116.
>
> * guix/build/utils.scm (substitute): Add condition to skip lines containing
> the NUL character.
> ---
>  guix/build/utils.scm | 44 ++++++++++++++++++++++++++------------------
>  1 file changed, 26 insertions(+), 18 deletions(-)
>
> diff --git a/guix/build/utils.scm b/guix/build/utils.scm
> index 7391307c8..975f4e70a 100644
> --- a/guix/build/utils.scm
> +++ b/guix/build/utils.scm
> @@ -3,6 +3,7 @@
>  ;;; Copyright © 2013 Andreas Enge <andreas@enge.fr>
>  ;;; Copyright © 2013 Nikita Karetnikov <nikita@karetnikov.org>
>  ;;; Copyright © 2015 Mark H Weaver <mhw@netris.org>
> +;;; Copyright © 2018 Maxim Cournoyer <maxim.cournoyer@gmail.com>
>  ;;;
>  ;;; This file is part of GNU Guix.
>  ;;;
> @@ -621,28 +622,35 @@ PROC as (PROC LINE MATCHES); PROC must return the line that will be written as
>  a substitution of the original line.  Be careful about using '$' to match the
>  end of a line; by itself it won't match the terminating newline of a line."
>    (let ((rx+proc  (map (match-lambda
> -                        (((? regexp? pattern) . proc)
> -                         (cons pattern proc))
> -                        ((pattern . proc)
> -                         (cons (make-regexp pattern regexp/extended)
> -                               proc)))
> +                         (((? regexp? pattern) . proc)
> +                          (cons pattern proc))
> +                         ((pattern . proc)
> +                          (cons (make-regexp pattern regexp/extended)
> +                                proc)))
>                         pattern+procs)))
>      (with-atomic-file-replacement file
>        (lambda (in out)
>          (let loop ((line (read-line in 'concat)))
> -          (if (eof-object? line)
> -              #t
> -              (let ((line (fold (lambda (r+p line)
> -                                  (match r+p
> -                                    ((regexp . proc)
> -                                     (match (list-matches regexp line)
> -                                       ((and m+ (_ _ ...))
> -                                        (proc line m+))
> -                                       (_ line)))))
> -                                line
> -                                rx+proc)))
> -                (display line out)
> -                (loop (read-line in 'concat)))))))))
> +          (cond
> +           ((eof-object? line)
> +            #t)
> +           ((string-contains line (make-string 1 #\nul))
> +            ;; The regexp functions of the GNU C library (which Guile uses)
> +            ;; cannot deal with NUL characters, so skip to the next line.
> +            (format #t "skipping line with NUL characters: ~s\n" line)
> +            (loop (read-line in 'concat)))

This code will unconditionally *delete* all lines that contain NULs.

This will happen because the lines with NULs are not being written to
the output file, which will replace the original file when this loop
reaches EOF.  So, any lines that are not copied to the output will be
lost.

To preserve the lines with NULs, you should call (display line out)
before calling 'loop'.

Also, please use (string-index line #\nul) to check for NULs instead of
'string-contains'.  It should be more efficient.

> +           (else
> +            (let ((line (fold (lambda (r+p line)
> +                                (match r+p
> +                                  ((regexp . proc)
> +                                   (match (list-matches regexp line)
> +                                     ((and m+ (_ _ ...))
> +                                      (proc line m+))
> +                                     (_ line)))))
> +                              line
> +                              rx+proc)))
> +              (display line out)
> +              (loop (read-line in 'concat))))))))))

With the changes suggested above, I would have no objection to pushing
this to core-updates.  However, it occurs to me that we could handle the
NUL case in a better way:

Since the C regex functions that we use cannot handle NUL bytes, we
could use a different code point to represent NUL during those
operations.  We could choose a code point from one of the Unicode
Private Use Areas <https://en.wikipedia.org/wiki/Private_Use_Areas> that
does not occur in the string.

Let NUL* be the code point which will represent NUL bytes.  First
replace all NULs with NUL*s, then perform the substitutions, and finally
replace all ALT*s with NULs before writing to the output.

As an important optimization, we should avoid performing these extra
operations unless (string-index line #\nul) finds a NUL.

We could then perform these extra substitutions with simple, inefficient
code.  Maybe something like this (untested):

--8<---------------cut here---------------start------------->8---
    (with-atomic-file-replacement file
      (lambda (in out)
        (let loop ((line (read-line in 'concat)))
          (if (eof-object? line)
              #t
              (let* ((nul* (or (and (string-index line #\nul)
                                    (unused-private-use-code-point line))
                               #\nul))
                     (line* (replace-char #\nul nul* line))
                     (line1* (fold (lambda (r+p line)
                                     (match r+p
                                       ((regexp . proc)
                                        (match (list-matches regexp line)
                                          ((and m+ (_ _ ...))
                                           (proc line m+))
                                          (_ line)))))
                                   line*
                                   rx+proc))
                     (line1 (replace-char nul* #\nul line1*)))
                (display line1 out)
                (loop (read-line in 'concat)))))))))
--8<---------------cut here---------------end--------------->8---

Where the following additional private procedures would be added to
(guix build utils) above the definition for 'substitute':

--8<---------------cut here---------------start------------->8---
(define (unused-private-use-code-point s)
  "Find a code point within a Unicode Private Use Area that is not
present in S, and return the corresponding character object.  If one
cannot be found, return false."
  (define (scan lo hi)
    (and (<= lo hi)
         (let ((c (integer->char lo)))
           (if (string-index s c)
               (scan (+ lo 1) hi)
               c))))
  (or (scan   #xE000   #xF8FF)
      (scan  #xF0000  #xFFFFD)
      (scan #x100000 #x10FFFD)))

(define (replace-char c1 c2 s)
  "Return a string which is equal to S except with all instances of C1
replaced by C2.  If C1 and C2 are equal, return S."
  (if (char=? c1 c2)
      s
      (string-map (lambda (c)
                    (if (char=? c c1)
                        c2
                        c))
                  s)))
--8<---------------cut here---------------end--------------->8---

What do you think?

      Mark

^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates))
  2018-06-14  1:40       ` Maxim Cournoyer
@ 2018-06-14  8:01         ` Ludovic Courtès
  0 siblings, 0 replies; 19+ messages in thread
From: Ludovic Courtès @ 2018-06-14  8:01 UTC (permalink / raw)
  To: Maxim Cournoyer; +Cc: 30116

Hello Maxim,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

> ludo@gnu.org (Ludovic Courtès) writes:

[...]

>> Otherwise LGTM.  This would have to go to the next ‘core-updates’ (or
>> ‘core-updates-next’ in the meantime.)
>>
>> Thanks!
>>
>> Ludo’.
>
> Ping. Is it the right time to merge this?

Yes you can push it to ‘core-updates’ now.  Thank you!

Ludo’.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates))
  2018-06-14  7:02     ` Mark H Weaver
@ 2018-06-14  8:02       ` Ludovic Courtès
  2018-06-16 16:47       ` Maxim Cournoyer
  1 sibling, 0 replies; 19+ messages in thread
From: Ludovic Courtès @ 2018-06-14  8:02 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: 30116, Maxim Cournoyer

Mark H Weaver <mhw@netris.org> skribis:

> Thanks for working on this.  I found a problem with this patch,
> and I also have a suggestion.  Please see below.

I hadn’t seen Mark’s reply, which raises valid concerns.  Please dismiss
the message I just sent, Maxim.

Ludo’.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates))
  2018-06-14  7:02     ` Mark H Weaver
  2018-06-14  8:02       ` Ludovic Courtès
@ 2018-06-16 16:47       ` Maxim Cournoyer
  2018-06-17  4:36         ` Mark H Weaver
  1 sibling, 1 reply; 19+ messages in thread
From: Maxim Cournoyer @ 2018-06-16 16:47 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: 30116

Hi Mark,

Mark H Weaver <mhw@netris.org> writes:

> Hi Maxim,
>
> Thanks for working on this.  I found a problem with this patch,
> and I also have a suggestion.  Please see below.
>
> Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:
>
>> From 9891e428eae0ed24e0d61862b3f5e298606b79eb Mon Sep 17 00:00:00 2001
>> From: Maxim Cournoyer <maxim.cournoyer@gmail.com>
>> Date: Sun, 14 Jan 2018 20:31:33 -0500
>> Subject: [PATCH] utils: Prevent substitute from crashing on files containing
>>  NUL chars.
>>
>> Fixes issue #30116.
>>
>> * guix/build/utils.scm (substitute): Add condition to skip lines containing
>> the NUL character.
>> ---
>>  guix/build/utils.scm | 44 ++++++++++++++++++++++++++------------------
>>  1 file changed, 26 insertions(+), 18 deletions(-)
>>
>> diff --git a/guix/build/utils.scm b/guix/build/utils.scm
>> index 7391307c8..975f4e70a 100644
>> --- a/guix/build/utils.scm
>> +++ b/guix/build/utils.scm
>> @@ -3,6 +3,7 @@
>>  ;;; Copyright © 2013 Andreas Enge <andreas@enge.fr>
>>  ;;; Copyright © 2013 Nikita Karetnikov <nikita@karetnikov.org>
>>  ;;; Copyright © 2015 Mark H Weaver <mhw@netris.org>
>> +;;; Copyright © 2018 Maxim Cournoyer <maxim.cournoyer@gmail.com>
>>  ;;;
>>  ;;; This file is part of GNU Guix.
>>  ;;;
>> @@ -621,28 +622,35 @@ PROC as (PROC LINE MATCHES); PROC must return the line that will be written as
>>  a substitution of the original line.  Be careful about using '$' to match the
>>  end of a line; by itself it won't match the terminating newline of a line."
>>    (let ((rx+proc  (map (match-lambda
>> -                        (((? regexp? pattern) . proc)
>> -                         (cons pattern proc))
>> -                        ((pattern . proc)
>> -                         (cons (make-regexp pattern regexp/extended)
>> -                               proc)))
>> +                         (((? regexp? pattern) . proc)
>> +                          (cons pattern proc))
>> +                         ((pattern . proc)
>> +                          (cons (make-regexp pattern regexp/extended)
>> +                                proc)))
>>                         pattern+procs)))
>>      (with-atomic-file-replacement file
>>        (lambda (in out)
>>          (let loop ((line (read-line in 'concat)))
>> -          (if (eof-object? line)
>> -              #t
>> -              (let ((line (fold (lambda (r+p line)
>> -                                  (match r+p
>> -                                    ((regexp . proc)
>> -                                     (match (list-matches regexp line)
>> -                                       ((and m+ (_ _ ...))
>> -                                        (proc line m+))
>> -                                       (_ line)))))
>> -                                line
>> -                                rx+proc)))
>> -                (display line out)
>> -                (loop (read-line in 'concat)))))))))
>> +          (cond
>> +           ((eof-object? line)
>> +            #t)
>> +           ((string-contains line (make-string 1 #\nul))
>> +            ;; The regexp functions of the GNU C library (which Guile uses)
>> +            ;; cannot deal with NUL characters, so skip to the next line.
>> +            (format #t "skipping line with NUL characters: ~s\n" line)
>> +            (loop (read-line in 'concat)))
>
> This code will unconditionally *delete* all lines that contain NULs.
>
> This will happen because the lines with NULs are not being written to
> the output file, which will replace the original file when this loop
> reaches EOF.  So, any lines that are not copied to the output will be
> lost.
>
> To preserve the lines with NULs, you should call (display line out)
> before calling 'loop'.

Good observation! I agree that we should keep limit the effect of
ignoring NULs only to the substitution.

> Also, please use (string-index line #\nul) to check for NULs instead of
> 'string-contains'.  It should be more efficient.

OK!

>
>> +           (else
>> +            (let ((line (fold (lambda (r+p line)
>> +                                (match r+p
>> +                                  ((regexp . proc)
>> +                                   (match (list-matches regexp line)
>> +                                     ((and m+ (_ _ ...))
>> +                                      (proc line m+))
>> +                                     (_ line)))))
>> +                              line
>> +                              rx+proc)))
>> +              (display line out)
>> +              (loop (read-line in 'concat))))))))))
>
> With the changes suggested above, I would have no objection to pushing
> this to core-updates.  However, it occurs to me that we could handle the
> NUL case in a better way:
>
> Since the C regex functions that we use cannot handle NUL bytes, we
> could use a different code point to represent NUL during those
> operations.  We could choose a code point from one of the Unicode
> Private Use Areas <https://en.wikipedia.org/wiki/Private_Use_Areas> that
> does not occur in the string.
>
> Let NUL* be the code point which will represent NUL bytes.  First
> replace all NULs with NUL*s, then perform the substitutions, and finally
> replace all ALT*s with NULs before writing to the output.

Do I understand this transformation as NULs -> NUL*s and back from NUL*s
-> NULs correctly? I'm not sure how NUL*s became ALT*s in your explanation.

> As an important optimization, we should avoid performing these extra
> operations unless (string-index line #\nul) finds a NUL.

OK.

> We could then perform these extra substitutions with simple, inefficient
> code.  Maybe something like this (untested):
>
>     (with-atomic-file-replacement file
>       (lambda (in out)
>         (let loop ((line (read-line in 'concat)))
>           (if (eof-object? line)
>               #t
>               (let* ((nul* (or (and (string-index line #\nul)
>                                     (unused-private-use-code-point line))
>                                #\nul))
>                      (line* (replace-char #\nul nul* line))
>                      (line1* (fold (lambda (r+p line)
>                                      (match r+p
>                                        ((regexp . proc)
>                                         (match (list-matches regexp line)
>                                           ((and m+ (_ _ ...))
>                                            (proc line m+))
>                                           (_ line)))))
>                                    line*
>                                    rx+proc))
>                      (line1 (replace-char nul* #\nul line1*)))
>                 (display line1 out)
>                 (loop (read-line in 'concat)))))))))
>
>
> Where the following additional private procedures would be added to
> (guix build utils) above the definition for 'substitute':
>
> (define (unused-private-use-code-point s)
>   "Find a code point within a Unicode Private Use Area that is not
> present in S, and return the corresponding character object.  If one
> cannot be found, return false."
>   (define (scan lo hi)
>     (and (<= lo hi)
>          (let ((c (integer->char lo)))
>            (if (string-index s c)
>                (scan (+ lo 1) hi)
>                c))))
>   (or (scan   #xE000   #xF8FF)
>       (scan  #xF0000  #xFFFFD)
>       (scan #x100000 #x10FFFD)))
>
> (define (replace-char c1 c2 s)
>   "Return a string which is equal to S except with all instances of C1
> replaced by C2.  If C1 and C2 are equal, return S."
>   (if (char=? c1 c2)
>       s
>       (string-map (lambda (c)
>                     (if (char=? c c1)
>                         c2
>                         c))
>                   s)))
>
> What do you think?

It raises the complexity level a bit for something which doesn't seem to
be a very common scenario, but otherwise seems a very elegant
workaround. It seems to me that your implementation is already pretty
complete. I'll try write a test for validating it and report back.

Thank you for sharing your ideas!

Maxim

^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates))
  2018-06-16 16:47       ` Maxim Cournoyer
@ 2018-06-17  4:36         ` Mark H Weaver
  2021-01-08 19:14           ` bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates) Maxim Cournoyer
  0 siblings, 1 reply; 19+ messages in thread
From: Mark H Weaver @ 2018-06-17  4:36 UTC (permalink / raw)
  To: Maxim Cournoyer; +Cc: 30116

Hi Maxim,

Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

> Mark H Weaver <mhw@netris.org> writes:
>
>> With the changes suggested above, I would have no objection to pushing
>> this to core-updates.  However, it occurs to me that we could handle the
>> NUL case in a better way:
>>
>> Since the C regex functions that we use cannot handle NUL bytes, we
>> could use a different code point to represent NUL during those
>> operations.  We could choose a code point from one of the Unicode
>> Private Use Areas <https://en.wikipedia.org/wiki/Private_Use_Areas> that
>> does not occur in the string.
>>
>> Let NUL* be the code point which will represent NUL bytes.  First
>> replace all NULs with NUL*s, then perform the substitutions, and finally
>> replace all ALT*s with NULs before writing to the output.
>
> Do I understand this transformation as NULs -> NUL*s and back from NUL*s
> -> NULs correctly? I'm not sure how NUL*s became ALT*s in your explanation.

Sorry, it's a typo.  Where I wrote "ALT*s", I meant to write "NUL*s".

>> What do you think?
>
> It raises the complexity level a bit for something which doesn't seem to
> be a very common scenario,

FWIW, I agree that it's not a common scenario, and it's not entirely
clear that it was worth the time I spent on it, or the added complexity.
On the other hand, I would dislike having a basic API like 'substitute*'
be subtly broken in this way.

> but otherwise seems a very elegant
> workaround. It seems to me that your implementation is already pretty
> complete. I'll try write a test for validating it and report back.

Sounds good.  Thank you!

      Mark

^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates)
  2018-06-17  4:36         ` Mark H Weaver
@ 2021-01-08 19:14           ` Maxim Cournoyer
  2021-01-08 21:42             ` Mark H Weaver
  0 siblings, 1 reply; 19+ messages in thread
From: Maxim Cournoyer @ 2021-01-08 19:14 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: 30116-done

Hello Mark,

I was recently reminded of this bug by a new encounter; at last wrote a
test for your proposed fix, and it appear to work as intended!  I've
committed it on your behalf in commit 485ac28235 on the core-updates
branch.

Closing!

Thank you for the clever hack :-)

Maxim




^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates)
  2021-01-08 19:14           ` bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates) Maxim Cournoyer
@ 2021-01-08 21:42             ` Mark H Weaver
  0 siblings, 0 replies; 19+ messages in thread
From: Mark H Weaver @ 2021-01-08 21:42 UTC (permalink / raw)
  To: Maxim Cournoyer; +Cc: 30116-done

Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:
> I was recently reminded of this bug by a new encounter; at last wrote a
> test for your proposed fix, and it appear to work as intended!  I've
> committed it on your behalf in commit 485ac28235 on the core-updates
> branch.

Thanks for taking care of this Maxim, and for adding the test case.

    Regards,
      Mark




^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-01-08 21:44 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-15  1:27 bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates) Maxim Cournoyer
     [not found] ` <handler.30116.B.15159796942311.ack@debbugs.gnu.org>
2018-01-15  1:38   ` bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates)) Maxim Cournoyer
2018-01-17 14:37     ` Ludovic Courtès
2018-06-14  1:40       ` Maxim Cournoyer
2018-06-14  8:01         ` Ludovic Courtès
2018-06-14  7:02     ` Mark H Weaver
2018-06-14  8:02       ` Ludovic Courtès
2018-06-16 16:47       ` Maxim Cournoyer
2018-06-17  4:36         ` Mark H Weaver
2021-01-08 19:14           ` bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates) Maxim Cournoyer
2021-01-08 21:42             ` Mark H Weaver
2018-01-16 11:23 ` Ludovic Courtès
2018-01-21  4:24   ` Maxim Cournoyer
2018-01-21 18:17     ` Mark H Weaver
2018-01-22 10:58       ` Ludovic Courtès
2018-01-23  4:27         ` Maxim Cournoyer
2018-01-23 14:11           ` Ludovic Courtès
2018-01-25  5:11             ` Maxim Cournoyer
2018-01-25 11:11               ` Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).