From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mark H Weaver Subject: bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates)) Date: Thu, 14 Jun 2018 03:02:47 -0400 Message-ID: <87r2l9q294.fsf@netris.org> References: <87r2qrc3mq.fsf@gmail.com> <87k1wjc35d.fsf_-_@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:50602) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fTMJd-0003Wm-CZ for bug-guix@gnu.org; Thu, 14 Jun 2018 03:05:07 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fTMJa-0005gC-4J for bug-guix@gnu.org; Thu, 14 Jun 2018 03:05:05 -0400 Received: from debbugs.gnu.org ([208.118.235.43]:39370) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fTMJa-0005g6-0Y for bug-guix@gnu.org; Thu, 14 Jun 2018 03:05:02 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1fTMJZ-0005SV-Ih for bug-guix@gnu.org; Thu, 14 Jun 2018 03:05:01 -0400 Sender: "Debbugs-submit" Resent-Message-ID: In-Reply-To: <87k1wjc35d.fsf_-_@gmail.com> (Maxim Cournoyer's message of "Sun, 14 Jan 2018 20:38:22 -0500") List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: "bug-Guix" To: Maxim Cournoyer Cc: 30116@debbugs.gnu.org Hi Maxim, Thanks for working on this. I found a problem with this patch, and I also have a suggestion. Please see below. Maxim Cournoyer writes: > From 9891e428eae0ed24e0d61862b3f5e298606b79eb Mon Sep 17 00:00:00 2001 > From: Maxim Cournoyer > Date: Sun, 14 Jan 2018 20:31:33 -0500 > Subject: [PATCH] utils: Prevent substitute from crashing on files contain= ing > NUL chars. > > Fixes issue #30116. > > * guix/build/utils.scm (substitute): Add condition to skip lines containi= ng > the NUL character. > --- > guix/build/utils.scm | 44 ++++++++++++++++++++++++++------------------ > 1 file changed, 26 insertions(+), 18 deletions(-) > > diff --git a/guix/build/utils.scm b/guix/build/utils.scm > index 7391307c8..975f4e70a 100644 > --- a/guix/build/utils.scm > +++ b/guix/build/utils.scm > @@ -3,6 +3,7 @@ > ;;; Copyright =C2=A9 2013 Andreas Enge > ;;; Copyright =C2=A9 2013 Nikita Karetnikov > ;;; Copyright =C2=A9 2015 Mark H Weaver > +;;; Copyright =C2=A9 2018 Maxim Cournoyer > ;;; > ;;; This file is part of GNU Guix. > ;;; > @@ -621,28 +622,35 @@ PROC as (PROC LINE MATCHES); PROC must return the l= ine that will be written as > a substitution of the original line. Be careful about using '$' to matc= h the > end of a line; by itself it won't match the terminating newline of a lin= e." > (let ((rx+proc (map (match-lambda > - (((? regexp? pattern) . proc) > - (cons pattern proc)) > - ((pattern . proc) > - (cons (make-regexp pattern regexp/extended) > - proc))) > + (((? regexp? pattern) . proc) > + (cons pattern proc)) > + ((pattern . proc) > + (cons (make-regexp pattern regexp/extended) > + proc))) > pattern+procs))) > (with-atomic-file-replacement file > (lambda (in out) > (let loop ((line (read-line in 'concat))) > - (if (eof-object? line) > - #t > - (let ((line (fold (lambda (r+p line) > - (match r+p > - ((regexp . proc) > - (match (list-matches regexp line) > - ((and m+ (_ _ ...)) > - (proc line m+)) > - (_ line))))) > - line > - rx+proc))) > - (display line out) > - (loop (read-line in 'concat))))))))) > + (cond > + ((eof-object? line) > + #t) > + ((string-contains line (make-string 1 #\nul)) > + ;; The regexp functions of the GNU C library (which Guile us= es) > + ;; cannot deal with NUL characters, so skip to the next line. > + (format #t "skipping line with NUL characters: ~s\n" line) > + (loop (read-line in 'concat))) This code will unconditionally *delete* all lines that contain NULs. This will happen because the lines with NULs are not being written to the output file, which will replace the original file when this loop reaches EOF. So, any lines that are not copied to the output will be lost. To preserve the lines with NULs, you should call (display line out) before calling 'loop'. Also, please use (string-index line #\nul) to check for NULs instead of 'string-contains'. It should be more efficient. > + (else > + (let ((line (fold (lambda (r+p line) > + (match r+p > + ((regexp . proc) > + (match (list-matches regexp line) > + ((and m+ (_ _ ...)) > + (proc line m+)) > + (_ line))))) > + line > + rx+proc))) > + (display line out) > + (loop (read-line in 'concat)))))))))) With the changes suggested above, I would have no objection to pushing this to core-updates. However, it occurs to me that we could handle the NUL case in a better way: Since the C regex functions that we use cannot handle NUL bytes, we could use a different code point to represent NUL during those operations. We could choose a code point from one of the Unicode Private Use Areas that does not occur in the string. Let NUL* be the code point which will represent NUL bytes. First replace all NULs with NUL*s, then perform the substitutions, and finally replace all ALT*s with NULs before writing to the output. As an important optimization, we should avoid performing these extra operations unless (string-index line #\nul) finds a NUL. We could then perform these extra substitutions with simple, inefficient code. Maybe something like this (untested): --8<---------------cut here---------------start------------->8--- (with-atomic-file-replacement file (lambda (in out) (let loop ((line (read-line in 'concat))) (if (eof-object? line) #t (let* ((nul* (or (and (string-index line #\nul) (unused-private-use-code-point line)) #\nul)) (line* (replace-char #\nul nul* line)) (line1* (fold (lambda (r+p line) (match r+p ((regexp . proc) (match (list-matches regexp line) ((and m+ (_ _ ...)) (proc line m+)) (_ line))))) line* rx+proc)) (line1 (replace-char nul* #\nul line1*))) (display line1 out) (loop (read-line in 'concat))))))))) --8<---------------cut here---------------end--------------->8--- Where the following additional private procedures would be added to (guix build utils) above the definition for 'substitute': --8<---------------cut here---------------start------------->8--- (define (unused-private-use-code-point s) "Find a code point within a Unicode Private Use Area that is not present in S, and return the corresponding character object. If one cannot be found, return false." (define (scan lo hi) (and (<=3D lo hi) (let ((c (integer->char lo))) (if (string-index s c) (scan (+ lo 1) hi) c)))) (or (scan #xE000 #xF8FF) (scan #xF0000 #xFFFFD) (scan #x100000 #x10FFFD))) (define (replace-char c1 c2 s) "Return a string which is equal to S except with all instances of C1 replaced by C2. If C1 and C2 are equal, return S." (if (char=3D? c1 c2) s (string-map (lambda (c) (if (char=3D? c c1) c2 c)) s))) --8<---------------cut here---------------end--------------->8--- What do you think? Mark