From mboxrd@z Thu Jan 1 00:00:00 1970 From: ludo@gnu.org (Ludovic =?UTF-8?Q?Court=C3=A8s?=) Subject: bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates) Date: Mon, 22 Jan 2018 11:58:37 +0100 Message-ID: <87o9lmp3c2.fsf@gnu.org> References: <87r2qrc3mq.fsf@gmail.com> <87o9lu6o9m.fsf@gnu.org> <87607vu9dp.fsf@gmail.com> <877esb84ae.fsf@netris.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:57533) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1edZof-0001D4-UH for bug-guix@gnu.org; Mon, 22 Jan 2018 05:59:07 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1edZoc-0002v2-SV for bug-guix@gnu.org; Mon, 22 Jan 2018 05:59:06 -0500 Received: from debbugs.gnu.org ([208.118.235.43]:58073) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1edZoc-0002uu-Oe for bug-guix@gnu.org; Mon, 22 Jan 2018 05:59:02 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1edZoc-0007kL-8a for bug-guix@gnu.org; Mon, 22 Jan 2018 05:59:02 -0500 Sender: "Debbugs-submit" Resent-Message-ID: In-Reply-To: <877esb84ae.fsf@netris.org> (Mark H. Weaver's message of "Sun, 21 Jan 2018 13:17:45 -0500") List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: "bug-Guix" To: Mark H Weaver Cc: 30116@debbugs.gnu.org, Maxim Cournoyer Mark H Weaver skribis: > Maxim Cournoyer writes: > >> ludo@gnu.org (Ludovic Court=C3=A8s) writes: >> >>> Maxim Cournoyer skribis: >>> >>>> I've encountered the following crash when trying to use substitute on a >>>> file which contains NUL characters: >>> >>> Yes, that=E2=80=99s because Guile=E2=80=99s =E2=80=98regexp-exec=E2=80= =99 simply wraps libc=E2=80=99s =E2=80=98regexec=E2=80=99, >>> which does not handle NULs. >>> >>> We should consider switching to the pure-Scheme SRFI-115: >>> >>> https://srfi.schemers.org/srfi-115/srfi-115.html >> >> This looks good, and I started looking into porting `substitute' to it, >> but quickly noticed it doesn't seem to be implemented in Guile yet? ISTR that the reference implementation works fine on Guile. > Indeed. SRFI-115 for Guile is on my TODO list, although it might be > better to wait until after we switch to using UTF-8 encoding internally > for strings, since that will drastically affect the implementation of > any efficient regexp matcher on Scheme strings. Indeed, though I suppose it doesn=E2=80=99t matter much for the cases where =E2=80=98substitute*=E2=80=99 is used? > Anyway, 'substitute*' is to be used only on text files, and NUL bytes > are not a valid textual character. So, I think that this case is > outside of what 'substitute*' is meant to do, and therefore not a bug in > 'substitute*', although of course a more graceful error would surely be > preferable. Yes, that=E2=80=99s also a good point. So yeah, I think it may be good =E2=80=9Ceventually=E2=80=9D to switch to S= RFI-115, but that=E2=80=99s not urgent. Thoughts? Ludo=E2=80=99.