From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mark H Weaver Subject: bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates)) Date: Sun, 17 Jun 2018 00:36:00 -0400 Message-ID: <87d0wq3u8f.fsf@netris.org> References: <87r2qrc3mq.fsf@gmail.com> <87k1wjc35d.fsf_-_@gmail.com> <87r2l9q294.fsf@netris.org> <87tvq2smpm.fsf@gmail.com> Mime-Version: 1.0 Content-Type: text/plain Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:47680) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fUPS3-0005oQ-CB for bug-guix@gnu.org; Sun, 17 Jun 2018 00:38:08 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fUPRy-0007W9-SP for bug-guix@gnu.org; Sun, 17 Jun 2018 00:38:07 -0400 Received: from debbugs.gnu.org ([208.118.235.43]:44554) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fUPRy-0007Vx-OE for bug-guix@gnu.org; Sun, 17 Jun 2018 00:38:02 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1fUPRy-0007O4-EM for bug-guix@gnu.org; Sun, 17 Jun 2018 00:38:02 -0400 Sender: "Debbugs-submit" Resent-Message-ID: In-Reply-To: <87tvq2smpm.fsf@gmail.com> (Maxim Cournoyer's message of "Sat, 16 Jun 2018 12:47:01 -0400") List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: "bug-Guix" To: Maxim Cournoyer Cc: 30116@debbugs.gnu.org Hi Maxim, Maxim Cournoyer writes: > Mark H Weaver writes: > >> With the changes suggested above, I would have no objection to pushing >> this to core-updates. However, it occurs to me that we could handle the >> NUL case in a better way: >> >> Since the C regex functions that we use cannot handle NUL bytes, we >> could use a different code point to represent NUL during those >> operations. We could choose a code point from one of the Unicode >> Private Use Areas that >> does not occur in the string. >> >> Let NUL* be the code point which will represent NUL bytes. First >> replace all NULs with NUL*s, then perform the substitutions, and finally >> replace all ALT*s with NULs before writing to the output. > > Do I understand this transformation as NULs -> NUL*s and back from NUL*s > -> NULs correctly? I'm not sure how NUL*s became ALT*s in your explanation. Sorry, it's a typo. Where I wrote "ALT*s", I meant to write "NUL*s". >> What do you think? > > It raises the complexity level a bit for something which doesn't seem to > be a very common scenario, FWIW, I agree that it's not a common scenario, and it's not entirely clear that it was worth the time I spent on it, or the added complexity. On the other hand, I would dislike having a basic API like 'substitute*' be subtly broken in this way. > but otherwise seems a very elegant > workaround. It seems to me that your implementation is already pretty > complete. I'll try write a test for validating it and report back. Sounds good. Thank you! Mark