From mboxrd@z Thu Jan  1 00:00:00 1970
From: Mark H Weaver <mhw@netris.org>
Subject: bug#30116: [PATCH] `substitute' crashes when file contains NUL
	characters (core-updates))
Date: Sun, 17 Jun 2018 00:36:00 -0400
Message-ID: <87d0wq3u8f.fsf@netris.org>
References: <87r2qrc3mq.fsf@gmail.com>
	<handler.30116.B.15159796942311.ack@debbugs.gnu.org>
	<87k1wjc35d.fsf_-_@gmail.com> <87r2l9q294.fsf@netris.org>
	<87tvq2smpm.fsf@gmail.com>
Mime-Version: 1.0
Content-Type: text/plain
Return-path: <bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org>
Received: from eggs.gnu.org ([2001:4830:134:3::10]:47680)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1fUPS3-0005oQ-CB
	for bug-guix@gnu.org; Sun, 17 Jun 2018 00:38:08 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1fUPRy-0007W9-SP
	for bug-guix@gnu.org; Sun, 17 Jun 2018 00:38:07 -0400
Received: from debbugs.gnu.org ([208.118.235.43]:44554)
	by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
	(Exim 4.71) (envelope-from <Debian-debbugs@debbugs.gnu.org>)
	id 1fUPRy-0007Vx-OE
	for bug-guix@gnu.org; Sun, 17 Jun 2018 00:38:02 -0400
Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1fUPRy-0007O4-EM
	for bug-guix@gnu.org; Sun, 17 Jun 2018 00:38:02 -0400
Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
Resent-Message-ID: <handler.30116.B30116.152921024728352@debbugs.gnu.org>
In-Reply-To: <87tvq2smpm.fsf@gmail.com> (Maxim Cournoyer's message of "Sat, 16
	Jun 2018 12:47:01 -0400")
List-Id: Bug reports for GNU Guix <bug-guix.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/bug-guix>,
	<mailto:bug-guix-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/bug-guix/>
List-Post: <mailto:bug-guix@gnu.org>
List-Help: <mailto:bug-guix-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/bug-guix>,
	<mailto:bug-guix-request@gnu.org?subject=subscribe>
Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org
Sender: "bug-Guix" <bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org>
To: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Cc: 30116@debbugs.gnu.org

Hi Maxim,

Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

> Mark H Weaver <mhw@netris.org> writes:
>
>> With the changes suggested above, I would have no objection to pushing
>> this to core-updates.  However, it occurs to me that we could handle the
>> NUL case in a better way:
>>
>> Since the C regex functions that we use cannot handle NUL bytes, we
>> could use a different code point to represent NUL during those
>> operations.  We could choose a code point from one of the Unicode
>> Private Use Areas <https://en.wikipedia.org/wiki/Private_Use_Areas> that
>> does not occur in the string.
>>
>> Let NUL* be the code point which will represent NUL bytes.  First
>> replace all NULs with NUL*s, then perform the substitutions, and finally
>> replace all ALT*s with NULs before writing to the output.
>
> Do I understand this transformation as NULs -> NUL*s and back from NUL*s
> -> NULs correctly? I'm not sure how NUL*s became ALT*s in your explanation.

Sorry, it's a typo.  Where I wrote "ALT*s", I meant to write "NUL*s".

>> What do you think?
>
> It raises the complexity level a bit for something which doesn't seem to
> be a very common scenario,

FWIW, I agree that it's not a common scenario, and it's not entirely
clear that it was worth the time I spent on it, or the added complexity.
On the other hand, I would dislike having a basic API like 'substitute*'
be subtly broken in this way.

> but otherwise seems a very elegant
> workaround. It seems to me that your implementation is already pretty
> complete. I'll try write a test for validating it and report back.

Sounds good.  Thank you!

      Mark