Hi Ludovic,

Ludovic Courtès <ludo@gnu.org> writes:

> Please add a “Fixes” line in the commit log.

Done, thanks.

> I’m not reviewing the code in depth and I trust your judgment.
>
> The risks of bugs I can think of are: missed ASCII references (a
> regression, whereby some ASCII references would not get rewritten),

This is indeed a risk whenever we modify the grafting code, which is
written for efficiency, not clarity.  I've tried to be careful, and have
checked that my newly grafted system and user profiles do not retain
references to ungrafted code, modulo the following pre-existing issues:

  <https://bugs.gnu.org/47576>
  (ibus-daemon launches ungrafted subprocesses)

  <https://bugs.gnu.org/47614>
  (Chunked store references in .zo files in Racket 8)

> and unrelated UTF-{16,32}-base32-looking references getting rewritten.
>
> I guess the latter is very unlikely because only sequences found in the
> replacement table may be rewritten, right?

Yes, that's correct.

> The former should be caught by ‘tests/grafts.scm’ but we could also
> check the closure of real-world systems, for instance, to make sure it
> doesn’t refer to ungrafted things.

As I mention above, I've already done so for my own GNOME-based Guix
system.

> Do you know how this affects performance?

I have not yet measured it, but subjectively, I do not notice any
obvious difference in speed.

I expect that the main performance impact is that large blocks of NULs
will be processed more slowly by the current draft version of the new
grafting code.  That's because NULs can now be part of a nix hash, and
therefore the new grafting code can only advance 1 byte position when
seeing a NUL, whereas previously it would skip ahead 33 positions in
that case.

If desired, the handling of NULs could be made more efficient, at the
cost of a bit more complexity.  When seeing a NUL, we could check the
adjacent bytes to see if this is part of a nix-base32 character in
UTF-16 or UTF-32 encoding.  If not, we could skip ahead.

> Perhaps add short docstrings for clarity.

Done.

>> +(for-each
>> + (lambda (char-size1)
>> +   (for-each
>> +    (lambda (char-size2)
>> +      (for-each
>> +       (lambda (gap)
>> +        (for-each
>> +         (lambda (offset)
>> +           (test-equal (format #f "replace-store-references, char-sizes ~a ~a, gap ~s, offset ~a"
>> +                               char-size1 char-size2 gap offset)
[...]
>> +         ;; offsets to test
>> +         (map (lambda (i) (- buffer-size (* 40 char-size1) i))
>> +              (iota 30))))
>> +       ;; gaps
>> +       '("" "-" " " "a")))
>> +    ;; char-size2 values to test
>> +    '(1 2)))
>> + ;; char-size1 values to test
>> + '(1 2 4))
>
> For clarity, perhaps you can define a top-level procedure for the test
> and call it from ‘for-each’.

Good idea.  I'd like to optimize the tests a bit before pushing this.
They take a fairly long time to run, and lead to a huge 1.5G grafts.log
file.  It might be sufficient to avoid 'test-equal' here.

> Modulo these very minor issues, it looks like it’s ready to go!

Sounds good.  Thanks for the review!

Below, I've attached my current revision of this draft patch, which
incorporates your suggestions regarding the main code.  What remains is
to improve the tests.

      Regards,
        Mark