Vijay Marupudi schreef op vr 21-01-2022 om 15:20 [-0500]:
+ (pass-if-exception "utf8->string range: end < start"
+ exception:out-of-range
+ (let* ((utf8 (string->utf8 "gnu guile")))
+ (utf8->string utf8 1 0)))
+ [other tests]
It would be nice to check multibyte characters as well,
to verify that byte indices and not character indices are used.
E.g., (utf8->string #vu8(195 169) 0 2) should return "é".
Another nice test: (utf8->string #vu8(195 169) 0 1) should raise
a 'decoding-error', even though #vu8(195 169) is valid UTF-8.
And (utf8->string #vu8(0 32 196) 0 2) should return "\x00 " even
though #vu8(0 32 195) is invalid UTF-8 -- and as a bonus, it checks
that the nul character is supported -- which can be easily forgotten
because Guile is implemented in C which usually terminates strings
by zero instead of using a length field.
Overall, the patch you sent seems a reasonable approach to me, though
I didn't verify the details. I find myself at times copying a part
of a bytevector to a new bytevector because some procedure doesn't
allow specifying byte ranges ...
Greetings,
Maxime