> It would be nice to check multibyte characters as well,
> to verify that byte indices and not character indices are used.
>
> E.g., (utf8->string #vu8(195 169) 0 2) should return "é".
>
> Another nice test: (utf8->string #vu8(195 169) 0 1) should raise
> a 'decoding-error', even though #vu8(195 169) is valid UTF-8.
>
> And (utf8->string #vu8(0 32 196) 0 2) should return "\x00 " even
> though #vu8(0 32 195) is invalid UTF-8 -- and as a bonus, it checks
> that the nul character is supported -- which can be easily forgotten
> because Guile is implemented in C which usually terminates strings
> by zero instead of using a length field.

Thank you for the suggestions. I have added all the tests you suggested
to the test suite, and they all pass.

> Overall, the patch you sent seems a reasonable approach to me, though
> I didn't verify the details. I find myself at times copying a part of
> a bytevector to a new bytevector because some procedure doesn't allow
> specifying byte ranges ...

I'm glad it will be useful for you!

I addition to those tests, I have added the range functionality to both
utf16->string, and utf32->string. I have updated the documentation, and
the tests pass. I have also changed the name of the functions to
emphasize that they are a range on the bytevector (not the string). The
new C functions are the following.

SCM scm_utf8_range_to_string (SCM, SCM, SCM);
SCM scm_utf16_range_to_string (SCM, SCM, SCM, SCM);
SCM scm_utf32_range_to_string (SCM, SCM, SCM, SCM);

In a separate patch, I have removed the wrapper function for R7RS
compatibility and have exported the new changed utf8->string function. I
have removed a function that was not being used anywhere in the process.

I have attached the edited patch, and the new R7RS patch.

~ Vijay