> It would be nice to check multibyte characters as well, > to verify that byte indices and not character indices are used. > > E.g., (utf8->string #vu8(195 169) 0 2) should return "é". > > Another nice test: (utf8->string #vu8(195 169) 0 1) should raise > a 'decoding-error', even though #vu8(195 169) is valid UTF-8. > > And (utf8->string #vu8(0 32 196) 0 2) should return "\x00 " even > though #vu8(0 32 195) is invalid UTF-8 -- and as a bonus, it checks > that the nul character is supported -- which can be easily forgotten > because Guile is implemented in C which usually terminates strings > by zero instead of using a length field. Thank you for the suggestions. I have added all the tests you suggested to the test suite, and they all pass. > Overall, the patch you sent seems a reasonable approach to me, though > I didn't verify the details. I find myself at times copying a part of > a bytevector to a new bytevector because some procedure doesn't allow > specifying byte ranges ... I'm glad it will be useful for you! I addition to those tests, I have added the range functionality to both utf16->string, and utf32->string. I have updated the documentation, and the tests pass. I have also changed the name of the functions to emphasize that they are a range on the bytevector (not the string). The new C functions are the following. SCM scm_utf8_range_to_string (SCM, SCM, SCM); SCM scm_utf16_range_to_string (SCM, SCM, SCM, SCM); SCM scm_utf32_range_to_string (SCM, SCM, SCM, SCM); In a separate patch, I have removed the wrapper function for R7RS compatibility and have exported the new changed utf8->string function. I have removed a function that was not being used anywhere in the process. I have attached the edited patch, and the new R7RS patch. ~ Vijay