It can definitely do it, but I guess in emacs-module-rs it's not done by default because of performance implications - it might be quite costly to check every string in some cases, and it wasn't really clear if emacs can pass an invalid string. So currently this case causes undefined behavior there which results in emacs crash.
On Tue, Dec 17, 2024 at 2:24 PM Eli Zaretskii <
eliz@gnu.org> wrote:
> From: Evgeny Kurnevsky <kurnevsky@gmail.com>
> Date: Tue, 17 Dec 2024 13:31:57 +0000
>
> Yes, that's a binary file that is not an utf-8 string. From the comment in module_copy_string_contents
> implementation I guessed that in such cases emacs should signal an error, but instead it just passes this
> invalid string to the dynamic library which caused this bug in emacs-module-rs (see
> https://ubolonton.github.io/emacs-module-rs/latest/type-conversions.html#strings ). So if it's expected then
> maybe it should be explicitly said in the docs of copy_string_contents here
> https://www.gnu.org/software/emacs/manual/html_node/elisp/Module-Values.html ? It just says that it stores
> the utf-8 encoded text which makes an impression that it's an always valid utf-8 string.
I could look into the internals, but I actually wonder why the module
doesn't check the text before relying on such subtle behaviors. We
didn't document the fact that it signals an error for a reason.
So: why cannot the module code or the application which uses it test
up from that the string it copies is human-readable text, nit some
binary junk?
--
С уважением, Курневский Евгений.