I see the comment in emacs-module.h says /* Copy the content of the Lisp string VALUE to BUFFER as an utf8 NUL-terminated string. SIZE must point to the total size of the buffer. If BUFFER is NULL or if SIZE is not big enough, write the required buffer size to SIZE and return true. Note that SIZE must include the last NUL byte (e.g. "abc" needs a buffer of size 4). Return true if the string was successfully copied. */ However, the Text representation chapter in Elisp manual told me that UTF-8 encoding in Emacs is extended to store raw bytevector To support this multitude of characters and scripts, Emacs closely follows the ¡°Unicode Standard¡±. The Unicode Standard assigns a unique number, called a ¡°codepoint¡±, to each and every character. The range of codepoints defined by Unicode, or the Unicode ¡°codespace¡±, is ¡®0..#x10FFFF¡¯ (in hexadecimal notation), inclusive. Emacs extends this range with codepoints in the range ¡®#x110000..#x3FFFFF¡¯, which it uses for representing characters that are not unified with Unicode and ¡°raw 8-bit bytes¡± that cannot be interpreted as characters. Thus, a character codepoint in Emacs is a 22-bit integer. Will "copy_string_contents" always give us a proper UTF-8 string. Or it will give us a mix of bytevector and UTF8?