Is copy_string_contents in emacs-module.h give us a proper UTF-8 string?

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Is copy_string_contents in emacs-module.h give us a proper UTF-8 string?
@ 2020-10-08  6:09 Zhu Zihao
  2020-10-08  7:38 ` Eli Zaretskii
  2020-10-08  7:40 ` Robert Pluim
  0 siblings, 2 replies; 3+ messages in thread
From: Zhu Zihao @ 2020-10-08  6:09 UTC (permalink / raw)
  To: emacs-devel@gnu.org

[-- Attachment #1: Type: text/plain, Size: 1331 bytes --]

I see the comment in emacs-module.h says

/* Copy the content of the Lisp string VALUE to BUFFER as an utf8
     NUL-terminated string.

     SIZE must point to the total size of the buffer.  If BUFFER is
     NULL or if SIZE is not big enough, write the required buffer size
     to SIZE and return true.

     Note that SIZE must include the last NUL byte (e.g. "abc" needs
     a buffer of size 4).

     Return true if the string was successfully copied.  */

However, the Text representation chapter in Elisp manual told me that UTF-8 encoding in Emacs is extended to store raw bytevector

   To support this multitude of characters and scripts, Emacs closely
follows the “Unicode Standard”.  The Unicode Standard assigns a unique
number, called a “codepoint”, to each and every character.  The range of
codepoints defined by Unicode, or the Unicode “codespace”, is
‘0..#x10FFFF’ (in hexadecimal notation), inclusive.  Emacs extends this
range with codepoints in the range ‘#x110000..#x3FFFFF’, which it uses
for representing characters that are not unified with Unicode and “raw
8-bit bytes” that cannot be interpreted as characters.  Thus, a
character codepoint in Emacs is a 22-bit integer.

Will "copy_string_contents" always give us a proper UTF-8 string. Or it will give us a mix of bytevector and UTF8?

[-- Attachment #2: Type: text/html, Size: 1809 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Is copy_string_contents in emacs-module.h give us a proper UTF-8 string?
  2020-10-08  6:09 Is copy_string_contents in emacs-module.h give us a proper UTF-8 string? Zhu Zihao
@ 2020-10-08  7:38 ` Eli Zaretskii
  2020-10-08  7:40 ` Robert Pluim
  1 sibling, 0 replies; 3+ messages in thread
From: Eli Zaretskii @ 2020-10-08  7:38 UTC (permalink / raw)
  To: Zhu Zihao; +Cc: emacs-devel

> Date: Thu, 8 Oct 2020 14:09:53 +0800 (CST)
> From: "Zhu Zihao" <all_but_last@163.com>
> 
>    To support this multitude of characters and scripts, Emacs closely
> follows the “Unicode Standard”.  The Unicode Standard assigns a unique
> number, called a “codepoint”, to each and every character.  The range of
> codepoints defined by Unicode, or the Unicode “codespace”, is
> ‘0..#x10FFFF’ (in hexadecimal notation), inclusive.  Emacs extends this
> range with codepoints in the range ‘#x110000..#x3FFFFF’, which it uses
> for representing characters that are not unified with Unicode and “raw
> 8-bit bytes” that cannot be interpreted as characters.  Thus, a
> character codepoint in Emacs is a 22-bit integer.
> 
> Will "copy_string_contents" always give us a proper UTF-8 string. Or it will give us a mix of bytevector and
> UTF8?

If the original string includes raw bytes, copy_string_contents will
signal an error.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Is copy_string_contents in emacs-module.h give us a proper UTF-8 string?
  2020-10-08  6:09 Is copy_string_contents in emacs-module.h give us a proper UTF-8 string? Zhu Zihao
  2020-10-08  7:38 ` Eli Zaretskii
@ 2020-10-08  7:40 ` Robert Pluim
  1 sibling, 0 replies; 3+ messages in thread
From: Robert Pluim @ 2020-10-08  7:40 UTC (permalink / raw)
  To: Zhu Zihao; +Cc: emacs-devel@gnu.org

>>>>> On Thu, 8 Oct 2020 14:09:53 +0800 (CST), "Zhu Zihao" <all_but_last@163.com> said:

    Zhu> I see the comment in emacs-module.h says
    Zhu> /* Copy the content of the Lisp string VALUE to BUFFER as an utf8
    Zhu>      NUL-terminated string.

    Zhu>      SIZE must point to the total size of the buffer.  If BUFFER is
    Zhu>      NULL or if SIZE is not big enough, write the required buffer size
    Zhu>      to SIZE and return true.

    Zhu>      Note that SIZE must include the last NUL byte (e.g. "abc" needs
    Zhu>      a buffer of size 4).

    Zhu>      Return true if the string was successfully copied.  */

From emacs-module.c:module_copy_string_contents:

     We set HANDLE-8-BIT and HANDLE-OVER-UNI to nil to signal an error
     if the argument is not a valid Unicode string.  While it isn't
     documented how copy_string_contents behaves in this case,
     signaling an error is the most defensive and obvious reaction. */

    Zhu> Will "copy_string_contents" always give us a proper UTF-8 string. Or it will give us a mix of bytevector and UTF8?

It will either give a UTF-8 string or signal an error.

Robert
-- 

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-10-08  7:40 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-10-08  6:09 Is copy_string_contents in emacs-module.h give us a proper UTF-8 string? Zhu Zihao
2020-10-08  7:38 ` Eli Zaretskii
2020-10-08  7:40 ` Robert Pluim

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).