unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: "Zhu Zihao" <all_but_last@163.com>
To: "emacs-devel@gnu.org" <emacs-devel@gnu.org>
Subject: Is copy_string_contents in emacs-module.h give us a proper UTF-8 string?
Date: Thu, 8 Oct 2020 14:09:53 +0800 (CST)	[thread overview]
Message-ID: <79bc0bc2.1982.17506d48b4d.Coremail.all_but_last@163.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 1331 bytes --]

I see the comment in emacs-module.h says


/* Copy the content of the Lisp string VALUE to BUFFER as an utf8
     NUL-terminated string.

     SIZE must point to the total size of the buffer.  If BUFFER is
     NULL or if SIZE is not big enough, write the required buffer size
     to SIZE and return true.

     Note that SIZE must include the last NUL byte (e.g. "abc" needs
     a buffer of size 4).

     Return true if the string was successfully copied.  */


However, the Text representation chapter in Elisp manual told me that UTF-8 encoding in Emacs is extended to store raw bytevector


   To support this multitude of characters and scripts, Emacs closely
follows the “Unicode Standard”.  The Unicode Standard assigns a unique
number, called a “codepoint”, to each and every character.  The range of
codepoints defined by Unicode, or the Unicode “codespace”, is
‘0..#x10FFFF’ (in hexadecimal notation), inclusive.  Emacs extends this
range with codepoints in the range ‘#x110000..#x3FFFFF’, which it uses
for representing characters that are not unified with Unicode and “raw
8-bit bytes” that cannot be interpreted as characters.  Thus, a
character codepoint in Emacs is a 22-bit integer.


Will "copy_string_contents" always give us a proper UTF-8 string. Or it will give us a mix of bytevector and UTF8?

[-- Attachment #2: Type: text/html, Size: 1809 bytes --]

             reply	other threads:[~2020-10-08  6:09 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-08  6:09 Zhu Zihao [this message]
2020-10-08  7:38 ` Is copy_string_contents in emacs-module.h give us a proper UTF-8 string? Eli Zaretskii
2020-10-08  7:40 ` Robert Pluim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=79bc0bc2.1982.17506d48b4d.Coremail.all_but_last@163.com \
    --to=all_but_last@163.com \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).