From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Philipp Stephani
> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Sat, 21 Nov 2015 09:01:12 +0000
> Cc: tzz@lifelogs= .com, aurelien.aptel+emacs@gmail.com, emacs-devel@gnu.org
>
> Let me summarize the issues I see: The internal Emacs encoding can cha= nge
> between versions (command in mule-conf.el), therefore we shouldn't= use it in
> the module API. IIUC this rules out make_multibyte_string: it only acc= epts the
> internal encoding. Therefore I proposed to always have users specify t= he
> encoding explicitly and then use code_convert_string_norecord to creat= e the
> Lisp string objects. Would that work? (We probably then need another s= et of
> functions for unibyte strings.)
I'm not sure I'm following, so let's take a step back, okay?
My comments were about using build_string and make_string in 2
functions defined in emacs-module.c: module_make_function and
module_make_string.=C2=A0 Both of these emacs-module.c functions produce
strings for consumption by Emacs, AFAIU: the former produces a doc
string of a function defined by a module, which will be used by
various documentation-related functions and commands within Emacs, the
latter produces a string to be passed to Emacs Lisp code for use as
any other Lisp string.=C2=A0 Do you agree so far?
If you agree, then in both cases the strings these functions return
should be in the internal representation of strings used by Emacs, not
in some encoding like UTF-8 or ISO-8859-1.=C2=A0 (We could also use encoded=
strings, but that would require Lisp programs using module functions
to always decode any strings they receive, which is less efficient and
more error-prone.)
(Btw, I don't think we should worry about changing the internal
representation of characters in Emacs, because make_multibyte_string
will be updated as needed.)
This is what my comments were about.=C2=A0 I think that you, by contrast, are talking about the encoding of the _input_ strings, in this case
the 'documentation' argument to module_make_function and 'str= 39;
argument to module_make_string.=C2=A0 My assumption was that these
arguments will always have to be in UTF-8 encoding; if that assumption
is true, then no decoding via code_convert_string_norecord is
necessary, since make_multibyte_string will DTRT.=C2=A0 We can (and
probably should) document the fact that all non-ASCII strings must be
UTF-8 encoded as a requirement of the emacs-module interface.Or rather, an extension to UTF-8 capable of encoding= surrogate code points and numbers that are not code points, as described i= n=C2=A0https://www.gnu.org/software/emacs/manual/html_= node/elisp/Text-Representations.html.=C2=A0
If you are thinking about accepting strings encoded in other
encodings, I'd consider this an extension, to be added later if
needed.=C2=A0 After all, a module can easily convert to UTF-8 by itself,
using facilities such as iconv.Yes, pr= ovided the internal Emacs encoding is stable.=C2=A0
In any case, code_convert_string_norecord cannot be the complete
solution, because it accepts Lisp string objects, not C strings.=C2=A0 You<= br> still need to create a Lisp string (but this time using
make_unibyte_string).=C2=A0 The point is to always use either
make_unibyte_string or make_multibyte_string, and never build_string
or make_string; the latter 2 should only be used for fixed ASCII-only
strings.