Eli Zaretskii <eliz@gnu.org> schrieb am Sa., 21. Nov. 2015 um 10:30 Uhr:

> > From: Philipp Stephani <p.stephani2@gmail.com>
> > Date: Sat, 21 Nov 2015 09:01:12 +0000
> > Cc: tzz@lifelogs.com, aurelien.aptel+emacs@gmail.com,
> emacs-devel@gnu.org
> >
> > Let me summarize the issues I see: The internal Emacs encoding can change
> > between versions (command in mule-conf.el), therefore we shouldn't use
> it in
> > the module API. IIUC this rules out make_multibyte_string: it only
> accepts the
> > internal encoding. Therefore I proposed to always have users specify the
> > encoding explicitly and then use code_convert_string_norecord to create
> the
> > Lisp string objects. Would that work? (We probably then need another set
> of
> > functions for unibyte strings.)
>
> I'm not sure I'm following, so let's take a step back, okay?
>
> My comments were about using build_string and make_string in 2
> functions defined in emacs-module.c: module_make_function and
> module_make_string.  Both of these emacs-module.c functions produce
> strings for consumption by Emacs, AFAIU: the former produces a doc
> string of a function defined by a module, which will be used by
> various documentation-related functions and commands within Emacs, the
> latter produces a string to be passed to Emacs Lisp code for use as
> any other Lisp string.  Do you agree so far?
>

Yes.


>
> If you agree, then in both cases the strings these functions return
> should be in the internal representation of strings used by Emacs, not
> in some encoding like UTF-8 or ISO-8859-1.  (We could also use encoded
> strings, but that would require Lisp programs using module functions
> to always decode any strings they receive, which is less efficient and
> more error-prone.)
>

Yes. Just for understanding: there are two types of strings: unibyte (just
a sequence of chars), and multibyte (sequence of chars interpreted in the
internal Emacs encoding), right?


>
> (Btw, I don't think we should worry about changing the internal
> representation of characters in Emacs, because make_multibyte_string
> will be updated as needed.)
>

This is a crucial point. If the internal encoding never changes, then we
can declare that those string parameters are expected to be in the internal
encoding. But see the discussion in
https://github.com/aaptel/emacs-dynamic-module/issues/37: the comment in
mule-conf.el seems to indicate that the internal encoding is not stable.


>
> This is what my comments were about.  I think that you, by contrast,
> are talking about the encoding of the _input_ strings, in this case
> the 'documentation' argument to module_make_function and 'str'
> argument to module_make_string.  My assumption was that these
> arguments will always have to be in UTF-8 encoding; if that assumption
> is true, then no decoding via code_convert_string_norecord is
> necessary, since make_multibyte_string will DTRT.  We can (and
> probably should) document the fact that all non-ASCII strings must be
> UTF-8 encoded as a requirement of the emacs-module interface.
>

Or rather, an extension to UTF-8 capable of encoding surrogate code points
and numbers that are not code points, as described in
https://www.gnu.org/software/emacs/manual/html_node/elisp/Text-Representations.html
.


>
> If you are thinking about accepting strings encoded in other
> encodings, I'd consider this an extension, to be added later if
> needed.  After all, a module can easily convert to UTF-8 by itself,
> using facilities such as iconv.
>

Yes, provided the internal Emacs encoding is stable.


>
> In any case, code_convert_string_norecord cannot be the complete
> solution, because it accepts Lisp string objects, not C strings.  You
> still need to create a Lisp string (but this time using
> make_unibyte_string).  The point is to always use either
> make_unibyte_string or make_multibyte_string, and never build_string
> or make_string; the latter 2 should only be used for fixed ASCII-only
> strings.
>
>
Yes, that's fine, the question is about whether the internal encoding is
stable. If it's stable, we can use make_multibyte_string; if not, we can
only use make_unibyte_string.