Eli Zaretskii schrieb am Sa., 21. Nov. 2015 um 10:30 Uhr: > > From: Philipp Stephani > > Date: Sat, 21 Nov 2015 09:01:12 +0000 > > Cc: tzz@lifelogs.com, aurelien.aptel+emacs@gmail.com, > emacs-devel@gnu.org > > > > Let me summarize the issues I see: The internal Emacs encoding can change > > between versions (command in mule-conf.el), therefore we shouldn't use > it in > > the module API. IIUC this rules out make_multibyte_string: it only > accepts the > > internal encoding. Therefore I proposed to always have users specify the > > encoding explicitly and then use code_convert_string_norecord to create > the > > Lisp string objects. Would that work? (We probably then need another set > of > > functions for unibyte strings.) > > I'm not sure I'm following, so let's take a step back, okay? > > My comments were about using build_string and make_string in 2 > functions defined in emacs-module.c: module_make_function and > module_make_string. Both of these emacs-module.c functions produce > strings for consumption by Emacs, AFAIU: the former produces a doc > string of a function defined by a module, which will be used by > various documentation-related functions and commands within Emacs, the > latter produces a string to be passed to Emacs Lisp code for use as > any other Lisp string. Do you agree so far? > Yes. > > If you agree, then in both cases the strings these functions return > should be in the internal representation of strings used by Emacs, not > in some encoding like UTF-8 or ISO-8859-1. (We could also use encoded > strings, but that would require Lisp programs using module functions > to always decode any strings they receive, which is less efficient and > more error-prone.) > Yes. Just for understanding: there are two types of strings: unibyte (just a sequence of chars), and multibyte (sequence of chars interpreted in the internal Emacs encoding), right? > > (Btw, I don't think we should worry about changing the internal > representation of characters in Emacs, because make_multibyte_string > will be updated as needed.) > This is a crucial point. If the internal encoding never changes, then we can declare that those string parameters are expected to be in the internal encoding. But see the discussion in https://github.com/aaptel/emacs-dynamic-module/issues/37: the comment in mule-conf.el seems to indicate that the internal encoding is not stable. > > This is what my comments were about. I think that you, by contrast, > are talking about the encoding of the _input_ strings, in this case > the 'documentation' argument to module_make_function and 'str' > argument to module_make_string. My assumption was that these > arguments will always have to be in UTF-8 encoding; if that assumption > is true, then no decoding via code_convert_string_norecord is > necessary, since make_multibyte_string will DTRT. We can (and > probably should) document the fact that all non-ASCII strings must be > UTF-8 encoded as a requirement of the emacs-module interface. > Or rather, an extension to UTF-8 capable of encoding surrogate code points and numbers that are not code points, as described in https://www.gnu.org/software/emacs/manual/html_node/elisp/Text-Representations.html . > > If you are thinking about accepting strings encoded in other > encodings, I'd consider this an extension, to be added later if > needed. After all, a module can easily convert to UTF-8 by itself, > using facilities such as iconv. > Yes, provided the internal Emacs encoding is stable. > > In any case, code_convert_string_norecord cannot be the complete > solution, because it accepts Lisp string objects, not C strings. You > still need to create a Lisp string (but this time using > make_unibyte_string). The point is to always use either > make_unibyte_string or make_multibyte_string, and never build_string > or make_string; the latter 2 should only be used for fixed ASCII-only > strings. > > Yes, that's fine, the question is about whether the internal encoding is stable. If it's stable, we can use make_multibyte_string; if not, we can only use make_unibyte_string.