From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Philipp Stephani
>=C2=A0 =C2=A0 =C2=A0(Btw, I don't think we should worry about c= hanging the internal
>=C2=A0 =C2=A0 =C2=A0representation of characters in Emacs, because make= _multibyte_string
>=C2=A0 =C2=A0 =C2=A0will be updated as needed.)
>
> This is a crucial point. If the internal encoding never changes, then = we can
> declare that those string parameters are expected to be in the interna= l
> encoding.
No, we cannot, or rather should not.=C2=A0 It is unreasonable to expect
external modules to know the intricacies of the internal
representation.=C2=A0 Most Emacs hackers don't.
> But see the discussion in
> https://github.com/aaptel/emacs-dynamic= -module/issues/37: the comment in
> mule-conf.el seems to indicate that the internal encoding is not stabl= e.
That discussion is about zero-copy access to Emacs buffer text and
Emacs strings inside module code.
=C2=A0 Such access is indeed imp= ossible
without either knowing _something_ about the internal representation,
or having additional APIs in emacs-module.c that allow modules such
access while hiding the details of the internal representation.=C2=A0 We
could discuss extending the module functionality to include this.
But that is a separate issue from what module_make_function and
module_make_string do.=C2=A0 These two functions are basic, and don't n= eed
to know about the internal representation or use it.=C2=A0 While direct
access to Emacs buffer text will be needed by only some modules,
module_make_function will be used by all of them, and
module_make_string by many.
So I think we shouldn't conflate these two issues; they are separate.
>=C2=A0 =C2=A0 =C2=A0This is what my comments were about. I think that y= ou, by contrast,
>=C2=A0 =C2=A0 =C2=A0are talking about the encoding of the _input_ strin= gs, in this case
>=C2=A0 =C2=A0 =C2=A0the 'documentation' argument to module_make= _function and 'str'
>=C2=A0 =C2=A0 =C2=A0argument to module_make_string. My assumption was t= hat these
>=C2=A0 =C2=A0 =C2=A0arguments will always have to be in UTF-8 encoding;= if that assumption
>=C2=A0 =C2=A0 =C2=A0is true, then no decoding via code_convert_string_n= orecord is
>=C2=A0 =C2=A0 =C2=A0necessary, since make_multibyte_string will DTRT. W= e can (and
>=C2=A0 =C2=A0 =C2=A0probably should) document the fact that all non-ASC= II strings must be
>=C2=A0 =C2=A0 =C2=A0UTF-8 encoded as a requirement of the emacs-module = interface.
>
> Or rather, an extension to UTF-8 capable of encoding surrogate code po= ints and
> numbers that are not code points, as described in
> https://www.= gnu.org/software/emacs/manual/html_node/elisp/Text-Representations.html= .
No, I meant strict UTF-8, not its Emacs extension.
> If it's stable, we can use make_multibyte_string; if not, we ca= n<= div>
> only use make_unibyte_string.
If the arguments strings are in strict UTF-8, then
make_multibyte_string will DTRT automagically, no matter what the
internal representation is.=C2=A0 That is their contract.