From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Dynamic loading progress Date: Sat, 21 Nov 2015 13:10:13 +0200 Message-ID: <83mvu74nhm.fsf@gnu.org> References: <83k2ptq5t3.fsf@gnu.org> <87h9kxx60e.fsf@lifelogs.com> <877flswse5.fsf@lifelogs.com> <8737wgw7kf.fsf@lifelogs.com> <87io5bv1it.fsf@lifelogs.com> <87egfzuwca.fsf@lifelogs.com> <876118u6f2.fsf@lifelogs.com> <8737w3qero.fsf@lifelogs.com> <831tbn9g9j.fsf@gnu.org> <878u5upw7o.fsf@lifelogs.com> <83ziya8xph.fsf@gnu.org> <83y4du80xo.fsf@gnu.org> <837fld6lps.fsf@gnu.org> <83si3z4s5n.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org X-Trace: ger.gmane.org 1448104261 20848 80.91.229.3 (21 Nov 2015 11:11:01 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 21 Nov 2015 11:11:01 +0000 (UTC) Cc: aurelien.aptel+emacs@gmail.com, tzz@lifelogs.com, emacs-devel@gnu.org To: Philipp Stephani Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Nov 21 12:10:49 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1a0648-00077c-Op for ged-emacs-devel@m.gmane.org; Sat, 21 Nov 2015 12:10:49 +0100 Original-Received: from localhost ([::1]:51880 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a0647-00040s-Oq for ged-emacs-devel@m.gmane.org; Sat, 21 Nov 2015 06:10:47 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:45297) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a063q-00040n-I9 for emacs-devel@gnu.org; Sat, 21 Nov 2015 06:10:31 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a063l-0007LT-IN for emacs-devel@gnu.org; Sat, 21 Nov 2015 06:10:30 -0500 Original-Received: from mtaout28.012.net.il ([80.179.55.184]:44331) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a063l-0007LA-5e for emacs-devel@gnu.org; Sat, 21 Nov 2015 06:10:25 -0500 Original-Received: from conversion-daemon.mtaout28.012.net.il by mtaout28.012.net.il (HyperSendmail v2007.08) id <0NY500B00W71DQ00@mtaout28.012.net.il> for emacs-devel@gnu.org; Sat, 21 Nov 2015 13:09:17 +0200 (IST) Original-Received: from HOME-C4E4A596F7 ([84.94.185.246]) by mtaout28.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NY5000WCWBHLG80@mtaout28.012.net.il>; Sat, 21 Nov 2015 13:09:17 +0200 (IST) In-reply-to: X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 80.179.55.184 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:194943 Archived-At: > From: Philipp Stephani > Date: Sat, 21 Nov 2015 10:31:24 +0000 > Cc: tzz@lifelogs.com, aurelien.aptel+emacs@gmail.com, emacs-devel@gnu.org > > If you agree, then in both cases the strings these functions return > should be in the internal representation of strings used by Emacs, not > in some encoding like UTF-8 or ISO-8859-1. (We could also use encoded > strings, but that would require Lisp programs using module functions > to always decode any strings they receive, which is less efficient and > more error-prone.) > > Yes. Just for understanding: there are two types of strings: unibyte (just a > sequence of chars), and multibyte (sequence of chars interpreted in the > internal Emacs encoding), right? Yes. However, unibyte strings are just streams of bytes; Emacs cannot interpret them, and they generally appear on display as octal escapes. They should never be presented to the user, except if the user explicitly requested that, e.g. by a command such as find-file-literally. > (Btw, I don't think we should worry about changing the internal > representation of characters in Emacs, because make_multibyte_string > will be updated as needed.) > > This is a crucial point. If the internal encoding never changes, then we can > declare that those string parameters are expected to be in the internal > encoding. No, we cannot, or rather should not. It is unreasonable to expect external modules to know the intricacies of the internal representation. Most Emacs hackers don't. > But see the discussion in > https://github.com/aaptel/emacs-dynamic-module/issues/37: the comment in > mule-conf.el seems to indicate that the internal encoding is not stable. That discussion is about zero-copy access to Emacs buffer text and Emacs strings inside module code. Such access is indeed impossible without either knowing _something_ about the internal representation, or having additional APIs in emacs-module.c that allow modules such access while hiding the details of the internal representation. We could discuss extending the module functionality to include this. But that is a separate issue from what module_make_function and module_make_string do. These two functions are basic, and don't need to know about the internal representation or use it. While direct access to Emacs buffer text will be needed by only some modules, module_make_function will be used by all of them, and module_make_string by many. So I think we shouldn't conflate these two issues; they are separate. > This is what my comments were about. I think that you, by contrast, > are talking about the encoding of the _input_ strings, in this case > the 'documentation' argument to module_make_function and 'str' > argument to module_make_string. My assumption was that these > arguments will always have to be in UTF-8 encoding; if that assumption > is true, then no decoding via code_convert_string_norecord is > necessary, since make_multibyte_string will DTRT. We can (and > probably should) document the fact that all non-ASCII strings must be > UTF-8 encoded as a requirement of the emacs-module interface. > > Or rather, an extension to UTF-8 capable of encoding surrogate code points and > numbers that are not code points, as described in > https://www.gnu.org/software/emacs/manual/html_node/elisp/Text-Representations.html. No, I meant strict UTF-8, not its Emacs extension. > If you are thinking about accepting strings encoded in other > encodings, I'd consider this an extension, to be added later if > needed. After all, a module can easily convert to UTF-8 by itself, > using facilities such as iconv. > > Yes, provided the internal Emacs encoding is stable. That's not what I meant. (AFAIK, iconv doesn't know about the Emacs internal representation.) I meant that a module could convert from any encoding to UTF-8, and then pass the resulting UTF-8 string to the emacs-module API. > In any case, code_convert_string_norecord cannot be the complete > solution, because it accepts Lisp string objects, not C strings. You > still need to create a Lisp string (but this time using > make_unibyte_string). The point is to always use either > make_unibyte_string or make_multibyte_string, and never build_string > or make_string; the latter 2 should only be used for fixed ASCII-only > strings. > > Yes, that's fine, the question is about whether the internal encoding is > stable. With my suggestion, the stability of the internal representation is not an issue. > If it's stable, we can use make_multibyte_string; if not, we can > only use make_unibyte_string. If the arguments strings are in strict UTF-8, then make_multibyte_string will DTRT automagically, no matter what the internal representation is. That is their contract.