From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Philipp Stephani Newsgroups: gmane.emacs.devel Subject: Re: Dynamic loading progress Date: Sat, 21 Nov 2015 10:31:24 +0000 Message-ID: References: <83k2ptq5t3.fsf@gnu.org> <87h9kxx60e.fsf@lifelogs.com> <877flswse5.fsf@lifelogs.com> <8737wgw7kf.fsf@lifelogs.com> <87io5bv1it.fsf@lifelogs.com> <87egfzuwca.fsf@lifelogs.com> <876118u6f2.fsf@lifelogs.com> <8737w3qero.fsf@lifelogs.com> <831tbn9g9j.fsf@gnu.org> <878u5upw7o.fsf@lifelogs.com> <83ziya8xph.fsf@gnu.org> <83y4du80xo.fsf@gnu.org> <837fld6lps.fsf@gnu.org> <83si3z4s5n.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=001a114b6af604c30305250a7f9a X-Trace: ger.gmane.org 1448101916 21114 80.91.229.3 (21 Nov 2015 10:31:56 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 21 Nov 2015 10:31:56 +0000 (UTC) Cc: aurelien.aptel+emacs@gmail.com, tzz@lifelogs.com, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Nov 21 11:31:55 2015 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1a05SU-0004qW-GN for ged-emacs-devel@m.gmane.org; Sat, 21 Nov 2015 11:31:54 +0100 Original-Received: from localhost ([::1]:51794 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a05ST-0003DG-8N for ged-emacs-devel@m.gmane.org; Sat, 21 Nov 2015 05:31:53 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:39405) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a05SF-0003DB-8d for emacs-devel@gnu.org; Sat, 21 Nov 2015 05:31:40 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a05SD-0006Na-He for emacs-devel@gnu.org; Sat, 21 Nov 2015 05:31:39 -0500 Original-Received: from mail-wm0-x234.google.com ([2a00:1450:400c:c09::234]:36455) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a05SB-0006NB-31; Sat, 21 Nov 2015 05:31:35 -0500 Original-Received: by wmww144 with SMTP id w144so45762817wmw.1; Sat, 21 Nov 2015 02:31:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-type; bh=qndabqJt3vT9HoxfQZ0veOJ9nZ5kiv61/9W7zVBUGac=; b=uwcTPLlnMlw7V8ArNJu5b373Oam6B7dOXWC4ZsXj3n/To+rSph8WpogeY+ujwSkpsy an5SnUFfv1Ld/6peiwID2qZWBV9nKFiLInCCj4c6vNJJpkfRbb8FAZkCDLnN8Kt4cMox cDLawE+mWk012wC6ORc9b+biue73qKBHqVorAS5gDefQSs5uS8cj6gFp2xCYUmw7xtfk WEQoyvEBLEFqChFHvAm+OrDvlxg/whBCInumQV3XvrLccCfkq+jIHtnKK49hCi/zvuEp ceXDbqez87r4p2SjaJkZ2Z7AIR49hCA7XbJ4CnA4bYwrX7tJerq2AUPFFQKtYUm1RkQe lV9Q== X-Received: by 10.28.72.137 with SMTP id v131mr5293997wma.63.1448101894470; Sat, 21 Nov 2015 02:31:34 -0800 (PST) In-Reply-To: <83si3z4s5n.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2a00:1450:400c:c09::234 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:194939 Archived-At: --001a114b6af604c30305250a7f9a Content-Type: text/plain; charset=UTF-8 Eli Zaretskii schrieb am Sa., 21. Nov. 2015 um 10:30 Uhr: > > From: Philipp Stephani > > Date: Sat, 21 Nov 2015 09:01:12 +0000 > > Cc: tzz@lifelogs.com, aurelien.aptel+emacs@gmail.com, > emacs-devel@gnu.org > > > > Let me summarize the issues I see: The internal Emacs encoding can change > > between versions (command in mule-conf.el), therefore we shouldn't use > it in > > the module API. IIUC this rules out make_multibyte_string: it only > accepts the > > internal encoding. Therefore I proposed to always have users specify the > > encoding explicitly and then use code_convert_string_norecord to create > the > > Lisp string objects. Would that work? (We probably then need another set > of > > functions for unibyte strings.) > > I'm not sure I'm following, so let's take a step back, okay? > > My comments were about using build_string and make_string in 2 > functions defined in emacs-module.c: module_make_function and > module_make_string. Both of these emacs-module.c functions produce > strings for consumption by Emacs, AFAIU: the former produces a doc > string of a function defined by a module, which will be used by > various documentation-related functions and commands within Emacs, the > latter produces a string to be passed to Emacs Lisp code for use as > any other Lisp string. Do you agree so far? > Yes. > > If you agree, then in both cases the strings these functions return > should be in the internal representation of strings used by Emacs, not > in some encoding like UTF-8 or ISO-8859-1. (We could also use encoded > strings, but that would require Lisp programs using module functions > to always decode any strings they receive, which is less efficient and > more error-prone.) > Yes. Just for understanding: there are two types of strings: unibyte (just a sequence of chars), and multibyte (sequence of chars interpreted in the internal Emacs encoding), right? > > (Btw, I don't think we should worry about changing the internal > representation of characters in Emacs, because make_multibyte_string > will be updated as needed.) > This is a crucial point. If the internal encoding never changes, then we can declare that those string parameters are expected to be in the internal encoding. But see the discussion in https://github.com/aaptel/emacs-dynamic-module/issues/37: the comment in mule-conf.el seems to indicate that the internal encoding is not stable. > > This is what my comments were about. I think that you, by contrast, > are talking about the encoding of the _input_ strings, in this case > the 'documentation' argument to module_make_function and 'str' > argument to module_make_string. My assumption was that these > arguments will always have to be in UTF-8 encoding; if that assumption > is true, then no decoding via code_convert_string_norecord is > necessary, since make_multibyte_string will DTRT. We can (and > probably should) document the fact that all non-ASCII strings must be > UTF-8 encoded as a requirement of the emacs-module interface. > Or rather, an extension to UTF-8 capable of encoding surrogate code points and numbers that are not code points, as described in https://www.gnu.org/software/emacs/manual/html_node/elisp/Text-Representations.html . > > If you are thinking about accepting strings encoded in other > encodings, I'd consider this an extension, to be added later if > needed. After all, a module can easily convert to UTF-8 by itself, > using facilities such as iconv. > Yes, provided the internal Emacs encoding is stable. > > In any case, code_convert_string_norecord cannot be the complete > solution, because it accepts Lisp string objects, not C strings. You > still need to create a Lisp string (but this time using > make_unibyte_string). The point is to always use either > make_unibyte_string or make_multibyte_string, and never build_string > or make_string; the latter 2 should only be used for fixed ASCII-only > strings. > > Yes, that's fine, the question is about whether the internal encoding is stable. If it's stable, we can use make_multibyte_string; if not, we can only use make_unibyte_string. --001a114b6af604c30305250a7f9a Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


Eli Za= retskii <eliz@gnu.org> schrieb am= Sa., 21. Nov. 2015 um 10:30=C2=A0Uhr:
> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Sat, 21 Nov 2015 09:01:12 +0000
> Cc: tzz@lifelogs= .com, aurelien.aptel+emacs@gmail.com, emacs-devel@gnu.org
>
> Let me summarize the issues I see: The internal Emacs encoding can cha= nge
> between versions (command in mule-conf.el), therefore we shouldn't= use it in
> the module API. IIUC this rules out make_multibyte_string: it only acc= epts the
> internal encoding. Therefore I proposed to always have users specify t= he
> encoding explicitly and then use code_convert_string_norecord to creat= e the
> Lisp string objects. Would that work? (We probably then need another s= et of
> functions for unibyte strings.)

I'm not sure I'm following, so let's take a step back, okay?
My comments were about using build_string and make_string in 2
functions defined in emacs-module.c: module_make_function and
module_make_string.=C2=A0 Both of these emacs-module.c functions produce strings for consumption by Emacs, AFAIU: the former produces a doc
string of a function defined by a module, which will be used by
various documentation-related functions and commands within Emacs, the
latter produces a string to be passed to Emacs Lisp code for use as
any other Lisp string.=C2=A0 Do you agree so far?

=
Yes.
=C2=A0

If you agree, then in both cases the strings these functions return
should be in the internal representation of strings used by Emacs, not
in some encoding like UTF-8 or ISO-8859-1.=C2=A0 (We could also use encoded=
strings, but that would require Lisp programs using module functions
to always decode any strings they receive, which is less efficient and
more error-prone.)

Yes. Just for unders= tanding: there are two types of strings: unibyte (just a sequence of chars)= , and multibyte (sequence of chars interpreted in the internal Emacs encodi= ng), right?
=C2=A0

(Btw, I don't think we should worry about changing the internal
representation of characters in Emacs, because make_multibyte_string
will be updated as needed.)

This is a c= rucial point. If the internal encoding never changes, then we can declare t= hat those string parameters are expected to be in the internal encoding. Bu= t see the discussion in=C2=A0https://github.com/aaptel/emacs-dynamic-module/issue= s/37: the comment in mule-conf.el seems to indicate that the internal e= ncoding is not stable.
=C2=A0

This is what my comments were about.=C2=A0 I think that you, by contrast, are talking about the encoding of the _input_ strings, in this case
the 'documentation' argument to module_make_function and 'str&#= 39;
argument to module_make_string.=C2=A0 My assumption was that these
arguments will always have to be in UTF-8 encoding; if that assumption
is true, then no decoding via code_convert_string_norecord is
necessary, since make_multibyte_string will DTRT.=C2=A0 We can (and
probably should) document the fact that all non-ASCII strings must be
UTF-8 encoded as a requirement of the emacs-module interface.

Or rather, an extension to UTF-8 capable of encoding= surrogate code points and numbers that are not code points, as described i= n=C2=A0https://www.gnu.org/software/emacs/manual/html_= node/elisp/Text-Representations.html.
=C2=A0

If you are thinking about accepting strings encoded in other
encodings, I'd consider this an extension, to be added later if
needed.=C2=A0 After all, a module can easily convert to UTF-8 by itself, using facilities such as iconv.

Yes, pr= ovided the internal Emacs encoding is stable.
=C2=A0

In any case, code_convert_string_norecord cannot be the complete
solution, because it accepts Lisp string objects, not C strings.=C2=A0 You<= br> still need to create a Lisp string (but this time using
make_unibyte_string).=C2=A0 The point is to always use either
make_unibyte_string or make_multibyte_string, and never build_string
or make_string; the latter 2 should only be used for fixed ASCII-only
strings.


Yes, that's fine, the question is = about whether the internal encoding is stable. If it's stable, we can u= se make_multibyte_string; if not, we can only use make_unibyte_string.=C2= =A0
--001a114b6af604c30305250a7f9a--