From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Philipp Stephani <p.stephani2@gmail.com>
Newsgroups: gmane.emacs.devel
Subject: Re: Dynamic loading progress
Date: Sat, 21 Nov 2015 10:31:24 +0000
Message-ID: <CAArVCkQ0qUTUr5GZ+xmCub2tEWc0YzFKRsHEN-FFv3ioAc2n0w@mail.gmail.com>
References: <CA+5B0FOuWbpBUTsrE4tzzoLxACPQ-mgxx7zJKyW2LR77QRM=Ug@mail.gmail.com>
	<83k2ptq5t3.fsf@gnu.org> <87h9kxx60e.fsf@lifelogs.com>
	<CA+5B0FPp9nYEmoyDLrutJpcOZBtpV9kxx7LdPqrsj26rnj11qA@mail.gmail.com>
	<CAArVCkS515CVbS1UfavFGAq0dGO=e_mGftMbhF_eBw3SSu3Xjg@mail.gmail.com>
	<877flswse5.fsf@lifelogs.com>
	<CAArVCkT0M8o4MDP1RaP-r9JqumoQaMbhANRrycSEyyCj+mqUcA@mail.gmail.com>
	<8737wgw7kf.fsf@lifelogs.com>
	<CA+5B0FOGrn01XZzKJvXdWLPL62ONUzoEBfQRwLiKqLmd6Ta3RA@mail.gmail.com>
	<87io5bv1it.fsf@lifelogs.com>
	<CA+5B0FOp8Ub1+V_2G4CC1r2aG1hLKmZdSic59MfOy=9QoovSRQ@mail.gmail.com>
	<87egfzuwca.fsf@lifelogs.com>
	<CAArVCkSEHxSd3X2PnEvRJk5n1wOR0y9neU7AxGYEHSqKRG+y3Q@mail.gmail.com>
	<876118u6f2.fsf@lifelogs.com>
	<CA+5B0FPz-vo+Y=38=21jRQuEHANzFG_cf3tPDiwEbK2TO4+JdA@mail.gmail.com>
	<CA+5B0FNW48d3S5CJfxHK9HHVHPmuYqaT3K9tn5MVTgv_qas5Rw@mail.gmail.com>
	<ryhmvud820v.fsf@dod.no>
	<CA+5B0FMU1Ry6mRSinyV5Ar8DaL4VciEUEbTe1NcXZUQ2-4y4TA@mail.gmail.com>
	<8737w3qero.fsf@lifelogs.com> <831tbn9g9j.fsf@gnu.org>
	<878u5upw7o.fsf@lifelogs.com>
	<83ziya8xph.fsf@gnu.org> <83y4du80xo.fsf@gnu.org>
	<CAArVCkTwVbA58_wfj7O-Et83M8YJ9jfpCKhYn466BYO8T2cG0A@mail.gmail.com>
	<837fld6lps.fsf@gnu.org>
	<CAArVCkSTdg=EjSiN69TqLoH_ufkz_vzV6qLKNae2QbEXadYomg@mail.gmail.com>
	<83si3z4s5n.fsf@gnu.org>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: multipart/alternative; boundary=001a114b6af604c30305250a7f9a
X-Trace: ger.gmane.org 1448101916 21114 80.91.229.3 (21 Nov 2015 10:31:56 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Sat, 21 Nov 2015 10:31:56 +0000 (UTC)
Cc: aurelien.aptel+emacs@gmail.com, tzz@lifelogs.com, emacs-devel@gnu.org
To: Eli Zaretskii <eliz@gnu.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Nov 21 11:31:55 2015
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1a05SU-0004qW-GN
	for ged-emacs-devel@m.gmane.org; Sat, 21 Nov 2015 11:31:54 +0100
Original-Received: from localhost ([::1]:51794 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1a05ST-0003DG-8N
	for ged-emacs-devel@m.gmane.org; Sat, 21 Nov 2015 05:31:53 -0500
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:39405)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <p.stephani2@gmail.com>) id 1a05SF-0003DB-8d
	for emacs-devel@gnu.org; Sat, 21 Nov 2015 05:31:40 -0500
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <p.stephani2@gmail.com>) id 1a05SD-0006Na-He
	for emacs-devel@gnu.org; Sat, 21 Nov 2015 05:31:39 -0500
Original-Received: from mail-wm0-x234.google.com ([2a00:1450:400c:c09::234]:36455)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <p.stephani2@gmail.com>)
	id 1a05SB-0006NB-31; Sat, 21 Nov 2015 05:31:35 -0500
Original-Received: by wmww144 with SMTP id w144so45762817wmw.1;
	Sat, 21 Nov 2015 02:31:34 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:references:in-reply-to:from:date:message-id:subject:to
	:cc:content-type;
	bh=qndabqJt3vT9HoxfQZ0veOJ9nZ5kiv61/9W7zVBUGac=;
	b=uwcTPLlnMlw7V8ArNJu5b373Oam6B7dOXWC4ZsXj3n/To+rSph8WpogeY+ujwSkpsy
	an5SnUFfv1Ld/6peiwID2qZWBV9nKFiLInCCj4c6vNJJpkfRbb8FAZkCDLnN8Kt4cMox
	cDLawE+mWk012wC6ORc9b+biue73qKBHqVorAS5gDefQSs5uS8cj6gFp2xCYUmw7xtfk
	WEQoyvEBLEFqChFHvAm+OrDvlxg/whBCInumQV3XvrLccCfkq+jIHtnKK49hCi/zvuEp
	ceXDbqez87r4p2SjaJkZ2Z7AIR49hCA7XbJ4CnA4bYwrX7tJerq2AUPFFQKtYUm1RkQe
	lV9Q==
X-Received: by 10.28.72.137 with SMTP id v131mr5293997wma.63.1448101894470;
	Sat, 21 Nov 2015 02:31:34 -0800 (PST)
In-Reply-To: <83si3z4s5n.fsf@gnu.org>
X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address
	(bad octet value).
X-Received-From: 2a00:1450:400c:c09::234
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:194939
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/194939>

--001a114b6af604c30305250a7f9a
Content-Type: text/plain; charset=UTF-8

Eli Zaretskii <eliz@gnu.org> schrieb am Sa., 21. Nov. 2015 um 10:30 Uhr:

> > From: Philipp Stephani <p.stephani2@gmail.com>
> > Date: Sat, 21 Nov 2015 09:01:12 +0000
> > Cc: tzz@lifelogs.com, aurelien.aptel+emacs@gmail.com,
> emacs-devel@gnu.org
> >
> > Let me summarize the issues I see: The internal Emacs encoding can change
> > between versions (command in mule-conf.el), therefore we shouldn't use
> it in
> > the module API. IIUC this rules out make_multibyte_string: it only
> accepts the
> > internal encoding. Therefore I proposed to always have users specify the
> > encoding explicitly and then use code_convert_string_norecord to create
> the
> > Lisp string objects. Would that work? (We probably then need another set
> of
> > functions for unibyte strings.)
>
> I'm not sure I'm following, so let's take a step back, okay?
>
> My comments were about using build_string and make_string in 2
> functions defined in emacs-module.c: module_make_function and
> module_make_string.  Both of these emacs-module.c functions produce
> strings for consumption by Emacs, AFAIU: the former produces a doc
> string of a function defined by a module, which will be used by
> various documentation-related functions and commands within Emacs, the
> latter produces a string to be passed to Emacs Lisp code for use as
> any other Lisp string.  Do you agree so far?
>

Yes.


>
> If you agree, then in both cases the strings these functions return
> should be in the internal representation of strings used by Emacs, not
> in some encoding like UTF-8 or ISO-8859-1.  (We could also use encoded
> strings, but that would require Lisp programs using module functions
> to always decode any strings they receive, which is less efficient and
> more error-prone.)
>

Yes. Just for understanding: there are two types of strings: unibyte (just
a sequence of chars), and multibyte (sequence of chars interpreted in the
internal Emacs encoding), right?


>
> (Btw, I don't think we should worry about changing the internal
> representation of characters in Emacs, because make_multibyte_string
> will be updated as needed.)
>

This is a crucial point. If the internal encoding never changes, then we
can declare that those string parameters are expected to be in the internal
encoding. But see the discussion in
https://github.com/aaptel/emacs-dynamic-module/issues/37: the comment in
mule-conf.el seems to indicate that the internal encoding is not stable.


>
> This is what my comments were about.  I think that you, by contrast,
> are talking about the encoding of the _input_ strings, in this case
> the 'documentation' argument to module_make_function and 'str'
> argument to module_make_string.  My assumption was that these
> arguments will always have to be in UTF-8 encoding; if that assumption
> is true, then no decoding via code_convert_string_norecord is
> necessary, since make_multibyte_string will DTRT.  We can (and
> probably should) document the fact that all non-ASCII strings must be
> UTF-8 encoded as a requirement of the emacs-module interface.
>

Or rather, an extension to UTF-8 capable of encoding surrogate code points
and numbers that are not code points, as described in
https://www.gnu.org/software/emacs/manual/html_node/elisp/Text-Representations.html
.


>
> If you are thinking about accepting strings encoded in other
> encodings, I'd consider this an extension, to be added later if
> needed.  After all, a module can easily convert to UTF-8 by itself,
> using facilities such as iconv.
>

Yes, provided the internal Emacs encoding is stable.


>
> In any case, code_convert_string_norecord cannot be the complete
> solution, because it accepts Lisp string objects, not C strings.  You
> still need to create a Lisp string (but this time using
> make_unibyte_string).  The point is to always use either
> make_unibyte_string or make_multibyte_string, and never build_string
> or make_string; the latter 2 should only be used for fixed ASCII-only
> strings.
>
>
Yes, that's fine, the question is about whether the internal encoding is
stable. If it's stable, we can use make_multibyte_string; if not, we can
only use make_unibyte_string.

--001a114b6af604c30305250a7f9a
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><br><br><div class=3D"gmail_quote"><div dir=3D"ltr">Eli Za=
retskii &lt;<a href=3D"mailto:eliz@gnu.org">eliz@gnu.org</a>&gt; schrieb am=
 Sa., 21. Nov. 2015 um 10:30=C2=A0Uhr:<br></div><blockquote class=3D"gmail_=
quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1=
ex">&gt; From: Philipp Stephani &lt;<a href=3D"mailto:p.stephani2@gmail.com=
" target=3D"_blank">p.stephani2@gmail.com</a>&gt;<br>
&gt; Date: Sat, 21 Nov 2015 09:01:12 +0000<br>
&gt; Cc: <a href=3D"mailto:tzz@lifelogs.com" target=3D"_blank">tzz@lifelogs=
.com</a>, <a href=3D"mailto:aurelien.aptel%2Bemacs@gmail.com" target=3D"_bl=
ank">aurelien.aptel+emacs@gmail.com</a>, <a href=3D"mailto:emacs-devel@gnu.=
org" target=3D"_blank">emacs-devel@gnu.org</a><br>
&gt;<br>
&gt; Let me summarize the issues I see: The internal Emacs encoding can cha=
nge<br>
&gt; between versions (command in mule-conf.el), therefore we shouldn&#39;t=
 use it in<br>
&gt; the module API. IIUC this rules out make_multibyte_string: it only acc=
epts the<br>
&gt; internal encoding. Therefore I proposed to always have users specify t=
he<br>
&gt; encoding explicitly and then use code_convert_string_norecord to creat=
e the<br>
&gt; Lisp string objects. Would that work? (We probably then need another s=
et of<br>
&gt; functions for unibyte strings.)<br>
<br>
I&#39;m not sure I&#39;m following, so let&#39;s take a step back, okay?<br=
>
<br>
My comments were about using build_string and make_string in 2<br>
functions defined in emacs-module.c: module_make_function and<br>
module_make_string.=C2=A0 Both of these emacs-module.c functions produce<br=
>
strings for consumption by Emacs, AFAIU: the former produces a doc<br>
string of a function defined by a module, which will be used by<br>
various documentation-related functions and commands within Emacs, the<br>
latter produces a string to be passed to Emacs Lisp code for use as<br>
any other Lisp string.=C2=A0 Do you agree so far?<br></blockquote><div><br>=
</div><div>Yes.</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" sty=
le=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
If you agree, then in both cases the strings these functions return<br>
should be in the internal representation of strings used by Emacs, not<br>
in some encoding like UTF-8 or ISO-8859-1.=C2=A0 (We could also use encoded=
<br>
strings, but that would require Lisp programs using module functions<br>
to always decode any strings they receive, which is less efficient and<br>
more error-prone.)<br></blockquote><div><br></div><div>Yes. Just for unders=
tanding: there are two types of strings: unibyte (just a sequence of chars)=
, and multibyte (sequence of chars interpreted in the internal Emacs encodi=
ng), right?</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=
=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
(Btw, I don&#39;t think we should worry about changing the internal<br>
representation of characters in Emacs, because make_multibyte_string<br>
will be updated as needed.)<br></blockquote><div><br></div><div>This is a c=
rucial point. If the internal encoding never changes, then we can declare t=
hat those string parameters are expected to be in the internal encoding. Bu=
t see the discussion in=C2=A0<a href=3D"https://github.com/aaptel/emacs-dyn=
amic-module/issues/37">https://github.com/aaptel/emacs-dynamic-module/issue=
s/37</a>: the comment in mule-conf.el seems to indicate that the internal e=
ncoding is not stable.</div><div>=C2=A0</div><blockquote class=3D"gmail_quo=
te" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"=
>
<br>
This is what my comments were about.=C2=A0 I think that you, by contrast,<b=
r>
are talking about the encoding of the _input_ strings, in this case<br>
the &#39;documentation&#39; argument to module_make_function and &#39;str&#=
39;<br>
argument to module_make_string.=C2=A0 My assumption was that these<br>
arguments will always have to be in UTF-8 encoding; if that assumption<br>
is true, then no decoding via code_convert_string_norecord is<br>
necessary, since make_multibyte_string will DTRT.=C2=A0 We can (and<br>
probably should) document the fact that all non-ASCII strings must be<br>
UTF-8 encoded as a requirement of the emacs-module interface.<br></blockquo=
te><div><br></div><div>Or rather, an extension to UTF-8 capable of encoding=
 surrogate code points and numbers that are not code points, as described i=
n=C2=A0<a href=3D"https://www.gnu.org/software/emacs/manual/html_node/elisp=
/Text-Representations.html">https://www.gnu.org/software/emacs/manual/html_=
node/elisp/Text-Representations.html</a>.</div><div>=C2=A0</div><blockquote=
 class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc soli=
d;padding-left:1ex">
<br>
If you are thinking about accepting strings encoded in other<br>
encodings, I&#39;d consider this an extension, to be added later if<br>
needed.=C2=A0 After all, a module can easily convert to UTF-8 by itself,<br=
>
using facilities such as iconv.<br></blockquote><div><br></div><div>Yes, pr=
ovided the internal Emacs encoding is stable.</div><div>=C2=A0</div><blockq=
uote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc =
solid;padding-left:1ex">
<br>
In any case, code_convert_string_norecord cannot be the complete<br>
solution, because it accepts Lisp string objects, not C strings.=C2=A0 You<=
br>
still need to create a Lisp string (but this time using<br>
make_unibyte_string).=C2=A0 The point is to always use either<br>
make_unibyte_string or make_multibyte_string, and never build_string<br>
or make_string; the latter 2 should only be used for fixed ASCII-only<br>
strings.<br>
<br></blockquote><div><br></div><div>Yes, that&#39;s fine, the question is =
about whether the internal encoding is stable. If it&#39;s stable, we can u=
se make_multibyte_string; if not, we can only use make_unibyte_string.=C2=
=A0</div></div></div>

--001a114b6af604c30305250a7f9a--