From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Philipp Stephani <p.stephani2@gmail.com>
Newsgroups: gmane.emacs.devel
Subject: Re: Dynamic loading progress
Date: Sat, 21 Nov 2015 12:11:45 +0000
Message-ID: <CAArVCkR+LqXPbHnWKW+2FQ61z+AyWR6ThBAb5ens=mwN+rS_mQ@mail.gmail.com>
References: <CA+5B0FOuWbpBUTsrE4tzzoLxACPQ-mgxx7zJKyW2LR77QRM=Ug@mail.gmail.com>
	<83k2ptq5t3.fsf@gnu.org> <87h9kxx60e.fsf@lifelogs.com>
	<CA+5B0FPp9nYEmoyDLrutJpcOZBtpV9kxx7LdPqrsj26rnj11qA@mail.gmail.com>
	<CAArVCkS515CVbS1UfavFGAq0dGO=e_mGftMbhF_eBw3SSu3Xjg@mail.gmail.com>
	<877flswse5.fsf@lifelogs.com>
	<CAArVCkT0M8o4MDP1RaP-r9JqumoQaMbhANRrycSEyyCj+mqUcA@mail.gmail.com>
	<8737wgw7kf.fsf@lifelogs.com>
	<CA+5B0FOGrn01XZzKJvXdWLPL62ONUzoEBfQRwLiKqLmd6Ta3RA@mail.gmail.com>
	<87io5bv1it.fsf@lifelogs.com>
	<CA+5B0FOp8Ub1+V_2G4CC1r2aG1hLKmZdSic59MfOy=9QoovSRQ@mail.gmail.com>
	<87egfzuwca.fsf@lifelogs.com>
	<CAArVCkSEHxSd3X2PnEvRJk5n1wOR0y9neU7AxGYEHSqKRG+y3Q@mail.gmail.com>
	<876118u6f2.fsf@lifelogs.com>
	<CA+5B0FPz-vo+Y=38=21jRQuEHANzFG_cf3tPDiwEbK2TO4+JdA@mail.gmail.com>
	<CA+5B0FNW48d3S5CJfxHK9HHVHPmuYqaT3K9tn5MVTgv_qas5Rw@mail.gmail.com>
	<ryhmvud820v.fsf@dod.no>
	<CA+5B0FMU1Ry6mRSinyV5Ar8DaL4VciEUEbTe1NcXZUQ2-4y4TA@mail.gmail.com>
	<8737w3qero.fsf@lifelogs.com> <831tbn9g9j.fsf@gnu.org>
	<878u5upw7o.fsf@lifelogs.com>
	<83ziya8xph.fsf@gnu.org> <83y4du80xo.fsf@gnu.org>
	<CAArVCkTwVbA58_wfj7O-Et83M8YJ9jfpCKhYn466BYO8T2cG0A@mail.gmail.com>
	<837fld6lps.fsf@gnu.org>
	<CAArVCkSTdg=EjSiN69TqLoH_ufkz_vzV6qLKNae2QbEXadYomg@mail.gmail.com>
	<83si3z4s5n.fsf@gnu.org>
	<CAArVCkQ0qUTUr5GZ+xmCub2tEWc0YzFKRsHEN-FFv3ioAc2n0w@mail.gmail.com>
	<83mvu74nhm.fsf@gnu.org>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: multipart/alternative; boundary=089e0102def4df6bf505250be57d
X-Trace: ger.gmane.org 1448107931 6884 80.91.229.3 (21 Nov 2015 12:12:11 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Sat, 21 Nov 2015 12:12:11 +0000 (UTC)
Cc: aurelien.aptel+emacs@gmail.com, tzz@lifelogs.com, emacs-devel@gnu.org
To: Eli Zaretskii <eliz@gnu.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Nov 21 13:12:06 2015
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1a071R-0002kY-UP
	for ged-emacs-devel@m.gmane.org; Sat, 21 Nov 2015 13:12:06 +0100
Original-Received: from localhost ([::1]:52033 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1a071R-0005eV-CC
	for ged-emacs-devel@m.gmane.org; Sat, 21 Nov 2015 07:12:05 -0500
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:56684)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <p.stephani2@gmail.com>) id 1a071M-0005cQ-3i
	for emacs-devel@gnu.org; Sat, 21 Nov 2015 07:12:01 -0500
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <p.stephani2@gmail.com>) id 1a071K-0004fM-3c
	for emacs-devel@gnu.org; Sat, 21 Nov 2015 07:11:59 -0500
Original-Received: from mail-wm0-x235.google.com ([2a00:1450:400c:c09::235]:33881)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <p.stephani2@gmail.com>)
	id 1a071H-0004f7-Mz; Sat, 21 Nov 2015 07:11:55 -0500
Original-Received: by wmvv187 with SMTP id v187so104565024wmv.1;
	Sat, 21 Nov 2015 04:11:55 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:references:in-reply-to:from:date:message-id:subject:to
	:cc:content-type;
	bh=X/C8EPfCj7UHhYdf8Jbmn4fkgNYoQCLjdC99CvNugM0=;
	b=ISwXSas4ONWHd6QdxDt0+m+T8i3w/44INWAbMTgXmdkNDhPTc1Klb9JCAuXMkXyrDU
	OKpVnwj3Z8EbEyLQz/EMGXTGm2ljRl/JVTCquuvhSN+Elf00V2kbyVwCqkY+08a7GRIk
	vMLNyLxgLW2dHSKqNbOAZrhGLWPbds492A4r42DUAMochOwD7ee1IdUEEAC4lsqv8zMc
	TjeC9D2vUGI3g4fYJ7Dky+tKxcFCYv/u1Gxwtgdf5ot6BVWJvEtrISTFZG0Bj5VG7VQq
	y8Hf0qTrsa0eyJULVKSCjXlPD61rTwGtZLo/OsNK3FiX5JqQtbEeSbW8tHiNG6kUlRB/
	h1YQ==
X-Received: by 10.194.87.39 with SMTP id u7mr20234164wjz.11.1448107915043;
	Sat, 21 Nov 2015 04:11:55 -0800 (PST)
In-Reply-To: <83mvu74nhm.fsf@gnu.org>
X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address
	(bad octet value).
X-Received-From: 2a00:1450:400c:c09::235
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:194953
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/194953>

--089e0102def4df6bf505250be57d
Content-Type: text/plain; charset=UTF-8

Eli Zaretskii <eliz@gnu.org> schrieb am Sa., 21. Nov. 2015 um 12:10 Uhr:

> >     (Btw, I don't think we should worry about changing the internal
> >     representation of characters in Emacs, because make_multibyte_string
> >     will be updated as needed.)
> >
> > This is a crucial point. If the internal encoding never changes, then we
> can
> > declare that those string parameters are expected to be in the internal
> > encoding.
>
> No, we cannot, or rather should not.  It is unreasonable to expect
> external modules to know the intricacies of the internal
> representation.  Most Emacs hackers don't.
>

Fine with me, but how would we then represent Emacs strings that are not
valid Unicode strings? Just raise an error?


>
> > But see the discussion in
> > https://github.com/aaptel/emacs-dynamic-module/issues/37: the comment in
> > mule-conf.el seems to indicate that the internal encoding is not stable.
>
> That discussion is about zero-copy access to Emacs buffer text and
> Emacs strings inside module code.


Partially, the encoding discussion is also part of that because it's
required to specify the encoding before zero-copy access is even possible.


>   Such access is indeed impossible
> without either knowing _something_ about the internal representation,
> or having additional APIs in emacs-module.c that allow modules such
> access while hiding the details of the internal representation.  We
> could discuss extending the module functionality to include this.
>
>
Yes, there's no need for that in this subthread though.


> But that is a separate issue from what module_make_function and
> module_make_string do.  These two functions are basic, and don't need
> to know about the internal representation or use it.  While direct
> access to Emacs buffer text will be needed by only some modules,
> module_make_function will be used by all of them, and
> module_make_string by many.
>
> So I think we shouldn't conflate these two issues; they are separate.
>

OK.


>
> >     This is what my comments were about. I think that you, by contrast,
> >     are talking about the encoding of the _input_ strings, in this case
> >     the 'documentation' argument to module_make_function and 'str'
> >     argument to module_make_string. My assumption was that these
> >     arguments will always have to be in UTF-8 encoding; if that
> assumption
> >     is true, then no decoding via code_convert_string_norecord is
> >     necessary, since make_multibyte_string will DTRT. We can (and
> >     probably should) document the fact that all non-ASCII strings must be
> >     UTF-8 encoded as a requirement of the emacs-module interface.
> >
> > Or rather, an extension to UTF-8 capable of encoding surrogate code
> points and
> > numbers that are not code points, as described in
> >
> https://www.gnu.org/software/emacs/manual/html_node/elisp/Text-Representations.html
> .
>
> No, I meant strict UTF-8, not its Emacs extension.
>

That would be possible and provide a clean interface. However, Emacs
strings are extended, so we'd need to specify how they interact with UTF-8
strings.

   - If a module passes a char sequence that's not a valid UTF-8 string,
   but a valid Emacs multibyte string, what should happen? Error, undefined
   behavior, silently accepted?
   - If copy_string_contents is passed an Emacs string that is not a valid
   Unicode string, what should happen? Error, or should the internal
   representation be silently leaked?


> > If it's stable, we can use make_multibyte_string; if not, we can
> > only use make_unibyte_string.
>
> If the arguments strings are in strict UTF-8, then
> make_multibyte_string will DTRT automagically, no matter what the
> internal representation is.  That is their contract.
>

OK, then we can use that, of course. The question of handling invalid UTF-8
strings is still open, though, as make_multibyte_string doesn't enforce
valid UTF-8.
If it's the contract of make_multibyte_string that it will always accept
UTF-8, then that should be added as a comment to that function. Currently I
don't see it documented anywhere.

--089e0102def4df6bf505250be57d
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><br><br><div class=3D"gmail_quote"><div dir=3D"ltr">Eli Za=
retskii &lt;<a href=3D"mailto:eliz@gnu.org">eliz@gnu.org</a>&gt; schrieb am=
 Sa., 21. Nov. 2015 um 12:10=C2=A0Uhr:<br></div><blockquote class=3D"gmail_=
quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1=
ex">&gt;=C2=A0 =C2=A0 =C2=A0(Btw, I don&#39;t think we should worry about c=
hanging the internal<br>
&gt;=C2=A0 =C2=A0 =C2=A0representation of characters in Emacs, because make=
_multibyte_string<br>
&gt;=C2=A0 =C2=A0 =C2=A0will be updated as needed.)<br>
&gt;<br>
&gt; This is a crucial point. If the internal encoding never changes, then =
we can<br>
&gt; declare that those string parameters are expected to be in the interna=
l<br>
&gt; encoding.<br>
<br>
No, we cannot, or rather should not.=C2=A0 It is unreasonable to expect<br>
external modules to know the intricacies of the internal<br>
representation.=C2=A0 Most Emacs hackers don&#39;t.<br></blockquote><div><b=
r></div><div>Fine with me, but how would we then represent Emacs strings th=
at are not valid Unicode strings? Just raise an error?</div><div>=C2=A0</di=
v><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:=
1px #ccc solid;padding-left:1ex">
<br>
&gt; But see the discussion in<br>
&gt; <a href=3D"https://github.com/aaptel/emacs-dynamic-module/issues/37" r=
el=3D"noreferrer" target=3D"_blank">https://github.com/aaptel/emacs-dynamic=
-module/issues/37</a>: the comment in<br>
&gt; mule-conf.el seems to indicate that the internal encoding is not stabl=
e.<br>
<br>
That discussion is about zero-copy access to Emacs buffer text and<br>
Emacs strings inside module code.</blockquote><div><br></div><div>Partially=
, the encoding discussion is also part of that because it&#39;s required to=
 specify the encoding before zero-copy access is even possible.</div><div>=
=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;bo=
rder-left:1px #ccc solid;padding-left:1ex">=C2=A0 Such access is indeed imp=
ossible<br>
without either knowing _something_ about the internal representation,<br>
or having additional APIs in emacs-module.c that allow modules such<br>
access while hiding the details of the internal representation.=C2=A0 We<br=
>
could discuss extending the module functionality to include this.<br>
<br></blockquote><div><br></div><div>Yes, there&#39;s no need for that in t=
his subthread though.</div><div>=C2=A0</div><blockquote class=3D"gmail_quot=
e" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
But that is a separate issue from what module_make_function and<br>
module_make_string do.=C2=A0 These two functions are basic, and don&#39;t n=
eed<br>
to know about the internal representation or use it.=C2=A0 While direct<br>
access to Emacs buffer text will be needed by only some modules,<br>
module_make_function will be used by all of them, and<br>
module_make_string by many.<br>
<br>
So I think we shouldn&#39;t conflate these two issues; they are separate.<b=
r></blockquote><div><br></div><div>OK.</div><div>=C2=A0</div><blockquote cl=
ass=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;p=
adding-left:1ex">
<br>
&gt;=C2=A0 =C2=A0 =C2=A0This is what my comments were about. I think that y=
ou, by contrast,<br>
&gt;=C2=A0 =C2=A0 =C2=A0are talking about the encoding of the _input_ strin=
gs, in this case<br>
&gt;=C2=A0 =C2=A0 =C2=A0the &#39;documentation&#39; argument to module_make=
_function and &#39;str&#39;<br>
&gt;=C2=A0 =C2=A0 =C2=A0argument to module_make_string. My assumption was t=
hat these<br>
&gt;=C2=A0 =C2=A0 =C2=A0arguments will always have to be in UTF-8 encoding;=
 if that assumption<br>
&gt;=C2=A0 =C2=A0 =C2=A0is true, then no decoding via code_convert_string_n=
orecord is<br>
&gt;=C2=A0 =C2=A0 =C2=A0necessary, since make_multibyte_string will DTRT. W=
e can (and<br>
&gt;=C2=A0 =C2=A0 =C2=A0probably should) document the fact that all non-ASC=
II strings must be<br>
&gt;=C2=A0 =C2=A0 =C2=A0UTF-8 encoded as a requirement of the emacs-module =
interface.<br>
&gt;<br>
&gt; Or rather, an extension to UTF-8 capable of encoding surrogate code po=
ints and<br>
&gt; numbers that are not code points, as described in<br>
&gt; <a href=3D"https://www.gnu.org/software/emacs/manual/html_node/elisp/T=
ext-Representations.html" rel=3D"noreferrer" target=3D"_blank">https://www.=
gnu.org/software/emacs/manual/html_node/elisp/Text-Representations.html</a>=
.<br>
<br>
No, I meant strict UTF-8, not its Emacs extension.<br></blockquote><div><br=
></div><div>That would be possible and provide a clean interface. However, =
Emacs strings are extended, so we&#39;d need to specify how they interact w=
ith UTF-8 strings.</div><div><ul><li>If a module passes a char sequence tha=
t&#39;s not a valid UTF-8 string, but a valid Emacs multibyte string, what =
should happen? Error, undefined behavior, silently accepted?</li><li>If cop=
y_string_contents is passed an Emacs string that is not a valid Unicode str=
ing, what should happen? Error, or should the internal representation be si=
lently leaked?</li></ul></div><div>=C2=A0</div><blockquote class=3D"gmail_q=
uote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1e=
x">&gt; If it&#39;s stable, we can use make_multibyte_string; if not, we ca=
n<br>
&gt; only use make_unibyte_string.<br>
<br>
If the arguments strings are in strict UTF-8, then<br>
make_multibyte_string will DTRT automagically, no matter what the<br>
internal representation is.=C2=A0 That is their contract.<br></blockquote><=
div><br></div><div>OK, then we can use that, of course. The question of han=
dling invalid UTF-8 strings is still open, though, as make_multibyte_string=
 doesn&#39;t enforce valid UTF-8.</div><div>If it&#39;s the contract of mak=
e_multibyte_string that it will always accept UTF-8, then that should be ad=
ded as a comment to that function. Currently I don&#39;t see it documented =
anywhere.=C2=A0</div></div></div>

--089e0102def4df6bf505250be57d--