From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Philipp Stephani <p.stephani2@gmail.com>
Newsgroups: gmane.emacs.devel
Subject: Re: Dynamic loading progress
Date: Sun, 22 Nov 2015 09:25:08 +0000
Message-ID: <CAArVCkRBF7+yJcFiYA6KmZzKp5EGP6iauQ=0hkH5KJZbMRH7LA@mail.gmail.com>
References: <CA+5B0FOuWbpBUTsrE4tzzoLxACPQ-mgxx7zJKyW2LR77QRM=Ug@mail.gmail.com>
	<83k2ptq5t3.fsf@gnu.org> <87h9kxx60e.fsf@lifelogs.com>
	<CA+5B0FPp9nYEmoyDLrutJpcOZBtpV9kxx7LdPqrsj26rnj11qA@mail.gmail.com>
	<CAArVCkS515CVbS1UfavFGAq0dGO=e_mGftMbhF_eBw3SSu3Xjg@mail.gmail.com>
	<877flswse5.fsf@lifelogs.com>
	<CAArVCkT0M8o4MDP1RaP-r9JqumoQaMbhANRrycSEyyCj+mqUcA@mail.gmail.com>
	<8737wgw7kf.fsf@lifelogs.com>
	<CA+5B0FOGrn01XZzKJvXdWLPL62ONUzoEBfQRwLiKqLmd6Ta3RA@mail.gmail.com>
	<87io5bv1it.fsf@lifelogs.com>
	<CA+5B0FOp8Ub1+V_2G4CC1r2aG1hLKmZdSic59MfOy=9QoovSRQ@mail.gmail.com>
	<87egfzuwca.fsf@lifelogs.com>
	<CAArVCkSEHxSd3X2PnEvRJk5n1wOR0y9neU7AxGYEHSqKRG+y3Q@mail.gmail.com>
	<876118u6f2.fsf@lifelogs.com>
	<CA+5B0FPz-vo+Y=38=21jRQuEHANzFG_cf3tPDiwEbK2TO4+JdA@mail.gmail.com>
	<CA+5B0FNW48d3S5CJfxHK9HHVHPmuYqaT3K9tn5MVTgv_qas5Rw@mail.gmail.com>
	<ryhmvud820v.fsf@dod.no>
	<CA+5B0FMU1Ry6mRSinyV5Ar8DaL4VciEUEbTe1NcXZUQ2-4y4TA@mail.gmail.com>
	<8737w3qero.fsf@lifelogs.com> <831tbn9g9j.fsf@gnu.org>
	<878u5upw7o.fsf@lifelogs.com>
	<83ziya8xph.fsf@gnu.org> <83y4du80xo.fsf@gnu.org>
	<CAArVCkTwVbA58_wfj7O-Et83M8YJ9jfpCKhYn466BYO8T2cG0A@mail.gmail.com>
	<837fld6lps.fsf@gnu.org>
	<CAArVCkSTdg=EjSiN69TqLoH_ufkz_vzV6qLKNae2QbEXadYomg@mail.gmail.com>
	<83si3z4s5n.fsf@gnu.org>
	<CAArVCkQ0qUTUr5GZ+xmCub2tEWc0YzFKRsHEN-FFv3ioAc2n0w@mail.gmail.com>
	<83mvu74nhm.fsf@gnu.org>
	<CAArVCkR+LqXPbHnWKW+2FQ61z+AyWR6ThBAb5ens=mwN+rS_mQ@mail.gmail.com>
	<83d1v34hba.fsf@gnu.org>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: multipart/alternative; boundary=001a1148eec6d91d7d05251daf90
X-Trace: ger.gmane.org 1448184347 16226 80.91.229.3 (22 Nov 2015 09:25:47 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Sun, 22 Nov 2015 09:25:47 +0000 (UTC)
Cc: aurelien.aptel+emacs@gmail.com, tzz@lifelogs.com, emacs-devel@gnu.org
To: Eli Zaretskii <eliz@gnu.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Nov 22 10:25:46 2015
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1a0Qu0-0000sq-PS
	for ged-emacs-devel@m.gmane.org; Sun, 22 Nov 2015 10:25:45 +0100
Original-Received: from localhost ([::1]:55291 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1a0Qu0-0002kx-HD
	for ged-emacs-devel@m.gmane.org; Sun, 22 Nov 2015 04:25:44 -0500
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:42669)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <p.stephani2@gmail.com>) id 1a0Qte-0002kq-Dg
	for emacs-devel@gnu.org; Sun, 22 Nov 2015 04:25:24 -0500
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <p.stephani2@gmail.com>) id 1a0Qtc-0007rs-WD
	for emacs-devel@gnu.org; Sun, 22 Nov 2015 04:25:22 -0500
Original-Received: from mail-wm0-x236.google.com ([2a00:1450:400c:c09::236]:34899)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <p.stephani2@gmail.com>)
	id 1a0Qta-0007rL-KO; Sun, 22 Nov 2015 04:25:18 -0500
Original-Received: by wmuu63 with SMTP id u63so21796420wmu.0;
	Sun, 22 Nov 2015 01:25:18 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:references:in-reply-to:from:date:message-id:subject:to
	:cc:content-type;
	bh=npe1XRt1Si51k1FNDPldQwLpyVRG3kcYguFC6D/mV5A=;
	b=dRUddMetUDo9rLQGJ5rz75/od0gbtyZXyuc1FllOJymgj3a2lXTXhVQMce1L9FuOUl
	lNe17+mNIsXgH3IqVUkq0pexJVfFoqRnR7cUcgRLkWnRd7TxbOSF6bGF0JuIEblw+rsY
	R6IcNlMFpFn1hAqr37Ayw3H56jfsBiCqVZl49h/tHWEhAuDKfsc0QTPYHB7X554P5Zsk
	QdNO6vIru9sW9hVlw8Yc/wIDP7rG7WyGOd/FTiSm5iE5xU0ASyD/+UdxXLKVWohHnQff
	vskYXJD/3oBGTSWfuCLtnzlR4j7/0MssfvVnczz1aXyqJC0M7BQhc47ijf772sGI5fXA
	mmSw==
X-Received: by 10.28.97.197 with SMTP id v188mr13168984wmb.63.1448184318072;
	Sun, 22 Nov 2015 01:25:18 -0800 (PST)
In-Reply-To: <83d1v34hba.fsf@gnu.org>
X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address
	(bad octet value).
X-Received-From: 2a00:1450:400c:c09::236
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:194994
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/194994>

--001a1148eec6d91d7d05251daf90
Content-Type: text/plain; charset=UTF-8

Eli Zaretskii <eliz@gnu.org> schrieb am Sa., 21. Nov. 2015 um 14:23 Uhr:

> > From: Philipp Stephani <p.stephani2@gmail.com>
> > Date: Sat, 21 Nov 2015 12:11:45 +0000
> > Cc: tzz@lifelogs.com, aurelien.aptel+emacs@gmail.com,
> emacs-devel@gnu.org
> >
> >     No, we cannot, or rather should not. It is unreasonable to expect
> >     external modules to know the intricacies of the internal
> >     representation. Most Emacs hackers don't.
> >
> > Fine with me, but how would we then represent Emacs strings that are not
> valid
> > Unicode strings? Just raise an error?
>
> No need to raise an error.  Strings that are returned to modules
> should be encoded into UTF-8.  That encoding already takes care of
> these situations: it either produces the UTF-8 encoding of the
> equivalent Unicode characters, or outputs raw bytes.
>

Then we should document such a situation and give module authors a way to
detect them. For example, what happens if a sequence of such raw bytes
happens to be a valid UTF-8 sequence? Is there a way for module code to
detect this situation?


>
> We are using this all the time when we save files or send stuff over
> the network.
>
> >     No, I meant strict UTF-8, not its Emacs extension.
> >
> > That would be possible and provide a clean interface. However, Emacs
> strings
> > are extended, so we'd need to specify how they interact with UTF-8
> strings.
> >
> > * If a module passes a char sequence that's not a valid UTF-8 string,
> but a
> >   valid Emacs multibyte string, what should happen? Error, undefined
> behavior,
> >   silently accepted?
>
> We are quite capable of quietly accepting such strings, so that is
> what I would suggest.  Doing so would be in line with what Emacs does
> when such invalid sequences come from other sources, like files.
>

If we accept such strings, then we should document what the extensions are.
- Are UTF-8-like sequences encoding surrogate code points accepted?
- Are UTF-8-like sequences encoding integers outside the Unicode codespace
accepted?
- Are non-shortest forms accepted?
- Are other invalid code unit sequences accepted?
If the answer to any of these is "yes", we can't say we accept UTF-8,
because we don't. Rather we should say what is actually accepted.


>
> > * If copy_string_contents is passed an Emacs string that is not a valid
> Unicode
> >   string, what should happen?
>
> How can that happen?  The Emacs string comes from the Emacs bowels, so
> it must be "valid" string by Emacs standards.  Or maybe I don't
> understand what you mean by "invalid Unicode string".
>

A sequence of integers where at least one element is not a Unicode scalar
value.


>
> In any case, we already deal with any such problems when we save a
> buffer to a file, or send it over the network.  This isn't some new
> problem we need to cope with.
>

Yes, but the module interface is new, it doesn't necessarily have to have
the same behavior. If we say we emit only UTF-8, then we should do so.


>
> > OK, then we can use that, of course. The question of handling invalid
> UTF-8
> > strings is still open, though, as make_multibyte_string doesn't enforce
> valid
> > UTF-8.
>
> It doesn't enforce valid UTF-8 because it can handle invalid UTF-8 as
> well.  That's by design.
>

Then whatever it handles needs to be specified.


>
> > If it's the contract of make_multibyte_string that it will always accept
> UTF-8,
> > then that should be added as a comment to that function. Currently I
> don't see
> > it documented anywhere.
>
> That part of the documentation is only revealed to veteran Emacs
> hackers, subject to swearing not to reveal that to the uninitiated and
> to some blood-letting that seals the oath ;-)
>

I see ;-)

--001a1148eec6d91d7d05251daf90
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><br><br><div class=3D"gmail_quote"><div dir=3D"ltr">Eli Za=
retskii &lt;<a href=3D"mailto:eliz@gnu.org">eliz@gnu.org</a>&gt; schrieb am=
 Sa., 21. Nov. 2015 um 14:23=C2=A0Uhr:<br></div><blockquote class=3D"gmail_=
quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1=
ex">&gt; From: Philipp Stephani &lt;<a href=3D"mailto:p.stephani2@gmail.com=
" target=3D"_blank">p.stephani2@gmail.com</a>&gt;<br>
&gt; Date: Sat, 21 Nov 2015 12:11:45 +0000<br>
&gt; Cc: <a href=3D"mailto:tzz@lifelogs.com" target=3D"_blank">tzz@lifelogs=
.com</a>, <a href=3D"mailto:aurelien.aptel%2Bemacs@gmail.com" target=3D"_bl=
ank">aurelien.aptel+emacs@gmail.com</a>, <a href=3D"mailto:emacs-devel@gnu.=
org" target=3D"_blank">emacs-devel@gnu.org</a><br>
&gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0No, we cannot, or rather should not. It is unreason=
able to expect<br>
&gt;=C2=A0 =C2=A0 =C2=A0external modules to know the intricacies of the int=
ernal<br>
&gt;=C2=A0 =C2=A0 =C2=A0representation. Most Emacs hackers don&#39;t.<br>
&gt;<br>
&gt; Fine with me, but how would we then represent Emacs strings that are n=
ot valid<br>
&gt; Unicode strings? Just raise an error?<br>
<br>
No need to raise an error.=C2=A0 Strings that are returned to modules<br>
should be encoded into UTF-8.=C2=A0 That encoding already takes care of<br>
these situations: it either produces the UTF-8 encoding of the<br>
equivalent Unicode characters, or outputs raw bytes.<br></blockquote><div><=
br></div><div>Then we should document such a situation and give module auth=
ors a way to detect them. For example, what happens if a sequence of such r=
aw bytes happens to be a valid UTF-8 sequence? Is there a way for module co=
de to detect this situation?</div><div>=C2=A0</div><blockquote class=3D"gma=
il_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-lef=
t:1ex">
<br>
We are using this all the time when we save files or send stuff over<br>
the network.<br>
<br>
&gt;=C2=A0 =C2=A0 =C2=A0No, I meant strict UTF-8, not its Emacs extension.<=
br>
&gt;<br>
&gt; That would be possible and provide a clean interface. However, Emacs s=
trings<br>
&gt; are extended, so we&#39;d need to specify how they interact with UTF-8=
 strings.<br>
&gt;<br>
&gt; * If a module passes a char sequence that&#39;s not a valid UTF-8 stri=
ng, but a<br>
&gt;=C2=A0 =C2=A0valid Emacs multibyte string, what should happen? Error, u=
ndefined behavior,<br>
&gt;=C2=A0 =C2=A0silently accepted?<br>
<br>
We are quite capable of quietly accepting such strings, so that is<br>
what I would suggest.=C2=A0 Doing so would be in line with what Emacs does<=
br>
when such invalid sequences come from other sources, like files.<br></block=
quote><div><br></div><div>If we accept such strings, then we should documen=
t what the extensions are.</div><div>- Are UTF-8-like sequences encoding su=
rrogate code points accepted?</div><div>- Are UTF-8-like sequences encoding=
 integers outside the Unicode codespace accepted?</div><div>- Are non-short=
est forms accepted?</div><div>- Are other invalid code unit sequences accep=
ted?</div><div>If the answer to any of these is &quot;yes&quot;, we can&#39=
;t say we accept UTF-8, because we don&#39;t. Rather we should say what is =
actually accepted.</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" =
style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
&gt; * If copy_string_contents is passed an Emacs string that is not a vali=
d Unicode<br>
&gt;=C2=A0 =C2=A0string, what should happen?<br>
<br>
How can that happen?=C2=A0 The Emacs string comes from the Emacs bowels, so=
<br>
it must be &quot;valid&quot; string by Emacs standards.=C2=A0 Or maybe I do=
n&#39;t<br>
understand what you mean by &quot;invalid Unicode string&quot;.<br></blockq=
uote><div><br></div><div>A sequence of integers where at least one element =
is not a Unicode scalar value.</div><div>=C2=A0</div><blockquote class=3D"g=
mail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-l=
eft:1ex">
<br>
In any case, we already deal with any such problems when we save a<br>
buffer to a file, or send it over the network.=C2=A0 This isn&#39;t some ne=
w<br>
problem we need to cope with.<br></blockquote><div><br></div><div>Yes, but =
the module interface is new, it doesn&#39;t necessarily have to have the sa=
me behavior. If we say we emit only UTF-8, then we should do so.</div><div>=
=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;bo=
rder-left:1px #ccc solid;padding-left:1ex">
<br>
&gt; OK, then we can use that, of course. The question of handling invalid =
UTF-8<br>
&gt; strings is still open, though, as make_multibyte_string doesn&#39;t en=
force valid<br>
&gt; UTF-8.<br>
<br>
It doesn&#39;t enforce valid UTF-8 because it can handle invalid UTF-8 as<b=
r>
well.=C2=A0 That&#39;s by design.<br></blockquote><div><br></div><div>Then =
whatever it handles needs to be specified.</div><div>=C2=A0</div><blockquot=
e class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc sol=
id;padding-left:1ex">
<br>
&gt; If it&#39;s the contract of make_multibyte_string that it will always =
accept UTF-8,<br>
&gt; then that should be added as a comment to that function. Currently I d=
on&#39;t see<br>
&gt; it documented anywhere.<br>
<br>
That part of the documentation is only revealed to veteran Emacs<br>
hackers, subject to swearing not to reveal that to the uninitiated and<br>
to some blood-letting that seals the oath ;-)<br></blockquote><div><br></di=
v><div>I see ;-)=C2=A0</div></div></div>

--001a1148eec6d91d7d05251daf90--