From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Philipp Stephani <p.stephani2@gmail.com>
Newsgroups: gmane.emacs.devel
Subject: Re: Dynamic loading progress
Date: Sun, 22 Nov 2015 19:37:43 +0000
Message-ID: <CAArVCkQB+F9f1JOP9uqYf2nk=NJ_PcfbwWkLK1Cp1jRE24gH2w@mail.gmail.com>
References: <CA+5B0FOuWbpBUTsrE4tzzoLxACPQ-mgxx7zJKyW2LR77QRM=Ug@mail.gmail.com>
	<83k2ptq5t3.fsf@gnu.org> <87h9kxx60e.fsf@lifelogs.com>
	<CA+5B0FPp9nYEmoyDLrutJpcOZBtpV9kxx7LdPqrsj26rnj11qA@mail.gmail.com>
	<CAArVCkS515CVbS1UfavFGAq0dGO=e_mGftMbhF_eBw3SSu3Xjg@mail.gmail.com>
	<877flswse5.fsf@lifelogs.com>
	<CAArVCkT0M8o4MDP1RaP-r9JqumoQaMbhANRrycSEyyCj+mqUcA@mail.gmail.com>
	<8737wgw7kf.fsf@lifelogs.com>
	<CA+5B0FOGrn01XZzKJvXdWLPL62ONUzoEBfQRwLiKqLmd6Ta3RA@mail.gmail.com>
	<87io5bv1it.fsf@lifelogs.com>
	<CA+5B0FOp8Ub1+V_2G4CC1r2aG1hLKmZdSic59MfOy=9QoovSRQ@mail.gmail.com>
	<87egfzuwca.fsf@lifelogs.com>
	<CAArVCkSEHxSd3X2PnEvRJk5n1wOR0y9neU7AxGYEHSqKRG+y3Q@mail.gmail.com>
	<876118u6f2.fsf@lifelogs.com>
	<CA+5B0FPz-vo+Y=38=21jRQuEHANzFG_cf3tPDiwEbK2TO4+JdA@mail.gmail.com>
	<CA+5B0FNW48d3S5CJfxHK9HHVHPmuYqaT3K9tn5MVTgv_qas5Rw@mail.gmail.com>
	<ryhmvud820v.fsf@dod.no>
	<CA+5B0FMU1Ry6mRSinyV5Ar8DaL4VciEUEbTe1NcXZUQ2-4y4TA@mail.gmail.com>
	<8737w3qero.fsf@lifelogs.com> <831tbn9g9j.fsf@gnu.org>
	<878u5upw7o.fsf@lifelogs.com>
	<83ziya8xph.fsf@gnu.org> <83y4du80xo.fsf@gnu.org>
	<CAArVCkTwVbA58_wfj7O-Et83M8YJ9jfpCKhYn466BYO8T2cG0A@mail.gmail.com>
	<837fld6lps.fsf@gnu.org>
	<CAArVCkSTdg=EjSiN69TqLoH_ufkz_vzV6qLKNae2QbEXadYomg@mail.gmail.com>
	<83si3z4s5n.fsf@gnu.org>
	<CAArVCkQ0qUTUr5GZ+xmCub2tEWc0YzFKRsHEN-FFv3ioAc2n0w@mail.gmail.com>
	<83mvu74nhm.fsf@gnu.org>
	<CAArVCkR+LqXPbHnWKW+2FQ61z+AyWR6ThBAb5ens=mwN+rS_mQ@mail.gmail.com>
	<83d1v34hba.fsf@gnu.org>
	<CAArVCkRBF7+yJcFiYA6KmZzKp5EGP6iauQ=0hkH5KJZbMRH7LA@mail.gmail.com>
	<83io4u2aze.fsf@gnu.org>
	<CAArVCkROBfCxh1qcSW9ApP-6m60YFyMR7H3W0xZ_rkYauF8umg@mail.gmail.com>
	<8337vx3kp6.fsf@gnu.org>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: multipart/alternative; boundary=047d7bb04dc49b53d70525263efa
X-Trace: ger.gmane.org 1448221088 7473 80.91.229.3 (22 Nov 2015 19:38:08 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Sun, 22 Nov 2015 19:38:08 +0000 (UTC)
Cc: aurelien.aptel+emacs@gmail.com, tzz@lifelogs.com, emacs-devel@gnu.org
To: Eli Zaretskii <eliz@gnu.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Nov 22 20:38:07 2015
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1a0aSX-0000EJ-Jp
	for ged-emacs-devel@m.gmane.org; Sun, 22 Nov 2015 20:38:01 +0100
Original-Received: from localhost ([::1]:57215 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1a0aSX-00043T-HV
	for ged-emacs-devel@m.gmane.org; Sun, 22 Nov 2015 14:38:01 -0500
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:37666)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <p.stephani2@gmail.com>) id 1a0aST-00043M-S8
	for emacs-devel@gnu.org; Sun, 22 Nov 2015 14:37:59 -0500
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <p.stephani2@gmail.com>) id 1a0aSS-0000o8-08
	for emacs-devel@gnu.org; Sun, 22 Nov 2015 14:37:57 -0500
Original-Received: from mail-wm0-x232.google.com ([2a00:1450:400c:c09::232]:33274)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <p.stephani2@gmail.com>)
	id 1a0aSP-0000nk-Kt; Sun, 22 Nov 2015 14:37:53 -0500
Original-Received: by wmec201 with SMTP id c201so134426866wme.0;
	Sun, 22 Nov 2015 11:37:53 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:references:in-reply-to:from:date:message-id:subject:to
	:cc:content-type;
	bh=+WNZXhyrnyzuvdz9xDXDJVorav3q6iIhD22G9WJhKOM=;
	b=hh0+3ICcdYhiL8w3uW6pz0obLHKPbQ5y4so+a0WlSV6BX44uwm1SySUEaRjawPkOJ1
	rHhrQ/d5KyfizQs3NZ9eLaJ5vx0RQMuwx5oD+aLOCUF97zCo7b5SZ92dj+VSBhlTybO6
	5UBxmr99xgaGiUPKgrf2rMMmos6PywM3/q33EbdFKTS6eUjrUy8jMZXXvCW6IByKshux
	svZD5Xze7wsOhwTxb4bKnhgGmqY2qbJ7vSpVizQLV7SkOZanYKxZZhqbnuksHWcPt2qm
	oY3s8YZBLJbPCxHXLCdPwY3c0rScHvDeYcwVTiuSNCRCFb7GDRLfKsMhrGfKyj38VC1E
	6ehQ==
X-Received: by 10.195.13.135 with SMTP id ey7mr26123983wjd.25.1448221072902;
	Sun, 22 Nov 2015 11:37:52 -0800 (PST)
In-Reply-To: <8337vx3kp6.fsf@gnu.org>
X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address
	(bad octet value).
X-Received-From: 2a00:1450:400c:c09::232
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:195056
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/195056>

--047d7bb04dc49b53d70525263efa
Content-Type: text/plain; charset=UTF-8

Eli Zaretskii <eliz@gnu.org> schrieb am So., 22. Nov. 2015 um 20:20 Uhr:

> > From: Philipp Stephani <p.stephani2@gmail.com>
> > Date: Sun, 22 Nov 2015 18:19:29 +0000
> > Cc: tzz@lifelogs.com, aurelien.aptel+emacs@gmail.com,
> emacs-devel@gnu.org
> >
> >     I already suggested what we should say in the documentation: that
> >     these interfaces accept and produce UTF-8 encoded non-ASCII text.
> >
> >
> > If the interface accepts UTF-8, then it must signal an error for invalid
> > sequences; the Unicode standard mandates this.
>
> The Unicode standard cannot mandate anything for Emacs, because Emacs
> is not subject to Unicode standardization.
>

True, but I think we shouldn't make the terminology more confusing. If we
say "UTF-8", we should mean "UTF-8 as defined in the Unicode standard", not
the Emacs extension of UTF-8. That's all.


>
> > If the interface produces UTF-8, then it must only ever produce valid
> > sequences
>
> As I explained, this would violate the basic expectation from a text
> editing program.
>
> > That's why I propose to not encode raw bytes as bytes, but as the Emacs
> integer
> > codes used to represent them.
>
> If we do that, no external code will be able to do anything useful
> with such "bytes".  Module authors will have to write their own
> replacements for library functions.  This will never be accepted by
> our users.
>

I wouldn't be so pessimistic, but I was convinced by consistency with
encode-coding-string. So yes, let's use the raw bytes (and document that).


>
> > If any byte sequence is accepted, then the behavior becomes more
> complex. We
> > need to exhaustively describe the behavior for any possible byte
> sequence,
> > otherwise module authors cannot make any assumption.
>
> We say that we accept valid UTF-8 encoded strings; anything else
> might produce invalid UTF-8 on output.
>

Couldn't we just say "it behaves as if encoding and decoding were done
using the utf-8-unix coding system"? Because I think that's what this boils
down to.


>
> > No matter what we expect or tolerate, we need to state that.
>
> No, we don't.  When the callers violate the contract, they cannot
> expect to know in detail what will happen.  If they want to know, they
> will have to read the source.
>

So you want this to be unspecified or undefined behavior? That might be OK
(we already have that in several places), but we still need to state what
the contract is.


>
> > Module authors are not end users.
>
> They are users like anyone who writes Lisp.  They came to expect that
> Emacs behaves in certain ways, and modules should follow suit.
>
> > I agree that end users should not see errors on decoding failure,
> > but modules use only programmatic access, where we can be more
> > strict.
>
> You cannot be more strict, unless you rewrite the whole
> encoding/decoding machinery, or write specialized code to detect and
> reject invalid UTF-8 before it is passed to a decoder.  There are no
> good reasons to do either, so let's not.
>
> > An Emacs string is a sequence of integers.
>
> No, it's a sequence of bytes.
>

From
https://www.gnu.org/software/emacs/manual/html_node/elisp/String-Basics.html
:
"In Emacs Lisp, characters are simply integers ... A string is a fixed
sequence of characters"
How a string is represented internally shouldn't be the concern of module
authors.


>
> > I agree that we shouldn't add such limitations. But I disagree that we
> should
> > leave the behavior undocumented in such cases.
>
> OK, so let's agree to disagree.  If that disagreement gets in your way
> of fixing the issues related to this discussion, please say so, and I
> will fix them myself
>
>
No, I will definitely fix it. I think our disagreement is way smaller than
it might look like.

--047d7bb04dc49b53d70525263efa
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><br><br><div class=3D"gmail_quote"><div dir=3D"ltr">Eli Za=
retskii &lt;<a href=3D"mailto:eliz@gnu.org">eliz@gnu.org</a>&gt; schrieb am=
 So., 22. Nov. 2015 um 20:20=C2=A0Uhr:<br></div><blockquote class=3D"gmail_=
quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1=
ex">&gt; From: Philipp Stephani &lt;<a href=3D"mailto:p.stephani2@gmail.com=
" target=3D"_blank">p.stephani2@gmail.com</a>&gt;<br>
&gt; Date: Sun, 22 Nov 2015 18:19:29 +0000<br>
&gt; Cc: <a href=3D"mailto:tzz@lifelogs.com" target=3D"_blank">tzz@lifelogs=
.com</a>, <a href=3D"mailto:aurelien.aptel%2Bemacs@gmail.com" target=3D"_bl=
ank">aurelien.aptel+emacs@gmail.com</a>, <a href=3D"mailto:emacs-devel@gnu.=
org" target=3D"_blank">emacs-devel@gnu.org</a><br>
&gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0I already suggested what we should say in the docum=
entation: that<br>
&gt;=C2=A0 =C2=A0 =C2=A0these interfaces accept and produce UTF-8 encoded n=
on-ASCII text.<br>
&gt;<br>
&gt;<br>
&gt; If the interface accepts UTF-8, then it must signal an error for inval=
id<br>
&gt; sequences; the Unicode standard mandates this.<br>
<br>
The Unicode standard cannot mandate anything for Emacs, because Emacs<br>
is not subject to Unicode standardization.<br></blockquote><div><br></div><=
div>True, but I think we shouldn&#39;t make the terminology more confusing.=
 If we say &quot;UTF-8&quot;, we should mean &quot;UTF-8 as defined in the =
Unicode standard&quot;, not the Emacs extension of UTF-8. That&#39;s all.</=
div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 =
0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
&gt; If the interface produces UTF-8, then it must only ever produce valid<=
br>
&gt; sequences<br>
<br>
As I explained, this would violate the basic expectation from a text<br>
editing program.<br>
<br>
&gt; That&#39;s why I propose to not encode raw bytes as bytes, but as the =
Emacs integer<br>
&gt; codes used to represent them.<br>
<br>
If we do that, no external code will be able to do anything useful<br>
with such &quot;bytes&quot;.=C2=A0 Module authors will have to write their =
own<br>
replacements for library functions.=C2=A0 This will never be accepted by<br=
>
our users.<br></blockquote><div><br></div><div>I wouldn&#39;t be so pessimi=
stic, but I was convinced by consistency with encode-coding-string. So yes,=
 let&#39;s use the raw bytes (and document that).</div><div>=C2=A0</div><bl=
ockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #=
ccc solid;padding-left:1ex">
<br>
&gt; If any byte sequence is accepted, then the behavior becomes more compl=
ex. We<br>
&gt; need to exhaustively describe the behavior for any possible byte seque=
nce,<br>
&gt; otherwise module authors cannot make any assumption.<br>
<br>
We say that we accept valid UTF-8 encoded strings; anything else<br>
might produce invalid UTF-8 on output.<br></blockquote><div><br></div><div>=
Couldn&#39;t we just say &quot;it behaves as if encoding and decoding were =
done using the utf-8-unix coding system&quot;? Because I think that&#39;s w=
hat this boils down to.</div><div>=C2=A0</div><blockquote class=3D"gmail_qu=
ote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex=
">
<br>
&gt; No matter what we expect or tolerate, we need to state that.<br>
<br>
No, we don&#39;t.=C2=A0 When the callers violate the contract, they cannot<=
br>
expect to know in detail what will happen.=C2=A0 If they want to know, they=
<br>
will have to read the source.<br></blockquote><div><br></div><div>So you wa=
nt this to be unspecified or undefined behavior? That might be OK (we alrea=
dy have that in several places), but we still need to state what the contra=
ct is.</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"mar=
gin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
&gt; Module authors are not end users.<br>
<br>
They are users like anyone who writes Lisp.=C2=A0 They came to expect that<=
br>
Emacs behaves in certain ways, and modules should follow suit.<br>
<br>
&gt; I agree that end users should not see errors on decoding failure,<br>
&gt; but modules use only programmatic access, where we can be more<br>
&gt; strict.<br>
<br>
You cannot be more strict, unless you rewrite the whole<br>
encoding/decoding machinery, or write specialized code to detect and<br>
reject invalid UTF-8 before it is passed to a decoder.=C2=A0 There are no<b=
r>
good reasons to do either, so let&#39;s not.<br>
<br>
&gt; An Emacs string is a sequence of integers.<br>
<br>
No, it&#39;s a sequence of bytes.<br></blockquote><div><br></div><div>From=
=C2=A0<a href=3D"https://www.gnu.org/software/emacs/manual/html_node/elisp/=
String-Basics.html">https://www.gnu.org/software/emacs/manual/html_node/eli=
sp/String-Basics.html</a>:</div><div>&quot;In Emacs Lisp, characters are si=
mply integers ...=C2=A0A string is a fixed sequence of characters&quot;</di=
v><div>How a string is represented internally shouldn&#39;t be the concern =
of module authors.</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" =
style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
&gt; I agree that we shouldn&#39;t add such limitations. But I disagree tha=
t we should<br>
&gt; leave the behavior undocumented in such cases.<br>
<br>
OK, so let&#39;s agree to disagree.=C2=A0 If that disagreement gets in your=
 way<br>
of fixing the issues related to this discussion, please say so, and I<br>
will fix them myself<br>
<br></blockquote><div><br></div><div>No, I will definitely fix it. I think =
our disagreement is way smaller than it might look like.=C2=A0</div></div><=
/div>

--047d7bb04dc49b53d70525263efa--