From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!.POSTED!not-for-mail
From: Philipp Stephani <p.stephani2@gmail.com>
Newsgroups: gmane.emacs.devel
Subject: Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
Date: Wed, 28 Dec 2016 18:45:43 +0000
Message-ID: <CAArVCkRybk6YF1zq_fvx5KLmN8M4vmiku3SYzq7TBrVGsu=wcg@mail.gmail.com>
References: <6d0c8c2e-8428-2fdb-0d6e-899f7b9d7ffd@nifty.com>
	<mvmvav6sjam.fsf@hawking.suse.de>
	<8053af81-80e1-a24a-f649-8ffc86963ed5@nifty.com>
	<0cc7fab4-9a2c-6a8d-def7-36bd50317ca3@yandex.ru>
	<7f9a799f-de88-fd78-0cdc-dac0928f1503@nifty.com>
	<m3a8cizi9i.fsf@gnus.org>
	<aec90d00-79f7-4995-3cee-4f4c0c9f9305@nifty.com>
	<a3849ba6-205c-69bd-4be1-da271867377d@yandex.ru>
	<308bb78f-8be3-092d-d877-e129d340242b@nifty.com>
	<4dc615e7-ec73-60a5-426e-0d6986f15d76@yandex.ru>
	<0cb406fb-ffc4-a4ad-557a-2cacc99b8e75@nifty.com>
	<86ccb4af-5719-c017-26bb-fc06b4c904d2@yandex.ru>
	<83r35uxkr5.fsf@gnu.org>
	<CAArVCkTdj6ZJKbJd1xYE5TySeOnKoUog_tCodqWgxPXQC9dMBg@mail.gmail.com>
	<4e12d4ad-cd6b-3087-5d7c-449d4c1886e2@yandex.ru>
	<83lgw1q9uu.fsf@gnu.org>
	<af68b5ee-287e-07e8-4c84-9a63c7fd7c2d@yandex.ru>
	<83eg1tq8is.fsf@gnu.org>
	<787e5206-53e0-752f-a339-4608d2f7ad39@yandex.ru>
	<m3r35tufye.fsf@gnus.org>
	<8360n5q6j4.fsf@gnu.org> <m3fum9ue5i.fsf@gnus.org>
	<8337i8rkbe.fsf@gnu.org>
	<CAArVCkRcHBPBWxwV+Q+kTc5yoefdPOTFu5=09-W56WpKhJEHhA@mail.gmail.com>
	<83polcpzwk.fsf@gnu.org>
	<CAArVCkSh-yNCD77mZdR4J=uOdNUX=KNSUdtHnaHBUnAUXjyVYQ@mail.gmail.com>
	<83lguzvr63.fsf@gnu.org>
NNTP-Posting-Host: blaine.gmane.org
Mime-Version: 1.0
Content-Type: multipart/alternative; boundary=001a114b41a0ec7b480544bc6023
X-Trace: blaine.gmane.org 1482950859 9505 195.159.176.226 (28 Dec 2016 18:47:39 GMT)
X-Complaints-To: usenet@blaine.gmane.org
NNTP-Posting-Date: Wed, 28 Dec 2016 18:47:39 +0000 (UTC)
Cc: larsi@gnus.org, dgutov@yandex.ru, kentaro.nakazawa@nifty.com,
	emacs-devel@gnu.org
To: Eli Zaretskii <eliz@gnu.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Dec 28 19:47:34 2016
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by blaine.gmane.org with esmtp (Exim 4.84_2)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1cMJG3-00014I-UD
	for ged-emacs-devel@m.gmane.org; Wed, 28 Dec 2016 19:47:28 +0100
Original-Received: from localhost ([::1]:60519 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1cMJG8-0000Yr-Mi
	for ged-emacs-devel@m.gmane.org; Wed, 28 Dec 2016 13:47:32 -0500
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:44563)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <p.stephani2@gmail.com>) id 1cMJEh-0000ID-TT
	for emacs-devel@gnu.org; Wed, 28 Dec 2016 13:46:07 -0500
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <p.stephani2@gmail.com>) id 1cMJEe-00034P-5a
	for emacs-devel@gnu.org; Wed, 28 Dec 2016 13:46:03 -0500
Original-Received: from mail-wm0-x231.google.com ([2a00:1450:400c:c09::231]:38866)
	by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
	(Exim 4.71) (envelope-from <p.stephani2@gmail.com>)
	id 1cMJEZ-00032a-H3; Wed, 28 Dec 2016 13:45:55 -0500
Original-Received: by mail-wm0-x231.google.com with SMTP id k184so120127342wme.1;
	Wed, 28 Dec 2016 10:45:55 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
	h=mime-version:references:in-reply-to:from:date:message-id:subject:to
	:cc; bh=vsRktpA7D8WLb1zaC5kZbtDCJksNQD8TTg8C4YwtMcc=;
	b=nLz4ssfBYeKVLuXyb+G2WBIuJYSd2IKOMKiMta9VZhjcTolWAFW9tmVguqk2Wdejmw
	K6Y2Nt2dMgD5z/2cN0KZB7D+kMl6DuiO5Aj5aIGFftQCTRTAMg45yTLLGctoXZCbZlYJ
	LchNDd36750bGwTx5od2tzdBTSaAwWZXIxdKesEOaqt4RXYfJjs8A+wdlJwNJlCP7rfq
	1rM1lIsJhnJ9aZFOjqLCzu0Es4XIO3tiiDU82UASMGnf7QYtqjz0lSPh1kZLaD9n/HoX
	HuMU4rHy4RCGeoL5mRfxv5SgLrB3scA0tjXRIF+lEMV0k9iAt7AR20QgXZf3ceRwOfWp
	QfOA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:mime-version:references:in-reply-to:from:date
	:message-id:subject:to:cc;
	bh=vsRktpA7D8WLb1zaC5kZbtDCJksNQD8TTg8C4YwtMcc=;
	b=pQ4D4AGNWp9d5rzpvZLgM2TsIyrdVdTuRCHpi7/t7nHOedBhuE4k4BvYn5K+spdIS6
	A9qoqbvh0sb/g4o/QwKVAEDtWYxaA+0Q944G3yUvwZfRxget9ZCUCidt0vEuo9Sirw2i
	lgpWLfKuCaCUrxTDV3p+3ediqCgrmldp26YBugtAgyINYIZ1hSu7KGHw0WZ4rs7xguHI
	lz5llGnIeFv9ZeMaW0xkWJAJaaXn881kb4vKx+aEGk2ckmiR6GhT2x4meJOJcNqmyiSu
	bKMHaRNZY/UTia/kdrFneNHpO4PBCuf2b9n41mxPzDtc3sKWHzXQPC39o+mbEj+XMPbj
	493A==
X-Gm-Message-State: AIkVDXJgSWZCi859NHsMJiz/WCnSW+35Zw2K1SKKa8jdebGvJ24JhgdXdCUnPSX8IsD+jsxQNoatjwUQz7xeAw==
X-Received: by 10.28.31.23 with SMTP id f23mr37952253wmf.94.1482950754202;
	Wed, 28 Dec 2016 10:45:54 -0800 (PST)
In-Reply-To: <83lguzvr63.fsf@gnu.org>
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-Received-From: 2a00:1450:400c:c09::231
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel/>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: "Emacs-devel" <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Xref: news.gmane.org gmane.emacs.devel:210924
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/210924>

--001a114b41a0ec7b480544bc6023
Content-Type: text/plain; charset=UTF-8

Eli Zaretskii <eliz@gnu.org> schrieb am Mi., 28. Dez. 2016 um 19:35 Uhr:

> > From: Philipp Stephani <p.stephani2@gmail.com>
> > Date: Wed, 28 Dec 2016 18:18:25 +0000
> > Cc: larsi@gnus.org, emacs-devel@gnu.org, kentaro.nakazawa@nifty.com,
> >       dgutov@yandex.ru
> >
> >  > > That's right -- why should any code care? Yet url.el does.
> >  >
> >  > No, it doesn't, not if the string is plain ASCII.
> >  >
> >  > But in that case it isn't, it's morally a byte array.
> >
> >  Yes, because the internal representation of characters in Emacs is a
> >  superset of UTF-8.
> >
> > That has nothing to do with characters. A byte array is conceptually
> different from a character string.
>
> In Emacs, they are both implemented using very similar objects.
>

Yes, that's why I said "conceptually different". The concepts may be the
different, but the implementation might still be the same.


>
> >  > What Emacs lacks is good support for byte arrays.
> >
> >  Unibyte strings are byte arrays. What do you think we lack in that
> regard?
> >
> > If unibyte strings should be used for byte arrays, then the URL
> functions should indeed signal an error
> > whenever url-request-data is a multibyte string, as HTTP requests are
> conceptually byte arrays, not character
> > strings.
>
> Which is what we do now.
>

There is no such check for url-request-data. There's an overall check for
the complete request, but that also doesn't check for unibyte-ness.


>
> >  > For HTTP, process-send-string shouldn't need to deal
> >  > with encoding or EOL conversion, it should just accept a byte array
> and send that, unmodified.
> >
> >  I disagree. Handling unibyte strings is a nuisance, so Emacs allows
> >  most applications be oblivious about them, and just handle
> >  human-readable text.
> >
> > That is the wrong approach (byte arrays and character strings are
> fundamentally different types, and mixing
> > them together only causes pain), and it cannot work when implementing
> network protocols. HTTP requests
> > are *not* human-readable text, they are byte arrays. Attempting to
> handle Unicode strings can't work because
> > we wouldn't know the number of encoded bytes.
>
> You are arguing against a long and quite painful history of non-ASCII
> strings in Emacs.  What we have now is based on a lot of experience
> and at least two very large refactoring jobs.  Going back would be a
> very bad idea indeed, as we've been there already, and users didn't
> like that.  Some of us are old enough to remember the notorious \201
> bytes creeping into text files and mail messages, due to that.  Never
> again.
>

I'm not suggesting going back, too much would be broken.


>
> Our experience is that we should keep use of unibyte strings in Lisp
> application code to the absolute minimum, ideally zero.  Once we
> arrived at that conclusion, we've been living happily ever after.
> This minor issue we are discussing here is certainly not worth
> repeating past mistakes for which we paid plenty in sweat and blood.
>

If you want unibyte strings to represent octet streams, then unibyte
strings must be usable in application code, because octet streams are a
concept that exists in reality, and applications must be able to support
them in some way. If you don't want unibyte strings, then you need to
provide some different way to represent octet streams.

--001a114b41a0ec7b480544bc6023
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><br><br><div class=3D"gmail_quote"><div dir=3D"ltr">Eli Za=
retskii &lt;<a href=3D"mailto:eliz@gnu.org">eliz@gnu.org</a>&gt; schrieb am=
 Mi., 28. Dez. 2016 um 19:35=C2=A0Uhr:<br></div><blockquote class=3D"gmail_=
quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1=
ex">&gt; From: Philipp Stephani &lt;<a href=3D"mailto:p.stephani2@gmail.com=
" class=3D"gmail_msg" target=3D"_blank">p.stephani2@gmail.com</a>&gt;<br cl=
ass=3D"gmail_msg">
&gt; Date: Wed, 28 Dec 2016 18:18:25 +0000<br class=3D"gmail_msg">
&gt; Cc: <a href=3D"mailto:larsi@gnus.org" class=3D"gmail_msg" target=3D"_b=
lank">larsi@gnus.org</a>, <a href=3D"mailto:emacs-devel@gnu.org" class=3D"g=
mail_msg" target=3D"_blank">emacs-devel@gnu.org</a>, <a href=3D"mailto:kent=
aro.nakazawa@nifty.com" class=3D"gmail_msg" target=3D"_blank">kentaro.nakaz=
awa@nifty.com</a>,<br class=3D"gmail_msg">
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0<a href=3D"mailto:dgutov@yandex.ru" class=3D=
"gmail_msg" target=3D"_blank">dgutov@yandex.ru</a><br class=3D"gmail_msg">
&gt;<br class=3D"gmail_msg">
&gt;=C2=A0 &gt; &gt; That&#39;s right -- why should any code care? Yet url.=
el does.<br class=3D"gmail_msg">
&gt;=C2=A0 &gt;<br class=3D"gmail_msg">
&gt;=C2=A0 &gt; No, it doesn&#39;t, not if the string is plain ASCII.<br cl=
ass=3D"gmail_msg">
&gt;=C2=A0 &gt;<br class=3D"gmail_msg">
&gt;=C2=A0 &gt; But in that case it isn&#39;t, it&#39;s morally a byte arra=
y.<br class=3D"gmail_msg">
&gt;<br class=3D"gmail_msg">
&gt;=C2=A0 Yes, because the internal representation of characters in Emacs =
is a<br class=3D"gmail_msg">
&gt;=C2=A0 superset of UTF-8.<br class=3D"gmail_msg">
&gt;<br class=3D"gmail_msg">
&gt; That has nothing to do with characters. A byte array is conceptually d=
ifferent from a character string.<br class=3D"gmail_msg">
<br class=3D"gmail_msg">
In Emacs, they are both implemented using very similar objects.<br class=3D=
"gmail_msg"></blockquote><div><br></div><div>Yes, that&#39;s why I said &qu=
ot;conceptually different&quot;. The concepts may be the different, but the=
 implementation might still be the same.</div><div>=C2=A0</div><blockquote =
class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid=
;padding-left:1ex">
<br class=3D"gmail_msg">
&gt;=C2=A0 &gt; What Emacs lacks is good support for byte arrays.<br class=
=3D"gmail_msg">
&gt;<br class=3D"gmail_msg">
&gt;=C2=A0 Unibyte strings are byte arrays. What do you think we lack in th=
at regard?<br class=3D"gmail_msg">
&gt;<br class=3D"gmail_msg">
&gt; If unibyte strings should be used for byte arrays, then the URL functi=
ons should indeed signal an error<br class=3D"gmail_msg">
&gt; whenever url-request-data is a multibyte string, as HTTP requests are =
conceptually byte arrays, not character<br class=3D"gmail_msg">
&gt; strings.<br class=3D"gmail_msg">
<br class=3D"gmail_msg">
Which is what we do now.<br class=3D"gmail_msg"></blockquote><div><br></div=
><div>There is no such check for url-request-data. There&#39;s an overall c=
heck for the complete request, but that also doesn&#39;t check for unibyte-=
ness.</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"marg=
in:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br class=3D"gmail_msg">
&gt;=C2=A0 &gt; For HTTP, process-send-string shouldn&#39;t need to deal<br=
 class=3D"gmail_msg">
&gt;=C2=A0 &gt; with encoding or EOL conversion, it should just accept a by=
te array and send that, unmodified.<br class=3D"gmail_msg">
&gt;<br class=3D"gmail_msg">
&gt;=C2=A0 I disagree. Handling unibyte strings is a nuisance, so Emacs all=
ows<br class=3D"gmail_msg">
&gt;=C2=A0 most applications be oblivious about them, and just handle<br cl=
ass=3D"gmail_msg">
&gt;=C2=A0 human-readable text.<br class=3D"gmail_msg">
&gt;<br class=3D"gmail_msg">
&gt; That is the wrong approach (byte arrays and character strings are fund=
amentally different types, and mixing<br class=3D"gmail_msg">
&gt; them together only causes pain), and it cannot work when implementing =
network protocols. HTTP requests<br class=3D"gmail_msg">
&gt; are *not* human-readable text, they are byte arrays. Attempting to han=
dle Unicode strings can&#39;t work because<br class=3D"gmail_msg">
&gt; we wouldn&#39;t know the number of encoded bytes.<br class=3D"gmail_ms=
g">
<br class=3D"gmail_msg">
You are arguing against a long and quite painful history of non-ASCII<br cl=
ass=3D"gmail_msg">
strings in Emacs.=C2=A0 What we have now is based on a lot of experience<br=
 class=3D"gmail_msg">
and at least two very large refactoring jobs.=C2=A0 Going back would be a<b=
r class=3D"gmail_msg">
very bad idea indeed, as we&#39;ve been there already, and users didn&#39;t=
<br class=3D"gmail_msg">
like that.=C2=A0 Some of us are old enough to remember the notorious \201<b=
r class=3D"gmail_msg">
bytes creeping into text files and mail messages, due to that.=C2=A0 Never<=
br class=3D"gmail_msg">
again.<br class=3D"gmail_msg"></blockquote><div><br></div><div>I&#39;m not =
suggesting going back, too much would be broken.</div><div>=C2=A0</div><blo=
ckquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #c=
cc solid;padding-left:1ex">
<br class=3D"gmail_msg">
Our experience is that we should keep use of unibyte strings in Lisp<br cla=
ss=3D"gmail_msg">
application code to the absolute minimum, ideally zero.=C2=A0 Once we<br cl=
ass=3D"gmail_msg">
arrived at that conclusion, we&#39;ve been living happily ever after.<br cl=
ass=3D"gmail_msg">
This minor issue we are discussing here is certainly not worth<br class=3D"=
gmail_msg">
repeating past mistakes for which we paid plenty in sweat and blood.<br cla=
ss=3D"gmail_msg"></blockquote><div><br></div><div>If you want unibyte strin=
gs to represent octet streams, then unibyte strings must be usable in appli=
cation code, because octet streams are a concept that exists in reality, an=
d applications must be able to support them in some way. If you don&#39;t w=
ant unibyte strings, then you need to provide some different way to represe=
nt octet streams.=C2=A0</div></div></div>

--001a114b41a0ec7b480544bc6023--