bug#23750: 25.0.95; bug in url-retrieve or json.el

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

* bug#23750: 25.0.95; bug in url-retrieve or json.el
@ 2016-06-12  2:22 Leo Liu
  2016-06-13 15:02 ` Dmitry Gutov
  0 siblings, 1 reply; 125+ messages in thread
From: Leo Liu @ 2016-06-12  2:22 UTC (permalink / raw)
  To: 23750

I have been trying to debug an issue in TernJs¹ on and off for a few
months now and it seems the cause is some nasty bug in Emacs 25. Could
someone follow the steps detailed in
https://github.com/ternjs/tern/issues/719 to reproduce the issue?

I have verified that the bug is not in Tern but Emacs i.e. under some
circumstances emacs's URL package strips some chars in the request body
which, in this case, leads to unbalanced parentheses in the JSON doc.

Leo

Footnotes: 
¹  https://github.com/ternjs/tern/issues/719

^ permalink raw reply	[flat|nested] 125+ messages in thread

* bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-06-12  2:22 bug#23750: 25.0.95; bug in url-retrieve or json.el Leo Liu
@ 2016-06-13 15:02 ` Dmitry Gutov
  2016-06-13 17:55   ` Stefan Monnier
  0 siblings, 1 reply; 125+ messages in thread
From: Dmitry Gutov @ 2016-06-13 15:02 UTC (permalink / raw)
  To: Leo Liu, 23750; +Cc: Stefan Monnier

[-- Attachment #1: Type: text/plain, Size: 531 bytes --]

On 06/12/2016 05:22 AM, Leo Liu wrote:

> ¹  https://github.com/ternjs/tern/issues/719

Investigation shows that the problem occurs when url-http-data is 
multibyte and (length url-http-data) differs from (length 
(string-as-unibyte url-http-data)), because we send a wrong value in 
Content-length.

Changing url-http-create-request like this will make the problem more 
obvious for anyone else that hits it, patch attached.

Stefan, did you have a particular situation in mind where this might be 
bad, when you wrote the FIXME?

[-- Attachment #2: url-http-unibyte.diff --]
[-- Type: text/x-patch, Size: 1077 bytes --]

diff --git a/lisp/url/url-http.el b/lisp/url/url-http.el
index 5832e92..f7ec640 100644
--- a/lisp/url/url-http.el
+++ b/lisp/url/url-http.el
@@ -278,14 +278,10 @@ url-http-create-request
           ;; We used to concat directly, but if one of the strings happens
           ;; to being multibyte (even if it only contains pure ASCII) then
           ;; every string gets converted with `string-MAKE-multibyte' which
-          ;; turns the 127-255 codes into things like latin-1 accented chars
-          ;; (it would work right if it used `string-TO-multibyte' instead).
+          ;; turns the 127-255 codes into things like latin-1 accented chars.
           ;; So to avoid the problem we force every string to be unibyte.
           (mapconcat
-           ;; FIXME: Instead of `string-AS-unibyte' we'd want
-           ;; `string-to-unibyte', so as to properly signal an error if one
-           ;; of the strings contains a multibyte char.
-           'string-as-unibyte
+           'string-to-unibyte
            (delq nil
             (list
              ;; The request

^ permalink raw reply related	[flat|nested] 125+ messages in thread

* bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-06-13 15:02 ` Dmitry Gutov
@ 2016-06-13 17:55   ` Stefan Monnier
  2016-06-13 19:26     ` Dmitry Gutov
  0 siblings, 1 reply; 125+ messages in thread
From: Stefan Monnier @ 2016-06-13 17:55 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 23750, Leo Liu

>> ¹  https://github.com/ternjs/tern/issues/719
> Investigation shows that the problem occurs when url-http-data is multibyte
> and (length url-http-data) differs from (length (string-as-unibyte
> url-http-data)), because we send a wrong value in Content-length.
> Changing url-http-create-request like this will make the problem more
> obvious for anyone else that hits it, patch attached.
> Stefan, did you have a particular situation in mind where this might be bad,
> when you wrote the FIXME?

No, nothing in particular.  Just that `string-as-unibyte` is generally
synonymous with "the author is confused about how coding systems work",
aka "trouble".


        Stefan





^ permalink raw reply	[flat|nested] 125+ messages in thread

* bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-06-13 17:55   ` Stefan Monnier
@ 2016-06-13 19:26     ` Dmitry Gutov
  2016-06-14  0:30       ` Stefan Monnier
  0 siblings, 1 reply; 125+ messages in thread
From: Dmitry Gutov @ 2016-06-13 19:26 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 23750, Leo Liu

On 06/13/2016 08:55 PM, Stefan Monnier wrote:

> No, nothing in particular.  Just that `string-as-unibyte` is generally
> synonymous with "the author is confused about how coding systems work",
> aka "trouble".

You were also the author in this case. The same commit added both the 
use of string-as-unibyte and the FIXME comment.





^ permalink raw reply	[flat|nested] 125+ messages in thread

* bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-06-13 19:26     ` Dmitry Gutov
@ 2016-06-14  0:30       ` Stefan Monnier
  2016-06-19 18:14         ` Dmitry Gutov
  0 siblings, 1 reply; 125+ messages in thread
From: Stefan Monnier @ 2016-06-14  0:30 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 23750, Leo Liu

>> No, nothing in particular.  Just that `string-as-unibyte` is generally
>> synonymous with "the author is confused about how coding systems work",
>> aka "trouble".
> You were also the author in this case. The same commit added both the use of
> string-as-unibyte and the FIXME comment.

Can't remember why I did so.  My best guess is that I tried to mimick
some earlier behavior.


        Stefan





^ permalink raw reply	[flat|nested] 125+ messages in thread

* bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-06-14  0:30       ` Stefan Monnier
@ 2016-06-19 18:14         ` Dmitry Gutov
  2016-06-19 18:25           ` Eli Zaretskii
  0 siblings, 1 reply; 125+ messages in thread
From: Dmitry Gutov @ 2016-06-19 18:14 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 23750, Leo Liu

On 06/14/2016 03:30 AM, Stefan Monnier wrote:

> Can't remember why I did so.  My best guess is that I tried to mimick
> some earlier behavior.

OK, thanks anyway. I've pushed the patch to master as 
2ede29575fa22eb7c265117d7511cff9fe02c606.

Eli, could we have it emacs-25 as well? It's not critical, but it should 
make the life of our users easier to flagging problems with the usage of 
url-http earlier, in a more appropriate place, with an error, rather 
than leaving that up to them to deduce why their HTTP server truncates 
the request body.

While the truncation bug itself is quite old, it's been exacerbated in 
Emacs 25 by my own цщкл to make json.el faster: one side-effect is  that 
it doesn't \u-quote multibyte characters anymore, or at least not all of 
them.

FWIW, I've been running with it applied to emacs-25 for the past week 
with no problems.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-06-19 18:14         ` Dmitry Gutov
@ 2016-06-19 18:25           ` Eli Zaretskii
  2016-06-19 18:30             ` John Wiegley
  2016-06-19 18:36             ` Dmitry Gutov
  0 siblings, 2 replies; 125+ messages in thread
From: Eli Zaretskii @ 2016-06-19 18:25 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 23750, monnier, sdl.web

> Cc: 23750@debbugs.gnu.org, Leo Liu <sdl.web@gmail.com>,
>  Eli Zaretskii <eliz@gnu.org>
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Sun, 19 Jun 2016 21:14:55 +0300
> 
> On 06/14/2016 03:30 AM, Stefan Monnier wrote:
> 
> > Can't remember why I did so.  My best guess is that I tried to mimick
> > some earlier behavior.
> 
> OK, thanks anyway. I've pushed the patch to master as 
> 2ede29575fa22eb7c265117d7511cff9fe02c606.
> 
> Eli, could we have it emacs-25 as well? It's not critical, but it should 
> make the life of our users easier to flagging problems with the usage of 
> url-http earlier, in a more appropriate place, with an error, rather 
> than leaving that up to them to deduce why their HTTP server truncates 
> the request body.

I'd need a very detailed description of the bug, and why this
particular solution was used.  IME, neither string-to-unibyte not
string-as-unibyte should ever be used in applications, their use is
more often than not a sign of some basic misunderstanding of text
encoding.  For starters, how come 8-bit bytes wind up in that
function, and what do they stand for?





^ permalink raw reply	[flat|nested] 125+ messages in thread

* bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-06-19 18:25           ` Eli Zaretskii
@ 2016-06-19 18:30             ` John Wiegley
  2016-06-19 18:45               ` Dmitry Gutov
  2016-06-19 18:36             ` Dmitry Gutov
  1 sibling, 1 reply; 125+ messages in thread
From: John Wiegley @ 2016-06-19 18:30 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 23750, Dmitry Gutov, sdl.web, monnier

[-- Attachment #1: Type: text/plain, Size: 934 bytes --]

>>>>> Eli Zaretskii <eliz@gnu.org> writes:

>> Eli, could we have it emacs-25 as well? It's not critical, but it should
>> make the life of our users easier to flagging problems with the usage of
>> url-http earlier, in a more appropriate place, with an error, rather than
>> leaving that up to them to deduce why their HTTP server truncates the
>> request body.

Bear in mind that 25.2 can be released as soon after as we want it to. If
anything is "optional" at this point in time, it should be deferred.

We shouldn't try to race anything into the release, just because we think
users will then have to live with some minor inferior behavior for a long time
after. The description above certainly does not sound like something that
needs to be happen for 25.1.

-- 
John Wiegley                  GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com                          60E1 46C4 BD1A 7AC1 4BA2

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 629 bytes --]

^ permalink raw reply	[flat|nested] 125+ messages in thread

* bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-06-19 18:30             ` John Wiegley
@ 2016-06-19 18:45               ` Dmitry Gutov
  2016-06-19 19:56                 ` John Wiegley
  0 siblings, 1 reply; 125+ messages in thread
From: Dmitry Gutov @ 2016-06-19 18:45 UTC (permalink / raw)
  To: John Wiegley, Eli Zaretskii; +Cc: 23750, sdl.web, monnier

On 06/19/2016 09:30 PM, John Wiegley wrote:

> Bear in mind that 25.2 can be released as soon after as we want it to. If
> anything is "optional" at this point in time, it should be deferred.

Let's apply the few outstanding patches and release 25.2 the next day, then?

Traditionally, releases are separated by at least several months, even 
ones with no big changes.

> We shouldn't try to race anything into the release, just because we think
> users will then have to live with some minor inferior behavior for a long time
> after. The description above certainly does not sound like something that
> needs to be happen for 25.1.

Just to be clear: the patch doesn't change the behavior of any working 
code. It just catches a particular kind of bug earlier than it would 
manifest through a cryptic behavior.

Behavior which is non-trivial to debug, and thus adds to the already 
non-trivial effort required of a person writing an advanced language 
support code (using an external daemon talking over HTTP is fairly 
common for this these days).

^ permalink raw reply	[flat|nested] 125+ messages in thread

* bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-06-19 18:45               ` Dmitry Gutov
@ 2016-06-19 19:56                 ` John Wiegley
  2016-06-19 20:05                   ` Dmitry Gutov
                                     ` (2 more replies)
  0 siblings, 3 replies; 125+ messages in thread
From: John Wiegley @ 2016-06-19 19:56 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 23750, sdl.web, monnier

[-- Attachment #1: Type: text/plain, Size: 955 bytes --]

>>>>> Dmitry Gutov <dgutov@yandex.ru> writes:

> Just to be clear: the patch doesn't change the behavior of any working code.
> It just catches a particular kind of bug earlier than it would manifest
> through a cryptic behavior.
>
> Behavior which is non-trivial to debug, and thus adds to the already
> non-trivial effort required of a person writing an advanced language support
> code (using an external daemon talking over HTTP is fairly common for this
> these days).

I get that. But right now, if it doesn't *have* to happen, it should wait.
We're thinking about cutting the release candidate in just a few days, pending
one issue that Eli is looking into. Any change -- and I mean _any_ change --
has the potential to introduce unforeseen effects that could delay us further.

-- 
John Wiegley                  GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com                          60E1 46C4 BD1A 7AC1 4BA2

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 629 bytes --]

^ permalink raw reply	[flat|nested] 125+ messages in thread

* bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-06-19 19:56                 ` John Wiegley
@ 2016-06-19 20:05                   ` Dmitry Gutov
  2016-06-19 21:07                     ` John Wiegley
  2016-06-20  1:26                   ` Glenn Morris
  2016-06-20  2:58                   ` Dmitry Gutov
  2 siblings, 1 reply; 125+ messages in thread
From: Dmitry Gutov @ 2016-06-19 20:05 UTC (permalink / raw)
  To: John Wiegley; +Cc: 23750, sdl.web, monnier

On 06/19/2016 10:56 PM, John Wiegley wrote:

> We're thinking about cutting the release candidate in just a few days, pending
> one issue that Eli is looking into. Any change -- and I mean _any_ change --
> has the potential to introduce unforeseen effects that could delay us further.

By how much?

Even if that change causes problems (which is unlikely), we'd only have 
to revert it, and, unless other issues have come in the meantime, we 
could build and release Emacs 25.1 right then, more or less.

It's not like a regression there has a significant potential to obscure 
other problems. We've tested the current state of the URL package pretty 
well by now anyway.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-06-19 20:05                   ` Dmitry Gutov
@ 2016-06-19 21:07                     ` John Wiegley
  2016-06-20  1:28                       ` Glenn Morris
  0 siblings, 1 reply; 125+ messages in thread
From: John Wiegley @ 2016-06-19 21:07 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 23750, sdl.web, monnier

[-- Attachment #1: Type: text/plain, Size: 850 bytes --]

>>>>> Dmitry Gutov <dgutov@yandex.ru> writes:

> By how much?
> 
> Even if that change causes problems (which is unlikely), we'd only have to
> revert it, and, unless other issues have come in the meantime, we could
> build and release Emacs 25.1 right then, more or less.

A day comes when a line has to be drawn in the sand, otherwise we could nickel
and dime ourselves into the next century. That line is drawn; the time for
25.1 is at hand. Let's start thinking about 25.2 as we think about these types
of improvements, and how we might accelerate its release so it happens in 1-2
months time. There can be many 25.x's, without disrupting the feature work
happening on master.

-- 
John Wiegley                  GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com                          60E1 46C4 BD1A 7AC1 4BA2

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 629 bytes --]

^ permalink raw reply	[flat|nested] 125+ messages in thread

* bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-06-19 21:07                     ` John Wiegley
@ 2016-06-20  1:28                       ` Glenn Morris
  2016-06-20  4:22                         ` John Wiegley
  0 siblings, 1 reply; 125+ messages in thread
From: Glenn Morris @ 2016-06-20  1:28 UTC (permalink / raw)
  To: John Wiegley; +Cc: 23750, Dmitry Gutov, sdl.web, monnier

John Wiegley wrote:

> There can be many 25.x's, without disrupting the feature work
> happening on master.

Then why is master STILL advertising itself as the forerunner to 25.2?
Why are we closing a bunch of bugs as "fixed in 25.2" if they won't be
fixed till 26.1?





^ permalink raw reply	[flat|nested] 125+ messages in thread

* bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-06-20  1:28                       ` Glenn Morris
@ 2016-06-20  4:22                         ` John Wiegley
  2016-06-20 12:39                           ` Lars Ingebrigtsen
                                             ` (2 more replies)
  0 siblings, 3 replies; 125+ messages in thread
From: John Wiegley @ 2016-06-20  4:22 UTC (permalink / raw)
  To: Glenn Morris; +Cc: 23750, Dmitry Gutov, sdl.web, monnier

>>>>> Glenn Morris <rgm@gnu.org> writes:

> Then why is master STILL advertising itself as the forerunner to 25.2? Why
> are we closing a bunch of bugs as "fixed in 25.2" if they won't be fixed
> till 26.1?

I guess to avoid having the reported version number in bug reports keep
jumping around? Master is really working toward 26.1 at this point.

Once we start working on 25.2, we should cherry-pick over all the fixes for
bugs are marked "fixed in 25.2". Otherwise, they should be marked "fixed in
26.1".

-- 
John Wiegley                  GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com                          60E1 46C4 BD1A 7AC1 4BA2





^ permalink raw reply	[flat|nested] 125+ messages in thread

* bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-06-20  4:22                         ` John Wiegley
@ 2016-06-20 12:39                           ` Lars Ingebrigtsen
  2016-07-01 20:49                             ` John Wiegley
  2016-06-20 14:42                           ` Eli Zaretskii
  2016-06-23 17:14                           ` Glenn Morris
  2 siblings, 1 reply; 125+ messages in thread
From: Lars Ingebrigtsen @ 2016-06-20 12:39 UTC (permalink / raw)
  To: John Wiegley; +Cc: 23750, Dmitry Gutov, sdl.web, monnier

John Wiegley <jwiegley@gmail.com> writes:

>>>>>> Glenn Morris <rgm@gnu.org> writes:
>
>> Then why is master STILL advertising itself as the forerunner to 25.2? Why
>> are we closing a bunch of bugs as "fixed in 25.2" if they won't be fixed
>> till 26.1?
>
> I guess to avoid having the reported version number in bug reports keep
> jumping around? Master is really working toward 26.1 at this point.
>
> Once we start working on 25.2, we should cherry-pick over all the fixes for
> bugs are marked "fixed in 25.2". Otherwise, they should be marked "fixed in
> 26.1".

Most bugs fixed in master are marked "fixed in 25.2" (since that is what
master is announcing itself as being the forerunner to), so that doesn't
make much sense, I'm afraid.

Which is what Glenn is telling us, once again.  I really don't
understand why master hasn't been changed to say that it's the
forerunner to 26.1.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 125+ messages in thread

* bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-06-20 12:39                           ` Lars Ingebrigtsen
@ 2016-07-01 20:49                             ` John Wiegley
  0 siblings, 0 replies; 125+ messages in thread
From: John Wiegley @ 2016-07-01 20:49 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 23750, Dmitry Gutov, sdl.web, monnier

[-- Attachment #1: Type: text/plain, Size: 2034 bytes --]

>>>>> Lars Ingebrigtsen <larsi@gnus.org> writes:

> Most bugs fixed in master are marked "fixed in 25.2" (since that is what
> master is announcing itself as being the forerunner to), so that doesn't
> make much sense, I'm afraid.
> 
> Which is what Glenn is telling us, once again. I really don't understand why
> master hasn't been changed to say that it's the forerunner to 26.1.

The last time we had our long discussion about what the various branches mean,
the conclusion was that emacs-25 is for the next release, and master is for
all other work.

Most people did NOT want master to be toward the next release (25.2), as that
leaves nowhere for changes meant for 26 only.

However, this also leaves nowhere for fixes to go that are only for 25.2. But
since no additional branches were desired, the compromise was that both types
of changes will go into master, and we will be backport certain changes into
emacs-25 toward 25.2 after the release.

Marking a bug as "fixed in 25.2" seems wrong to me, because it implies a
guarantee that the fix will get cherry picked into emacs-25 after 25.1 is
released, although I highly doubt this will happen for every such fix. There
is just too much work to be done.

What we should do is mark every commit intended for 25.2 in a way that lets us
find them all automatically after the release, with a link to the bugs they
fix so that we can safely state "fixed in 25.2". Since this hasn't happened, I
imagine it will be a very manual process, and will be missing several of those
fixes.

This is why I personally argued for 3 branches, but it's not what the people
doing the real work wanted, so this is what we have.

After 25.1, we'll just have to see what happens to emacs-25 and to the
bug-tracker. I imagine several of the "fixed in 25.2" bugs will need to be
adjusted to "fixed in 26.1".

-- 
John Wiegley                  GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com                          60E1 46C4 BD1A 7AC1 4BA2

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 629 bytes --]

^ permalink raw reply	[flat|nested] 125+ messages in thread

* bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-06-20  4:22                         ` John Wiegley
  2016-06-20 12:39                           ` Lars Ingebrigtsen
@ 2016-06-20 14:42                           ` Eli Zaretskii
  2016-06-23 17:14                           ` Glenn Morris
  2 siblings, 0 replies; 125+ messages in thread
From: Eli Zaretskii @ 2016-06-20 14:42 UTC (permalink / raw)
  To: John Wiegley; +Cc: 23750, dgutov, sdl.web, monnier

> From: John Wiegley <jwiegley@gmail.com>
> Date: Sun, 19 Jun 2016 21:22:25 -0700
> Cc: 23750@debbugs.gnu.org, Dmitry Gutov <dgutov@yandex.ru>, sdl.web@gmail.com,
> 	monnier@IRO.UMontreal.CA
> 
> Once we start working on 25.2, we should cherry-pick over all the fixes for
> bugs are marked "fixed in 25.2".

I don't think this is practical.  The only practical way is to cut a
new release branch off master.





^ permalink raw reply	[flat|nested] 125+ messages in thread

* bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-06-20  4:22                         ` John Wiegley
  2016-06-20 12:39                           ` Lars Ingebrigtsen
  2016-06-20 14:42                           ` Eli Zaretskii
@ 2016-06-23 17:14                           ` Glenn Morris
  2 siblings, 0 replies; 125+ messages in thread
From: Glenn Morris @ 2016-06-23 17:14 UTC (permalink / raw)
  To: John Wiegley; +Cc: 23750, Dmitry Gutov, sdl.web, monnier

John Wiegley wrote:

>> Then why is master STILL advertising itself as the forerunner to 25.2? Why
>> are we closing a bunch of bugs as "fixed in 25.2" if they won't be fixed
>> till 26.1?
>
> I guess to avoid having the reported version number in bug reports keep
> jumping around? Master is really working toward 26.1 at this point.

This doesn't make any sense to me. (And why are you guessing? Isn't
there a plan?)

> Once we start working on 25.2, we should cherry-pick over all the fixes for
> bugs are marked "fixed in 25.2". Otherwise, they should be marked "fixed in
> 26.1".

I don't think that will work well, but good luck with it.






^ permalink raw reply	[flat|nested] 125+ messages in thread

* bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-06-19 19:56                 ` John Wiegley
  2016-06-19 20:05                   ` Dmitry Gutov
@ 2016-06-20  1:26                   ` Glenn Morris
  2016-06-20  2:58                   ` Dmitry Gutov
  2 siblings, 0 replies; 125+ messages in thread
From: Glenn Morris @ 2016-06-20  1:26 UTC (permalink / raw)
  To: John Wiegley; +Cc: 23750, Dmitry Gutov, sdl.web, monnier

John Wiegley wrote:

> We're thinking about cutting the release candidate in just a few days

Please see admin/release-process for some tasks that should happen
before that.





^ permalink raw reply	[flat|nested] 125+ messages in thread

* bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-06-19 19:56                 ` John Wiegley
  2016-06-19 20:05                   ` Dmitry Gutov
  2016-06-20  1:26                   ` Glenn Morris
@ 2016-06-20  2:58                   ` Dmitry Gutov
  2 siblings, 0 replies; 125+ messages in thread
From: Dmitry Gutov @ 2016-06-20  2:58 UTC (permalink / raw)
  To: John Wiegley; +Cc: 23750, sdl.web, monnier

On 06/19/2016 10:56 PM, John Wiegley wrote:

> We're thinking about cutting the release candidate in just a few days, pending
> one issue that Eli is looking into.

Do you mean bug#23779? I wouldn't call it critical (judging by the 
number of years it went unreported), and it's not a regression, so it 
doesn't make a lot of sense to fix it without taking care of the bug 
that resulted in it being reported (bug#23769).

^ permalink raw reply	[flat|nested] 125+ messages in thread

* bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-06-19 18:25           ` Eli Zaretskii
  2016-06-19 18:30             ` John Wiegley
@ 2016-06-19 18:36             ` Dmitry Gutov
  2016-06-20  0:15               ` Leo Liu
  2016-06-20  2:40               ` Eli Zaretskii
  1 sibling, 2 replies; 125+ messages in thread
From: Dmitry Gutov @ 2016-06-19 18:36 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 23750, monnier, sdl.web

On 06/19/2016 09:25 PM, Eli Zaretskii wrote:

> I'd need a very detailed description of the bug, and why this
> particular solution was used.

This particular bug came from this:

"Content-length: " (number-to-string (length url-http-data))

Which gives wrong value when url-http-data is multibyte (it should be 
length in bytes). So then, the HTTP server on the other side saw the 
wrong body length and truncated the body when reading the request. Or 
something along these lines.

> IME, neither string-to-unibyte not
> string-as-unibyte should ever be used in applications, their use is
> more often than not a sign of some basic misunderstanding of text
> encoding.  For starters, how come 8-bit bytes wind up in that
> function, and what do they stand for?

Some 8-byte encoding of the HTTP request body.

Anyway, yes, the hope is that the programmer uses something like 
encode-coding-string to produce that value (and picks the encoding, and 
indicates it in the appropriate HTTP header). Then string-to-unibyte 
will simply be a no-op. But we need to catch the case when they don't, 
and this seems to be the easiest way to do this.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-06-19 18:36             ` Dmitry Gutov
@ 2016-06-20  0:15               ` Leo Liu
  2016-06-20 14:39                 ` Eli Zaretskii
  2016-06-20  2:40               ` Eli Zaretskii
  1 sibling, 1 reply; 125+ messages in thread
From: Leo Liu @ 2016-06-20  0:15 UTC (permalink / raw)
  To: 23750

On 2016-06-19 21:36 +0300, Dmitry Gutov wrote:
> This particular bug came from this:
>
> "Content-length: " (number-to-string (length url-http-data))
>
> Which gives wrong value when url-http-data is multibyte (it should be
> length in bytes). So then, the HTTP server on the other side saw the
> wrong body length and truncated the body when reading the request.

As Dmitry mentioned earlier json-encode in 25.1 produces multibyte
strings and makes it easier to hit this bug when consuming JSON API's.
There are three parties that are suspicious: 1) JSON API server 2)
JSON.el 3) URL. It took me a while to realise it's URL's fault IOW the
bug isn't easy to debug. This is somewhat related to changes brought in
by 25.1.

Leo

^ permalink raw reply	[flat|nested] 125+ messages in thread

* bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-06-20  0:15               ` Leo Liu
@ 2016-06-20 14:39                 ` Eli Zaretskii
  0 siblings, 0 replies; 125+ messages in thread
From: Eli Zaretskii @ 2016-06-20 14:39 UTC (permalink / raw)
  To: Leo Liu; +Cc: 23750

> From: Leo Liu <sdl.web@gmail.com>
> Date: Mon, 20 Jun 2016 08:15:26 +0800
> 
> > This particular bug came from this:
> >
> > "Content-length: " (number-to-string (length url-http-data))
> >
> > Which gives wrong value when url-http-data is multibyte (it should be
> > length in bytes). So then, the HTTP server on the other side saw the
> > wrong body length and truncated the body when reading the request.
> 
> As Dmitry mentioned earlier json-encode in 25.1 produces multibyte
> strings and makes it easier to hit this bug when consuming JSON API's.
> There are three parties that are suspicious: 1) JSON API server 2)
> JSON.el 3) URL. It took me a while to realise it's URL's fault IOW the
> bug isn't easy to debug. This is somewhat related to changes brought in
> by 25.1.

I understand that url-http expects unibyte strings.  So my suggestion
is to test that, and signal an error if the requirement is violated,
with an error message text that could be understood by users and
developers.

Alternatively, we could encode multibyte strings in UTF-8, if we want
to attempt to silently cope with such strings.

In any case, using string-*-unibyte functions for that is not needed,
and I'm quite sure their use in this case is a left-over from an era
long gone.





^ permalink raw reply	[flat|nested] 125+ messages in thread

* bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-06-19 18:36             ` Dmitry Gutov
  2016-06-20  0:15               ` Leo Liu
@ 2016-06-20  2:40               ` Eli Zaretskii
  2016-06-20  2:51                 ` Dmitry Gutov
  1 sibling, 1 reply; 125+ messages in thread
From: Eli Zaretskii @ 2016-06-20  2:40 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 23750, monnier, sdl.web

> Cc: 23750@debbugs.gnu.org, monnier@IRO.UMontreal.CA, sdl.web@gmail.com
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Sun, 19 Jun 2016 21:36:25 +0300
> 
> This particular bug came from this:
> 
> "Content-length: " (number-to-string (length url-http-data))
> 
> Which gives wrong value when url-http-data is multibyte (it should be 
> length in bytes). So then, the HTTP server on the other side saw the 
> wrong body length and truncated the body when reading the request. Or 
> something along these lines.

So this is not a bug in Emacs, but a diagnostic facility to let bugs
in applications be discovered?

> > IME, neither string-to-unibyte not
> > string-as-unibyte should ever be used in applications, their use is
> > more often than not a sign of some basic misunderstanding of text
> > encoding.  For starters, how come 8-bit bytes wind up in that
> > function, and what do they stand for?
> 
> Some 8-byte encoding of the HTTP request body.
> 
> Anyway, yes, the hope is that the programmer uses something like 
> encode-coding-string to produce that value (and picks the encoding, and 
> indicates it in the appropriate HTTP header). Then string-to-unibyte 
> will simply be a no-op. But we need to catch the case when they don't, 
> and this seems to be the easiest way to do this.

If this is what you need, why not simply test the payload for being a
unibyte string?  There a function, multibyte-string-p, for that.





^ permalink raw reply	[flat|nested] 125+ messages in thread

* bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-06-20  2:40               ` Eli Zaretskii
@ 2016-06-20  2:51                 ` Dmitry Gutov
  2016-06-20 14:38                   ` Eli Zaretskii
  0 siblings, 1 reply; 125+ messages in thread
From: Dmitry Gutov @ 2016-06-20  2:51 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 23750, monnier, sdl.web

On 06/20/2016 05:40 AM, Eli Zaretskii wrote:

> So this is not a bug in Emacs, but a diagnostic facility to let bugs
> in applications be discovered?

It's a bug. Accepting invalid input and behaving badly with it is 
definitely a bug.

> If this is what you need, why not simply test the payload for being a
> unibyte string?  There a function, multibyte-string-p, for that.

There are a lot of variables to test (see the comment above the 
mapconcat call).

I'm fine either way, but my patch changes two characters, and yours will 
be longer. And you'll have to come up with the error message(s).

^ permalink raw reply	[flat|nested] 125+ messages in thread

* bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-06-20  2:51                 ` Dmitry Gutov
@ 2016-06-20 14:38                   ` Eli Zaretskii
  2016-06-20 14:54                     ` Dmitry Gutov
  2016-06-20 17:16                     ` Dmitry Gutov
  0 siblings, 2 replies; 125+ messages in thread
From: Eli Zaretskii @ 2016-06-20 14:38 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 23750, monnier, sdl.web

> Cc: 23750@debbugs.gnu.org, monnier@IRO.UMontreal.CA, sdl.web@gmail.com
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Mon, 20 Jun 2016 05:51:06 +0300

This all sounds like my response is not welcome, but in that case why
did you ask the question?

Anyway:

>     So this is not a bug in Emacs, but a diagnostic facility to let bugs
>     in applications be discovered?
> 
> It's a bug. Accepting invalid input and behaving badly with it is definitely a bug.

No, the bug is where the invalid input is generated in the first
place.  Each API has its contract; if you violate the contract, you
invoke undefined behavior.

>     If this is what you need, why not simply test the payload for being a
>     unibyte string?  There a function, multibyte-string-p, for that.
> 
> There are a lot of variables to test (see the comment above the mapconcat call).

Looks like mapc will be able to deal with that.  Or just use concat,
and test the result with multibyte-string-p before sending.  Or encode
it with UTF-8, if it is not unibyte already.

Btw, I don't think the comment which explains why we started using
mapconcat is accurate these days.  It was written before the move to
Unicode in Emacs 23, but we stopped converting raw bytes into Latin-1
characters in Emacs 23 and later.  So maybe we should just go back to
using concat (with erroring out, if the result is multibyte, and/or
maybe with replacing 'length' with 'string-bytes').

Bottom line: like I said, there should be no reason to use
string-*-unibyte in modern Emacs code on the url-http level or higher
(maybe not at all).  Its use is a sign of some basic misunderstanding,
or a bug elsewhere, or remnant of old problems that no longer exist.
So I think we should reconsider the solution on master as well.

> I'm fine either way, but my patch changes two characters, and yours will be longer.

I don't think the quality of a change should be judged by the number
of characters in the patch.  That is a very strange criterion, to say
the least.  It would mean, for example, that changes with comments are
worse than changes without comments, or that saving newlines in C code
(which makes the code less readable) is a virtue.

> And you'll have to come up with the error message(s).

Are you saying you like the error message from string-to-unibyte?

  Cannot convert 123th character to unibyte

Doesn't really strike me as something that a user or an average
developer will understand.  I thought you wanted something more
human-readable, like

  Invalid multibyte text in HTTP request %s

^ permalink raw reply	[flat|nested] 125+ messages in thread

* bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-06-20 14:38                   ` Eli Zaretskii
@ 2016-06-20 14:54                     ` Dmitry Gutov
  2016-06-20 15:03                       ` Eli Zaretskii
  2016-06-20 17:16                     ` Dmitry Gutov
  1 sibling, 1 reply; 125+ messages in thread
From: Dmitry Gutov @ 2016-06-20 14:54 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 23750, monnier, sdl.web

On 06/20/2016 05:38 PM, Eli Zaretskii wrote:

 > This all sounds like my response is not welcome, but in that case why
 > did you ask the question?

I was kind of hoping for "yes, let's get it into 25.1!"? :)

> No, the bug is where the invalid input is generated in the first
> place.  Each API has its contract; if you violate the contract, you
> invoke undefined behavior.

It's a bug in the API, or bad API, if you will. It needs stricter 
contract, and the submitted patch added it.

Or to look at it another way, the current contract allows url-http-data 
to be multibyte, because the requirement to the contrary is not 
documented anywhere that I can see. The variable is simply undocumented.

>>     If this is what you need, why not simply test the payload for being a
>>     unibyte string?  There a function, multibyte-string-p, for that.
>>
>> There are a lot of variables to test (see the comment above the mapconcat call).
>
> Looks like mapc will be able to deal with that.  Or just use concat,
> and test the result with multibyte-string-p before sending.  Or encode
> it with UTF-8, if it is not unibyte already.

I don't know if we want to be that permissive that we'll encode to UTF-8 
silently.

> Btw, I don't think the comment which explains why we started using
> mapconcat is accurate these days.  It was written before the move to
> Unicode in Emacs 23, but we stopped converting raw bytes into Latin-1
> characters in Emacs 23 and later.  So maybe we should just go back to
> using concat (with erroring out, if the result is multibyte, and/or
> maybe with replacing 'length' with 'string-bytes').

Better error out: the payload's encoding is something only the caller 
should be concerned with. Unless we're fine with the users assuming that 
Emacs's internal encoding is close enough to UTF-8.

> Bottom line: like I said, there should be no reason to use
> string-*-unibyte in modern Emacs code on the url-http level or higher
> (maybe not at all).  Its use is a sign of some basic misunderstanding,
> or a bug elsewhere, or remnant of old problems that no longer exist.
> So I think we should reconsider the solution on master as well.

I don't mind. Would you advocate for having this fix on emacs-25 if I 
implement it the way you described?

>> And you'll have to come up with the error message(s).
>
> Are you saying you like the error message from string-to-unibyte?
>
>   Cannot convert 123th character to unibyte

It's an order of magnitude better than what was before (no error and 
silent corruption), but yes, there is space for improvement.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-06-20 14:54                     ` Dmitry Gutov
@ 2016-06-20 15:03                       ` Eli Zaretskii
  0 siblings, 0 replies; 125+ messages in thread
From: Eli Zaretskii @ 2016-06-20 15:03 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 23750, monnier, sdl.web

> Cc: 23750@debbugs.gnu.org, monnier@IRO.UMontreal.CA, sdl.web@gmail.com
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Mon, 20 Jun 2016 17:54:23 +0300
> 
> On 06/20/2016 05:38 PM, Eli Zaretskii wrote:
> 
>  > This all sounds like my response is not welcome, but in that case why
>  > did you ask the question?
> 
> I was kind of hoping for "yes, let's get it into 25.1!"? :)

I'm not that kind of guy, as you know ;-)

> > Bottom line: like I said, there should be no reason to use
> > string-*-unibyte in modern Emacs code on the url-http level or higher
> > (maybe not at all).  Its use is a sign of some basic misunderstanding,
> > or a bug elsewhere, or remnant of old problems that no longer exist.
> > So I think we should reconsider the solution on master as well.
> 
> I don't mind. Would you advocate for having this fix on emacs-25 if I 
> implement it the way you described?

A single test and an error message is safe enough to go to emacs-25,
yes.





^ permalink raw reply	[flat|nested] 125+ messages in thread

* bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-06-20 14:38                   ` Eli Zaretskii
  2016-06-20 14:54                     ` Dmitry Gutov
@ 2016-06-20 17:16                     ` Dmitry Gutov
  2016-06-20 20:17                       ` Eli Zaretskii
  1 sibling, 1 reply; 125+ messages in thread
From: Dmitry Gutov @ 2016-06-20 17:16 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 23750, monnier, sdl.web

On 06/20/2016 05:38 PM, Eli Zaretskii wrote:

> Or just use concat,
> and test the result with multibyte-string-p before sending.

Actually, here's a reason why we might prefer not to replace 
string-as/to-unibyte with multibyte-string-p: string-to-unibyte works 
fine if the string's contents only contain ASCII/8-bit characters, even 
if the string itself is multibyte. But multibyte-string-p returns nil 
for such strings anyway.

So doing like you suggest might make some (arguably not well-written) 
programs fail, which otherwise could function fine, provided they only 
operate on ASCII strings. And having a multibyte string with ASCII-only 
contents is fairly common when the string is produced with 
buffer-substring from a source code buffer.

While it might be good to discourage this kind of programming practice 
(that doesn't handle non-ASCII text properly), it seems like this would 
be better for master rather that the impending release.

WDYT?

^ permalink raw reply	[flat|nested] 125+ messages in thread

* bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-06-20 17:16                     ` Dmitry Gutov
@ 2016-06-20 20:17                       ` Eli Zaretskii
  2016-06-20 20:27                         ` Dmitry Gutov
  0 siblings, 1 reply; 125+ messages in thread
From: Eli Zaretskii @ 2016-06-20 20:17 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 23750, monnier, sdl.web

> Cc: 23750@debbugs.gnu.org, monnier@IRO.UMontreal.CA, sdl.web@gmail.com
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Mon, 20 Jun 2016 20:16:37 +0300
> 
> On 06/20/2016 05:38 PM, Eli Zaretskii wrote:
> 
> > Or just use concat,
> > and test the result with multibyte-string-p before sending.
> 
> Actually, here's a reason why we might prefer not to replace 
> string-as/to-unibyte with multibyte-string-p: string-to-unibyte works 
> fine if the string's contents only contain ASCII/8-bit characters, even 
> if the string itself is multibyte. But multibyte-string-p returns nil 
> for such strings anyway.

We can replace the call to multibyte-string-p with a comparison of
what 'length' and 'string-bytes' return.  That should overcome this
issue.





^ permalink raw reply	[flat|nested] 125+ messages in thread

* bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-06-20 20:17                       ` Eli Zaretskii
@ 2016-06-20 20:27                         ` Dmitry Gutov
  2016-06-21  2:30                           ` Eli Zaretskii
  0 siblings, 1 reply; 125+ messages in thread
From: Dmitry Gutov @ 2016-06-20 20:27 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 23750, monnier, sdl.web

On 06/20/2016 11:17 PM, Eli Zaretskii wrote:

> We can replace the call to multibyte-string-p with a comparison of
> what 'length' and 'string-bytes' return.  That should overcome this
> issue.

Why not just call string-to-unibyte? To you expect different results?





^ permalink raw reply	[flat|nested] 125+ messages in thread

* bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-06-20 20:27                         ` Dmitry Gutov
@ 2016-06-21  2:30                           ` Eli Zaretskii
  2016-06-21 13:51                             ` Dmitry Gutov
  0 siblings, 1 reply; 125+ messages in thread
From: Eli Zaretskii @ 2016-06-21  2:30 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 23750, monnier, sdl.web

> Cc: 23750@debbugs.gnu.org, monnier@IRO.UMontreal.CA, sdl.web@gmail.com
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Mon, 20 Jun 2016 23:27:01 +0300
> 
> On 06/20/2016 11:17 PM, Eli Zaretskii wrote:
> 
> > We can replace the call to multibyte-string-p with a comparison of
> > what 'length' and 'string-bytes' return.  That should overcome this
> > issue.
> 
> Why not just call string-to-unibyte?

Because (a) I don't want to see that function in our sources, ever,
and (b) you don't have any control on the error message it produces,
which is not appropriate for application-level checks.





^ permalink raw reply	[flat|nested] 125+ messages in thread

* bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-06-21  2:30                           ` Eli Zaretskii
@ 2016-06-21 13:51                             ` Dmitry Gutov
  2016-06-21 15:18                               ` Eli Zaretskii
  0 siblings, 1 reply; 125+ messages in thread
From: Dmitry Gutov @ 2016-06-21 13:51 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 23750, monnier, sdl.web

[-- Attachment #1: Type: text/plain, Size: 368 bytes --]

On 06/21/2016 05:30 AM, Eli Zaretskii wrote:

> Because (a) I don't want to see that function in our sources, ever,
> and (b) you don't have any control on the error message it produces,
> which is not appropriate for application-level checks.

Please take a look at the attachment. OK to install?

I recall John saying we shouldn't push any more changes to emacs-25.

[-- Attachment #2: url-http-multibyte.diff --]
[-- Type: text/x-patch, Size: 1588 bytes --]

diff --git a/lisp/url/url-http.el b/lisp/url/url-http.el
index 5832e92..7156e6f 100644
--- a/lisp/url/url-http.el
+++ b/lisp/url/url-http.el
@@ -275,19 +275,7 @@ url-http-create-request
     ;; allows us to elide null lines directly, at the cost of making
     ;; the layout less clear.
     (setq request
-          ;; We used to concat directly, but if one of the strings happens
-          ;; to being multibyte (even if it only contains pure ASCII) then
-          ;; every string gets converted with `string-MAKE-multibyte' which
-          ;; turns the 127-255 codes into things like latin-1 accented chars
-          ;; (it would work right if it used `string-TO-multibyte' instead).
-          ;; So to avoid the problem we force every string to be unibyte.
-          (mapconcat
-           ;; FIXME: Instead of `string-AS-unibyte' we'd want
-           ;; `string-to-unibyte', so as to properly signal an error if one
-           ;; of the strings contains a multibyte char.
-           'string-as-unibyte
-           (delq nil
-            (list
+          (concat
              ;; The request
              (or url-http-method "GET") " "
              (if using-proxy (url-recreate-url url-http-target-url) real-fname)
@@ -365,7 +353,10 @@ url-http-create-request
              "\r\n"
              ;; Any data
              url-http-data))
-           ""))
+    ;; Bug#23750
+    (unless (= (string-bytes request)
+               (length request))
+      (error "Multibyte text in HTTP request: %s" request))
     (url-http-debug "Request is: \n%s" request)
     request))
 

^ permalink raw reply related	[flat|nested] 125+ messages in thread

* bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-06-21 13:51                             ` Dmitry Gutov
@ 2016-06-21 15:18                               ` Eli Zaretskii
  2016-06-22  1:08                                 ` John Wiegley
  0 siblings, 1 reply; 125+ messages in thread
From: Eli Zaretskii @ 2016-06-21 15:18 UTC (permalink / raw)
  To: Dmitry Gutov, John Wiegley; +Cc: 23750, monnier, sdl.web

> Cc: 23750@debbugs.gnu.org, monnier@IRO.UMontreal.CA, sdl.web@gmail.com
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Tue, 21 Jun 2016 16:51:59 +0300
> 
> > Because (a) I don't want to see that function in our sources, ever,
> > and (b) you don't have any control on the error message it produces,
> > which is not appropriate for application-level checks.
> 
> Please take a look at the attachment. OK to install?

Yes, but let's wait for John.

> I recall John saying we shouldn't push any more changes to emacs-25.

He did?  John, this change is IMO safe for emacs-25.  Is it OK to
push there?

Thanks.





^ permalink raw reply	[flat|nested] 125+ messages in thread

* bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-06-21 15:18                               ` Eli Zaretskii
@ 2016-06-22  1:08                                 ` John Wiegley
  2016-06-22  2:36                                   ` Eli Zaretskii
  2016-06-22 18:21                                   ` Dmitry Gutov
  0 siblings, 2 replies; 125+ messages in thread
From: John Wiegley @ 2016-06-22  1:08 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 23750, Dmitry Gutov, sdl.web, monnier

[-- Attachment #1: Type: text/plain, Size: 335 bytes --]

>>>>> Eli Zaretskii <eliz@gnu.org> writes:

> He did? John, this change is IMO safe for emacs-25. Is it OK to push there?

If you think it's safe, Eli, then I'm good with it.

-- 
John Wiegley                  GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com                          60E1 46C4 BD1A 7AC1 4BA2

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 629 bytes --]

^ permalink raw reply	[flat|nested] 125+ messages in thread

* bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-06-22  1:08                                 ` John Wiegley
@ 2016-06-22  2:36                                   ` Eli Zaretskii
  2016-06-22 18:21                                   ` Dmitry Gutov
  1 sibling, 0 replies; 125+ messages in thread
From: Eli Zaretskii @ 2016-06-22  2:36 UTC (permalink / raw)
  To: John Wiegley; +Cc: 23750, dgutov, sdl.web, monnier

> From: John Wiegley <jwiegley@gmail.com>
> Cc: Dmitry Gutov <dgutov@yandex.ru>,  23750@debbugs.gnu.org,  monnier@IRO.UMontreal.CA,  sdl.web@gmail.com
> Date: Tue, 21 Jun 2016 18:08:44 -0700
> 
> >>>>> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > He did? John, this change is IMO safe for emacs-25. Is it OK to push there?
> 
> If you think it's safe, Eli, then I'm good with it.

OK, thanks.





^ permalink raw reply	[flat|nested] 125+ messages in thread

* bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-06-22  1:08                                 ` John Wiegley
  2016-06-22  2:36                                   ` Eli Zaretskii
@ 2016-06-22 18:21                                   ` Dmitry Gutov
  1 sibling, 0 replies; 125+ messages in thread
From: Dmitry Gutov @ 2016-06-22 18:21 UTC (permalink / raw)
  To: John Wiegley, Eli Zaretskii; +Cc: 23750-done, sdl.web, monnier

On 06/22/2016 04:08 AM, John Wiegley wrote:

> If you think it's safe, Eli, then I'm good with it.

Thanks!

Pushed, and closing.






^ permalink raw reply	[flat|nested] 125+ messages in thread

* bug#23750: 25.0.95; bug in url-retrieve or json.el
@ 2016-11-29  8:22 Kentaro NAKAZAWA
  2016-11-29  9:54 ` Andreas Schwab
  0 siblings, 1 reply; 125+ messages in thread
From: Kentaro NAKAZAWA @ 2016-11-29  8:22 UTC (permalink / raw)
  To: dgutov, emacs-devel

Why can not I use multibyte text for http requests?
The following correct http request will fail.

(require 'json)
(let* ((content "ほげ <- VALID utf-8 Japanese multibyte text")
       (url "https://api.github.com/gists")
       (url-request-method "POST")
       (url-request-data
        (json-encode
         `(("description" . "test")
           ("public" . false)
           ("files" . (("test.txt" . (("content" . ,content)))))))))
  (with-current-buffer (url-retrieve-synchronously url)
    (buffer-string)))
=> url-http-create-request: Multibyte text in HTTP request: POST /gists
HTTP/1.1

Please apply the following patch.

--- url-http.el.orig	2016-09-15 17:16:04.000000000 +0900
+++ url-http.el	2016-11-29 17:10:57.018703500 +0900
@@ -351,16 +351,12 @@
              (if url-http-data
                  (concat
                   "Content-length: " (number-to-string
-                                      (length url-http-data))
+                                      (string-bytes url-http-data))
                   "\r\n"))
              ;; End request
              "\r\n"
              ;; Any data
              url-http-data))
-    ;; Bug#23750
-    (unless (= (string-bytes request)
-               (length request))
-      (error "Multibyte text in HTTP request: %s" request))
     (url-http-debug "Request is: \n%s" request)
     request))



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-29  8:22 Kentaro NAKAZAWA
@ 2016-11-29  9:54 ` Andreas Schwab
  2016-11-29 10:06   ` Kentaro NAKAZAWA
  0 siblings, 1 reply; 125+ messages in thread
From: Andreas Schwab @ 2016-11-29  9:54 UTC (permalink / raw)
  To: Kentaro NAKAZAWA; +Cc: emacs-devel, dgutov

On Nov 29 2016, Kentaro NAKAZAWA <kentaro.nakazawa@nifty.com> wrote:

> Why can not I use multibyte text for http requests?

You need to encode it.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-29  9:54 ` Andreas Schwab
@ 2016-11-29 10:06   ` Kentaro NAKAZAWA
  2016-11-29 10:08     ` Dmitry Gutov
  0 siblings, 1 reply; 125+ messages in thread
From: Kentaro NAKAZAWA @ 2016-11-29 10:06 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: emacs-devel, dgutov

On 2016/11/29 18:54, Andreas Schwab wrote:

> You need to encode it.

The text is encoded with utf-8.
The correct utf-8 text also contains multibyte text.
(Multibyte text is (/= (string-bytes text) (length text)) => t)

How can I correctly POST multibyte text?



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-29 10:06   ` Kentaro NAKAZAWA
@ 2016-11-29 10:08     ` Dmitry Gutov
  2016-11-29 10:23       ` Kentaro NAKAZAWA
  0 siblings, 1 reply; 125+ messages in thread
From: Dmitry Gutov @ 2016-11-29 10:08 UTC (permalink / raw)
  To: Kentaro NAKAZAWA, Andreas Schwab; +Cc: emacs-devel

On 29.11.2016 12:06, Kentaro NAKAZAWA wrote:

> The text is encoded with utf-8.
> The correct utf-8 text also contains multibyte text.
> (Multibyte text is (/= (string-bytes text) (length text)) => t)
>
> How can I correctly POST multibyte text?

You encode it to a unibyte string using encode-coding-string.



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-29 10:08     ` Dmitry Gutov
@ 2016-11-29 10:23       ` Kentaro NAKAZAWA
  2016-11-29 10:34         ` Lars Ingebrigtsen
  0 siblings, 1 reply; 125+ messages in thread
From: Kentaro NAKAZAWA @ 2016-11-29 10:23 UTC (permalink / raw)
  To: Dmitry Gutov, Andreas Schwab; +Cc: emacs-devel



On 2016/11/29 19:08, Dmitry Gutov wrote:

> You encode it to a unibyte string using encode-coding-string.

(let* ((content (encode-coding-string
                 "ほげ <- VALID utf-8 Japanese multibyte text"
                 'us-ascii))
=> The following text was POSTed.
?? <- VALID utf-8 Japanese multibyte text
^^Two question marks

(let* ((content (encode-coding-string
                 "ほげ <- VALID utf-8 Japanese multibyte text"
                 'raw-text))
=> url-http-create-request: Multibyte text in HTTP request: POST /gists
HTTP/1.1

I tried various things but I do not know how to do it ...



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-29 10:23       ` Kentaro NAKAZAWA
@ 2016-11-29 10:34         ` Lars Ingebrigtsen
  2016-11-29 10:38           ` Kentaro NAKAZAWA
  0 siblings, 1 reply; 125+ messages in thread
From: Lars Ingebrigtsen @ 2016-11-29 10:34 UTC (permalink / raw)
  To: Kentaro NAKAZAWA; +Cc: emacs-devel

Kentaro NAKAZAWA <kentaro.nakazawa@nifty.com> writes:

> (let* ((content (encode-coding-string
>                  "ほげ <- VALID utf-8 Japanese multibyte text"
>                  'us-ascii))

Use

(encode-coding-string "ほげ <- VALID utf-8 Japanese multibyte text" 'utf-8)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-29 10:34         ` Lars Ingebrigtsen
@ 2016-11-29 10:38           ` Kentaro NAKAZAWA
  2016-11-29 10:42             ` Lars Ingebrigtsen
  2016-11-29 10:50             ` Dmitry Gutov
  0 siblings, 2 replies; 125+ messages in thread
From: Kentaro NAKAZAWA @ 2016-11-29 10:38 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: emacs-devel

On 2016/11/29 19:34, Lars Ingebrigtsen wrote:

> (encode-coding-string "ほげ <- VALID utf-8 Japanese multibyte text" 'utf-8)

=> url-http-create-request: Multibyte text in HTTP request: POST /gists
HTTP/1.1

It is the same result.



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-29 10:38           ` Kentaro NAKAZAWA
@ 2016-11-29 10:42             ` Lars Ingebrigtsen
  2016-11-29 10:48               ` Kentaro NAKAZAWA
  2016-11-29 10:49               ` Dmitry Gutov
  2016-11-29 10:50             ` Dmitry Gutov
  1 sibling, 2 replies; 125+ messages in thread
From: Lars Ingebrigtsen @ 2016-11-29 10:42 UTC (permalink / raw)
  To: Kentaro NAKAZAWA; +Cc: emacs-devel

Kentaro NAKAZAWA <kentaro.nakazawa@nifty.com> writes:

> On 2016/11/29 19:34, Lars Ingebrigtsen wrote:
>
>> (encode-coding-string "ほげ <- VALID utf-8 Japanese multibyte text" 'utf-8)
>
> => url-http-create-request: Multibyte text in HTTP request: POST /gists
> HTTP/1.1
>
> It is the same result.

Uhm...  how about

(string-as-unibyte
 (encode-coding-string "ほげ <- VALID utf-8 Japanese multibyte text" 'utf-8))

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-29 10:42             ` Lars Ingebrigtsen
@ 2016-11-29 10:48               ` Kentaro NAKAZAWA
  2016-11-29 10:49               ` Dmitry Gutov
  1 sibling, 0 replies; 125+ messages in thread
From: Kentaro NAKAZAWA @ 2016-11-29 10:48 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: emacs-devel

On 2016/11/29 19:42, Lars Ingebrigtsen wrote:

> (string-as-unibyte
>  (encode-coding-string "ほげ <- VALID utf-8 Japanese multibyte text"
'utf-8))

=> url-http-create-request: Multibyte text in HTTP request: POST /gists
HTTP/1.1

This is also the same result...



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-29 10:42             ` Lars Ingebrigtsen
  2016-11-29 10:48               ` Kentaro NAKAZAWA
@ 2016-11-29 10:49               ` Dmitry Gutov
  1 sibling, 0 replies; 125+ messages in thread
From: Dmitry Gutov @ 2016-11-29 10:49 UTC (permalink / raw)
  To: Lars Ingebrigtsen, Kentaro NAKAZAWA; +Cc: emacs-devel

On 29.11.2016 12:42, Lars Ingebrigtsen wrote:

> (string-as-unibyte
>  (encode-coding-string "ほげ <- VALID utf-8 Japanese multibyte text" 'utf-8))

That shouldn't be necessary.



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-29 10:38           ` Kentaro NAKAZAWA
  2016-11-29 10:42             ` Lars Ingebrigtsen
@ 2016-11-29 10:50             ` Dmitry Gutov
  2016-11-29 10:55               ` Kentaro NAKAZAWA
  1 sibling, 1 reply; 125+ messages in thread
From: Dmitry Gutov @ 2016-11-29 10:50 UTC (permalink / raw)
  To: Kentaro NAKAZAWA, Lars Ingebrigtsen; +Cc: emacs-devel

On 29.11.2016 12:38, Kentaro NAKAZAWA wrote:
> On 2016/11/29 19:34, Lars Ingebrigtsen wrote:
>
>> (encode-coding-string "ほげ <- VALID utf-8 Japanese multibyte text" 'utf-8)
>
> => url-http-create-request: Multibyte text in HTTP request: POST /gists
> HTTP/1.1
>
> It is the same result.

Do you have a full example to reproduce this?



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-29 10:50             ` Dmitry Gutov
@ 2016-11-29 10:55               ` Kentaro NAKAZAWA
  2016-11-29 10:59                 ` Dmitry Gutov
  0 siblings, 1 reply; 125+ messages in thread
From: Kentaro NAKAZAWA @ 2016-11-29 10:55 UTC (permalink / raw)
  To: Dmitry Gutov, Lars Ingebrigtsen; +Cc: emacs-devel

On 2016/11/29 19:50, Dmitry Gutov wrote:

> Do you have a full example to reproduce this?

(require 'json)
(let* ((content "ほげ <- VALID utf-8 Japanese multibyte text")
       (url "https://api.github.com/gists")
       (url-request-method "POST")
       (url-request-data
        (json-encode
         `(("description" . "test")
           ("public" . false)
           ("files" . (("test.txt" . (("content" . ,content)))))))))
  (with-current-buffer (url-retrieve-synchronously url)
    (buffer-string)))

Evaluate the above by *scratch* and post it to private anonymous gist.



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-29 10:55               ` Kentaro NAKAZAWA
@ 2016-11-29 10:59                 ` Dmitry Gutov
  2016-11-29 11:03                   ` Kentaro NAKAZAWA
  0 siblings, 1 reply; 125+ messages in thread
From: Dmitry Gutov @ 2016-11-29 10:59 UTC (permalink / raw)
  To: Kentaro NAKAZAWA, Lars Ingebrigtsen; +Cc: emacs-devel

On 29.11.2016 12:55, Kentaro NAKAZAWA wrote:
> On 2016/11/29 19:50, Dmitry Gutov wrote:
>
>> Do you have a full example to reproduce this?
>
> (require 'json)
> (let* ((content "ほげ <- VALID utf-8 Japanese multibyte text")
>        (url "https://api.github.com/gists")
>        (url-request-method "POST")
>        (url-request-data
>         (json-encode
>          `(("description" . "test")
>            ("public" . false)
>            ("files" . (("test.txt" . (("content" . ,content)))))))))
>   (with-current-buffer (url-retrieve-synchronously url)
>     (buffer-string)))

Where is the encode-coding-string call?



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-29 10:59                 ` Dmitry Gutov
@ 2016-11-29 11:03                   ` Kentaro NAKAZAWA
  2016-11-29 11:05                     ` Dmitry Gutov
  0 siblings, 1 reply; 125+ messages in thread
From: Kentaro NAKAZAWA @ 2016-11-29 11:03 UTC (permalink / raw)
  To: Dmitry Gutov, Lars Ingebrigtsen; +Cc: emacs-devel

On 2016/11/29 19:59, Dmitry Gutov wrote:

> Where is the encode-coding-string call?

Sorry, this is it.

(let* ((content (encode-coding-string
                 "ほげ <- VALID utf-8 Japanese multibyte text"
                 'utf-8))
       (url "https://api.github.com/gists")
       (url-request-method "POST")
       (url-request-data
        (json-encode
         `(("description" . "test")
           ("public" . false)
           ("files" . (("test.txt" . (("content" . ,content)))))))))
  (with-current-buffer (url-retrieve-synchronously url)
    (buffer-string)))




^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-29 11:03                   ` Kentaro NAKAZAWA
@ 2016-11-29 11:05                     ` Dmitry Gutov
  2016-11-29 11:12                       ` Kentaro NAKAZAWA
  2016-11-29 17:23                       ` Eli Zaretskii
  0 siblings, 2 replies; 125+ messages in thread
From: Dmitry Gutov @ 2016-11-29 11:05 UTC (permalink / raw)
  To: Kentaro NAKAZAWA, Lars Ingebrigtsen; +Cc: emacs-devel

On 29.11.2016 13:03, Kentaro NAKAZAWA wrote:

> (let* ((content (encode-coding-string
>                  "ほげ <- VALID utf-8 Japanese multibyte text"
>                  'utf-8))
>        (url "https://api.github.com/gists")
>        (url-request-method "POST")
>        (url-request-data
>         (json-encode
>          `(("description" . "test")
>            ("public" . false)
>            ("files" . (("test.txt" . (("content" . ,content)))))))))
>   (with-current-buffer (url-retrieve-synchronously url)
>     (buffer-string)))

json-encode returns a multibyte string. Try this:

(let* ((content "ほげ <- VALID utf-8 Japanese multibyte text")
        (url "https://api.github.com/gists")
        (url-request-method "POST")
        (url-request-data
         (encode-coding-string
          (json-encode
           `(("description" . "test")
             ("public" . false)
             ("files" . (("test.txt" . (("content" . ,content)))))))
          'utf-8)))
   (with-current-buffer (url-retrieve-synchronously url)
     (buffer-string)))



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-29 11:05                     ` Dmitry Gutov
@ 2016-11-29 11:12                       ` Kentaro NAKAZAWA
  2016-11-29 17:23                       ` Eli Zaretskii
  1 sibling, 0 replies; 125+ messages in thread
From: Kentaro NAKAZAWA @ 2016-11-29 11:12 UTC (permalink / raw)
  To: Dmitry Gutov, Lars Ingebrigtsen; +Cc: emacs-devel

On 2016/11/29 20:05, Dmitry Gutov wrote:

> json-encode returns a multibyte string. Try this:

It worked! Thank you for telling me the correct code!
I confirmed the correct result below.

(let* ((content "ほげ <- VALID utf-8 Japanese multibyte text")
       (url "https://api.github.com/gists")
       (url-request-method "POST")
       (url-request-data
        (encode-coding-string
         (json-encode
          `(("description" . "test")
            ("public" . false)
            ("files" . (("test.txt" . (("content" . ,content)))))))
         'utf-8)))
  (with-current-buffer (url-retrieve-synchronously url)
    (when (url-http-parse-headers)
      (search-forward-regexp "\n\\s-*\n" nil t)
      (browse-url (cdr (assoc 'html_url (json-read)))))))



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-29 11:05                     ` Dmitry Gutov
  2016-11-29 11:12                       ` Kentaro NAKAZAWA
@ 2016-11-29 17:23                       ` Eli Zaretskii
  2016-11-29 23:09                         ` Philipp Stephani
  1 sibling, 1 reply; 125+ messages in thread
From: Eli Zaretskii @ 2016-11-29 17:23 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: larsi, kentaro.nakazawa, emacs-devel

> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Tue, 29 Nov 2016 13:05:39 +0200
> Cc: emacs-devel@gnu.org
> 
> On 29.11.2016 13:03, Kentaro NAKAZAWA wrote:
> 
> > (let* ((content (encode-coding-string
> >                  "ほげ <- VALID utf-8 Japanese multibyte text"
> >                  'utf-8))
> >        (url "https://api.github.com/gists")
> >        (url-request-method "POST")
> >        (url-request-data
> >         (json-encode
> >          `(("description" . "test")
> >            ("public" . false)
> >            ("files" . (("test.txt" . (("content" . ,content)))))))))
> >   (with-current-buffer (url-retrieve-synchronously url)
> >     (buffer-string)))
> 
> json-encode returns a multibyte string.

Any idea why?  Is it again that 'concat' misfeature, when one of the
strings is pure-ASCII, but happens to be multibyte?  Maybe we should
do something about that.

Thanks.



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-29 17:23                       ` Eli Zaretskii
@ 2016-11-29 23:09                         ` Philipp Stephani
  2016-11-29 23:18                           ` Philipp Stephani
                                             ` (2 more replies)
  0 siblings, 3 replies; 125+ messages in thread
From: Philipp Stephani @ 2016-11-29 23:09 UTC (permalink / raw)
  To: Eli Zaretskii, Dmitry Gutov; +Cc: larsi, kentaro.nakazawa, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1622 bytes --]

Eli Zaretskii <eliz@gnu.org> schrieb am Di., 29. Nov. 2016 um 18:24 Uhr:

> > From: Dmitry Gutov <dgutov@yandex.ru>
> > Date: Tue, 29 Nov 2016 13:05:39 +0200
> > Cc: emacs-devel@gnu.org
> >
> > On 29.11.2016 13:03, Kentaro NAKAZAWA wrote:
> >
> > > (let* ((content (encode-coding-string
> > >                  "ほげ <- VALID utf-8 Japanese multibyte text"
> > >                  'utf-8))
> > >        (url "https://api.github.com/gists")
> > >        (url-request-method "POST")
> > >        (url-request-data
> > >         (json-encode
> > >          `(("description" . "test")
> > >            ("public" . false)
> > >            ("files" . (("test.txt" . (("content" . ,content)))))))))
> > >   (with-current-buffer (url-retrieve-synchronously url)
> > >     (buffer-string)))
> >
> > json-encode returns a multibyte string.
>
> Any idea why?


Because (symbol-name 'false) returns a multibyte string. I guess the
ultimate reason is that the reader always creates multibyte strings for
symbol names.


> Is it again that 'concat' misfeature, when one of the
> strings is pure-ASCII, but happens to be multibyte?


Why is it a misfeature? I'd expect a concatenation of multibyte and unibyte
strings to either implicitly upgrade to as multibyte string (as in Python
2) or raise a signal (as in Python 3).
That url-retrieve breaks in this case is unfortunate, but I guess we can't
do much about it without breaking other stuff. Maybe the behavior regarding
unibyte and multibyte strings (e.g. what kinds of strings the reader and
`concat' generate) should simply be documented.

[-- Attachment #2: Type: text/html, Size: 2984 bytes --]

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-29 23:09                         ` Philipp Stephani
@ 2016-11-29 23:18                           ` Philipp Stephani
  2016-11-30 15:11                             ` Eli Zaretskii
  2016-11-30  0:16                           ` Dmitry Gutov
  2016-11-30 15:06                           ` Eli Zaretskii
  2 siblings, 1 reply; 125+ messages in thread
From: Philipp Stephani @ 2016-11-29 23:18 UTC (permalink / raw)
  To: Eli Zaretskii, Dmitry Gutov; +Cc: larsi, kentaro.nakazawa, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 327 bytes --]

Philipp Stephani <p.stephani2@gmail.com> schrieb am Mi., 30. Nov. 2016 um
00:09 Uhr:

> That url-retrieve breaks in this case is unfortunate, but I guess we can't
> do much about it without breaking other stuff.
>

Ah, I guess the URL functions could simply call string-to-unibyte, that
should do the right thing in all cases.

[-- Attachment #2: Type: text/html, Size: 706 bytes --]

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-29 23:18                           ` Philipp Stephani
@ 2016-11-30 15:11                             ` Eli Zaretskii
  2016-11-30 15:20                               ` Lars Ingebrigtsen
  0 siblings, 1 reply; 125+ messages in thread
From: Eli Zaretskii @ 2016-11-30 15:11 UTC (permalink / raw)
  To: Philipp Stephani; +Cc: larsi, emacs-devel, kentaro.nakazawa, dgutov

> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Tue, 29 Nov 2016 23:18:21 +0000
> Cc: larsi@gnus.org, kentaro.nakazawa@nifty.com, emacs-devel@gnu.org
> 
> Ah, I guess the URL functions could simply call string-to-unibyte, that should do the right thing in all cases. 

That would bring back the problem which caused us to introduce the
test which triggered this bug report.  string-to-unibyte can produce
results that might surprise naïve users, and it also can signal an
error whose text is not fit for showing it to users.

We are trying to avoid using that function, for these very reasons.



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-30 15:11                             ` Eli Zaretskii
@ 2016-11-30 15:20                               ` Lars Ingebrigtsen
  2016-11-30 15:43                                 ` Eli Zaretskii
  0 siblings, 1 reply; 125+ messages in thread
From: Lars Ingebrigtsen @ 2016-11-30 15:20 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Philipp Stephani, emacs-devel, kentaro.nakazawa, dgutov

Eli Zaretskii <eliz@gnu.org> writes:

> We are trying to avoid using that function, for these very reasons.

Indeed.

The entire url-retrieve interface is more than a little broken in many
small ways.

In the next-generation URL library interface (the `with-url' thing
discussed intermittently the past few years) I think it would make sense
to supply the caller with a method to say what charset you want stuff
like this to be encoded with.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-30 15:20                               ` Lars Ingebrigtsen
@ 2016-11-30 15:43                                 ` Eli Zaretskii
  2016-11-30 15:46                                   ` Lars Ingebrigtsen
  0 siblings, 1 reply; 125+ messages in thread
From: Eli Zaretskii @ 2016-11-30 15:43 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: p.stephani2, emacs-devel, kentaro.nakazawa, dgutov

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: Philipp Stephani <p.stephani2@gmail.com>,  dgutov@yandex.ru,  kentaro.nakazawa@nifty.com,  emacs-devel@gnu.org
> Date: Wed, 30 Nov 2016 16:20:20 +0100
> 
> In the next-generation URL library interface (the `with-url' thing
> discussed intermittently the past few years) I think it would make sense
> to supply the caller with a method to say what charset you want stuff
> like this to be encoded with.

Would they ever want anything except utf-8?



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-30 15:43                                 ` Eli Zaretskii
@ 2016-11-30 15:46                                   ` Lars Ingebrigtsen
  0 siblings, 0 replies; 125+ messages in thread
From: Lars Ingebrigtsen @ 2016-11-30 15:46 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: p.stephani2, emacs-devel, kentaro.nakazawa, dgutov

Eli Zaretskii <eliz@gnu.org> writes:

> Would they ever want anything except utf-8?

Standard HTTP values should be URL-encoded (or similar) anyway, so
non-URL-encoded values are for pretty non-standard use.  So I would
expect people to create interfaces in whatever charset they happen to
think of.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-29 23:09                         ` Philipp Stephani
  2016-11-29 23:18                           ` Philipp Stephani
@ 2016-11-30  0:16                           ` Dmitry Gutov
  2016-11-30 15:13                             ` Eli Zaretskii
  2016-11-30 15:06                           ` Eli Zaretskii
  2 siblings, 1 reply; 125+ messages in thread
From: Dmitry Gutov @ 2016-11-30  0:16 UTC (permalink / raw)
  To: Philipp Stephani, Eli Zaretskii; +Cc: larsi, kentaro.nakazawa, emacs-devel

On 30.11.2016 01:09, Philipp Stephani wrote:

> Because (symbol-name 'false) returns a multibyte string. I guess the ultimate reason is that the reader always creates multibyte strings for symbol names.

Yes. For the same reason,

(json-encode-alist '((a . "abc")))

also returns a multibyte string. And we're likely to see symbols as keys 
a lot.



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-30  0:16                           ` Dmitry Gutov
@ 2016-11-30 15:13                             ` Eli Zaretskii
  2016-11-30 15:17                               ` Dmitry Gutov
  0 siblings, 1 reply; 125+ messages in thread
From: Eli Zaretskii @ 2016-11-30 15:13 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: p.stephani2, emacs-devel, larsi, kentaro.nakazawa

> Cc: larsi@gnus.org, kentaro.nakazawa@nifty.com, emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Wed, 30 Nov 2016 02:16:36 +0200
> 
> On 30.11.2016 01:09, Philipp Stephani wrote:
> 
> > Because (symbol-name 'false) returns a multibyte string. I guess the ultimate reason is that the reader always creates multibyte strings for symbol names.
> 
> Yes. For the same reason,
> 
> (json-encode-alist '((a . "abc")))
> 
> also returns a multibyte string. And we're likely to see symbols as keys 
> a lot.

Can we do something about that in json-encode-* functions?



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-30 15:13                             ` Eli Zaretskii
@ 2016-11-30 15:17                               ` Dmitry Gutov
  2016-11-30 15:32                                 ` Stefan Monnier
  2016-11-30 15:42                                 ` Eli Zaretskii
  0 siblings, 2 replies; 125+ messages in thread
From: Dmitry Gutov @ 2016-11-30 15:17 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: p.stephani2, emacs-devel, larsi, kentaro.nakazawa

On 30.11.2016 17:13, Eli Zaretskii wrote:

> Can we do something about that in json-encode-* functions?

json-encode uses the previously mentioned symbol-name, which returns 
multibyte values. What would we do about that?



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-30 15:17                               ` Dmitry Gutov
@ 2016-11-30 15:32                                 ` Stefan Monnier
  2016-11-30 15:42                                 ` Eli Zaretskii
  1 sibling, 0 replies; 125+ messages in thread
From: Stefan Monnier @ 2016-11-30 15:32 UTC (permalink / raw)
  To: emacs-devel

>> Can we do something about that in json-encode-* functions?
> json-encode uses the previously mentioned symbol-name, which returns
> multibyte values. What would we do about that?

We need to encode the symbol name since it's a plain string which can
contain non-ASCII chars.


        Stefan




^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-30 15:17                               ` Dmitry Gutov
  2016-11-30 15:32                                 ` Stefan Monnier
@ 2016-11-30 15:42                                 ` Eli Zaretskii
  2016-11-30 15:45                                   ` Dmitry Gutov
  1 sibling, 1 reply; 125+ messages in thread
From: Eli Zaretskii @ 2016-11-30 15:42 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: p.stephani2, emacs-devel, larsi, kentaro.nakazawa

> Cc: p.stephani2@gmail.com, larsi@gnus.org, kentaro.nakazawa@nifty.com,
>  emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Wed, 30 Nov 2016 17:17:18 +0200
> 
> On 30.11.2016 17:13, Eli Zaretskii wrote:
> 
> > Can we do something about that in json-encode-* functions?
> 
> json-encode uses the previously mentioned symbol-name, which returns 
> multibyte values. What would we do about that?

Check that the value returned by symbol-name is pure-ASCII, and if so,
make it unibyte?



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-30 15:42                                 ` Eli Zaretskii
@ 2016-11-30 15:45                                   ` Dmitry Gutov
  2016-11-30 15:48                                     ` Lars Ingebrigtsen
  2016-11-30 16:23                                     ` Eli Zaretskii
  0 siblings, 2 replies; 125+ messages in thread
From: Dmitry Gutov @ 2016-11-30 15:45 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: p.stephani2, emacs-devel, larsi, kentaro.nakazawa

On 30.11.2016 17:42, Eli Zaretskii wrote:

>> json-encode uses the previously mentioned symbol-name, which returns
>> multibyte values. What would we do about that?
>
> Check that the value returned by symbol-name is pure-ASCII, and if so,
> make it unibyte?

In json-encode? Should it really deal with that concern explicitly?

I could understand an idea along the lines of "use a different 
algorithm", but calling encode-coding-string inside json-encode sounds odd.



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-30 15:45                                   ` Dmitry Gutov
@ 2016-11-30 15:48                                     ` Lars Ingebrigtsen
  2016-11-30 16:25                                       ` Eli Zaretskii
  2016-12-28 18:22                                       ` Philipp Stephani
  2016-11-30 16:23                                     ` Eli Zaretskii
  1 sibling, 2 replies; 125+ messages in thread
From: Lars Ingebrigtsen @ 2016-11-30 15:48 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Eli Zaretskii, emacs-devel, p.stephani2, kentaro.nakazawa

Dmitry Gutov <dgutov@yandex.ru> writes:

> In json-encode? Should it really deal with that concern explicitly?
>
> I could understand an idea along the lines of "use a different
> algorithm", but calling encode-coding-string inside json-encode sounds
> odd.

Yes, this is not a json.el problem at all.  It does the correct thing,
and shouldn't be changed.

It's just url.el being lacking in features, as usual.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-30 15:48                                     ` Lars Ingebrigtsen
@ 2016-11-30 16:25                                       ` Eli Zaretskii
  2016-11-30 16:27                                         ` Lars Ingebrigtsen
  2016-11-30 18:23                                         ` Philipp Stephani
  2016-12-28 18:22                                       ` Philipp Stephani
  1 sibling, 2 replies; 125+ messages in thread
From: Eli Zaretskii @ 2016-11-30 16:25 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: p.stephani2, emacs-devel, kentaro.nakazawa, dgutov

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: Eli Zaretskii <eliz@gnu.org>,  p.stephani2@gmail.com,  kentaro.nakazawa@nifty.com,  emacs-devel@gnu.org
> Date: Wed, 30 Nov 2016 16:48:09 +0100
> 
> Yes, this is not a json.el problem at all.  It does the correct thing,
> and shouldn't be changed.

??? Why should any code care whether a pure-ASCII string is marked as
unibyte or as multibyte?  Both are "correct".



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-30 16:25                                       ` Eli Zaretskii
@ 2016-11-30 16:27                                         ` Lars Ingebrigtsen
  2016-11-30 16:42                                           ` Eli Zaretskii
  2016-11-30 18:23                                         ` Philipp Stephani
  1 sibling, 1 reply; 125+ messages in thread
From: Lars Ingebrigtsen @ 2016-11-30 16:27 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: p.stephani2, emacs-devel, kentaro.nakazawa, dgutov

Eli Zaretskii <eliz@gnu.org> writes:

>> Yes, this is not a json.el problem at all.  It does the correct thing,
>> and shouldn't be changed.
>
> ??? Why should any code care whether a pure-ASCII string is marked as
> unibyte or as multibyte?  Both are "correct".

That's right -- why should any code care?  Yet url.el does.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-30 16:27                                         ` Lars Ingebrigtsen
@ 2016-11-30 16:42                                           ` Eli Zaretskii
  2016-11-30 18:25                                             ` Philipp Stephani
  0 siblings, 1 reply; 125+ messages in thread
From: Eli Zaretskii @ 2016-11-30 16:42 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: p.stephani2, emacs-devel, kentaro.nakazawa, dgutov

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: dgutov@yandex.ru,  p.stephani2@gmail.com,  kentaro.nakazawa@nifty.com,  emacs-devel@gnu.org
> Date: Wed, 30 Nov 2016 17:27:05 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> Yes, this is not a json.el problem at all.  It does the correct thing,
> >> and shouldn't be changed.
> >
> > ??? Why should any code care whether a pure-ASCII string is marked as
> > unibyte or as multibyte?  Both are "correct".
> 
> That's right -- why should any code care?  Yet url.el does.

No, it doesn't, not if the string is plain ASCII.



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-30 16:42                                           ` Eli Zaretskii
@ 2016-11-30 18:25                                             ` Philipp Stephani
  2016-11-30 18:48                                               ` Eli Zaretskii
  0 siblings, 1 reply; 125+ messages in thread
From: Philipp Stephani @ 2016-11-30 18:25 UTC (permalink / raw)
  To: Eli Zaretskii, Lars Ingebrigtsen; +Cc: dgutov, kentaro.nakazawa, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 934 bytes --]

Eli Zaretskii <eliz@gnu.org> schrieb am Mi., 30. Nov. 2016 um 17:42 Uhr:

> > From: Lars Ingebrigtsen <larsi@gnus.org>
> > Cc: dgutov@yandex.ru,  p.stephani2@gmail.com,
> kentaro.nakazawa@nifty.com,  emacs-devel@gnu.org
> > Date: Wed, 30 Nov 2016 17:27:05 +0100
> >
> > Eli Zaretskii <eliz@gnu.org> writes:
> >
> > >> Yes, this is not a json.el problem at all.  It does the correct thing,
> > >> and shouldn't be changed.
> > >
> > > ??? Why should any code care whether a pure-ASCII string is marked as
> > > unibyte or as multibyte?  Both are "correct".
> >
> > That's right -- why should any code care?  Yet url.el does.
>
> No, it doesn't, not if the string is plain ASCII.
>
>
But in that case it isn't, it's morally a byte array.
What Emacs lacks is good support for byte arrays. For HTTP,
process-send-string shouldn't need to deal with encoding or EOL conversion,
it should just accept a byte array and send that, unmodified.

[-- Attachment #2: Type: text/html, Size: 2099 bytes --]

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-30 18:25                                             ` Philipp Stephani
@ 2016-11-30 18:48                                               ` Eli Zaretskii
  2016-12-28 18:18                                                 ` Philipp Stephani
  0 siblings, 1 reply; 125+ messages in thread
From: Eli Zaretskii @ 2016-11-30 18:48 UTC (permalink / raw)
  To: Philipp Stephani; +Cc: larsi, dgutov, kentaro.nakazawa, emacs-devel

> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Wed, 30 Nov 2016 18:25:09 +0000
> Cc: emacs-devel@gnu.org, kentaro.nakazawa@nifty.com, dgutov@yandex.ru
> 
>  > That's right -- why should any code care? Yet url.el does.
> 
>  No, it doesn't, not if the string is plain ASCII.
> 
> But in that case it isn't, it's morally a byte array.

Yes, because the internal representation of characters in Emacs is a
superset of UTF-8.

> What Emacs lacks is good support for byte arrays.

Unibyte strings are byte arrays.  What do you think we lack in that regard?

> For HTTP, process-send-string shouldn't need to deal
> with encoding or EOL conversion, it should just accept a byte array and send that, unmodified.

I disagree.  Handling unibyte strings is a nuisance, so Emacs allows
most applications be oblivious about them, and just handle
human-readable text.



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-30 18:48                                               ` Eli Zaretskii
@ 2016-12-28 18:18                                                 ` Philipp Stephani
  2016-12-28 18:34                                                   ` Eli Zaretskii
  0 siblings, 1 reply; 125+ messages in thread
From: Philipp Stephani @ 2016-12-28 18:18 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, dgutov, kentaro.nakazawa, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1685 bytes --]

Eli Zaretskii <eliz@gnu.org> schrieb am Mi., 30. Nov. 2016 um 19:48 Uhr:

> > From: Philipp Stephani <p.stephani2@gmail.com>
> > Date: Wed, 30 Nov 2016 18:25:09 +0000
> > Cc: emacs-devel@gnu.org, kentaro.nakazawa@nifty.com, dgutov@yandex.ru
> >
> >  > That's right -- why should any code care? Yet url.el does.
> >
> >  No, it doesn't, not if the string is plain ASCII.
> >
> > But in that case it isn't, it's morally a byte array.
>
> Yes, because the internal representation of characters in Emacs is a
> superset of UTF-8.
>

That has nothing to do with characters. A byte array is conceptually
different from a character string.


>
> > What Emacs lacks is good support for byte arrays.
>
> Unibyte strings are byte arrays.  What do you think we lack in that regard?
>

If unibyte strings should be used for byte arrays, then the URL functions
should indeed signal an error whenever url-request-data is a multibyte
string, as HTTP requests are conceptually byte arrays, not character
strings.


>
> > For HTTP, process-send-string shouldn't need to deal
> > with encoding or EOL conversion, it should just accept a byte array and
> send that, unmodified.
>
> I disagree.  Handling unibyte strings is a nuisance, so Emacs allows
> most applications be oblivious about them, and just handle
> human-readable text.
>

That is the wrong approach (byte arrays and character strings are
fundamentally different types, and mixing them together only causes pain),
and it cannot work when implementing network protocols. HTTP requests are
*not* human-readable text, they are byte arrays. Attempting to handle
Unicode strings can't work because we wouldn't know the number of encoded
bytes.

[-- Attachment #2: Type: text/html, Size: 3100 bytes --]

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-12-28 18:18                                                 ` Philipp Stephani
@ 2016-12-28 18:34                                                   ` Eli Zaretskii
  2016-12-28 18:45                                                     ` Philipp Stephani
  0 siblings, 1 reply; 125+ messages in thread
From: Eli Zaretskii @ 2016-12-28 18:34 UTC (permalink / raw)
  To: Philipp Stephani; +Cc: larsi, dgutov, kentaro.nakazawa, emacs-devel

> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Wed, 28 Dec 2016 18:18:25 +0000
> Cc: larsi@gnus.org, emacs-devel@gnu.org, kentaro.nakazawa@nifty.com, 
> 	dgutov@yandex.ru
> 
>  > > That's right -- why should any code care? Yet url.el does.
>  >
>  > No, it doesn't, not if the string is plain ASCII.
>  >
>  > But in that case it isn't, it's morally a byte array.
> 
>  Yes, because the internal representation of characters in Emacs is a
>  superset of UTF-8.
> 
> That has nothing to do with characters. A byte array is conceptually different from a character string.

In Emacs, they are both implemented using very similar objects.

>  > What Emacs lacks is good support for byte arrays.
> 
>  Unibyte strings are byte arrays. What do you think we lack in that regard?
> 
> If unibyte strings should be used for byte arrays, then the URL functions should indeed signal an error
> whenever url-request-data is a multibyte string, as HTTP requests are conceptually byte arrays, not character
> strings.

Which is what we do now.

>  > For HTTP, process-send-string shouldn't need to deal
>  > with encoding or EOL conversion, it should just accept a byte array and send that, unmodified.
> 
>  I disagree. Handling unibyte strings is a nuisance, so Emacs allows
>  most applications be oblivious about them, and just handle
>  human-readable text.
> 
> That is the wrong approach (byte arrays and character strings are fundamentally different types, and mixing
> them together only causes pain), and it cannot work when implementing network protocols. HTTP requests
> are *not* human-readable text, they are byte arrays. Attempting to handle Unicode strings can't work because
> we wouldn't know the number of encoded bytes.

You are arguing against a long and quite painful history of non-ASCII
strings in Emacs.  What we have now is based on a lot of experience
and at least two very large refactoring jobs.  Going back would be a
very bad idea indeed, as we've been there already, and users didn't
like that.  Some of us are old enough to remember the notorious \201
bytes creeping into text files and mail messages, due to that.  Never
again.

Our experience is that we should keep use of unibyte strings in Lisp
application code to the absolute minimum, ideally zero.  Once we
arrived at that conclusion, we've been living happily ever after.
This minor issue we are discussing here is certainly not worth
repeating past mistakes for which we paid plenty in sweat and blood.



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-12-28 18:34                                                   ` Eli Zaretskii
@ 2016-12-28 18:45                                                     ` Philipp Stephani
  2016-12-28 18:55                                                       ` Eli Zaretskii
  2016-12-28 19:03                                                       ` Andreas Schwab
  0 siblings, 2 replies; 125+ messages in thread
From: Philipp Stephani @ 2016-12-28 18:45 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, dgutov, kentaro.nakazawa, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 3366 bytes --]

Eli Zaretskii <eliz@gnu.org> schrieb am Mi., 28. Dez. 2016 um 19:35 Uhr:

> > From: Philipp Stephani <p.stephani2@gmail.com>
> > Date: Wed, 28 Dec 2016 18:18:25 +0000
> > Cc: larsi@gnus.org, emacs-devel@gnu.org, kentaro.nakazawa@nifty.com,
> >       dgutov@yandex.ru
> >
> >  > > That's right -- why should any code care? Yet url.el does.
> >  >
> >  > No, it doesn't, not if the string is plain ASCII.
> >  >
> >  > But in that case it isn't, it's morally a byte array.
> >
> >  Yes, because the internal representation of characters in Emacs is a
> >  superset of UTF-8.
> >
> > That has nothing to do with characters. A byte array is conceptually
> different from a character string.
>
> In Emacs, they are both implemented using very similar objects.
>

Yes, that's why I said "conceptually different". The concepts may be the
different, but the implementation might still be the same.


>
> >  > What Emacs lacks is good support for byte arrays.
> >
> >  Unibyte strings are byte arrays. What do you think we lack in that
> regard?
> >
> > If unibyte strings should be used for byte arrays, then the URL
> functions should indeed signal an error
> > whenever url-request-data is a multibyte string, as HTTP requests are
> conceptually byte arrays, not character
> > strings.
>
> Which is what we do now.
>

There is no such check for url-request-data. There's an overall check for
the complete request, but that also doesn't check for unibyte-ness.


>
> >  > For HTTP, process-send-string shouldn't need to deal
> >  > with encoding or EOL conversion, it should just accept a byte array
> and send that, unmodified.
> >
> >  I disagree. Handling unibyte strings is a nuisance, so Emacs allows
> >  most applications be oblivious about them, and just handle
> >  human-readable text.
> >
> > That is the wrong approach (byte arrays and character strings are
> fundamentally different types, and mixing
> > them together only causes pain), and it cannot work when implementing
> network protocols. HTTP requests
> > are *not* human-readable text, they are byte arrays. Attempting to
> handle Unicode strings can't work because
> > we wouldn't know the number of encoded bytes.
>
> You are arguing against a long and quite painful history of non-ASCII
> strings in Emacs.  What we have now is based on a lot of experience
> and at least two very large refactoring jobs.  Going back would be a
> very bad idea indeed, as we've been there already, and users didn't
> like that.  Some of us are old enough to remember the notorious \201
> bytes creeping into text files and mail messages, due to that.  Never
> again.
>

I'm not suggesting going back, too much would be broken.


>
> Our experience is that we should keep use of unibyte strings in Lisp
> application code to the absolute minimum, ideally zero.  Once we
> arrived at that conclusion, we've been living happily ever after.
> This minor issue we are discussing here is certainly not worth
> repeating past mistakes for which we paid plenty in sweat and blood.
>

If you want unibyte strings to represent octet streams, then unibyte
strings must be usable in application code, because octet streams are a
concept that exists in reality, and applications must be able to support
them in some way. If you don't want unibyte strings, then you need to
provide some different way to represent octet streams.

[-- Attachment #2: Type: text/html, Size: 5777 bytes --]

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-12-28 18:45                                                     ` Philipp Stephani
@ 2016-12-28 18:55                                                       ` Eli Zaretskii
  2016-12-28 19:03                                                       ` Andreas Schwab
  1 sibling, 0 replies; 125+ messages in thread
From: Eli Zaretskii @ 2016-12-28 18:55 UTC (permalink / raw)
  To: Philipp Stephani; +Cc: larsi, dgutov, kentaro.nakazawa, emacs-devel

> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Wed, 28 Dec 2016 18:45:43 +0000
> Cc: larsi@gnus.org, emacs-devel@gnu.org, kentaro.nakazawa@nifty.com, 
> 	dgutov@yandex.ru
> 
>  > That has nothing to do with characters. A byte array is conceptually different from a character string.
> 
>  In Emacs, they are both implemented using very similar objects.
> 
> Yes, that's why I said "conceptually different". The concepts may be the different, but the implementation
> might still be the same.

If the implementation is the same, then concepts are not very
different to begin with, and the abstraction will sooner or later
leak into applications.

>  Our experience is that we should keep use of unibyte strings in Lisp
>  application code to the absolute minimum, ideally zero. Once we
>  arrived at that conclusion, we've been living happily ever after.
>  This minor issue we are discussing here is certainly not worth
>  repeating past mistakes for which we paid plenty in sweat and blood.
> 
> If you want unibyte strings to represent octet streams, then unibyte strings must be usable in application
> code

They are usable, but using them requires knowledge and proficiency
that's unusual with many Lisp developers, and it also has some
unpleasant pitfalls.

> because octet streams are a concept that exists in reality, and applications must be able to support
> them in some way. If you don't want unibyte strings, then you need to provide some different way to represent
> octet streams. 

We use unibyte strings where we must, and otherwise prefer multibyte
ones.  In most cases the unibyte strings exist in Emacs internals, so
that Lisp applications will not have to deal with them.  This case is
one of the few exceptions.

If you are still unconvinced and think that we need some separate
representation for byte arrays, consider this: when Emacs starts, it
takes some time until it bootstraps itself enough to learn how to
decode non-ASCII strings, such as file names.  Until then, all file
names are unibyte strings, and Emacs still must handle them correctly,
because otherwise it would be impossible to build or start it in a
directory that includes non-ASCII characters.

This and other similar subtleties are the reason why using anything
but a string for raw byte arrays is not a good idea, IMO and IME.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-12-28 18:45                                                     ` Philipp Stephani
  2016-12-28 18:55                                                       ` Eli Zaretskii
@ 2016-12-28 19:03                                                       ` Andreas Schwab
  1 sibling, 0 replies; 125+ messages in thread
From: Andreas Schwab @ 2016-12-28 19:03 UTC (permalink / raw)
  To: Philipp Stephani
  Cc: Eli Zaretskii, emacs-devel, kentaro.nakazawa, larsi, dgutov

On Dez 28 2016, Philipp Stephani <p.stephani2@gmail.com> wrote:

> If you want unibyte strings to represent octet streams, then unibyte
> strings must be usable in application code, because octet streams are a
> concept that exists in reality, and applications must be able to support
> them in some way. If you don't want unibyte strings, then you need to
> provide some different way to represent octet streams.

Octet streams are basically encoded strings, and we use unibyte strings
for encoded strings.  That's the only place where unibyte strings should
be used in Emacs.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-30 16:25                                       ` Eli Zaretskii
  2016-11-30 16:27                                         ` Lars Ingebrigtsen
@ 2016-11-30 18:23                                         ` Philipp Stephani
  2016-11-30 18:44                                           ` Eli Zaretskii
  1 sibling, 1 reply; 125+ messages in thread
From: Philipp Stephani @ 2016-11-30 18:23 UTC (permalink / raw)
  To: Eli Zaretskii, Lars Ingebrigtsen; +Cc: emacs-devel, kentaro.nakazawa, dgutov

[-- Attachment #1: Type: text/plain, Size: 875 bytes --]

Eli Zaretskii <eliz@gnu.org> schrieb am Mi., 30. Nov. 2016 um 17:25 Uhr:

> > From: Lars Ingebrigtsen <larsi@gnus.org>
> > Cc: Eli Zaretskii <eliz@gnu.org>,  p.stephani2@gmail.com,
> kentaro.nakazawa@nifty.com,  emacs-devel@gnu.org
> > Date: Wed, 30 Nov 2016 16:48:09 +0100
> >
> > Yes, this is not a json.el problem at all.  It does the correct thing,
> > and shouldn't be changed.
>
> ??? Why should any code care whether a pure-ASCII string is marked as
> unibyte or as multibyte?  Both are "correct".
>

I guess the problem is that process-send-string cares. If it didn't, we
wouldn't have the problem.
For URL, we'd need functions like
  (byte-array-length s) = (length (string-to-unibyte s))
  (process-send-bytes s) = (process-send-string (string-to-unibyte s))
(conceptually; process-send-string also does EOL conversion, which should
never be done for HTTP bodies.)

[-- Attachment #2: Type: text/html, Size: 1802 bytes --]

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-30 18:23                                         ` Philipp Stephani
@ 2016-11-30 18:44                                           ` Eli Zaretskii
  2016-12-28 18:09                                             ` Philipp Stephani
  0 siblings, 1 reply; 125+ messages in thread
From: Eli Zaretskii @ 2016-11-30 18:44 UTC (permalink / raw)
  To: Philipp Stephani; +Cc: larsi, emacs-devel, kentaro.nakazawa, dgutov

> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Wed, 30 Nov 2016 18:23:14 +0000
> Cc: dgutov@yandex.ru, kentaro.nakazawa@nifty.com, emacs-devel@gnu.org
> 
>  > Yes, this is not a json.el problem at all. It does the correct thing,
>  > and shouldn't be changed.
> 
>  ??? Why should any code care whether a pure-ASCII string is marked as
>  unibyte or as multibyte? Both are "correct".
> 
> I guess the problem is that process-send-string cares. If it didn't, we wouldn't have the problem.

I don't think I follow.  The error we are talking about is signaled
from url-http-create-request, not from process-send-string.

> For URL, we'd need functions like
> (byte-array-length s) = (length (string-to-unibyte s))

Why do you need this?  string-to-unibyte is well-defined only for
unibyte or ASCII strings (if we forget the raw bytes for a moment), so
length will do.

> (process-send-bytes s) = (process-send-string (string-to-unibyte s))

Why is this needed?  process-send-string already encodes its argument,
which produces a unibyte string.

> (conceptually; process-send-string also does EOL conversion, which should never be done for HTTP
> bodies.) 

I don't understand why.  There are protocols that require CR-LF, no?




^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-30 18:44                                           ` Eli Zaretskii
@ 2016-12-28 18:09                                             ` Philipp Stephani
  2016-12-28 18:27                                               ` Eli Zaretskii
  0 siblings, 1 reply; 125+ messages in thread
From: Philipp Stephani @ 2016-12-28 18:09 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, emacs-devel, kentaro.nakazawa, dgutov

[-- Attachment #1: Type: text/plain, Size: 2051 bytes --]

Eli Zaretskii <eliz@gnu.org> schrieb am Mi., 30. Nov. 2016 um 19:45 Uhr:

> > From: Philipp Stephani <p.stephani2@gmail.com>
> > Date: Wed, 30 Nov 2016 18:23:14 +0000
> > Cc: dgutov@yandex.ru, kentaro.nakazawa@nifty.com, emacs-devel@gnu.org
> >
> >  > Yes, this is not a json.el problem at all. It does the correct thing,
> >  > and shouldn't be changed.
> >
> >  ??? Why should any code care whether a pure-ASCII string is marked as
> >  unibyte or as multibyte? Both are "correct".
> >
> > I guess the problem is that process-send-string cares. If it didn't, we
> wouldn't have the problem.
>
> I don't think I follow.  The error we are talking about is signaled
> from url-http-create-request, not from process-send-string.
>

Yes, but url-http-create-request only cares about unibyte strings because
the request it creates is passed to process-send-string, which
special-cases unibyte strings.


>
> > For URL, we'd need functions like
> > (byte-array-length s) = (length (string-to-unibyte s))
>
> Why do you need this?  string-to-unibyte is well-defined only for
> unibyte or ASCII strings (if we forget the raw bytes for a moment), so
> length will do.
>

We need it because we have to send the byte length in a header. We can't
just use (length s) because it would silently give a wrong result.


>
> > (process-send-bytes s) = (process-send-string (string-to-unibyte s))
>
> Why is this needed?  process-send-string already encodes its argument,
> which produces a unibyte string.
>

We can't give a multibyte string to process-send-string, because we have to
pass the length in bytes in a header first. Therefore we have to encode any
string before passing it to process-send-string.


>
> > (conceptually; process-send-string also does EOL conversion, which
> should never be done for HTTP
> > bodies.)
>
> I don't understand why.  There are protocols that require CR-LF, no?
>
>
Yes, but HTTP request/response bodies should just be byte arrays and no
conversion whatsoever should happen. After all, the body could be a binary
data format.

[-- Attachment #2: Type: text/html, Size: 3840 bytes --]

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-12-28 18:09                                             ` Philipp Stephani
@ 2016-12-28 18:27                                               ` Eli Zaretskii
  2016-12-28 18:35                                                 ` Philipp Stephani
  0 siblings, 1 reply; 125+ messages in thread
From: Eli Zaretskii @ 2016-12-28 18:27 UTC (permalink / raw)
  To: Philipp Stephani; +Cc: larsi, emacs-devel, kentaro.nakazawa, dgutov

> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Wed, 28 Dec 2016 18:09:52 +0000
> Cc: larsi@gnus.org, dgutov@yandex.ru, kentaro.nakazawa@nifty.com, 
> 	emacs-devel@gnu.org
> 
> 
> [1:text/plain Show]
> 
> 
> [2:text/html Hide Save:noname (9kB)]
> 
> Eli Zaretskii <eliz@gnu.org> schrieb am Mi., 30. Nov. 2016 um 19:45 Uhr:
> 
>  > From: Philipp Stephani <p.stephani2@gmail.com>
>  > Date: Wed, 30 Nov 2016 18:23:14 +0000
>  > Cc: dgutov@yandex.ru, kentaro.nakazawa@nifty.com, emacs-devel@gnu.org
>  >
>  > > Yes, this is not a json.el problem at all. It does the correct thing,
>  > > and shouldn't be changed.
>  >
>  > ??? Why should any code care whether a pure-ASCII string is marked as
>  > unibyte or as multibyte? Both are "correct".
>  >
>  > I guess the problem is that process-send-string cares. If it didn't, we wouldn't have the problem.
> 
>  I don't think I follow. The error we are talking about is signaled
>  from url-http-create-request, not from process-send-string.
> 
> Yes, but url-http-create-request only cares about unibyte strings because the request it creates is passed to
> process-send-string, which special-cases unibyte strings.

How do you see that process-send-string special-cases unibyte strings?

>  > For URL, we'd need functions like
>  > (byte-array-length s) = (length (string-to-unibyte s))
> 
>  Why do you need this? string-to-unibyte is well-defined only for
>  unibyte or ASCII strings (if we forget the raw bytes for a moment), so
>  length will do.
> 
> We need it because we have to send the byte length in a header. We can't just use (length s) because it
> would silently give a wrong result.

We are miscommunicating.  string-to-unibyte can only meaningfully be
called on a pure-ASCII string, and for pure-ASCII strings 'length'
will count bytes.  So I see no need for 'byte-array-length' if its
implementation is as you indicated.

>  > (process-send-bytes s) = (process-send-string (string-to-unibyte s))
> 
>  Why is this needed? process-send-string already encodes its argument,
>  which produces a unibyte string.
> 
> We can't give a multibyte string to process-send-string, because we have to pass the length in bytes in a
> header first. Therefore we have to encode any string before passing it to process-send-string.

Once you encoded the string, why do you need anything except calling
process-send-string?



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-12-28 18:27                                               ` Eli Zaretskii
@ 2016-12-28 18:35                                                 ` Philipp Stephani
  2016-12-28 18:45                                                   ` Eli Zaretskii
  0 siblings, 1 reply; 125+ messages in thread
From: Philipp Stephani @ 2016-12-28 18:35 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, dgutov, kentaro.nakazawa, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 3307 bytes --]

Eli Zaretskii <eliz@gnu.org> schrieb am Mi., 28. Dez. 2016 um 19:28 Uhr:

> > From: Philipp Stephani <p.stephani2@gmail.com>
> > Date: Wed, 28 Dec 2016 18:09:52 +0000
> > Cc: larsi@gnus.org, dgutov@yandex.ru, kentaro.nakazawa@nifty.com,
> >       emacs-devel@gnu.org
> >
> >
> > [1:text/plain Show]
> >
> >
> > [2:text/html Hide Save:noname (9kB)]
> >
> > Eli Zaretskii <eliz@gnu.org> schrieb am Mi., 30. Nov. 2016 um 19:45 Uhr:
> >
> >  > From: Philipp Stephani <p.stephani2@gmail.com>
> >  > Date: Wed, 30 Nov 2016 18:23:14 +0000
> >  > Cc: dgutov@yandex.ru, kentaro.nakazawa@nifty.com, emacs-devel@gnu.org
> >  >
> >  > > Yes, this is not a json.el problem at all. It does the correct
> thing,
> >  > > and shouldn't be changed.
> >  >
> >  > ??? Why should any code care whether a pure-ASCII string is marked as
> >  > unibyte or as multibyte? Both are "correct".
> >  >
> >  > I guess the problem is that process-send-string cares. If it didn't,
> we wouldn't have the problem.
> >
> >  I don't think I follow. The error we are talking about is signaled
> >  from url-http-create-request, not from process-send-string.
> >
> > Yes, but url-http-create-request only cares about unibyte strings
> because the request it creates is passed to
> > process-send-string, which special-cases unibyte strings.
>
> How do you see that process-send-string special-cases unibyte strings?
>

The send_process function has two branches, one for unibyte, one for
multibyte.


>
> >  > For URL, we'd need functions like
> >  > (byte-array-length s) = (length (string-to-unibyte s))
> >
> >  Why do you need this? string-to-unibyte is well-defined only for
> >  unibyte or ASCII strings (if we forget the raw bytes for a moment), so
> >  length will do.
> >
> > We need it because we have to send the byte length in a header. We can't
> just use (length s) because it
> > would silently give a wrong result.
>
> We are miscommunicating.  string-to-unibyte can only meaningfully be
> called on a pure-ASCII string, and for pure-ASCII strings 'length'
> will count bytes.  So I see no need for 'byte-array-length' if its
> implementation is as you indicated.
>

That depends on how you want to represent byte arrays/octet streams in
Emacs. If you want to represent them using unibyte strings, then you indeed
only need `length'. But some earlier messages sounded like you wanted to
represent byte arrays either using unibyte strings or byte-only multibyte
strings. In that case `string-to-unibyte' is necessary.


>
> >  > (process-send-bytes s) = (process-send-string (string-to-unibyte s))
> >
> >  Why is this needed? process-send-string already encodes its argument,
> >  which produces a unibyte string.
> >
> > We can't give a multibyte string to process-send-string, because we have
> to pass the length in bytes in a
> > header first. Therefore we have to encode any string before passing it
> to process-send-string.
>
> Once you encoded the string, why do you need anything except calling
> process-send-string?
>
>
The byte size should be added as a Content-length HTTP header. If
url-request-data is a unibyte string, that's not a problem (except for the
newline conversion behavior in send_string), you can just use `length'. But
if it's a multibyte string, you need to encode first to find the byte
length.

[-- Attachment #2: Type: text/html, Size: 6155 bytes --]

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-12-28 18:35                                                 ` Philipp Stephani
@ 2016-12-28 18:45                                                   ` Eli Zaretskii
  0 siblings, 0 replies; 125+ messages in thread
From: Eli Zaretskii @ 2016-12-28 18:45 UTC (permalink / raw)
  To: Philipp Stephani; +Cc: larsi, dgutov, kentaro.nakazawa, emacs-devel

> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Wed, 28 Dec 2016 18:35:58 +0000
> Cc: larsi@gnus.org, emacs-devel@gnu.org, kentaro.nakazawa@nifty.com, 
> 	dgutov@yandex.ru
> 
>  How do you see that process-send-string special-cases unibyte strings?
> 
> The send_process function has two branches, one for unibyte, one for multibyte.

That's not special-casing.  That's polymorphism, if you like: Emacs
silently does TRT for both.

>  We are miscommunicating. string-to-unibyte can only meaningfully be
>  called on a pure-ASCII string, and for pure-ASCII strings 'length'
>  will count bytes. So I see no need for 'byte-array-length' if its
>  implementation is as you indicated.
> 
> That depends on how you want to represent byte arrays/octet streams in Emacs. If you want to represent
> them using unibyte strings, then you indeed only need `length'. But some earlier messages sounded like you
> wanted to represent byte arrays either using unibyte strings or byte-only multibyte strings. In that case
> `string-to-unibyte' is necessary.

No, it's not.  Multibyte strings that include raw bytes are converted
to single bytes when you encode them.

>  Once you encoded the string, why do you need anything except calling
>  process-send-string?
> 
> The byte size should be added as a Content-length HTTP header. If url-request-data is a unibyte string, that's
> not a problem (except for the newline conversion behavior in send_string), you can just use `length'. But if it's
> a multibyte string, you need to encode first to find the byte length. 

I thought we've just agreed that multibyte strings there should not be
allowed.



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-30 15:48                                     ` Lars Ingebrigtsen
  2016-11-30 16:25                                       ` Eli Zaretskii
@ 2016-12-28 18:22                                       ` Philipp Stephani
  2016-12-28 18:57                                         ` Lars Ingebrigtsen
  1 sibling, 1 reply; 125+ messages in thread
From: Philipp Stephani @ 2016-12-28 18:22 UTC (permalink / raw)
  To: Lars Ingebrigtsen, Dmitry Gutov
  Cc: Eli Zaretskii, kentaro.nakazawa, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 857 bytes --]

Lars Ingebrigtsen <larsi@gnus.org> schrieb am Mi., 30. Nov. 2016 um
16:48 Uhr:

> Dmitry Gutov <dgutov@yandex.ru> writes:
>
> > In json-encode? Should it really deal with that concern explicitly?
> >
> > I could understand an idea along the lines of "use a different
> > algorithm", but calling encode-coding-string inside json-encode sounds
> > odd.
>
> Yes, this is not a json.el problem at all.  It does the correct thing,
> and shouldn't be changed.
>

Agreed. Neither symbol-function nor concat nor the JSON function do
anything wrong here.


>
> It's just url.el being lacking in features, as usual.
>
>
>
I don't think url.el needs to grow features for encoding; after all, Emacs
already has functions for that. I'd rather add an explicit check for
unibyte-ness of url-request-data and document that url-request-data must be
a unibyte string or nil.

[-- Attachment #2: Type: text/html, Size: 1690 bytes --]

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-12-28 18:22                                       ` Philipp Stephani
@ 2016-12-28 18:57                                         ` Lars Ingebrigtsen
  2016-12-30  0:07                                           ` Richard Stallman
  0 siblings, 1 reply; 125+ messages in thread
From: Lars Ingebrigtsen @ 2016-12-28 18:57 UTC (permalink / raw)
  To: Philipp Stephani
  Cc: Eli Zaretskii, emacs-devel, kentaro.nakazawa, Dmitry Gutov

Philipp Stephani <p.stephani2@gmail.com> writes:

> I don't think url.el needs to grow features for encoding; after all, Emacs
> already has functions for that. I'd rather add an explicit check for
> unibyte-ness of url-request-data and document that url-request-data must be
> a unibyte string or nil. 

Nah.  If you want to do something here, just compute the correct length
header (as previously discussed), and virtually all callers will be happy.

I've started working on a `with-url' functionality that'll replace the
current mess.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-12-28 18:57                                         ` Lars Ingebrigtsen
@ 2016-12-30  0:07                                           ` Richard Stallman
  2016-12-30 14:15                                             ` Lars Ingebrigtsen
  0 siblings, 1 reply; 125+ messages in thread
From: Richard Stallman @ 2016-12-30  0:07 UTC (permalink / raw)
  To: Lars Ingebrigtsen
  Cc: p.stephani2, dgutov, kentaro.nakazawa, eliz, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > I've started working on a `with-url' functionality that'll replace the
  > current mess.

The name `with-url' suggests that Emacs has some sort of "current URL",
and that this macro temporarily specifies some particular URL as current.

That's not the case, is it?  So the name `with-url' doesn't fit
what it does.  (What does it do?)

We should change the name to something that fits what it does.

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-12-30  0:07                                           ` Richard Stallman
@ 2016-12-30 14:15                                             ` Lars Ingebrigtsen
  2016-12-30 16:59                                               ` Eli Zaretskii
  2016-12-30 21:38                                               ` Richard Stallman
  0 siblings, 2 replies; 125+ messages in thread
From: Lars Ingebrigtsen @ 2016-12-30 14:15 UTC (permalink / raw)
  To: Richard Stallman; +Cc: p.stephani2, dgutov, kentaro.nakazawa, emacs-devel

Richard Stallman <rms@gnu.org> writes:

> The name `with-url' suggests that Emacs has some sort of "current URL",
> and that this macro temporarily specifies some particular URL as current.
>
> That's not the case, is it?  So the name `with-url' doesn't fit
> what it does.  (What does it do?)

It's like `with-temp-buffer' and it's cousins: It generates a new
buffer, executes the body in that buffer, and kills the buffer when the
form finishes.

The contents of the buffer come from the specified URL, of course.  See
the recent discussion of with-url on emacs-devel.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-12-30 14:15                                             ` Lars Ingebrigtsen
@ 2016-12-30 16:59                                               ` Eli Zaretskii
  2017-01-21 15:39                                                 ` Lars Ingebrigtsen
  2016-12-30 21:38                                               ` Richard Stallman
  1 sibling, 1 reply; 125+ messages in thread
From: Eli Zaretskii @ 2016-12-30 16:59 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: p.stephani2, emacs-devel, kentaro.nakazawa, rms, dgutov

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Date: Fri, 30 Dec 2016 15:15:26 +0100
> Cc: p.stephani2@gmail.com, dgutov@yandex.ru, kentaro.nakazawa@nifty.com,
> 	emacs-devel@gnu.org
> 
> Richard Stallman <rms@gnu.org> writes:
> 
> > The name `with-url' suggests that Emacs has some sort of "current URL",
> > and that this macro temporarily specifies some particular URL as current.
> >
> > That's not the case, is it?  So the name `with-url' doesn't fit
> > what it does.  (What does it do?)
> 
> It's like `with-temp-buffer' and it's cousins: It generates a new
> buffer, executes the body in that buffer, and kills the buffer when the
> form finishes.

How about 'with-fetched-url', then?



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-12-30 16:59                                               ` Eli Zaretskii
@ 2017-01-21 15:39                                                 ` Lars Ingebrigtsen
  2017-01-21 15:56                                                   ` Eli Zaretskii
  0 siblings, 1 reply; 125+ messages in thread
From: Lars Ingebrigtsen @ 2017-01-21 15:39 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: p.stephani2, emacs-devel, kentaro.nakazawa, rms, dgutov

Eli Zaretskii <eliz@gnu.org> writes:

>> It's like `with-temp-buffer' and it's cousins: It generates a new
>> buffer, executes the body in that buffer, and kills the buffer when the
>> form finishes.
>
> How about 'with-fetched-url', then?

Hm...  I'm not sure it gives us more clarity.  It should really be
`with-content-fetched-from-specified-url', but that's a bit long, right?
So I think `with-url' is fine for anybody who's working with these
things.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2017-01-21 15:39                                                 ` Lars Ingebrigtsen
@ 2017-01-21 15:56                                                   ` Eli Zaretskii
  2017-01-21 16:30                                                     ` Lars Ingebrigtsen
  0 siblings, 1 reply; 125+ messages in thread
From: Eli Zaretskii @ 2017-01-21 15:56 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: p.stephani2, dgutov, kentaro.nakazawa, rms, emacs-devel

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Date: Sat, 21 Jan 2017 16:39:12 +0100
> Cc: p.stephani2@gmail.com, emacs-devel@gnu.org, kentaro.nakazawa@nifty.com,
> 	rms@gnu.org, dgutov@yandex.ru
> 
> > How about 'with-fetched-url', then?
> 
> Hm...  I'm not sure it gives us more clarity.  It should really be
> `with-content-fetched-from-specified-url', but that's a bit long, right?
> So I think `with-url' is fine for anybody who's working with these
> things.

Both Richard and myself came up with almost identical comments on
with-url, so I hope you will reconsider.



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2017-01-21 15:56                                                   ` Eli Zaretskii
@ 2017-01-21 16:30                                                     ` Lars Ingebrigtsen
  2017-01-21 22:58                                                       ` Stefan Monnier
  0 siblings, 1 reply; 125+ messages in thread
From: Lars Ingebrigtsen @ 2017-01-21 16:30 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: p.stephani2, dgutov, kentaro.nakazawa, rms, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> Both Richard and myself came up with almost identical comments on
> with-url, so I hope you will reconsider.

Perhaps we could have a vote.  The contenders are `with-url',
`with-fetched-url', `with-url-contents' and
`with-contents-in-a-buffer-fetched-from-somewhere-specified-by-the-following-url'.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2017-01-21 16:30                                                     ` Lars Ingebrigtsen
@ 2017-01-21 22:58                                                       ` Stefan Monnier
  2017-01-24 20:04                                                         ` Lars Ingebrigtsen
  0 siblings, 1 reply; 125+ messages in thread
From: Stefan Monnier @ 2017-01-21 22:58 UTC (permalink / raw)
  To: emacs-devel

>>>>> "Lars" == Lars Ingebrigtsen <larsi@gnus.org> writes:

> Eli Zaretskii <eliz@gnu.org> writes:
>> Both Richard and myself came up with almost identical comments on
>> with-url, so I hope you will reconsider.

> Perhaps we could have a vote.  The contenders are `with-url',
> `with-fetched-url', `with-url-contents' and
> `with-contents-in-a-buffer-fetched-from-somewhere-specified-by-the-following-url'.

I vote against with-url and
with-contents-in-a-buffer-fetched-from-somewhere-specified-by-the-following-url.
The other two seem fine,


        Stefan




^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2017-01-21 22:58                                                       ` Stefan Monnier
@ 2017-01-24 20:04                                                         ` Lars Ingebrigtsen
  2017-01-28  9:52                                                           ` Elias Mårtenson
  0 siblings, 1 reply; 125+ messages in thread
From: Lars Ingebrigtsen @ 2017-01-24 20:04 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> Perhaps we could have a vote.  The contenders are `with-url',
>> `with-fetched-url', `with-url-contents' and
>> `with-contents-in-a-buffer-fetched-from-somewhere-specified-by-the-following-url'.
>
> I vote against with-url and
> with-contents-in-a-buffer-fetched-from-somewhere-specified-by-the-following-url.
> The other two seem fine,

OK, then we have 1 vote for `with-url', 1.5 votes for `with-fetched-url'
and `with-url-contents' each, and zero for
`with-contents-in-a-buffer-fetched-from-somewhere-specified-by-the-following-url'.

The competition is heating up!

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2017-01-24 20:04                                                         ` Lars Ingebrigtsen
@ 2017-01-28  9:52                                                           ` Elias Mårtenson
  2017-01-28 14:16                                                             ` Lars Ingebrigtsen
  0 siblings, 1 reply; 125+ messages in thread
From: Elias Mårtenson @ 2017-01-28  9:52 UTC (permalink / raw)
  To: Lars Magne Ingebrigtsen; +Cc: Stefan Monnier, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 937 bytes --]

Who is allowed to vote? I consider with-url to be less than ideal and not
very clear. with-url-contents is a lot better.

Regards,
Elias


On 25 Jan 2017 4:06 AM, "Lars Ingebrigtsen" <larsi@gnus.org> wrote:

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> Perhaps we could have a vote.  The contenders are `with-url',
>> `with-fetched-url', `with-url-contents' and
>> `with-contents-in-a-buffer-fetched-from-somewhere-
specified-by-the-following-url'.
>
> I vote against with-url and
> with-contents-in-a-buffer-fetched-from-somewhere-
specified-by-the-following-url.
> The other two seem fine,

OK, then we have 1 vote for `with-url', 1.5 votes for `with-fetched-url'
and `with-url-contents' each, and zero for
`with-contents-in-a-buffer-fetched-from-somewhere-
specified-by-the-following-url'.

The competition is heating up!

--
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

[-- Attachment #2: Type: text/html, Size: 1695 bytes --]

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2017-01-28  9:52                                                           ` Elias Mårtenson
@ 2017-01-28 14:16                                                             ` Lars Ingebrigtsen
  0 siblings, 0 replies; 125+ messages in thread
From: Lars Ingebrigtsen @ 2017-01-28 14:16 UTC (permalink / raw)
  To: Elias Mårtenson; +Cc: Stefan Monnier, emacs-devel

Elias Mårtenson <lokedhs@gmail.com> writes:

> Who is allowed to vote? I consider with-url to be less than ideal and not very
> clear. with-url-contents is a lot better. 

OK, `with-url-contents' is now the clear leader here with 2.5 votes!

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-12-30 14:15                                             ` Lars Ingebrigtsen
  2016-12-30 16:59                                               ` Eli Zaretskii
@ 2016-12-30 21:38                                               ` Richard Stallman
  1 sibling, 0 replies; 125+ messages in thread
From: Richard Stallman @ 2016-12-30 21:38 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: p.stephani2, emacs-devel, kentaro.nakazawa, dgutov

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > That's not the case, is it?  So the name `with-url' doesn't fit
  > > what it does.  (What does it do?)

  > It's like `with-temp-buffer' and it's cousins: It generates a new
  > buffer, executes the body in that buffer, and kills the buffer when the
  > form finishes.

It sounds useful, but the name isn't clear.  Let's call it
`with-url-contents'; that fits what it does.

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.




^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-30 15:45                                   ` Dmitry Gutov
  2016-11-30 15:48                                     ` Lars Ingebrigtsen
@ 2016-11-30 16:23                                     ` Eli Zaretskii
  2016-12-01  0:30                                       ` Dmitry Gutov
  1 sibling, 1 reply; 125+ messages in thread
From: Eli Zaretskii @ 2016-11-30 16:23 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: p.stephani2, emacs-devel, larsi, kentaro.nakazawa

> Cc: p.stephani2@gmail.com, larsi@gnus.org, kentaro.nakazawa@nifty.com,
>  emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Wed, 30 Nov 2016 17:45:25 +0200
> 
> On 30.11.2016 17:42, Eli Zaretskii wrote:
> 
> >> json-encode uses the previously mentioned symbol-name, which returns
> >> multibyte values. What would we do about that?
> >
> > Check that the value returned by symbol-name is pure-ASCII, and if so,
> > make it unibyte?
> 
> In json-encode? Should it really deal with that concern explicitly?

Since both the original issue and this one are at least indirectly
caused by jason.el, it might make sense.

> I could understand an idea along the lines of "use a different 
> algorithm", but calling encode-coding-string inside json-encode sounds odd.

I didn't mean encode-coding-string, I meant string-make-unibyte, which
for a pure-ASCII string doesn't touch the contents.



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-30 16:23                                     ` Eli Zaretskii
@ 2016-12-01  0:30                                       ` Dmitry Gutov
  2016-12-01 17:17                                         ` Eli Zaretskii
  2016-12-28 18:25                                         ` Philipp Stephani
  0 siblings, 2 replies; 125+ messages in thread
From: Dmitry Gutov @ 2016-12-01  0:30 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: p.stephani2, emacs-devel, larsi, kentaro.nakazawa

On 30.11.2016 18:23, Eli Zaretskii wrote:

> Since both the original issue and this one are at least indirectly
> caused by jason.el, it might make sense.

Triggered, more like. JSON is a frequently-used format, but there are 
others. And same problems will remain when e.g. plain text is used.

> I didn't mean encode-coding-string, I meant string-make-unibyte, which
> for a pure-ASCII string doesn't touch the contents.

Either way, I don't think it's a great idea. Quite the opposite: by 
allowing the programmer to avoid calling `encode-coding-string' in more 
cases, we'll just make the problem in their code harder to find, until 
some user of that code really does need to transfer multibyte content.

Further, now that Emacs 25 is out, and we are allowed to have more 
breaking changes in Emacs 26, I think we should change the check at the 
end of url-http-create-request to just use multibyte-string-p.

Barring some unforeseen consequences, this will solidify the requirement 
that the caller need to deal with encoding explicitly in all cases, 
before passing the request body to the transport level.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-12-01  0:30                                       ` Dmitry Gutov
@ 2016-12-01 17:17                                         ` Eli Zaretskii
  2016-12-02 13:18                                           ` Dmitry Gutov
  2016-12-28 18:25                                         ` Philipp Stephani
  1 sibling, 1 reply; 125+ messages in thread
From: Eli Zaretskii @ 2016-12-01 17:17 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: p.stephani2, emacs-devel, larsi, kentaro.nakazawa

> Cc: p.stephani2@gmail.com, larsi@gnus.org, kentaro.nakazawa@nifty.com,
>  emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Thu, 1 Dec 2016 02:30:15 +0200
> 
> On 30.11.2016 18:23, Eli Zaretskii wrote:
> 
> > Since both the original issue and this one are at least indirectly
> > caused by jason.el, it might make sense.
> 
> Triggered, more like.

Nothing wrong with that.  If some issue isn't a bug, but gets in the
way of a broad class of applications, it is okay to silently DTRT for
that class only, in some central place that serves the class.

> Either way, I don't think it's a great idea. Quite the opposite: by 
> allowing the programmer to avoid calling `encode-coding-string' in more 
> cases, we'll just make the problem in their code harder to find, until 
> some user of that code really does need to transfer multibyte content.

I don't think we will win any hearts by nagging application
programmers when we could silently DTRT ourselves.

> Further, now that Emacs 25 is out, and we are allowed to have more 
> breaking changes in Emacs 26, I think we should change the check at the 
> end of url-http-create-request to just use multibyte-string-p.
> 
> Barring some unforeseen consequences, this will solidify the requirement 
> that the caller need to deal with encoding explicitly in all cases, 
> before passing the request body to the transport level.

Can you show me a patch to that effect, or point me to where it was
posted in the past?  I'm afraid I no longer remember those details.

Thanks.



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-12-01 17:17                                         ` Eli Zaretskii
@ 2016-12-02 13:18                                           ` Dmitry Gutov
  2016-12-02 14:24                                             ` Eli Zaretskii
  2016-12-02 15:29                                             ` Lars Ingebrigtsen
  0 siblings, 2 replies; 125+ messages in thread
From: Dmitry Gutov @ 2016-12-02 13:18 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: p.stephani2, emacs-devel, larsi, kentaro.nakazawa

On 01.12.2016 19:17, Eli Zaretskii wrote:

> Nothing wrong with that.  If some issue isn't a bug, but gets in the
> way of a broad class of applications,

I don't think it's useful to extract applications that use JSON+HTTP 
with ASCII-only payloads into a separate class.

Most of the time (or at least very often) it depends on the user, what 
kind of payload gets sent (with multibyte characters or not).

> it is okay to silently DTRT for
> that class only, in some central place that serves the class.

Those central places are coding.c and url/url-*.el. Not sure what can be 
done there, though.

> I don't think we will win any hearts by nagging application
> programmers when we could silently DTRT ourselves.

We can win the hearts of some users, long term, by making the API such 
that it's harder to do the wrong thing.

You yourself suggested multibyte-string-p originally, and I suggested 
the current more permissive approach more or less because that the new 
release was very close:

https://debbugs.gnu.org/cgi/bugreport.cgi?bug=23750#83

> Can you show me a patch to that effect, or point me to where it was
> posted in the past?  I'm afraid I no longer remember those details.

Something like this:

diff --git a/lisp/url/url-http.el b/lisp/url/url-http.el
index e0e080e..affd5c2 100644
--- a/lisp/url/url-http.el
+++ b/lisp/url/url-http.el
@@ -358,9 +358,8 @@ url-http-create-request
               ;; Any data
               url-http-data))
      ;; Bug#23750
-    (unless (= (string-bytes request)
-               (length request))
-      (error "Multibyte text in HTTP request: %s" request))
+    (when (mutibyte-string-p request)
+      (error "Multibyte text in HTTP request: %s, please translate any 
multibyte components to unibyte using `encode-coding-string'" request))
      (url-http-debug "Request is: \n%s" request)
      request))





^ permalink raw reply related	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-12-02 13:18                                           ` Dmitry Gutov
@ 2016-12-02 14:24                                             ` Eli Zaretskii
  2016-12-02 14:35                                               ` Dmitry Gutov
  2016-12-02 14:53                                               ` Yuri Khan
  2016-12-02 15:29                                             ` Lars Ingebrigtsen
  1 sibling, 2 replies; 125+ messages in thread
From: Eli Zaretskii @ 2016-12-02 14:24 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: p.stephani2, emacs-devel, larsi, kentaro.nakazawa

> Cc: p.stephani2@gmail.com, larsi@gnus.org, kentaro.nakazawa@nifty.com,
>  emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Fri, 2 Dec 2016 15:18:48 +0200
> 
> > it is okay to silently DTRT for
> > that class only, in some central place that serves the class.
> 
> Those central places are coding.c and url/url-*.el.

That's not what I meant (and coding.c is definitely not the place),
but let's leave this alone.

> diff --git a/lisp/url/url-http.el b/lisp/url/url-http.el
> index e0e080e..affd5c2 100644
> --- a/lisp/url/url-http.el
> +++ b/lisp/url/url-http.el
> @@ -358,9 +358,8 @@ url-http-create-request
>                ;; Any data
>                url-http-data))
>       ;; Bug#23750
> -    (unless (= (string-bytes request)
> -               (length request))
> -      (error "Multibyte text in HTTP request: %s" request))
> +    (when (mutibyte-string-p request)
> +      (error "Multibyte text in HTTP request: %s, please translate any 
> multibyte components to unibyte using `encode-coding-string'" request))
>       (url-http-debug "Request is: \n%s" request)
>       request))

This will also reject pure-ASCII strings that just happen to be
multibyte, although there will be no problem with such an HTTP
request.  Do we really want to disallow that use case?



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-12-02 14:24                                             ` Eli Zaretskii
@ 2016-12-02 14:35                                               ` Dmitry Gutov
  2016-12-02 15:20                                                 ` Eli Zaretskii
  2016-12-02 14:53                                               ` Yuri Khan
  1 sibling, 1 reply; 125+ messages in thread
From: Dmitry Gutov @ 2016-12-02 14:35 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: p.stephani2, emacs-devel, larsi, kentaro.nakazawa

On 02.12.2016 16:24, Eli Zaretskii wrote:

> This will also reject pure-ASCII strings that just happen to be
> multibyte, although there will be no problem with such an HTTP
> request.  Do we really want to disallow that use case?

That's the whole point of the patch. I think I've explained why in the 
previous message.



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-12-02 14:35                                               ` Dmitry Gutov
@ 2016-12-02 15:20                                                 ` Eli Zaretskii
  0 siblings, 0 replies; 125+ messages in thread
From: Eli Zaretskii @ 2016-12-02 15:20 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: p.stephani2, emacs-devel, larsi, kentaro.nakazawa

> Cc: p.stephani2@gmail.com, larsi@gnus.org, kentaro.nakazawa@nifty.com,
>  emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Fri, 2 Dec 2016 16:35:32 +0200
> 
> On 02.12.2016 16:24, Eli Zaretskii wrote:
> 
> > This will also reject pure-ASCII strings that just happen to be
> > multibyte, although there will be no problem with such an HTTP
> > request.  Do we really want to disallow that use case?
> 
> That's the whole point of the patch. I think I've explained why in the 
> previous message.

Fine, let's try.

Thanks.



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-12-02 14:24                                             ` Eli Zaretskii
  2016-12-02 14:35                                               ` Dmitry Gutov
@ 2016-12-02 14:53                                               ` Yuri Khan
  2016-12-02 15:45                                                 ` Eli Zaretskii
  2016-12-02 15:51                                                 ` Lars Ingebrigtsen
  1 sibling, 2 replies; 125+ messages in thread
From: Yuri Khan @ 2016-12-02 14:53 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: Philipp Stephani, Emacs developers, kentaro.nakazawa,
	Lars Magne Ingebrigtsen, Dmitry Gutov

On Fri, Dec 2, 2016 at 9:24 PM, Eli Zaretskii <eliz@gnu.org> wrote:

>> +    (when (mutibyte-string-p request)
>> +      (error "Multibyte text in HTTP request: %s, please translate any
>> multibyte components to unibyte using `encode-coding-string'" request))
>>       (url-http-debug "Request is: \n%s" request)
>>       request))
>
> This will also reject pure-ASCII strings that just happen to be
> multibyte, although there will be no problem with such an HTTP
> request.  Do we really want to disallow that use case?

It is really unfortunate that we talk about ASCII strings, unibyte
strings, multibyte strings, as if that was a meaningful
classification.

The real dichotomy is between text (aka strings) and MIME-type-tagged
byte arrays. In order to send a string over HTTP, one must encode it
to a byte array and tag it as "text/plain; charset=utf-8" or
"text/html; charset=utf-8" or application/json (no charset parameter
because json must always be encoded in one of utf-* for transmission).
Conversely, a byte array received over HTTP can, MIME type allowing,
decoded into a string.

The fact that there exist strings for which encoding and decoding are
identity transforms should be regarded only as an implementation
detail. Attempts by libraries and frameworks to silently DTRT for this
subset lead to applications neglecting to properly encode or tag
strings, leading, in turn, to breakage in presence of multilingual
text.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-12-02 14:53                                               ` Yuri Khan
@ 2016-12-02 15:45                                                 ` Eli Zaretskii
  2016-12-02 15:51                                                 ` Lars Ingebrigtsen
  1 sibling, 0 replies; 125+ messages in thread
From: Eli Zaretskii @ 2016-12-02 15:45 UTC (permalink / raw)
  To: Yuri Khan; +Cc: p.stephani2, emacs-devel, kentaro.nakazawa, larsi, dgutov

> From: Yuri Khan <yuri.v.khan@gmail.com>
> Date: Fri, 2 Dec 2016 21:53:16 +0700
> Cc: Dmitry Gutov <dgutov@yandex.ru>, Philipp Stephani <p.stephani2@gmail.com>, 
> It is really unfortunate that we talk about ASCII strings, unibyte
> strings, multibyte strings, as if that was a meaningful
> classification.

It is meaningful when you work on Emacs code.

> The real dichotomy is between text (aka strings) and MIME-type-tagged
> byte arrays.

That might be so in the context of HTTP, but in general, byte arrays
("raw bytes" in Emacs parlance) are not limited to MIME types.
Moreover, there are very frequent use cases where Emacs code needs to
work with a byte array whose type is unknown, or even cannot be known
at all, because it doesn't come with any meta-data of any kind.

> In order to send a string over HTTP, one must encode it
> to a byte array and tag it as "text/plain; charset=utf-8" or
> "text/html; charset=utf-8" or application/json (no charset parameter
> because json must always be encoded in one of utf-* for transmission).
> Conversely, a byte array received over HTTP can, MIME type allowing,
> decoded into a string.
> 
> The fact that there exist strings for which encoding and decoding are
> identity transforms should be regarded only as an implementation
> detail.

You are talking generalities here, whereas this discussion is about
Emacs-specific internal issues.  In Emacs, a plain-ASCII string is
indistinguishable from a "byte array" whose bytes are all below 128.
They have the same representation.  To muddy the water even more, a
plain-ASCII string can be "marked" as multibyte (again, internally),
but it should be clear that such a "mark" has no meaning at all for
ASCII text.

From the Lisp application POV, whether a plain-ASCII string it
receives or processes is marked as unibyte or multibyte is entirely
random.  So if some ASCII text is accepted by an Emacs API involved in
sending HTTP requests, while an identical ASCII string is rejected,
it could be a source of surprises and bug reports.

That is the core of the issues discussed here.

> Attempts by libraries and frameworks to silently DTRT for this
> subset lead to applications neglecting to properly encode or tag
> strings, leading, in turn, to breakage in presence of multilingual
> text.

Based on Emacs experience of dealing with multibyte text and its
encoding/decoding, the conclusion was that it is better to silently
DTRT where we can be sure we know how.  Making a point of educating
users by harsh measures such as signaling errors where Emacs could
easily proceed, is generally not welcome.  We will see if this case is
any different.

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-12-02 14:53                                               ` Yuri Khan
  2016-12-02 15:45                                                 ` Eli Zaretskii
@ 2016-12-02 15:51                                                 ` Lars Ingebrigtsen
  2016-12-02 15:58                                                   ` Eli Zaretskii
  1 sibling, 1 reply; 125+ messages in thread
From: Lars Ingebrigtsen @ 2016-12-02 15:51 UTC (permalink / raw)
  To: Yuri Khan
  Cc: Eli Zaretskii, Dmitry Gutov, kentaro.nakazawa, Philipp Stephani,
	Emacs developers

Yuri Khan <yuri.v.khan@gmail.com> writes:

> The real dichotomy is between text (aka strings) and MIME-type-tagged
> byte arrays.

To nit-pick (this is emacs-devel, after all): "Byte array" isn't very
meaningful, either.  The standards talk about octet streams.  :-)

But you're right, of course: This function has a string-based interface,
which is pretty meaningless, since no protocols (well, extremely few)
deal with characters -- they only deal with octet streams.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-12-02 15:51                                                 ` Lars Ingebrigtsen
@ 2016-12-02 15:58                                                   ` Eli Zaretskii
  0 siblings, 0 replies; 125+ messages in thread
From: Eli Zaretskii @ 2016-12-02 15:58 UTC (permalink / raw)
  To: Lars Ingebrigtsen
  Cc: p.stephani2, emacs-devel, dgutov, kentaro.nakazawa, yuri.v.khan

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: Eli Zaretskii <eliz@gnu.org>,  Philipp Stephani <p.stephani2@gmail.com>,  Emacs developers <emacs-devel@gnu.org>,  kentaro.nakazawa@nifty.com,  Dmitry Gutov <dgutov@yandex.ru>
> Date: Fri, 02 Dec 2016 16:51:28 +0100
> 
> But you're right, of course: This function has a string-based interface,
> which is pretty meaningless, since no protocols (well, extremely few)
> deal with characters -- they only deal with octet streams.

The Emacs implementation of an octet stream is a unibyte string.



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-12-02 13:18                                           ` Dmitry Gutov
  2016-12-02 14:24                                             ` Eli Zaretskii
@ 2016-12-02 15:29                                             ` Lars Ingebrigtsen
  2016-12-02 15:32                                               ` Dmitry Gutov
  1 sibling, 1 reply; 125+ messages in thread
From: Lars Ingebrigtsen @ 2016-12-02 15:29 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Eli Zaretskii, kentaro.nakazawa, p.stephani2, emacs-devel

Dmitry Gutov <dgutov@yandex.ru> writes:

> -    (unless (= (string-bytes request)
> -               (length request))
> -      (error "Multibyte text in HTTP request: %s" request))
> +    (when (mutibyte-string-p request)
> +      (error "Multibyte text in HTTP request: %s, please translate

This is going to break many current callers.  Most people aren't doing
anything as weird as trying to transmit non-ASCII text via any of these
headers (it's a very uncommon thing to do), but are just passing in
normal Emacs strings (containing nothing by ASCII, as is proper).

These will all fail if you do this, for no real gain.

Sorry to keep harping on about this, but the current url-* interface is
inadequate.  We should leave it be and move on to create a new,
well-defined url-fetching interface.

I hope to get time to do that during my next holiday, which should be in
February.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-12-02 15:29                                             ` Lars Ingebrigtsen
@ 2016-12-02 15:32                                               ` Dmitry Gutov
  2016-12-02 15:48                                                 ` Lars Ingebrigtsen
  0 siblings, 1 reply; 125+ messages in thread
From: Dmitry Gutov @ 2016-12-02 15:32 UTC (permalink / raw)
  To: Lars Ingebrigtsen
  Cc: Eli Zaretskii, kentaro.nakazawa, p.stephani2, emacs-devel

On 02.12.2016 17:29, Lars Ingebrigtsen wrote:

> This is going to break many current callers.  Most people aren't doing
> anything as weird as trying to transmit non-ASCII text via any of these
> headers (it's a very uncommon thing to do), but are just passing in
> normal Emacs strings (containing nothing by ASCII, as is proper).

Do you have some examples?

> These will all fail if you do this, for no real gain.

That's debatable.

> Sorry to keep harping on about this, but the current url-* interface is
> inadequate.  We should leave it be and move on to create a new,
> well-defined url-fetching interface.

I'm sure a well-defined interface will need to have a required 
"encoding" step, or an argument somewhere, at least.



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-12-02 15:32                                               ` Dmitry Gutov
@ 2016-12-02 15:48                                                 ` Lars Ingebrigtsen
  2016-12-02 15:56                                                   ` Dmitry Gutov
  0 siblings, 1 reply; 125+ messages in thread
From: Lars Ingebrigtsen @ 2016-12-02 15:48 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: p.stephani2, Eli Zaretskii, kentaro.nakazawa, emacs-devel

Dmitry Gutov <dgutov@yandex.ru> writes:

> On 02.12.2016 17:29, Lars Ingebrigtsen wrote:
>
>> This is going to break many current callers.  Most people aren't doing
>> anything as weird as trying to transmit non-ASCII text via any of these
>> headers (it's a very uncommon thing to do), but are just passing in
>> normal Emacs strings (containing nothing by ASCII, as is proper).
>
> Do you have some examples?

(multibyte-string-p (symbol-name 'a))
=> t

> I'm sure a well-defined interface will need to have a required
> "encoding" step, or an argument somewhere, at least.

Yes, of course.  The interface will allow the caller to specify the
charset of the data.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-12-02 15:48                                                 ` Lars Ingebrigtsen
@ 2016-12-02 15:56                                                   ` Dmitry Gutov
  2016-12-02 16:02                                                     ` Lars Ingebrigtsen
  0 siblings, 1 reply; 125+ messages in thread
From: Dmitry Gutov @ 2016-12-02 15:56 UTC (permalink / raw)
  To: Lars Ingebrigtsen
  Cc: p.stephani2, Eli Zaretskii, kentaro.nakazawa, emacs-devel

On 02.12.2016 17:48, Lars Ingebrigtsen wrote:
> Dmitry Gutov <dgutov@yandex.ru> writes:
>
>> On 02.12.2016 17:29, Lars Ingebrigtsen wrote:
>>
>>> This is going to break many current callers.  Most people aren't doing
>>> anything as weird as trying to transmit non-ASCII text via any of these
>>> headers (it's a very uncommon thing to do), but are just passing in
>>> normal Emacs strings (containing nothing by ASCII, as is proper).
>>
>> Do you have some examples?
>
> (multibyte-string-p (symbol-name 'a))
> => t

Examples of things "most people" are doing "trying to transmit" "nothing 
but ASCII" using the URL package, please.

>> I'm sure a well-defined interface will need to have a required
>> "encoding" step, or an argument somewhere, at least.
>
> Yes, of course.  The interface will allow the caller to specify the
> charset of the data.

And at least make it clear that the parameter with default to UTF-8, right?



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-12-02 15:56                                                   ` Dmitry Gutov
@ 2016-12-02 16:02                                                     ` Lars Ingebrigtsen
  2016-12-02 16:06                                                       ` Dmitry Gutov
  0 siblings, 1 reply; 125+ messages in thread
From: Lars Ingebrigtsen @ 2016-12-02 16:02 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: p.stephani2, Eli Zaretskii, kentaro.nakazawa, emacs-devel

Dmitry Gutov <dgutov@yandex.ru> writes:

> Examples of things "most people" are doing "trying to transmit"
> "nothing but ASCII" using the URL package, please.

I'm not sure what you want an example of.  That most people try to
transmit nothing but ASCII?  That they may end up with multibyte ASCII
strings without having "meaning" to (because it should make no
difference)?

The first thing is trivially true, and the second I think is also pretty
much self-evident:

(multibyte-string-p (buffer-substring (point) (- (point) 10)))
=> t

> And at least make it clear that the parameter with default to UTF-8, right?

Of course.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-12-02 16:02                                                     ` Lars Ingebrigtsen
@ 2016-12-02 16:06                                                       ` Dmitry Gutov
  2016-12-02 16:31                                                         ` Lars Ingebrigtsen
  0 siblings, 1 reply; 125+ messages in thread
From: Dmitry Gutov @ 2016-12-02 16:06 UTC (permalink / raw)
  To: Lars Ingebrigtsen
  Cc: p.stephani2, Eli Zaretskii, kentaro.nakazawa, emacs-devel

On 02.12.2016 18:02, Lars Ingebrigtsen wrote:

>> Examples of things "most people" are doing "trying to transmit"
>> "nothing but ASCII" using the URL package, please.
>
> I'm not sure what you want an example of.  That most people try to
> transmit nothing but ASCII?

Yes.

> That they may end up with multibyte ASCII
> strings without having "meaning" to (because it should make no
> difference)?

> The first thing is trivially true, and the second I think is also pretty
> much self-evident:
>
> (multibyte-string-p (buffer-substring (point) (- (point) 10)))
> => t

It's absolutely not a given that most applications or libraries that 
people write with Elisp will end up sending ASCII-only text.

Especially if those applications are then available publicly for other 
people to use.



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-12-02 16:06                                                       ` Dmitry Gutov
@ 2016-12-02 16:31                                                         ` Lars Ingebrigtsen
  2016-12-02 23:13                                                           ` Dmitry Gutov
  0 siblings, 1 reply; 125+ messages in thread
From: Lars Ingebrigtsen @ 2016-12-02 16:31 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: p.stephani2, Eli Zaretskii, kentaro.nakazawa, emacs-devel

Dmitry Gutov <dgutov@yandex.ru> writes:

>> I'm not sure what you want an example of.  That most people try to
>> transmit nothing but ASCII?
>
> Yes.

Normal web applications require that you URL-encode (or similar) any
data you send to them.  These encodings are ASCII only.

Here's a typical example of how this is used:

   (let ((url-request-method "POST")
	 (url-request-extra-headers
	  (list (cons "Content-Type"
		      (concat "multipart/form-data; boundary="
			      boundary))))
	 (url-request-data
	  (mm-url-encode-multipart-form-data values boundary)))

The output from mm-url-encode-multipart-form-data is ASCII, and is
typically multibyte.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-12-02 16:31                                                         ` Lars Ingebrigtsen
@ 2016-12-02 23:13                                                           ` Dmitry Gutov
  2016-12-03  0:37                                                             ` Lars Ingebrigtsen
  0 siblings, 1 reply; 125+ messages in thread
From: Dmitry Gutov @ 2016-12-02 23:13 UTC (permalink / raw)
  To: Lars Ingebrigtsen
  Cc: p.stephani2, Eli Zaretskii, kentaro.nakazawa, emacs-devel

On 02.12.2016 18:31, Lars Ingebrigtsen wrote:

> Normal web applications require that you URL-encode (or similar) any
> data you send to them.  These encodings are ASCII only.
>
> Here's a typical example of how this is used:
>
>    (let ((url-request-method "POST")
> 	 (url-request-extra-headers
> 	  (list (cons "Content-Type"
> 		      (concat "multipart/form-data; boundary="
> 			      boundary))))
> 	 (url-request-data
> 	  (mm-url-encode-multipart-form-data values boundary)))

Thanks!

> The output from mm-url-encode-multipart-form-data is ASCII, and is
> typically multibyte.

If we make the proposed change, this function will violate the contract 
on url-request-data (if the described above is its main use case).

Luckily, this function is part of Emacs, so we can fix it in the same patch.



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-12-02 23:13                                                           ` Dmitry Gutov
@ 2016-12-03  0:37                                                             ` Lars Ingebrigtsen
  2016-12-03  1:27                                                               ` Dmitry Gutov
  2016-12-03  8:12                                                               ` Eli Zaretskii
  0 siblings, 2 replies; 125+ messages in thread
From: Lars Ingebrigtsen @ 2016-12-03  0:37 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: p.stephani2, emacs-devel, Eli Zaretskii, kentaro.nakazawa

Dmitry Gutov <dgutov@yandex.ru> writes:

> If we make the proposed change, this function will violate the
> contract on url-request-data (if the described above is its main use
> case).
>
> Luckily, this function is part of Emacs, so we can fix it in the same patch.

I'm sorry, I'm not sure how to respond to this without making
accusations of a bad faith response on your part.

This is a function will an ill-defined interface, but virtually all
callers here understand what the interface is ("don't put anything into
the body that isn't ASCII").  Even if wonkily defined, this works for
virtually all callers, in-tree or not.

You're proposing a change that would make virtually all these usages of
this (ill-defined) function fail.

The real fix for this extremely obscure problem is 1) to remove the
`error' call you introduced in Emacs 25.1, and 2) make the
Content-Length header reflect the number of octets transferred instead
of the number of bytes in the URL string.  This would have moved the
number of successful calls to `url-retrieve' from (I'm guesstimating)
99.9995% to 99.999995%, and people who wanted to send iso8859-1 text to
web servers would still fail.  But these people are pretty rare.

Your proposal would move the number of successful calls to
`url-retrieve' with a body to around 0%.

At this point I'm not sure what else to say.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-12-03  0:37                                                             ` Lars Ingebrigtsen
@ 2016-12-03  1:27                                                               ` Dmitry Gutov
  2016-12-03  8:12                                                               ` Eli Zaretskii
  1 sibling, 0 replies; 125+ messages in thread
From: Dmitry Gutov @ 2016-12-03  1:27 UTC (permalink / raw)
  To: Lars Ingebrigtsen
  Cc: p.stephani2, emacs-devel, Eli Zaretskii, kentaro.nakazawa

On 03.12.2016 02:37, Lars Ingebrigtsen wrote:

> I'm sorry, I'm not sure how to respond to this without making
> accusations of a bad faith response on your part.

All I'm trying to do here is to introduce a more meaningful, stronger 
typing. See Yuri's comment on why that can be important.

I don't really know if the benefits really outweigh the inconvenience, 
but the only example you gave so far can be trivially solved from our side.

That leaves clients that perform "url encoding" manually using their own 
code, but there might be none of them, for all I know.

IME, JSON encoding is more popular than that, and those users are 
affected already.

> This is a function will an ill-defined interface, but virtually all
> callers here understand what the interface is ("don't put anything into
> the body that isn't ASCII").  Even if wonkily defined, this works for
> virtually all callers, in-tree or not.

> You're proposing a change that would make virtually all these usages of
> this (ill-defined) function fail.

True.

> The real fix for this extremely obscure problem is 1) to remove the
> `error' call you introduced in Emacs 25.1, and 2) make the
> Content-Length header reflect the number of octets transferred instead
> of the number of bytes in the URL string.  This would have moved the
> number of successful calls to `url-retrieve' from (I'm guesstimating)
> 99.9995% to 99.999995%, and people who wanted to send iso8859-1 text to
> web servers would still fail.  But these people are pretty rare.
>
> Your proposal would move the number of successful calls to
> `url-retrieve' with a body to around 0%.

Not true. All current users of json.el, at least, who have updated their 
code for Emacs 25, won't be affected. And I imagine they represent a 
significant fraction of `url-retrieve' users.



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-12-03  0:37                                                             ` Lars Ingebrigtsen
  2016-12-03  1:27                                                               ` Dmitry Gutov
@ 2016-12-03  8:12                                                               ` Eli Zaretskii
  2016-12-03 10:01                                                                 ` Lars Ingebrigtsen
  1 sibling, 1 reply; 125+ messages in thread
From: Eli Zaretskii @ 2016-12-03  8:12 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: p.stephani2, emacs-devel, kentaro.nakazawa, dgutov

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: p.stephani2@gmail.com,  Eli Zaretskii <eliz@gnu.org>,  kentaro.nakazawa@nifty.com,  emacs-devel@gnu.org
> Date: Sat, 03 Dec 2016 01:37:19 +0100
> 
> I'm sorry, I'm not sure how to respond to this without making
> accusations of a bad faith response on your part.

Please don't.  There's no bad faith on anyone's side here.

> make the Content-Length header reflect the number of octets
> transferred instead of the number of bytes in the URL string.

How do you propose to compute the number of transferred octets, given
that the URL request payload is a string?



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-12-03  8:12                                                               ` Eli Zaretskii
@ 2016-12-03 10:01                                                                 ` Lars Ingebrigtsen
  2016-12-03 16:00                                                                   ` Stefan Monnier
  0 siblings, 1 reply; 125+ messages in thread
From: Lars Ingebrigtsen @ 2016-12-03 10:01 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: p.stephani2, emacs-devel, kentaro.nakazawa, dgutov

Eli Zaretskii <eliz@gnu.org> writes:

> How do you propose to compute the number of transferred octets, given
> that the URL request payload is a string?

Just use `string-bytes' instead of `length'.  This happens to work since
almost all web services expect utf-8, and our strings happen to be
utf-8, too.  (The few callers that are sending a different charset
already presumably know to encode their data, or their applications
would be failing already.)

Yes, it's yucky, but this is an ill-defined function.  And we should
emphasise backwards compatibility instead of breaking people's code, I
think.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-12-03 10:01                                                                 ` Lars Ingebrigtsen
@ 2016-12-03 16:00                                                                   ` Stefan Monnier
  2016-12-03 20:01                                                                     ` Lars Ingebrigtsen
  0 siblings, 1 reply; 125+ messages in thread
From: Stefan Monnier @ 2016-12-03 16:00 UTC (permalink / raw)
  To: emacs-devel

> Just use `string-bytes' instead of `length'.

IIRC the problem with that is if the string is the result of
concatenating a unibyte and a multibyte string, in which case the string
may only contain bytes (and hence `length` gives the right result) yet
`string-bytes` and `length` will return different results (because the
≥128 bytes are encoded as 2 bytes in the multibyte representation).

        Stefan

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-12-03 16:00                                                                   ` Stefan Monnier
@ 2016-12-03 20:01                                                                     ` Lars Ingebrigtsen
  2016-12-03 20:57                                                                       ` Andreas Schwab
  0 siblings, 1 reply; 125+ messages in thread
From: Lars Ingebrigtsen @ 2016-12-03 20:01 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> IIRC the problem with that is if the string is the result of
> concatenating a unibyte and a multibyte string, in which case the string
> may only contain bytes (and hence `length` gives the right result) yet
> `string-bytes` and `length` will return different results (because the
> ≥128 bytes are encoded as 2 bytes in the multibyte representation).

Hm...  I see...  I think...  :-)

Can `string-bytes' return a different number than

(with-temp-buffer
  (set-buffer-multibyte nil)
  (insert string)
  (buffer-size))

?

In any case, this latter is what we want, because those are the octets
that will be transmitted to the server.  Unless there's another
subtlety I'm not aware of, which seems likely.  :-)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-12-03 20:01                                                                     ` Lars Ingebrigtsen
@ 2016-12-03 20:57                                                                       ` Andreas Schwab
  0 siblings, 0 replies; 125+ messages in thread
From: Andreas Schwab @ 2016-12-03 20:57 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: Stefan Monnier, emacs-devel

On Dez 03 2016, Lars Ingebrigtsen <larsi@gnus.org> wrote:

> Can `string-bytes' return a different number than
>
> (with-temp-buffer
>   (set-buffer-multibyte nil)
>   (insert string)
>   (buffer-size))
>
> ?

ELISP> (string-bytes "\200")
1 (#o1, #x1, ?\C-a)
ELISP> (string-bytes (string-make-multibyte "\200"))
2 (#o2, #x2, ?\C-b)
ELISP> (let ((string "\200")) (with-temp-buffer
  (set-buffer-multibyte nil)
  (insert string)
  (buffer-size)))
1 (#o1, #x1, ?\C-a)
ELISP> (let ((string (string-make-multibyte "\200"))) (with-temp-buffer
  (set-buffer-multibyte nil)
  (insert string)
  (buffer-size)))
1 (#o1, #x1, ?\C-a)
ELISP> 

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-12-01  0:30                                       ` Dmitry Gutov
  2016-12-01 17:17                                         ` Eli Zaretskii
@ 2016-12-28 18:25                                         ` Philipp Stephani
  1 sibling, 0 replies; 125+ messages in thread
From: Philipp Stephani @ 2016-12-28 18:25 UTC (permalink / raw)
  To: Dmitry Gutov, Eli Zaretskii; +Cc: larsi, kentaro.nakazawa, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 406 bytes --]

Dmitry Gutov <dgutov@yandex.ru> schrieb am Do., 1. Dez. 2016 um 01:30 Uhr:

>
> Further, now that Emacs 25 is out, and we are allowed to have more
> breaking changes in Emacs 26, I think we should change the check at the
> end of url-http-create-request to just use multibyte-string-p.
>
>
I think that's a good idea. (The check should also be moved to the front
and documented, but those are minor nits.)

[-- Attachment #2: Type: text/html, Size: 785 bytes --]

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-29 23:09                         ` Philipp Stephani
  2016-11-29 23:18                           ` Philipp Stephani
  2016-11-30  0:16                           ` Dmitry Gutov
@ 2016-11-30 15:06                           ` Eli Zaretskii
  2016-11-30 15:31                             ` Stefan Monnier
  2 siblings, 1 reply; 125+ messages in thread
From: Eli Zaretskii @ 2016-11-30 15:06 UTC (permalink / raw)
  To: Philipp Stephani; +Cc: larsi, emacs-devel, kentaro.nakazawa, dgutov

> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Tue, 29 Nov 2016 23:09:57 +0000
> Cc: larsi@gnus.org, kentaro.nakazawa@nifty.com, emacs-devel@gnu.org
> 
>  > json-encode returns a multibyte string.
> 
>  Any idea why? 
> 
> Because (symbol-name 'false) returns a multibyte string. I guess the ultimate reason is that the reader always
> creates multibyte strings for symbol names.

I'm not sure I understand how symbol-name comes into play here.  Can
you help me understand this?

>  Is it again that 'concat' misfeature, when one of the
>  strings is pure-ASCII, but happens to be multibyte?
> 
> Why is it a misfeature?

Because a pure-ASCII string doesn't need to be multibyte, it's only
becomes that by accident.  The net results is that this misfeature
gets in the way when you want to produce a unibyte string by
concatenating an encoded string and some ASCII text.

> I'd expect a concatenation of multibyte and unibyte strings to either implicitly upgrade
> to as multibyte string (as in Python 2) or raise a signal (as in Python 3).

But when all the strings are either unibyte or pure-ASCII, we could
produce a unibyte string without losing anything.



^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
  2016-11-30 15:06                           ` Eli Zaretskii
@ 2016-11-30 15:31                             ` Stefan Monnier
  0 siblings, 0 replies; 125+ messages in thread
From: Stefan Monnier @ 2016-11-30 15:31 UTC (permalink / raw)
  To: emacs-devel

> But when all the strings are either unibyte or pure-ASCII, we could
> produce a unibyte string without losing anything.

Actually, technically, if we take a multibyte string which only contains
pure-ASCII and convert it to unibyte, we lose information: with
a multibyte string, we can compare the `size` and the `size_byte`
fields, and if they're equal we know we have a pure-ASCII string,
whereas with a unibyte string, we'd have to scan the whole string
looking for a byte >= 128 to determine that it's pure-ASCII.

So maybe the change should be that when concat has to combine a unibyte
string and a multibyte string, it should first look to see if the
multibyte string has `size == size_byte` and if so, generate
a unibyte string.

        Stefan

^ permalink raw reply	[flat|nested] 125+ messages in thread

end of thread, other threads:[~2017-01-28 14:16 UTC | newest]

Thread overview: 125+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-12  2:22 bug#23750: 25.0.95; bug in url-retrieve or json.el Leo Liu
2016-06-13 15:02 ` Dmitry Gutov
2016-06-13 17:55   ` Stefan Monnier
2016-06-13 19:26     ` Dmitry Gutov
2016-06-14  0:30       ` Stefan Monnier
2016-06-19 18:14         ` Dmitry Gutov
2016-06-19 18:25           ` Eli Zaretskii
2016-06-19 18:30             ` John Wiegley
2016-06-19 18:45               ` Dmitry Gutov
2016-06-19 19:56                 ` John Wiegley
2016-06-19 20:05                   ` Dmitry Gutov
2016-06-19 21:07                     ` John Wiegley
2016-06-20  1:28                       ` Glenn Morris
2016-06-20  4:22                         ` John Wiegley
2016-06-20 12:39                           ` Lars Ingebrigtsen
2016-07-01 20:49                             ` John Wiegley
2016-06-20 14:42                           ` Eli Zaretskii
2016-06-23 17:14                           ` Glenn Morris
2016-06-20  1:26                   ` Glenn Morris
2016-06-20  2:58                   ` Dmitry Gutov
2016-06-19 18:36             ` Dmitry Gutov
2016-06-20  0:15               ` Leo Liu
2016-06-20 14:39                 ` Eli Zaretskii
2016-06-20  2:40               ` Eli Zaretskii
2016-06-20  2:51                 ` Dmitry Gutov
2016-06-20 14:38                   ` Eli Zaretskii
2016-06-20 14:54                     ` Dmitry Gutov
2016-06-20 15:03                       ` Eli Zaretskii
2016-06-20 17:16                     ` Dmitry Gutov
2016-06-20 20:17                       ` Eli Zaretskii
2016-06-20 20:27                         ` Dmitry Gutov
2016-06-21  2:30                           ` Eli Zaretskii
2016-06-21 13:51                             ` Dmitry Gutov
2016-06-21 15:18                               ` Eli Zaretskii
2016-06-22  1:08                                 ` John Wiegley
2016-06-22  2:36                                   ` Eli Zaretskii
2016-06-22 18:21                                   ` Dmitry Gutov
  -- strict thread matches above, loose matches on Subject: below --
2016-11-29  8:22 Kentaro NAKAZAWA
2016-11-29  9:54 ` Andreas Schwab
2016-11-29 10:06   ` Kentaro NAKAZAWA
2016-11-29 10:08     ` Dmitry Gutov
2016-11-29 10:23       ` Kentaro NAKAZAWA
2016-11-29 10:34         ` Lars Ingebrigtsen
2016-11-29 10:38           ` Kentaro NAKAZAWA
2016-11-29 10:42             ` Lars Ingebrigtsen
2016-11-29 10:48               ` Kentaro NAKAZAWA
2016-11-29 10:49               ` Dmitry Gutov
2016-11-29 10:50             ` Dmitry Gutov
2016-11-29 10:55               ` Kentaro NAKAZAWA
2016-11-29 10:59                 ` Dmitry Gutov
2016-11-29 11:03                   ` Kentaro NAKAZAWA
2016-11-29 11:05                     ` Dmitry Gutov
2016-11-29 11:12                       ` Kentaro NAKAZAWA
2016-11-29 17:23                       ` Eli Zaretskii
2016-11-29 23:09                         ` Philipp Stephani
2016-11-29 23:18                           ` Philipp Stephani
2016-11-30 15:11                             ` Eli Zaretskii
2016-11-30 15:20                               ` Lars Ingebrigtsen
2016-11-30 15:43                                 ` Eli Zaretskii
2016-11-30 15:46                                   ` Lars Ingebrigtsen
2016-11-30  0:16                           ` Dmitry Gutov
2016-11-30 15:13                             ` Eli Zaretskii
2016-11-30 15:17                               ` Dmitry Gutov
2016-11-30 15:32                                 ` Stefan Monnier
2016-11-30 15:42                                 ` Eli Zaretskii
2016-11-30 15:45                                   ` Dmitry Gutov
2016-11-30 15:48                                     ` Lars Ingebrigtsen
2016-11-30 16:25                                       ` Eli Zaretskii
2016-11-30 16:27                                         ` Lars Ingebrigtsen
2016-11-30 16:42                                           ` Eli Zaretskii
2016-11-30 18:25                                             ` Philipp Stephani
2016-11-30 18:48                                               ` Eli Zaretskii
2016-12-28 18:18                                                 ` Philipp Stephani
2016-12-28 18:34                                                   ` Eli Zaretskii
2016-12-28 18:45                                                     ` Philipp Stephani
2016-12-28 18:55                                                       ` Eli Zaretskii
2016-12-28 19:03                                                       ` Andreas Schwab
2016-11-30 18:23                                         ` Philipp Stephani
2016-11-30 18:44                                           ` Eli Zaretskii
2016-12-28 18:09                                             ` Philipp Stephani
2016-12-28 18:27                                               ` Eli Zaretskii
2016-12-28 18:35                                                 ` Philipp Stephani
2016-12-28 18:45                                                   ` Eli Zaretskii
2016-12-28 18:22                                       ` Philipp Stephani
2016-12-28 18:57                                         ` Lars Ingebrigtsen
2016-12-30  0:07                                           ` Richard Stallman
2016-12-30 14:15                                             ` Lars Ingebrigtsen
2016-12-30 16:59                                               ` Eli Zaretskii
2017-01-21 15:39                                                 ` Lars Ingebrigtsen
2017-01-21 15:56                                                   ` Eli Zaretskii
2017-01-21 16:30                                                     ` Lars Ingebrigtsen
2017-01-21 22:58                                                       ` Stefan Monnier
2017-01-24 20:04                                                         ` Lars Ingebrigtsen
2017-01-28  9:52                                                           ` Elias Mårtenson
2017-01-28 14:16                                                             ` Lars Ingebrigtsen
2016-12-30 21:38                                               ` Richard Stallman
2016-11-30 16:23                                     ` Eli Zaretskii
2016-12-01  0:30                                       ` Dmitry Gutov
2016-12-01 17:17                                         ` Eli Zaretskii
2016-12-02 13:18                                           ` Dmitry Gutov
2016-12-02 14:24                                             ` Eli Zaretskii
2016-12-02 14:35                                               ` Dmitry Gutov
2016-12-02 15:20                                                 ` Eli Zaretskii
2016-12-02 14:53                                               ` Yuri Khan
2016-12-02 15:45                                                 ` Eli Zaretskii
2016-12-02 15:51                                                 ` Lars Ingebrigtsen
2016-12-02 15:58                                                   ` Eli Zaretskii
2016-12-02 15:29                                             ` Lars Ingebrigtsen
2016-12-02 15:32                                               ` Dmitry Gutov
2016-12-02 15:48                                                 ` Lars Ingebrigtsen
2016-12-02 15:56                                                   ` Dmitry Gutov
2016-12-02 16:02                                                     ` Lars Ingebrigtsen
2016-12-02 16:06                                                       ` Dmitry Gutov
2016-12-02 16:31                                                         ` Lars Ingebrigtsen
2016-12-02 23:13                                                           ` Dmitry Gutov
2016-12-03  0:37                                                             ` Lars Ingebrigtsen
2016-12-03  1:27                                                               ` Dmitry Gutov
2016-12-03  8:12                                                               ` Eli Zaretskii
2016-12-03 10:01                                                                 ` Lars Ingebrigtsen
2016-12-03 16:00                                                                   ` Stefan Monnier
2016-12-03 20:01                                                                     ` Lars Ingebrigtsen
2016-12-03 20:57                                                                       ` Andreas Schwab
2016-12-28 18:25                                         ` Philipp Stephani
2016-11-30 15:06                           ` Eli Zaretskii
2016-11-30 15:31                             ` Stefan Monnier

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.