unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* [PATCH] url: Wrap cookie headers in url-http--encode-string.
@ 2016-09-07 15:30 Toke Høiland-Jørgensen
  2016-09-07 16:40 ` Stefan Monnier
  2016-09-07 19:14 ` [PATCH] url: Wrap cookie headers in url-http--encode-string Lars Ingebrigtsen
  0 siblings, 2 replies; 42+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-09-07 15:30 UTC (permalink / raw)
  To: emacs-devel; +Cc: Toke Høiland-Jørgensen

In some cases the output of url-cookie-generate-header can be multibyte,
which will trip the length check that was added at the end of
url-http-create-request. This seems to happen only when there's a
request body which contains UTF-8-encoded non-ASCII characters.

Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
---
 lisp/url/url-http.el | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/lisp/url/url-http.el b/lisp/url/url-http.el
index 927d0bb..81bb9b4 100644
--- a/lisp/url/url-http.el
+++ b/lisp/url/url-http.el
@@ -366,9 +366,10 @@ The string is based on `url-privacy-level' and `url-user-agent'."
              auth
              ;; Cookies
 	     (when (url-use-cookies url-http-target-url)
-	       (url-cookie-generate-header-lines
-		host real-fname
-		(equal "https" (url-type url-http-target-url))))
+               (url-http--encode-string
+                (url-cookie-generate-header-lines
+                 host real-fname
+                 (equal "https" (url-type url-http-target-url)))))
              ;; If-modified-since
              (if (and (not no-cache)
                       (member url-http-method '("GET" nil)))
-- 
2.9.3



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH] url: Wrap cookie headers in url-http--encode-string.
  2016-09-07 15:30 [PATCH] url: Wrap cookie headers in url-http--encode-string Toke Høiland-Jørgensen
@ 2016-09-07 16:40 ` Stefan Monnier
  2016-09-07 16:52   ` Toke Høiland-Jørgensen
  2016-09-07 19:14 ` [PATCH] url: Wrap cookie headers in url-http--encode-string Lars Ingebrigtsen
  1 sibling, 1 reply; 42+ messages in thread
From: Stefan Monnier @ 2016-09-07 16:40 UTC (permalink / raw)
  To: emacs-devel

> In some cases the output of url-cookie-generate-header can be multibyte,

Hmmm... I think this is a problem in itself.  So rather than encode the
result of url-cookie-generate-header-lines, we should change
url-cookie-generate-header-lines so that it always returns
a unibyte string.


        Stefan




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] url: Wrap cookie headers in url-http--encode-string.
  2016-09-07 16:40 ` Stefan Monnier
@ 2016-09-07 16:52   ` Toke Høiland-Jørgensen
  2016-09-07 17:15     ` Eli Zaretskii
  0 siblings, 1 reply; 42+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-09-07 16:52 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> In some cases the output of url-cookie-generate-header can be multibyte,
>
> Hmmm... I think this is a problem in itself. So rather than encode the
> result of url-cookie-generate-header-lines, we should change
> url-cookie-generate-header-lines so that it always returns a unibyte
> string.

Right, well, I'm not actually sure that there's any multibyte *content*
in the cookie header. But somehow, without this patch, if there's an
UTF-8 payload, (string-bytes request) returns an extra byte for each
UTF-8 octet. I have no idea why that is; basically, what I would see was:

(string-bytes url-http-data) == (length url-http-data)

but

(string-bytes (substring request (* -1 (length (url-http-data))))) being
two bytes longer (for a single UTF-8 character in the payload).

-Toke



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] url: Wrap cookie headers in url-http--encode-string.
  2016-09-07 16:52   ` Toke Høiland-Jørgensen
@ 2016-09-07 17:15     ` Eli Zaretskii
  2016-09-07 18:25       ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 42+ messages in thread
From: Eli Zaretskii @ 2016-09-07 17:15 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: monnier, emacs-devel

> From: Toke Høiland-Jørgensen <toke@toke.dk>
> Date: Wed, 07 Sep 2016 18:52:15 +0200
> Cc: emacs-devel@gnu.org
> 
> Right, well, I'm not actually sure that there's any multibyte *content*
> in the cookie header. But somehow, without this patch, if there's an
> UTF-8 payload, (string-bytes request) returns an extra byte for each
> UTF-8 octet. I have no idea why that is; basically, what I would see was:
> 
> (string-bytes url-http-data) == (length url-http-data)
> 
> but
> 
> (string-bytes (substring request (* -1 (length (url-http-data))))) being
> two bytes longer (for a single UTF-8 character in the payload).

Can you show a test case where this happens?



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] url: Wrap cookie headers in url-http--encode-string.
  2016-09-07 17:15     ` Eli Zaretskii
@ 2016-09-07 18:25       ` Toke Høiland-Jørgensen
  2016-09-08 14:06         ` Dmitry Gutov
  0 siblings, 1 reply; 42+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-09-07 18:25 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: monnier, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Toke Høiland-Jørgensen <toke@toke.dk>
>> Date: Wed, 07 Sep 2016 18:52:15 +0200
>> Cc: emacs-devel@gnu.org
>> 
>> Right, well, I'm not actually sure that there's any multibyte *content*
>> in the cookie header. But somehow, without this patch, if there's an
>> UTF-8 payload, (string-bytes request) returns an extra byte for each
>> UTF-8 octet. I have no idea why that is; basically, what I would see was:
>> 
>> (string-bytes url-http-data) == (length url-http-data)
>> 
>> but
>> 
>> (string-bytes (substring request (* -1 (length (url-http-data))))) being
>> two bytes longer (for a single UTF-8 character in the payload).
>
> Can you show a test case where this happens?

Appears to be any request with an UTF-8 encoded payload that goes
to a host that has cookies stored.

This snippet triggers the error for me if I replace example.org with a
domain that has cookies stored in ~/.emacs.d/url/cookies:

(let* ((url-request-data (encode-coding-string "æøå" 'utf-8)))
       (url-retrieve-synchronously "http://example.org"))

-Toke



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] url: Wrap cookie headers in url-http--encode-string.
  2016-09-07 15:30 [PATCH] url: Wrap cookie headers in url-http--encode-string Toke Høiland-Jørgensen
  2016-09-07 16:40 ` Stefan Monnier
@ 2016-09-07 19:14 ` Lars Ingebrigtsen
  2016-09-07 20:49   ` Toke Høiland-Jørgensen
  2016-09-08  2:47   ` Eli Zaretskii
  1 sibling, 2 replies; 42+ messages in thread
From: Lars Ingebrigtsen @ 2016-09-07 19:14 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: emacs-devel

Toke Høiland-Jørgensen <toke@toke.dk> writes:

> In some cases the output of url-cookie-generate-header can be multibyte,
> which will trip the length check that was added at the end of
> url-http-create-request.

I think that test should be removed so that Emacs 25.1 can be released
(with the same level of workingness that that function has in Emacs
24.4).

It was a mistake to add that test to Emacs 25.

And then these things can be worked on for the next release.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] url: Wrap cookie headers in url-http--encode-string.
  2016-09-07 19:14 ` [PATCH] url: Wrap cookie headers in url-http--encode-string Lars Ingebrigtsen
@ 2016-09-07 20:49   ` Toke Høiland-Jørgensen
  2016-09-08  2:47   ` Eli Zaretskii
  1 sibling, 0 replies; 42+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-09-07 20:49 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: emacs-devel

Lars Ingebrigtsen <larsi@gnus.org> writes:

> Toke Høiland-Jørgensen <toke@toke.dk> writes:
>
>> In some cases the output of url-cookie-generate-header can be multibyte,
>> which will trip the length check that was added at the end of
>> url-http-create-request.
>
> I think that test should be removed so that Emacs 25.1 can be released
> (with the same level of workingness that that function has in Emacs
> 24.4).
>
> It was a mistake to add that test to Emacs 25.
>
> And then these things can be worked on for the next release.

Fine with me :)

-Toke



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] url: Wrap cookie headers in url-http--encode-string.
  2016-09-07 19:14 ` [PATCH] url: Wrap cookie headers in url-http--encode-string Lars Ingebrigtsen
  2016-09-07 20:49   ` Toke Høiland-Jørgensen
@ 2016-09-08  2:47   ` Eli Zaretskii
  2016-09-08  9:07     ` Lars Ingebrigtsen
  1 sibling, 1 reply; 42+ messages in thread
From: Eli Zaretskii @ 2016-09-08  2:47 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: toke, emacs-devel

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Date: Wed, 07 Sep 2016 21:14:10 +0200
> Cc: emacs-devel@gnu.org
> 
> I think that test should be removed so that Emacs 25.1 can be released
> (with the same level of workingness that that function has in Emacs
> 24.4).

That makes very little sense to me, as it will re-introduce the
problems which caused us to add the test.  And it's a code change, so
it will require at least the same amount of testing that whatever fix
we come up with for this use case.  Sounds like lose-lose to me.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] url: Wrap cookie headers in url-http--encode-string.
  2016-09-08  2:47   ` Eli Zaretskii
@ 2016-09-08  9:07     ` Lars Ingebrigtsen
  2016-09-08 17:23       ` Eli Zaretskii
  0 siblings, 1 reply; 42+ messages in thread
From: Lars Ingebrigtsen @ 2016-09-08  9:07 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: toke, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> That makes very little sense to me, as it will re-introduce the
> problems which caused us to add the test.  And it's a code change, so
> it will require at least the same amount of testing that whatever fix
> we come up with for this use case.  Sounds like lose-lose to me.

The problems were pretty obscure, since most servers don't care much
about what the length header says.  And removing the error-out has a
very small chance of breaking anything, to put it mildly.

And we have no idea how many of these unencoded parameters remain to be
encoded (because of the rather wonky url calling conventions), so we
could be adding these encoding patches piecemeal to Emacs 25.1 pretests
for the next couple of years (at the current going rate of one per
month) while waiting to find them all.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] url: Wrap cookie headers in url-http--encode-string.
  2016-09-07 18:25       ` Toke Høiland-Jørgensen
@ 2016-09-08 14:06         ` Dmitry Gutov
  2016-09-08 14:14           ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 42+ messages in thread
From: Dmitry Gutov @ 2016-09-08 14:06 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Eli Zaretskii; +Cc: monnier, emacs-devel

On 07.09.2016 21:25, Toke Høiland-Jørgensen wrote:

> Appears to be any request with an UTF-8 encoded payload that goes
> to a host that has cookies stored.
>
> This snippet triggers the error for me if I replace example.org with a
> domain that has cookies stored in ~/.emacs.d/url/cookies:
>
> (let* ((url-request-data (encode-coding-string "æøå" 'utf-8)))
>        (url-retrieve-synchronously "http://example.org"))

What are the contents of ~/.emacs.d/url/cookies in this example? Without 
them, the test case is hard to reproduce.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] url: Wrap cookie headers in url-http--encode-string.
  2016-09-08 14:06         ` Dmitry Gutov
@ 2016-09-08 14:14           ` Toke Høiland-Jørgensen
  2016-09-08 14:25             ` Dmitry Gutov
  0 siblings, 1 reply; 42+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-09-08 14:14 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Eli Zaretskii, monnier, emacs-devel

Dmitry Gutov <dgutov@yandex.ru> writes:

> On 07.09.2016 21:25, Toke Høiland-Jørgensen wrote:
>
>> Appears to be any request with an UTF-8 encoded payload that goes
>> to a host that has cookies stored.
>>
>> This snippet triggers the error for me if I replace example.org with a
>> domain that has cookies stored in ~/.emacs.d/url/cookies:
>>
>> (let* ((url-request-data (encode-coding-string "æøå" 'utf-8)))
>>        (url-retrieve-synchronously "http://example.org"))
>
> What are the contents of ~/.emacs.d/url/cookies in this example? Without them,
> the test case is hard to reproduce.

I believe you can generate that by first retriving a site that will set
a cookie (and making sure cookie support is enabled). At runtime they
are stored in url-cookie-storage or url-cookie-secure-storage.

-Toke



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] url: Wrap cookie headers in url-http--encode-string.
  2016-09-08 14:14           ` Toke Høiland-Jørgensen
@ 2016-09-08 14:25             ` Dmitry Gutov
  2016-09-08 15:58               ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 42+ messages in thread
From: Dmitry Gutov @ 2016-09-08 14:25 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: Eli Zaretskii, monnier, emacs-devel

On 08.09.2016 17:14, Toke Høiland-Jørgensen wrote:

> I believe you can generate that by first retriving a site that will set
> a cookie (and making sure cookie support is enabled). At runtime they
> are stored in url-cookie-storage or url-cookie-secure-storage.

Could you post the full recipe?




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] url: Wrap cookie headers in url-http--encode-string.
  2016-09-08 14:25             ` Dmitry Gutov
@ 2016-09-08 15:58               ` Toke Høiland-Jørgensen
  2016-09-08 17:20                 ` Eli Zaretskii
                                   ` (2 more replies)
  0 siblings, 3 replies; 42+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-09-08 15:58 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Eli Zaretskii, monnier, emacs-devel

Dmitry Gutov <dgutov@yandex.ru> writes:

> On 08.09.2016 17:14, Toke Høiland-Jørgensen wrote:
>
>> I believe you can generate that by first retriving a site that will set
>> a cookie (and making sure cookie support is enabled). At runtime they
>> are stored in url-cookie-storage or url-cookie-secure-storage.
>
> Could you post the full recipe?

(url-retrieve-synchronously "http://google.se") ; sets a cookie
(let* ((url-request-data (encode-coding-string "æøå" 'utf-8)))
       (url-retrieve-synchronously "http://google.se")) ; crashes



-Toke



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] url: Wrap cookie headers in url-http--encode-string.
  2016-09-08 15:58               ` Toke Høiland-Jørgensen
@ 2016-09-08 17:20                 ` Eli Zaretskii
  2016-09-08 17:43                   ` Toke Høiland-Jørgensen
  2016-09-08 17:47                   ` Stefan Monnier
  2016-09-09 14:56                 ` Alain Schneble
  2016-09-09 18:02                 ` Alain Schneble
  2 siblings, 2 replies; 42+ messages in thread
From: Eli Zaretskii @ 2016-09-08 17:20 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: emacs-devel, monnier, dgutov

> From: Toke Høiland-Jørgensen <toke@toke.dk>
> Cc: Eli Zaretskii <eliz@gnu.org>, monnier@iro.umontreal.ca, emacs-devel@gnu.org
> Date: Thu, 08 Sep 2016 17:58:31 +0200
> 
> (url-retrieve-synchronously "http://google.se") ; sets a cookie
> (let* ((url-request-data (encode-coding-string "æøå" 'utf-8)))
>        (url-retrieve-synchronously "http://google.se")) ; crashes

Can the cookies file include non-ASCII text?  E.g., could the domain
be non-ASCII?

If that file is guaranteed to be pure ASCII, we can indeed force the
string returned by url-cookie-generate-header to be unibyte.
Otherwise we need to encode it.

Btw, if the cookies file can indeed include non-ASCII characters, IMO
url-cookie needs to take more care of forcing specific encoding when
it reads and writes that file, otherwise this is a problem waiting to
happen.  But we can solve this on master, if needed; the urgent thing
is to fix the release branch for this particular use case.

Thanks.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] url: Wrap cookie headers in url-http--encode-string.
  2016-09-08  9:07     ` Lars Ingebrigtsen
@ 2016-09-08 17:23       ` Eli Zaretskii
  0 siblings, 0 replies; 42+ messages in thread
From: Eli Zaretskii @ 2016-09-08 17:23 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: toke, emacs-devel

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: toke@toke.dk,  emacs-devel@gnu.org
> Date: Thu, 08 Sep 2016 11:07:17 +0200
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > That makes very little sense to me, as it will re-introduce the
> > problems which caused us to add the test.  And it's a code change, so
> > it will require at least the same amount of testing that whatever fix
> > we come up with for this use case.  Sounds like lose-lose to me.
> 
> The problems were pretty obscure, since most servers don't care much
> about what the length header says.  And removing the error-out has a
> very small chance of breaking anything, to put it mildly.

It sounds like you don't consider this issue serious enough to be
solved, in Emacs 25.1 or at all.  If so, I certainly disagree, and we
already decided we want to fix this.  So undoing that decision now
would be a step backwards for no good reason.

> And we have no idea how many of these unencoded parameters remain to be
> encoded (because of the rather wonky url calling conventions), so we
> could be adding these encoding patches piecemeal to Emacs 25.1 pretests
> for the next couple of years (at the current going rate of one per
> month) while waiting to find them all.

If we think other parts of the request can produce similar problems,
we can encode them all now.  At the time, the consensus was that this
couldn't happen, but if we made a mistake, it can be easily and safely
fixed.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] url: Wrap cookie headers in url-http--encode-string.
  2016-09-08 17:20                 ` Eli Zaretskii
@ 2016-09-08 17:43                   ` Toke Høiland-Jørgensen
  2016-09-08 18:01                     ` Eli Zaretskii
  2016-09-08 17:47                   ` Stefan Monnier
  1 sibling, 1 reply; 42+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-09-08 17:43 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, monnier, dgutov

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Toke Høiland-Jørgensen <toke@toke.dk>
>> Cc: Eli Zaretskii <eliz@gnu.org>, monnier@iro.umontreal.ca, emacs-devel@gnu.org
>> Date: Thu, 08 Sep 2016 17:58:31 +0200
>> 
>> (url-retrieve-synchronously "http://google.se") ; sets a cookie
>> (let* ((url-request-data (encode-coding-string "æøå" 'utf-8)))
>>        (url-retrieve-synchronously "http://google.se")) ; crashes
>
> Can the cookies file include non-ASCII text?  E.g., could the domain
> be non-ASCII?

>From glancing at the code, it seems those are the non-puny-coded
hostnames that are stored in that file. But that doesn't really matter,
as those are only lookup variables in an the array. The question is
whether the cookie values themselves can be. As for that, well...:

http://stackoverflow.com/questions/1969232/allowed-characters-in-cookies

My reading of that is in most cases they will be. But really, it depends
on what the url code itself accepts; of which I have no idea.

-Toke



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] url: Wrap cookie headers in url-http--encode-string.
  2016-09-08 17:20                 ` Eli Zaretskii
  2016-09-08 17:43                   ` Toke Høiland-Jørgensen
@ 2016-09-08 17:47                   ` Stefan Monnier
  2016-09-08 18:04                     ` Eli Zaretskii
  1 sibling, 1 reply; 42+ messages in thread
From: Stefan Monnier @ 2016-09-08 17:47 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Toke Høiland-Jørgensen, emacs-devel, dgutov

> If that file is guaranteed to be pure ASCII, we can indeed force the
> string returned by url-cookie-generate-header to be unibyte.
> Otherwise we need to encode it.

AFAIK the cookie data is "sequence of bytes", i.e. not "pure ASCII".

And yes the domain names may include non-ASCII characters, so we have to
encode those somehow (unless the URL library already encodes them for
us before url-cookie receives them).


        Stefan



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] url: Wrap cookie headers in url-http--encode-string.
  2016-09-08 17:43                   ` Toke Høiland-Jørgensen
@ 2016-09-08 18:01                     ` Eli Zaretskii
  0 siblings, 0 replies; 42+ messages in thread
From: Eli Zaretskii @ 2016-09-08 18:01 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: emacs-devel, monnier, dgutov

> From: Toke Høiland-Jørgensen <toke@toke.dk>
> Cc: dgutov@yandex.ru,  monnier@iro.umontreal.ca,  emacs-devel@gnu.org
> Date: Thu, 08 Sep 2016 19:43:41 +0200
> 
> > Can the cookies file include non-ASCII text?  E.g., could the domain
> > be non-ASCII?
> 
> >From glancing at the code, it seems those are the non-puny-coded
> hostnames that are stored in that file. But that doesn't really matter,
> as those are only lookup variables in an the array.

According to this:

  https://en.wikipedia.org/wiki/HTTP_cookie

a cookie response could include "Domain=example.com", which I read to
mean the domain can appear in the response for a cookie.  Am I
mistaken?

> The question is whether the cookie values themselves can be.

The same Wikipedia article says no:

  The value of a cookie may consist of any printable ASCII character
  (! through ~, Unicode \u0021 through \u007E) excluding , and ; and
  whitespace characters. The name of a cookie excludes the same
  characters, as well as =, since that is the delimiter between the
  name and value. The cookie standard RFC 2965 is more restrictive but
  not implemented by browsers.

So AFAIU, the only problem is the domain name (and maybe also Path).



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] url: Wrap cookie headers in url-http--encode-string.
  2016-09-08 17:47                   ` Stefan Monnier
@ 2016-09-08 18:04                     ` Eli Zaretskii
  2016-09-08 20:29                       ` Alain Schneble
  0 siblings, 1 reply; 42+ messages in thread
From: Eli Zaretskii @ 2016-09-08 18:04 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: toke, emacs-devel, dgutov

> From: Stefan Monnier <monnier@IRO.UMontreal.CA>
> Cc: Toke Høiland-Jørgensen <toke@toke.dk>,
>         dgutov@yandex.ru, emacs-devel@gnu.org
> Date: Thu, 08 Sep 2016 13:47:46 -0400
> 
> > If that file is guaranteed to be pure ASCII, we can indeed force the
> > string returned by url-cookie-generate-header to be unibyte.
> > Otherwise we need to encode it.
> 
> AFAIK the cookie data is "sequence of bytes", i.e. not "pure ASCII".

Not according to my reading of the Wikipedia, but maybe that is
inaccurate.

> And yes the domain names may include non-ASCII characters, so we have to
> encode those somehow (unless the URL library already encodes them for
> us before url-cookie receives them).

I meant the domain that is recorded in the cookies file.  It is read
with 'load', which AFAIU assumes UTF-8 in the absence of a coding
cookie.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] url: Wrap cookie headers in url-http--encode-string.
  2016-09-08 18:04                     ` Eli Zaretskii
@ 2016-09-08 20:29                       ` Alain Schneble
  2016-09-09  7:57                         ` Eli Zaretskii
  0 siblings, 1 reply; 42+ messages in thread
From: Alain Schneble @ 2016-09-08 20:29 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: toke, dgutov, Stefan Monnier, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Stefan Monnier <monnier@IRO.UMontreal.CA>
>> Cc: Toke Høiland-Jørgensen <toke@toke.dk>,
>>         dgutov@yandex.ru, emacs-devel@gnu.org
>> Date: Thu, 08 Sep 2016 13:47:46 -0400
>> 
>> > If that file is guaranteed to be pure ASCII, we can indeed force the
>> > string returned by url-cookie-generate-header to be unibyte.
>> > Otherwise we need to encode it.
>> 
>> AFAIK the cookie data is "sequence of bytes", i.e. not "pure ASCII".
>
> Not according to my reading of the Wikipedia, but maybe that is
> inaccurate.

FWIW, Mozilla refers to RFC6265
https://tools.ietf.org/html/rfc6265#section-5.4:

Bottom of Page 26 says:

   NOTE: Despite its name, the cookie-string is actually a sequence of
   octets, not a sequence of characters.  To convert the cookie-string
   (or components thereof) into a sequence of characters (e.g., for
   presentation to the user), the user agent might wish to try using the
   UTF-8 character encoding [RFC3629] to decode the octet sequence.
   This decoding might fail, however, because not every sequence of
   octets is valid UTF-8.

See also
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cookie.




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] url: Wrap cookie headers in url-http--encode-string.
  2016-09-08 20:29                       ` Alain Schneble
@ 2016-09-09  7:57                         ` Eli Zaretskii
  0 siblings, 0 replies; 42+ messages in thread
From: Eli Zaretskii @ 2016-09-09  7:57 UTC (permalink / raw)
  To: Alain Schneble; +Cc: toke, emacs-devel, monnier, dgutov

> From: Alain Schneble <a.s@realize.ch>
> CC: Stefan Monnier <monnier@IRO.UMontreal.CA>, <toke@toke.dk>,
> 	<emacs-devel@gnu.org>, <dgutov@yandex.ru>
> Date: Thu, 8 Sep 2016 22:29:47 +0200
> 
> FWIW, Mozilla refers to RFC6265
> https://tools.ietf.org/html/rfc6265#section-5.4:
> 
> Bottom of Page 26 says:
> 
>    NOTE: Despite its name, the cookie-string is actually a sequence of
>    octets, not a sequence of characters.  To convert the cookie-string
>    (or components thereof) into a sequence of characters (e.g., for
>    presentation to the user), the user agent might wish to try using the
>    UTF-8 character encoding [RFC3629] to decode the octet sequence.
>    This decoding might fail, however, because not every sequence of
>    octets is valid UTF-8.
> 
> See also
> https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cookie.

If the cookies file can include non-ASCII characters, then IMO Stefan
is right, and url-cookie-generate-header should produce a unibyte
string consisting of UTF-8 sequences.

However, I think for the emacs-25 branch, it would be safer to encode
the cookie header explicitly in url-http-create-request; we could make
the change in url-cookie-generate-header on master.  Any objections?

P.S. We should decide about the change on emacs-25 quickly, because
the current plan calls for the final 25.1 release tarball on Sep 12.

Thanks.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] url: Wrap cookie headers in url-http--encode-string.
  2016-09-08 15:58               ` Toke Høiland-Jørgensen
  2016-09-08 17:20                 ` Eli Zaretskii
@ 2016-09-09 14:56                 ` Alain Schneble
  2016-09-09 15:04                   ` Eli Zaretskii
  2016-09-09 15:06                   ` Stefan Monnier
  2016-09-09 18:02                 ` Alain Schneble
  2 siblings, 2 replies; 42+ messages in thread
From: Alain Schneble @ 2016-09-09 14:56 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Eli Zaretskii, emacs-devel, monnier, Dmitry Gutov

Toke Høiland-Jørgensen <toke@toke.dk> writes:

> Dmitry Gutov <dgutov@yandex.ru> writes:
>>
>> Could you post the full recipe?
>
> (url-retrieve-synchronously "http://google.se") ; sets a cookie
> (let* ((url-request-data (encode-coding-string "æøå" 'utf-8)))
>        (url-retrieve-synchronously "http://google.se")) ; crashes

I was able to reproduce it but am a bit confused, since it doesn't
signal an error when message-body "æøå" is replaced by "abc", while
reusing the same cookie.

I tried to track it down with the following example. `cookie-val' is the
value of the cookie-string:

(string-bytes cookie-val)
=> 131
(string-bytes (encode-coding-string "æøå" 'utf-8))
=> 6
(string-bytes (concat (encode-coding-string "æøå" 'utf-8) cookie-val))
=> 143 ' why?
(string-bytes (concat (string-as-unibyte "abc") ans-cookie-val))
=> 134

Why does concat behave that strangely?  What am I missing here?  Is the
behavior of concatenating a unibyte and a multibyte string simply
undefined?  But if so, shouldn't that be mentioned in the docstring?  Or
is it a general rule that has to be followed, to not mix uni-/multibyte
in any function?

Please excuse me if that seems to be a silly question to you.  I just
want to learn and understand how that really works behind the scenes.

Thanks for any help.




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] url: Wrap cookie headers in url-http--encode-string.
  2016-09-09 14:56                 ` Alain Schneble
@ 2016-09-09 15:04                   ` Eli Zaretskii
  2016-09-09 15:16                     ` Alain Schneble
  2016-09-09 15:06                   ` Stefan Monnier
  1 sibling, 1 reply; 42+ messages in thread
From: Eli Zaretskii @ 2016-09-09 15:04 UTC (permalink / raw)
  To: Alain Schneble; +Cc: toke, emacs-devel, monnier, dgutov

> From: Alain Schneble <a.s@realize.ch>
> CC: Dmitry Gutov <dgutov@yandex.ru>, Eli Zaretskii <eliz@gnu.org>,
> 	<monnier@iro.umontreal.ca>, <emacs-devel@gnu.org>
> Date: Fri, 9 Sep 2016 16:56:54 +0200
> 
> > (url-retrieve-synchronously "http://google.se") ; sets a cookie
> > (let* ((url-request-data (encode-coding-string "æøå" 'utf-8)))
> >        (url-retrieve-synchronously "http://google.se")) ; crashes
> 
> I was able to reproduce it but am a bit confused, since it doesn't
> signal an error when message-body "æøå" is replaced by "abc", while
> reusing the same cookie.
> 
> I tried to track it down with the following example. `cookie-val' is the
> value of the cookie-string:
> 
> (string-bytes cookie-val)
> => 131
> (string-bytes (encode-coding-string "æøå" 'utf-8))
> => 6
> (string-bytes (concat (encode-coding-string "æøå" 'utf-8) cookie-val))
> => 143 ' why?
> (string-bytes (concat (string-as-unibyte "abc") ans-cookie-val))
> => 134

Because a multibyte string with ASCII-only text has the same number of
bytes as it has characters.  While a multibyte string with non-ASCII
text has more bytes than characters, due to the way Emacs represents
characters internally (which is actually a superset of UTF-8).

> Why does concat behave that strangely?  What am I missing here?  Is the
> behavior of concatenating a unibyte and a multibyte string simply
> undefined?

No, it isn't undefined.  When some of the arguments are multibyte
strings, concat returns a multibyte string.  Nothing else would make
sense.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] url: Wrap cookie headers in url-http--encode-string.
  2016-09-09 14:56                 ` Alain Schneble
  2016-09-09 15:04                   ` Eli Zaretskii
@ 2016-09-09 15:06                   ` Stefan Monnier
  2016-09-09 15:15                     ` Alain Schneble
  1 sibling, 1 reply; 42+ messages in thread
From: Stefan Monnier @ 2016-09-09 15:06 UTC (permalink / raw)
  To: emacs-devel

> (string-bytes cookie-val)
> => 131

This says that the internal representation of the cookie-val string uses
up 131 bytes.

> (string-bytes (encode-coding-string "æøå" 'utf-8))
> => 6

Note that `length' should return the same value, since the string
returned by encode-coding-string should be unibyte (i.e. is a sequence of
bytes, rather than sequence of characters).

> (string-bytes (concat (encode-coding-string "æøå" 'utf-8) cookie-val))
> => 143 ' why?

Because the concatenation needs to convert the bytes held in the first
string into chars.  The internal representation of bytes >=128 as chars
takes up 2 bytes.


        Stefan




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] url: Wrap cookie headers in url-http--encode-string.
  2016-09-09 15:06                   ` Stefan Monnier
@ 2016-09-09 15:15                     ` Alain Schneble
  0 siblings, 0 replies; 42+ messages in thread
From: Alain Schneble @ 2016-09-09 15:15 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> (string-bytes cookie-val)
>> => 131
>
> This says that the internal representation of the cookie-val string uses
> up 131 bytes.
>
>> (string-bytes (encode-coding-string "æøå" 'utf-8))
>> => 6
>
> Note that `length' should return the same value, since the string
> returned by encode-coding-string should be unibyte (i.e. is a sequence of
> bytes, rather than sequence of characters).
>
>> (string-bytes (concat (encode-coding-string "æøå" 'utf-8) cookie-val))
>> => 143 ' why?
>
> Because the concatenation needs to convert the bytes held in the first
> string into chars.  The internal representation of bytes >=128 as chars
> takes up 2 bytes.

Your last sentence explains the behavior I did not understand.  Thanks!




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] url: Wrap cookie headers in url-http--encode-string.
  2016-09-09 15:04                   ` Eli Zaretskii
@ 2016-09-09 15:16                     ` Alain Schneble
  0 siblings, 0 replies; 42+ messages in thread
From: Alain Schneble @ 2016-09-09 15:16 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: toke, dgutov, monnier, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> (string-bytes cookie-val)
>> => 131
>> (string-bytes (encode-coding-string "æøå" 'utf-8))
>> => 6
>> (string-bytes (concat (encode-coding-string "æøå" 'utf-8) cookie-val))
>> => 143 ' why?
>> (string-bytes (concat (string-as-unibyte "abc") ans-cookie-val))
>> => 134
>
> Because a multibyte string with ASCII-only text has the same number of
> bytes as it has characters.  While a multibyte string with non-ASCII
> text has more bytes than characters, due to the way Emacs represents
> characters internally (which is actually a superset of UTF-8).
>
>> Why does concat behave that strangely?  What am I missing here?  Is the
>> behavior of concatenating a unibyte and a multibyte string simply
>> undefined?
>
> No, it isn't undefined.  When some of the arguments are multibyte
> strings, concat returns a multibyte string.  Nothing else would make
> sense.

Thanks!




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] url: Wrap cookie headers in url-http--encode-string.
  2016-09-08 15:58               ` Toke Høiland-Jørgensen
  2016-09-08 17:20                 ` Eli Zaretskii
  2016-09-09 14:56                 ` Alain Schneble
@ 2016-09-09 18:02                 ` Alain Schneble
  2016-09-09 18:07                   ` Toke Høiland-Jørgensen
  2016-09-09 18:54                   ` Eli Zaretskii
  2 siblings, 2 replies; 42+ messages in thread
From: Alain Schneble @ 2016-09-09 18:02 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Eli Zaretskii, emacs-devel, monnier, Dmitry Gutov

Toke Høiland-Jørgensen <toke@toke.dk> writes:

> Dmitry Gutov <dgutov@yandex.ru> writes:
>>
>> Could you post the full recipe?
>
> (url-retrieve-synchronously "http://google.se") ; sets a cookie
> (let* ((url-request-data (encode-coding-string "æøå" 'utf-8)))
>        (url-retrieve-synchronously "http://google.se")) ; crashes

Just two notes on this:

1. This example uses a GET request with some body data.  I don't think
   that this is a real use case.  Of course, with other requests such as
   POST, where some body data is typically sent, we might run into the
   same issue.

2. The reason why it crashes in this example is not because the cookie
   contains some non-ascii characters but because the ascii-only
   cookie-value gets converted into a multibyte string internally.  If
   you save and reload the cookie file after the first
   `url-retrieve-synchronously' that sets the cookie, the error goes
   away.  So this ...

   (url-retrieve-synchronously "http://google.se")
   (url-cookie-write-file)
   (url-cookie-parse-file)
   (let* ((url-request-data (encode-coding-string "æøå" 'utf-8)))
         (url-retrieve-synchronously "http://google.se"))

   ... works properly.

   The point is that while accepting and parsing the cookie, the
   cookie-value gets turned into a multibyte string in `url-parse-args'
   called from `url-cookie-handle-set-cookie'.  Not sure if that should
   happen if a multibyte string is not really needed...
   
   And while reloading the cookie file, the cookie-value is read back as
   a non multibyte-string, as I would have expected it to be in the
   first place.

Alain




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] url: Wrap cookie headers in url-http--encode-string.
  2016-09-09 18:02                 ` Alain Schneble
@ 2016-09-09 18:07                   ` Toke Høiland-Jørgensen
  2016-09-09 18:54                   ` Eli Zaretskii
  1 sibling, 0 replies; 42+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-09-09 18:07 UTC (permalink / raw)
  To: Alain Schneble; +Cc: Eli Zaretskii, emacs-devel, monnier, Dmitry Gutov

Alain Schneble <a.s@realize.ch> writes:

> Toke Høiland-Jørgensen <toke@toke.dk> writes:
>
>> Dmitry Gutov <dgutov@yandex.ru> writes:
>>>
>>> Could you post the full recipe?
>>
>> (url-retrieve-synchronously "http://google.se") ; sets a cookie
>> (let* ((url-request-data (encode-coding-string "æøå" 'utf-8)))
>>        (url-retrieve-synchronously "http://google.se")) ; crashes
>
> Just two notes on this:
>
> 1. This example uses a GET request with some body data.  I don't think
>    that this is a real use case.  Of course, with other requests such as
>    POST, where some body data is typically sent, we might run into the
>    same issue.

Yeah, that was not supposed to be a real-world use case, just the
shortest code snippet I could get to crash. The actual request that I
ran into this bug with was a PUT.

-Toke



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] url: Wrap cookie headers in url-http--encode-string.
  2016-09-09 18:02                 ` Alain Schneble
  2016-09-09 18:07                   ` Toke Høiland-Jørgensen
@ 2016-09-09 18:54                   ` Eli Zaretskii
  2016-09-09 19:21                     ` Alain Schneble
  1 sibling, 1 reply; 42+ messages in thread
From: Eli Zaretskii @ 2016-09-09 18:54 UTC (permalink / raw)
  To: Alain Schneble; +Cc: toke, emacs-devel, monnier, dgutov

> From: Alain Schneble <a.s@realize.ch>
> CC: Dmitry Gutov <dgutov@yandex.ru>, Eli Zaretskii <eliz@gnu.org>,
> 	<monnier@iro.umontreal.ca>, <emacs-devel@gnu.org>
> Date: Fri, 9 Sep 2016 20:02:43 +0200
> 
> 2. The reason why it crashes in this example is not because the cookie
>    contains some non-ascii characters but because the ascii-only
>    cookie-value gets converted into a multibyte string internally.

Of course, that part is clear.  ASCII strings can be either unibyte or
multibyte, depending on how they were produced.

But why did you think this was worth mentioning?  How does it help us
find the proper solution?

Thanks.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] url: Wrap cookie headers in url-http--encode-string.
  2016-09-09 18:54                   ` Eli Zaretskii
@ 2016-09-09 19:21                     ` Alain Schneble
  2016-09-09 19:32                       ` Eli Zaretskii
  0 siblings, 1 reply; 42+ messages in thread
From: Alain Schneble @ 2016-09-09 19:21 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: toke, dgutov, monnier, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Alain Schneble <a.s@realize.ch>
>> CC: Dmitry Gutov <dgutov@yandex.ru>, Eli Zaretskii <eliz@gnu.org>,
>> 	<monnier@iro.umontreal.ca>, <emacs-devel@gnu.org>
>> Date: Fri, 9 Sep 2016 20:02:43 +0200
>> 
>> 2. The reason why it crashes in this example is not because the cookie
>>    contains some non-ascii characters but because the ascii-only
>>    cookie-value gets converted into a multibyte string internally.
>
> Of course, that part is clear.  ASCII strings can be either unibyte or
> multibyte, depending on how they were produced.

I was not fully aware of the details how and when that happens, sorry.

> But why did you think this was worth mentioning?  How does it help us
> find the proper solution?

Just because the question was raised in this thread whether or not a
cookie-value can or will ever contain any non-ascii characters.  And I
think we still do not have any strong evidence that any server out there
really uses non-ascii characters on purpose.  So my comment was just to
consider all alternatives that fix this issue.  Because another approach
would be to explicitly convert the cookie-value to unibyte if it's an
us-ascii string in `url-cookie-store', for instance.  That would fix the
example that was shown and maybe all real use cases while still
signalling an error for the "yet unexpected non-ascii" ones.  I'm not
saying that this is better than what was proposed.

So sorry for the noise.




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] url: Wrap cookie headers in url-http--encode-string.
  2016-09-09 19:21                     ` Alain Schneble
@ 2016-09-09 19:32                       ` Eli Zaretskii
  2016-09-09 19:47                         ` Alain Schneble
  2016-09-09 20:01                         ` distinguishing multibyte/unibyte ASCII (was: [PATCH] url: Wrap cookie headers in url-http--encode-string.) Stefan Monnier
  0 siblings, 2 replies; 42+ messages in thread
From: Eli Zaretskii @ 2016-09-09 19:32 UTC (permalink / raw)
  To: Alain Schneble; +Cc: toke, dgutov, monnier, emacs-devel

> From: Alain Schneble <a.s@realize.ch>
> CC: <toke@toke.dk>, <emacs-devel@gnu.org>, <monnier@iro.umontreal.ca>,
> 	<dgutov@yandex.ru>
> Date: Fri, 9 Sep 2016 21:21:18 +0200
> 
> >> 2. The reason why it crashes in this example is not because the cookie
> >>    contains some non-ascii characters but because the ascii-only
> >>    cookie-value gets converted into a multibyte string internally.
> >
> > Of course, that part is clear.  ASCII strings can be either unibyte or
> > multibyte, depending on how they were produced.
> 
> I was not fully aware of the details how and when that happens, sorry.

If you just generate an ASCII string from ASCII characters, it will
usually be unibyte.  If you take it as a substring from a multibyte
buffer, it will usually be multibyte.

> > But why did you think this was worth mentioning?  How does it help us
> > find the proper solution?
> 
> Just because the question was raised in this thread whether or not a
> cookie-value can or will ever contain any non-ascii characters.  And I
> think we still do not have any strong evidence that any server out there
> really uses non-ascii characters on purpose.

That's not the issue.  The issue is whether a cookie-value can
legitimately have non-ASCII characters.  If it can, then we must
_encode_ the cookie-value, as that is the only correct way of getting
a unibyte string from non-ASCII characters.  And you pointed to an RFC
that seems to say non-ASCII characters in cookies are possible.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] url: Wrap cookie headers in url-http--encode-string.
  2016-09-09 19:32                       ` Eli Zaretskii
@ 2016-09-09 19:47                         ` Alain Schneble
  2016-09-09 19:49                           ` Eli Zaretskii
  2016-09-09 20:01                         ` distinguishing multibyte/unibyte ASCII (was: [PATCH] url: Wrap cookie headers in url-http--encode-string.) Stefan Monnier
  1 sibling, 1 reply; 42+ messages in thread
From: Alain Schneble @ 2016-09-09 19:47 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: toke, emacs-devel, monnier, dgutov

Eli Zaretskii <eliz@gnu.org> writes:

> If you just generate an ASCII string from ASCII characters, it will
> usually be unibyte.  If you take it as a substring from a multibyte
> buffer, it will usually be multibyte.

Thanks for your patience.

> That's not the issue.  The issue is whether a cookie-value can
> legitimately have non-ASCII characters.  If it can, then we must
> _encode_ the cookie-value, as that is the only correct way of getting
> a unibyte string from non-ASCII characters.  And you pointed to an RFC
> that seems to say non-ASCII characters in cookies are possible.

Yes true, but I thought that maybe fixing this as described could be a
viable non-invasive alternative for the upcoming 25.1 release.




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] url: Wrap cookie headers in url-http--encode-string.
  2016-09-09 19:47                         ` Alain Schneble
@ 2016-09-09 19:49                           ` Eli Zaretskii
  2016-09-09 19:56                             ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 42+ messages in thread
From: Eli Zaretskii @ 2016-09-09 19:49 UTC (permalink / raw)
  To: Alain Schneble; +Cc: toke, emacs-devel, monnier, dgutov

> From: Alain Schneble <a.s@realize.ch>
> CC: <toke@toke.dk>, <dgutov@yandex.ru>, <monnier@iro.umontreal.ca>,
> 	<emacs-devel@gnu.org>
> Date: Fri, 9 Sep 2016 21:47:23 +0200
> 
> > That's not the issue.  The issue is whether a cookie-value can
> > legitimately have non-ASCII characters.  If it can, then we must
> > _encode_ the cookie-value, as that is the only correct way of getting
> > a unibyte string from non-ASCII characters.  And you pointed to an RFC
> > that seems to say non-ASCII characters in cookies are possible.
> 
> Yes true, but I thought that maybe fixing this as described could be a
> viable non-invasive alternative for the upcoming 25.1 release.

It wouldn't be safe if cookies could include non-ASCII characters.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] url: Wrap cookie headers in url-http--encode-string.
  2016-09-09 19:49                           ` Eli Zaretskii
@ 2016-09-09 19:56                             ` Toke Høiland-Jørgensen
  2016-09-10  5:42                               ` Eli Zaretskii
  0 siblings, 1 reply; 42+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-09-09 19:56 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Alain Schneble, emacs-devel, monnier, dgutov

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Alain Schneble <a.s@realize.ch>
>> CC: <toke@toke.dk>, <dgutov@yandex.ru>, <monnier@iro.umontreal.ca>,
>> 	<emacs-devel@gnu.org>
>> Date: Fri, 9 Sep 2016 21:47:23 +0200
>> 
>> > That's not the issue.  The issue is whether a cookie-value can
>> > legitimately have non-ASCII characters.  If it can, then we must
>> > _encode_ the cookie-value, as that is the only correct way of getting
>> > a unibyte string from non-ASCII characters.  And you pointed to an RFC
>> > that seems to say non-ASCII characters in cookies are possible.
>> 
>> Yes true, but I thought that maybe fixing this as described could be a
>> viable non-invasive alternative for the upcoming 25.1 release.
>
> It wouldn't be safe if cookies could include non-ASCII characters.

Well, according to this:

http://stackoverflow.com/a/1969339

Safari, at least, will reject non-ASCII cookies. Which implies that in
practice no sites will use non-ASCII values because they would break.

How would url react if it loaded a page that contained a non-ASCII
cookie string, is really the question to be asking here. Presumably
there's some kind of input sanitation somewhere?

-Toke



^ permalink raw reply	[flat|nested] 42+ messages in thread

* distinguishing multibyte/unibyte ASCII (was: [PATCH] url: Wrap cookie headers in url-http--encode-string.)
  2016-09-09 19:32                       ` Eli Zaretskii
  2016-09-09 19:47                         ` Alain Schneble
@ 2016-09-09 20:01                         ` Stefan Monnier
  2016-09-09 20:17                           ` distinguishing multibyte/unibyte ASCII Toke Høiland-Jørgensen
                                             ` (2 more replies)
  1 sibling, 3 replies; 42+ messages in thread
From: Stefan Monnier @ 2016-09-09 20:01 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Alain Schneble, toke, dgutov, emacs-devel

> If you just generate an ASCII string from ASCII characters, it will
> usually be unibyte.  If you take it as a substring from a multibyte
> buffer, it will usually be multibyte.

And it's arguably a wart in Emacs's handling of chars-vs-bytes.
But it's kind of hard to fix now.

At some point I tried to change this handling (not exactly fix it) by
treating multibyte ASCII strings specially (it's easy to recognize by
checking that the char length is equal to the byte length and both are
readily available in the "struct Lisp_String" object).  Then when we
read an ASCII string, instead of making it unibyte, I'd keep it as
multibyte.  And then change things like "concat" so that those "ASCII
multibyte" strings don't force the result to be multibyte.

My local Emacs still runs with those changes, but in the end I don't
think the result is really better (or sufficiently better to justify
the subtle incompatibilities it introduces).

[ Also, I wouldn't be surprised to hear that such a change causes real
  problems with utf-7 or EBCDIC, or other systems where decoding/encoding
  a string of bytes/chars all <127 is not a no-op.  ]


        Stefan



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: distinguishing multibyte/unibyte ASCII
  2016-09-09 20:01                         ` distinguishing multibyte/unibyte ASCII (was: [PATCH] url: Wrap cookie headers in url-http--encode-string.) Stefan Monnier
@ 2016-09-09 20:17                           ` Toke Høiland-Jørgensen
  2016-09-09 20:46                             ` Stefan Monnier
  2016-09-09 21:02                           ` Alain Schneble
  2016-09-10  5:50                           ` distinguishing multibyte/unibyte ASCII (was: [PATCH] url: Wrap cookie headers in url-http--encode-string.) Eli Zaretskii
  2 siblings, 1 reply; 42+ messages in thread
From: Toke Høiland-Jørgensen @ 2016-09-09 20:17 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, Alain Schneble, dgutov, emacs-devel

Stefan Monnier <monnier@IRO.UMontreal.CA> writes:

>> If you just generate an ASCII string from ASCII characters, it will
>> usually be unibyte.  If you take it as a substring from a multibyte
>> buffer, it will usually be multibyte.
>
> And it's arguably a wart in Emacs's handling of chars-vs-bytes.
> But it's kind of hard to fix now.
>
> At some point I tried to change this handling (not exactly fix it) by
> treating multibyte ASCII strings specially (it's easy to recognize by
> checking that the char length is equal to the byte length and both are
> readily available in the "struct Lisp_String" object).  Then when we
> read an ASCII string, instead of making it unibyte, I'd keep it as
> multibyte.  And then change things like "concat" so that those "ASCII
> multibyte" strings don't force the result to be multibyte.
>
> My local Emacs still runs with those changes, but in the end I don't
> think the result is really better (or sufficiently better to justify
> the subtle incompatibilities it introduces).
>
> [ Also, I wouldn't be surprised to hear that such a change causes real
>   problems with utf-7 or EBCDIC, or other systems where decoding/encoding
>   a string of bytes/chars all <127 is not a no-op.  ]

Isn't Unicode fun? :)

-Toke



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: distinguishing multibyte/unibyte ASCII
  2016-09-09 20:17                           ` distinguishing multibyte/unibyte ASCII Toke Høiland-Jørgensen
@ 2016-09-09 20:46                             ` Stefan Monnier
  0 siblings, 0 replies; 42+ messages in thread
From: Stefan Monnier @ 2016-09-09 20:46 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Eli Zaretskii, Alain Schneble, dgutov, emacs-devel

> Isn't Unicode fun? :)

Actually it's not really due to Unicode.  It's due to the fact that
Emacs Lisp doesn't clearly distinguish chars and bytes (and strings of
bytes and strings of chars) so we sometimes have to guess which one
is meant.

In practice it works surprisingly well, and lets most coders live their
life without having to know and understand the difference, but the
actual precise details are sometimes rather brittle because of its
DWIMish nature.


        Stefan



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: distinguishing multibyte/unibyte ASCII
  2016-09-09 20:01                         ` distinguishing multibyte/unibyte ASCII (was: [PATCH] url: Wrap cookie headers in url-http--encode-string.) Stefan Monnier
  2016-09-09 20:17                           ` distinguishing multibyte/unibyte ASCII Toke Høiland-Jørgensen
@ 2016-09-09 21:02                           ` Alain Schneble
  2016-09-10  5:50                           ` distinguishing multibyte/unibyte ASCII (was: [PATCH] url: Wrap cookie headers in url-http--encode-string.) Eli Zaretskii
  2 siblings, 0 replies; 42+ messages in thread
From: Alain Schneble @ 2016-09-09 21:02 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, toke, emacs-devel, dgutov

Stefan Monnier <monnier@IRO.UMontreal.CA> writes:

>> If you just generate an ASCII string from ASCII characters, it will
>> usually be unibyte.  If you take it as a substring from a multibyte
>> buffer, it will usually be multibyte.
>
> And it's arguably a wart in Emacs's handling of chars-vs-bytes.
> But it's kind of hard to fix now.
>
> At some point I tried to change this handling (not exactly fix it) by
> treating multibyte ASCII strings specially (it's easy to recognize by
> checking that the char length is equal to the byte length and both are
> readily available in the "struct Lisp_String" object).  Then when we
> read an ASCII string, instead of making it unibyte, I'd keep it as
> multibyte.  And then change things like "concat" so that those "ASCII
> multibyte" strings don't force the result to be multibyte.
>
> My local Emacs still runs with those changes, but in the end I don't
> think the result is really better (or sufficiently better to justify
> the subtle incompatibilities it introduces).

I'm relieved to hear that :)  Thanks for sharing it.




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] url: Wrap cookie headers in url-http--encode-string.
  2016-09-09 19:56                             ` Toke Høiland-Jørgensen
@ 2016-09-10  5:42                               ` Eli Zaretskii
  2016-09-10  8:34                                 ` Dmitry Gutov
  0 siblings, 1 reply; 42+ messages in thread
From: Eli Zaretskii @ 2016-09-10  5:42 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: a.s, emacs-devel, monnier, dgutov

> From: Toke Høiland-Jørgensen <toke@toke.dk>
> Cc: Alain Schneble <a.s@realize.ch>, dgutov@yandex.ru, monnier@iro.umontreal.ca,
> 	emacs-devel@gnu.org
> Date: Fri, 09 Sep 2016 21:56:44 +0200
> 
> > It wouldn't be safe if cookies could include non-ASCII characters.
> 
> Well, according to this:
> 
> http://stackoverflow.com/a/1969339
> 
> Safari, at least, will reject non-ASCII cookies. Which implies that in
> practice no sites will use non-ASCII values because they would break.

Then your original patch should be fine for the emacs-25 branch.  If
no objections are voiced here by the end of the day, I think we should
install it.

Thanks.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: distinguishing multibyte/unibyte ASCII (was: [PATCH] url: Wrap cookie headers in url-http--encode-string.)
  2016-09-09 20:01                         ` distinguishing multibyte/unibyte ASCII (was: [PATCH] url: Wrap cookie headers in url-http--encode-string.) Stefan Monnier
  2016-09-09 20:17                           ` distinguishing multibyte/unibyte ASCII Toke Høiland-Jørgensen
  2016-09-09 21:02                           ` Alain Schneble
@ 2016-09-10  5:50                           ` Eli Zaretskii
  2 siblings, 0 replies; 42+ messages in thread
From: Eli Zaretskii @ 2016-09-10  5:50 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: a.s, toke, dgutov, emacs-devel

> From: Stefan Monnier <monnier@IRO.UMontreal.CA>
> Cc: Alain Schneble <a.s@realize.ch>, toke@toke.dk, emacs-devel@gnu.org,
>         dgutov@yandex.ru
> Date: Fri, 09 Sep 2016 16:01:57 -0400
> 
> At some point I tried to change this handling (not exactly fix it) by
> treating multibyte ASCII strings specially (it's easy to recognize by
> checking that the char length is equal to the byte length and both are
> readily available in the "struct Lisp_String" object).  Then when we
> read an ASCII string, instead of making it unibyte, I'd keep it as
> multibyte.  And then change things like "concat" so that those "ASCII
> multibyte" strings don't force the result to be multibyte.
> 
> My local Emacs still runs with those changes, but in the end I don't
> think the result is really better (or sufficiently better to justify
> the subtle incompatibilities it introduces).
> 
> [ Also, I wouldn't be surprised to hear that such a change causes real
>   problems with utf-7 or EBCDIC, or other systems where decoding/encoding
>   a string of bytes/chars all <127 is not a no-op.  ]

We could change concat (and other relevant functions, like format) to
recognize ASCII strings safely and reliably, but I think this would
make those functions slower, which would be a performance hit, since
those functions are used a lot in the inner loops.

This issue is actually rather marginal in Emacs, the url-http use case
is one of a very few that care, because they need to report length in
bytes.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] url: Wrap cookie headers in url-http--encode-string.
  2016-09-10  5:42                               ` Eli Zaretskii
@ 2016-09-10  8:34                                 ` Dmitry Gutov
  2016-09-10 19:12                                   ` Eli Zaretskii
  0 siblings, 1 reply; 42+ messages in thread
From: Dmitry Gutov @ 2016-09-10  8:34 UTC (permalink / raw)
  To: Eli Zaretskii, Toke Høiland-Jørgensen; +Cc: a.s, monnier, emacs-devel

On 10.09.2016 08:42, Eli Zaretskii wrote:

>> Well, according to this:
>>
>> http://stackoverflow.com/a/1969339
>>
>> Safari, at least, will reject non-ASCII cookies. Which implies that in
>> practice no sites will use non-ASCII values because they would break.
>
> Then your original patch should be fine for the emacs-25 branch.  If
> no objections are voiced here by the end of the day, I think we should
> install it.

Agreed.




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] url: Wrap cookie headers in url-http--encode-string.
  2016-09-10  8:34                                 ` Dmitry Gutov
@ 2016-09-10 19:12                                   ` Eli Zaretskii
  0 siblings, 0 replies; 42+ messages in thread
From: Eli Zaretskii @ 2016-09-10 19:12 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: a.s, toke, monnier, emacs-devel

> Cc: a.s@realize.ch, monnier@iro.umontreal.ca, emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Sat, 10 Sep 2016 11:34:28 +0300
> 
> On 10.09.2016 08:42, Eli Zaretskii wrote:
> 
> >> Well, according to this:
> >>
> >> http://stackoverflow.com/a/1969339
> >>
> >> Safari, at least, will reject non-ASCII cookies. Which implies that in
> >> practice no sites will use non-ASCII values because they would break.
> >
> > Then your original patch should be fine for the emacs-25 branch.  If
> > no objections are voiced here by the end of the day, I think we should
> > install it.
> 
> Agreed.

Pushed.



^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2016-09-10 19:12 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-07 15:30 [PATCH] url: Wrap cookie headers in url-http--encode-string Toke Høiland-Jørgensen
2016-09-07 16:40 ` Stefan Monnier
2016-09-07 16:52   ` Toke Høiland-Jørgensen
2016-09-07 17:15     ` Eli Zaretskii
2016-09-07 18:25       ` Toke Høiland-Jørgensen
2016-09-08 14:06         ` Dmitry Gutov
2016-09-08 14:14           ` Toke Høiland-Jørgensen
2016-09-08 14:25             ` Dmitry Gutov
2016-09-08 15:58               ` Toke Høiland-Jørgensen
2016-09-08 17:20                 ` Eli Zaretskii
2016-09-08 17:43                   ` Toke Høiland-Jørgensen
2016-09-08 18:01                     ` Eli Zaretskii
2016-09-08 17:47                   ` Stefan Monnier
2016-09-08 18:04                     ` Eli Zaretskii
2016-09-08 20:29                       ` Alain Schneble
2016-09-09  7:57                         ` Eli Zaretskii
2016-09-09 14:56                 ` Alain Schneble
2016-09-09 15:04                   ` Eli Zaretskii
2016-09-09 15:16                     ` Alain Schneble
2016-09-09 15:06                   ` Stefan Monnier
2016-09-09 15:15                     ` Alain Schneble
2016-09-09 18:02                 ` Alain Schneble
2016-09-09 18:07                   ` Toke Høiland-Jørgensen
2016-09-09 18:54                   ` Eli Zaretskii
2016-09-09 19:21                     ` Alain Schneble
2016-09-09 19:32                       ` Eli Zaretskii
2016-09-09 19:47                         ` Alain Schneble
2016-09-09 19:49                           ` Eli Zaretskii
2016-09-09 19:56                             ` Toke Høiland-Jørgensen
2016-09-10  5:42                               ` Eli Zaretskii
2016-09-10  8:34                                 ` Dmitry Gutov
2016-09-10 19:12                                   ` Eli Zaretskii
2016-09-09 20:01                         ` distinguishing multibyte/unibyte ASCII (was: [PATCH] url: Wrap cookie headers in url-http--encode-string.) Stefan Monnier
2016-09-09 20:17                           ` distinguishing multibyte/unibyte ASCII Toke Høiland-Jørgensen
2016-09-09 20:46                             ` Stefan Monnier
2016-09-09 21:02                           ` Alain Schneble
2016-09-10  5:50                           ` distinguishing multibyte/unibyte ASCII (was: [PATCH] url: Wrap cookie headers in url-http--encode-string.) Eli Zaretskii
2016-09-07 19:14 ` [PATCH] url: Wrap cookie headers in url-http--encode-string Lars Ingebrigtsen
2016-09-07 20:49   ` Toke Høiland-Jørgensen
2016-09-08  2:47   ` Eli Zaretskii
2016-09-08  9:07     ` Lars Ingebrigtsen
2016-09-08 17:23       ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).