all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* Bug in url-retrieve-synchronously from url.el on redirect
@ 2020-07-10 18:18 Daniele Nicolodi
  2020-07-10 18:49 ` Yuri Khan
  0 siblings, 1 reply; 10+ messages in thread
From: Daniele Nicolodi @ 2020-07-10 18:18 UTC (permalink / raw)
  To: Emacs developers

[-- Attachment #1: Type: text/plain, Size: 1422 bytes --]

Hello,

url-retrieve-synchronously fails to obey redirect responses if the
returned "Location" header contains spaces: it redirects to the URL
truncated to the first space. It seems that spaces in the Location
header value are allowed (at least ngnix produces headers like that).

If I understand the code correctly, the redirect response in interpreted
in url-http-parse-headers where the header is explicitly truncated:

	 (when redirect-uri
	   ;; Clean off any whitespace and/or <...> cruft.
	   (if (string-match "\\([^ \t]+\\)[ \t]" redirect-uri)
	       (setq redirect-uri (match-string 1 redirect-uri)))
	   (if (string-match "^<\\(.*\\)>$" redirect-uri)
	       (setq redirect-uri (match-string 1 redirect-uri)))

I think the first regular expression is wrong. I believe that its intent
is to remove leading and trailing white space, but it actually truncates
the value to the first white space character. Also, redirect-uri is
obtained with

       (let ((redirect-uri (or (mail-fetch-field "Location")
			       (mail-fetch-field "URI"))))

and mail-fetch-field already removes leading and trailing whitespace.

I think the attached patch should fix the problem.

Finally, the removal of the < > delimiters seems unnecessary too as they
are not allowed delimiters in HTTP headers (in my reading of
https://tools.ietf.org/html/rfc7230) however there are no adverse
consequences in leaving it there.

Cheers,
Dan

[-- Attachment #2: 0001-url-http-Fix-handling-of-redirect-locations-containi.patch --]
[-- Type: text/plain, Size: 939 bytes --]

From f2408eaaa1e2afb6be15588e0f8f8a2c3cfe1ec6 Mon Sep 17 00:00:00 2001
From: Daniele Nicolodi <daniele@grinta.net>
Date: Fri, 10 Jul 2020 12:16:01 -0600
Subject: [PATCH] url-http: Fix handling of redirect locations containing
 whitespace

---
 lisp/url/url-http.el | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/lisp/url/url-http.el b/lisp/url/url-http.el
index 669c24571f..b7d6f42ed5 100644
--- a/lisp/url/url-http.el
+++ b/lisp/url/url-http.el
@@ -702,9 +702,7 @@ should be shown to the user."
 	    ;; Treat everything like '300'
 	    nil))
 	 (when redirect-uri
-	   ;; Clean off any whitespace and/or <...> cruft.
-	   (if (string-match "\\([^ \t]+\\)[ \t]" redirect-uri)
-	       (setq redirect-uri (match-string 1 redirect-uri)))
+	   ;; Clean off any <...> cruft.
 	   (if (string-match "^<\\(.*\\)>$" redirect-uri)
 	       (setq redirect-uri (match-string 1 redirect-uri)))
 
-- 
2.24.3 (Apple Git-128)


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Bug in url-retrieve-synchronously from url.el on redirect
  2020-07-10 18:18 Bug in url-retrieve-synchronously from url.el on redirect Daniele Nicolodi
@ 2020-07-10 18:49 ` Yuri Khan
  2020-07-10 19:37   ` Yuri Khan
  2020-07-10 19:43   ` Daniele Nicolodi
  0 siblings, 2 replies; 10+ messages in thread
From: Yuri Khan @ 2020-07-10 18:49 UTC (permalink / raw)
  To: Daniele Nicolodi; +Cc: Emacs developers

On Sat, 11 Jul 2020 at 01:18, Daniele Nicolodi <daniele@grinta.net> wrote:

> url-retrieve-synchronously fails to obey redirect responses if the
> returned "Location" header contains spaces: it redirects to the URL
> truncated to the first space. It seems that spaces in the Location
> header value are allowed (at least ngnix produces headers like that).

They are not, and you should report it as a bug against nginx. It
should be percent-encoding the space. It should also be
percent-encoding any non-ASCII characters.

The Location header is defined in RFC 7231, section 7.1.2, with a
value of URI-reference as defined in RFC 3986. The complete BNF
grammar is listed in Appendix A, and none of the productions there
contain the raw space character, nor raw non-ASCII characters.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bug in url-retrieve-synchronously from url.el on redirect
  2020-07-10 18:49 ` Yuri Khan
@ 2020-07-10 19:37   ` Yuri Khan
  2020-07-10 19:46     ` Daniele Nicolodi
  2020-07-10 19:43   ` Daniele Nicolodi
  1 sibling, 1 reply; 10+ messages in thread
From: Yuri Khan @ 2020-07-10 19:37 UTC (permalink / raw)
  To: Daniele Nicolodi; +Cc: Emacs developers

On Sat, 11 Jul 2020 at 01:49, Yuri Khan <yuri.v.khan@gmail.com> wrote:
>
> On Sat, 11 Jul 2020 at 01:18, Daniele Nicolodi <daniele@grinta.net> wrote:
>
> > url-retrieve-synchronously fails to obey redirect responses if the
> > returned "Location" header contains spaces: it redirects to the URL
> > truncated to the first space. It seems that spaces in the Location
> > header value are allowed (at least ngnix produces headers like that).
>
> They are not, and you should report it as a bug against nginx. It
> should be percent-encoding the space. It should also be
> percent-encoding any non-ASCII characters.

Actually, I went along and reported it myself, with an example that
clearly demonstrates a bug in nginx:

https://trac.nginx.org/nginx/ticket/2016



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bug in url-retrieve-synchronously from url.el on redirect
  2020-07-10 18:49 ` Yuri Khan
  2020-07-10 19:37   ` Yuri Khan
@ 2020-07-10 19:43   ` Daniele Nicolodi
  2020-07-10 20:25     ` Yuri Khan
  1 sibling, 1 reply; 10+ messages in thread
From: Daniele Nicolodi @ 2020-07-10 19:43 UTC (permalink / raw)
  To: emacs-devel

On 10/07/2020 12:49, Yuri Khan wrote:
> On Sat, 11 Jul 2020 at 01:18, Daniele Nicolodi <daniele@grinta.net> wrote:
> 
>> url-retrieve-synchronously fails to obey redirect responses if the
>> returned "Location" header contains spaces: it redirects to the URL
>> truncated to the first space. It seems that spaces in the Location
>> header value are allowed (at least ngnix produces headers like that).
> 
> They are not, and you should report it as a bug against nginx. It
> should be percent-encoding the space. It should also be
> percent-encoding any non-ASCII characters.
> 
> The Location header is defined in RFC 7231, section 7.1.2, with a
> value of URI-reference as defined in RFC 3986. The complete BNF
> grammar is listed in Appendix A, and none of the productions there
> contain the raw space character, nor raw non-ASCII characters.

Thanks for the explanation.

As far as I understand the RFCs (and being wrong before, I may be wrong
again) do not allow for < > quoting either. Why does url-http.el strip
them? Why does it break the URI at the first space if spaces are not
allowed?

I would apply the old "be liberal in what you accent and strict in what
you emit" wisdom here. Indeed, all the HTTP implementations I tested,
except Emacs, interpret the Location header with spaces as most likely
intended.

Cheers,
Dan



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bug in url-retrieve-synchronously from url.el on redirect
  2020-07-10 19:37   ` Yuri Khan
@ 2020-07-10 19:46     ` Daniele Nicolodi
  0 siblings, 0 replies; 10+ messages in thread
From: Daniele Nicolodi @ 2020-07-10 19:46 UTC (permalink / raw)
  To: emacs-devel

On 10/07/2020 13:37, Yuri Khan wrote:
> On Sat, 11 Jul 2020 at 01:49, Yuri Khan <yuri.v.khan@gmail.com> wrote:
>>
>> On Sat, 11 Jul 2020 at 01:18, Daniele Nicolodi <daniele@grinta.net> wrote:
>>
>>> url-retrieve-synchronously fails to obey redirect responses if the
>>> returned "Location" header contains spaces: it redirects to the URL
>>> truncated to the first space. It seems that spaces in the Location
>>> header value are allowed (at least ngnix produces headers like that).
>>
>> They are not, and you should report it as a bug against nginx. It
>> should be percent-encoding the space. It should also be
>> percent-encoding any non-ASCII characters.
> 
> Actually, I went along and reported it myself, with an example that
> clearly demonstrates a bug in nginx:
> 
> https://trac.nginx.org/nginx/ticket/2016

Thanks Yury! Much better than what I could have done.

Cheers,
Dan



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bug in url-retrieve-synchronously from url.el on redirect
  2020-07-10 19:43   ` Daniele Nicolodi
@ 2020-07-10 20:25     ` Yuri Khan
  2020-07-10 20:32       ` Daniele Nicolodi
  0 siblings, 1 reply; 10+ messages in thread
From: Yuri Khan @ 2020-07-10 20:25 UTC (permalink / raw)
  To: Daniele Nicolodi; +Cc: Emacs developers

On Sat, 11 Jul 2020 at 02:43, Daniele Nicolodi <daniele@grinta.net> wrote:

> As far as I understand the RFCs (and being wrong before, I may be wrong
> again) do not allow for < > quoting either. Why does url-http.el strip
> them? Why does it break the URI at the first space if spaces are not
> allowed?

I cannot answer that, maybe someone who is knowledgeable about
uri-http.el chimes in.

RFC 7231 allows clients to attempt to DTRT with invalid Location URIs
in any way they deem appropriate; you could argue for a different
recovery heuristic. Me, I’d rather have things break loudly on each
violation, so that it does not go unnoticed for too long. Postel’s
Razor is how we got HTML in its current shape.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bug in url-retrieve-synchronously from url.el on redirect
  2020-07-10 20:25     ` Yuri Khan
@ 2020-07-10 20:32       ` Daniele Nicolodi
  2020-07-11  0:55         ` Daniele Nicolodi
  0 siblings, 1 reply; 10+ messages in thread
From: Daniele Nicolodi @ 2020-07-10 20:32 UTC (permalink / raw)
  To: Yuri Khan; +Cc: Emacs developers

On 10/07/2020 14:25, Yuri Khan wrote:
> On Sat, 11 Jul 2020 at 02:43, Daniele Nicolodi <daniele@grinta.net> wrote:
> 
>> As far as I understand the RFCs (and being wrong before, I may be wrong
>> again) do not allow for < > quoting either. Why does url-http.el strip
>> them? Why does it break the URI at the first space if spaces are not
>> allowed?
> 
> I cannot answer that, maybe someone who is knowledgeable about
> uri-http.el chimes in.
> 
> RFC 7231 allows clients to attempt to DTRT with invalid Location URIs
> in any way they deem appropriate; you could argue for a different
> recovery heuristic. Me, I’d rather have things break loudly on each
> violation, so that it does not go unnoticed for too long. Postel’s
> Razor is how we got HTML in its current shape.

I tend to agree with you, but, in this specific case, being compatible
with other HTTP implementations is a worthwhile goal.

Unfortunately, re-defining url-http-parse-headers is the only
work-around I found to make Emacs do the less bad thing when dealing
with this malformed URIs.

Cheers,
Dan



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bug in url-retrieve-synchronously from url.el on redirect
  2020-07-10 20:32       ` Daniele Nicolodi
@ 2020-07-11  0:55         ` Daniele Nicolodi
  2020-07-13 18:15           ` chad
  0 siblings, 1 reply; 10+ messages in thread
From: Daniele Nicolodi @ 2020-07-11  0:55 UTC (permalink / raw)
  To: emacs-devel, wmperry, monnier

[-- Attachment #1: Type: text/plain, Size: 1494 bytes --]

On 10-07-2020 14:32, Daniele Nicolodi wrote:
> On 10/07/2020 14:25, Yuri Khan wrote:
>> On Sat, 11 Jul 2020 at 02:43, Daniele Nicolodi <daniele@grinta.net> wrote:
>>
>>> As far as I understand the RFCs (and being wrong before, I may be wrong
>>> again) do not allow for < > quoting either. Why does url-http.el strip
>>> them? Why does it break the URI at the first space if spaces are not
>>> allowed?
>>
>> I cannot answer that, maybe someone who is knowledgeable about
>> uri-http.el chimes in.
>>
>> RFC 7231 allows clients to attempt to DTRT with invalid Location URIs
>> in any way they deem appropriate; you could argue for a different
>> recovery heuristic. Me, I’d rather have things break loudly on each
>> violation, so that it does not go unnoticed for too long. Postel’s
>> Razor is how we got HTML in its current shape.
> 
> I tend to agree with you, but, in this specific case, being compatible
> with other HTTP implementations is a worthwhile goal.
> 
> Unfortunately, re-defining url-http-parse-headers is the only
> work-around I found to make Emacs do the less bad thing when dealing
> with this malformed URIs.

Bill, you seem to be the author of this code, although Stefan is the one
that introduced it to the Emacs accordingly to git blame. Do any of you
know why the redirect Location is handled like that?

I would like to suggest the two attached patches. The first fixes actual
issues I encountered, the second simply adjusts a comment.

Thank you.

Cheers,
Dan


[-- Attachment #2: 0001-url-http-Fix-handling-of-redirect-locations.patch --]
[-- Type: text/plain, Size: 1493 bytes --]

From 052a9934380fdd7517b5bf38aca37aaf951a644e Mon Sep 17 00:00:00 2001
From: Daniele Nicolodi <daniele@grinta.net>
Date: Fri, 10 Jul 2020 12:16:01 -0600
Subject: [PATCH 1/2] url-http: Fix handling of redirect locations

Do not break the redirect Location header when it contain spaces. In
RFC 7231 the Location header is defined to carry a URI-reference.
According to RFC 3986 it should be percent-encoded and thus should not
contain spaces. However, there are HTTP server implementation (notably
nginx) that do not do that. This makes Emacs url-http.el behave like
most other HTTP client implementatios. Also remove the stripping of
angle bracket quotes as they are not valid according to the RFCs.
---
 lisp/url/url-http.el | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/lisp/url/url-http.el b/lisp/url/url-http.el
index 669c24571f..a746be8475 100644
--- a/lisp/url/url-http.el
+++ b/lisp/url/url-http.el
@@ -702,12 +702,6 @@ should be shown to the user."
 	    ;; Treat everything like '300'
 	    nil))
 	 (when redirect-uri
-	   ;; Clean off any whitespace and/or <...> cruft.
-	   (if (string-match "\\([^ \t]+\\)[ \t]" redirect-uri)
-	       (setq redirect-uri (match-string 1 redirect-uri)))
-	   (if (string-match "^<\\(.*\\)>$" redirect-uri)
-	       (setq redirect-uri (match-string 1 redirect-uri)))
-
 	   ;; Some stupid sites (like sourceforge) send a
 	   ;; non-fully-qualified URL (ie: /), which royally confuses
 	   ;; the URL library.
-- 
2.24.3 (Apple Git-128)


[-- Attachment #3: 0002-url-http-Do-not-suggest-a-broken-HTTP-server-impleme.patch --]
[-- Type: text/plain, Size: 1083 bytes --]

From 8da3326f64aaeb47f4306a4bacc3067d4bc38141 Mon Sep 17 00:00:00 2001
From: Daniele Nicolodi <daniele@grinta.net>
Date: Fri, 10 Jul 2020 18:50:39 -0600
Subject: [PATCH 2/2] url-http: Do not suggest a broken HTTP server
 implementation

Relative URIs are allowed in the Location header by the relvant RFCs.
---
 lisp/url/url-http.el | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/lisp/url/url-http.el b/lisp/url/url-http.el
index a746be8475..e0c925e13f 100644
--- a/lisp/url/url-http.el
+++ b/lisp/url/url-http.el
@@ -702,9 +702,7 @@ should be shown to the user."
 	    ;; Treat everything like '300'
 	    nil))
 	 (when redirect-uri
-	   ;; Some stupid sites (like sourceforge) send a
-	   ;; non-fully-qualified URL (ie: /), which royally confuses
-	   ;; the URL library.
+	   ;; Handle relative redirect URIs.
 	   (if (not (string-match url-nonrelative-link redirect-uri))
                ;; Be careful to use the real target URL, otherwise we may
                ;; compute the redirection relative to the URL of the proxy.
-- 
2.24.3 (Apple Git-128)


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Bug in url-retrieve-synchronously from url.el on redirect
  2020-07-11  0:55         ` Daniele Nicolodi
@ 2020-07-13 18:15           ` chad
  2020-07-13 18:48             ` Daniele Nicolodi
  0 siblings, 1 reply; 10+ messages in thread
From: chad @ 2020-07-13 18:15 UTC (permalink / raw)
  To: Daniele Nicolodi; +Cc: wmperry, Stefan Monnier, EMACS development team

[-- Attachment #1: Type: text/plain, Size: 2084 bytes --]

Very likely the </> stripping code dates from a time period when code would
recognize strings inside angle-brackets as potential URLs/URIs, and passed
the entire string to the url library out of simplicity. If memory serves,
Bill Perry's original url code dates from the wild and wooly early days of
loose url encoding. I would expect that it can be changed safely.

Hope that helps,
~Chad

On Fri, Jul 10, 2020 at 5:55 PM Daniele Nicolodi <daniele@grinta.net> wrote:

> On 10-07-2020 14:32, Daniele Nicolodi wrote:
> > On 10/07/2020 14:25, Yuri Khan wrote:
> >> On Sat, 11 Jul 2020 at 02:43, Daniele Nicolodi <daniele@grinta.net>
> wrote:
> >>
> >>> As far as I understand the RFCs (and being wrong before, I may be wrong
> >>> again) do not allow for < > quoting either. Why does url-http.el strip
> >>> them? Why does it break the URI at the first space if spaces are not
> >>> allowed?
> >>
> >> I cannot answer that, maybe someone who is knowledgeable about
> >> uri-http.el chimes in.
> >>
> >> RFC 7231 allows clients to attempt to DTRT with invalid Location URIs
> >> in any way they deem appropriate; you could argue for a different
> >> recovery heuristic. Me, I’d rather have things break loudly on each
> >> violation, so that it does not go unnoticed for too long. Postel’s
> >> Razor is how we got HTML in its current shape.
> >
> > I tend to agree with you, but, in this specific case, being compatible
> > with other HTTP implementations is a worthwhile goal.
> >
> > Unfortunately, re-defining url-http-parse-headers is the only
> > work-around I found to make Emacs do the less bad thing when dealing
> > with this malformed URIs.
>
> Bill, you seem to be the author of this code, although Stefan is the one
> that introduced it to the Emacs accordingly to git blame. Do any of you
> know why the redirect Location is handled like that?
>
> I would like to suggest the two attached patches. The first fixes actual
> issues I encountered, the second simply adjusts a comment.
>
> Thank you.
>
> Cheers,
> Dan
>
>

[-- Attachment #2: Type: text/html, Size: 2665 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bug in url-retrieve-synchronously from url.el on redirect
  2020-07-13 18:15           ` chad
@ 2020-07-13 18:48             ` Daniele Nicolodi
  0 siblings, 0 replies; 10+ messages in thread
From: Daniele Nicolodi @ 2020-07-13 18:48 UTC (permalink / raw)
  To: chad; +Cc: wmperry, Stefan Monnier, EMACS development team

On 13/07/2020 12:15, chad wrote:
> Very likely the </> stripping code dates from a time period when code
> would recognize strings inside angle-brackets as potential URLs/URIs,
> and passed the entire string to the url library out of simplicity. If
> memory serves, Bill Perry's original url code dates from the wild and
> wooly early days of loose url encoding. I would expect that it can be
> changed safely.
> 
> Hope that helps,
> ~Chad

Thanks Chad. This does not quite explain the presence of the </>
stripping in handling HTTP protocol headers. But it may be that with
subsequent refactoring, this coded ended where it is now.

It would be nice if someone with commit rights could find a couple of
spare cycles to comment on these patches and hopefully apply them.

PS: Emails to Bill Perry address are bouncing for me.

Cheers,
Dan


> 
> On Fri, Jul 10, 2020 at 5:55 PM Daniele Nicolodi <daniele@grinta.net
> <mailto:daniele@grinta.net>> wrote:
> 
>     On 10-07-2020 14:32, Daniele Nicolodi wrote:
>     > On 10/07/2020 14:25, Yuri Khan wrote:
>     >> On Sat, 11 Jul 2020 at 02:43, Daniele Nicolodi
>     <daniele@grinta.net <mailto:daniele@grinta.net>> wrote:
>     >>
>     >>> As far as I understand the RFCs (and being wrong before, I may
>     be wrong
>     >>> again) do not allow for < > quoting either. Why does url-http.el
>     strip
>     >>> them? Why does it break the URI at the first space if spaces are not
>     >>> allowed?
>     >>
>     >> I cannot answer that, maybe someone who is knowledgeable about
>     >> uri-http.el chimes in.
>     >>
>     >> RFC 7231 allows clients to attempt to DTRT with invalid Location URIs
>     >> in any way they deem appropriate; you could argue for a different
>     >> recovery heuristic. Me, I’d rather have things break loudly on each
>     >> violation, so that it does not go unnoticed for too long. Postel’s
>     >> Razor is how we got HTML in its current shape.
>     >
>     > I tend to agree with you, but, in this specific case, being compatible
>     > with other HTTP implementations is a worthwhile goal.
>     >
>     > Unfortunately, re-defining url-http-parse-headers is the only
>     > work-around I found to make Emacs do the less bad thing when dealing
>     > with this malformed URIs.
> 
>     Bill, you seem to be the author of this code, although Stefan is the one
>     that introduced it to the Emacs accordingly to git blame. Do any of you
>     know why the redirect Location is handled like that?
> 
>     I would like to suggest the two attached patches. The first fixes actual
>     issues I encountered, the second simply adjusts a comment.
> 
>     Thank you.
> 
>     Cheers,
>     Dan
> 




^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-07-13 18:48 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-07-10 18:18 Bug in url-retrieve-synchronously from url.el on redirect Daniele Nicolodi
2020-07-10 18:49 ` Yuri Khan
2020-07-10 19:37   ` Yuri Khan
2020-07-10 19:46     ` Daniele Nicolodi
2020-07-10 19:43   ` Daniele Nicolodi
2020-07-10 20:25     ` Yuri Khan
2020-07-10 20:32       ` Daniele Nicolodi
2020-07-11  0:55         ` Daniele Nicolodi
2020-07-13 18:15           ` chad
2020-07-13 18:48             ` Daniele Nicolodi

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.