* Bug in url-retrieve-synchronously from url.el on redirect @ 2020-07-10 18:18 Daniele Nicolodi 2020-07-10 18:49 ` Yuri Khan 0 siblings, 1 reply; 10+ messages in thread From: Daniele Nicolodi @ 2020-07-10 18:18 UTC (permalink / raw) To: Emacs developers [-- Attachment #1: Type: text/plain, Size: 1422 bytes --] Hello, url-retrieve-synchronously fails to obey redirect responses if the returned "Location" header contains spaces: it redirects to the URL truncated to the first space. It seems that spaces in the Location header value are allowed (at least ngnix produces headers like that). If I understand the code correctly, the redirect response in interpreted in url-http-parse-headers where the header is explicitly truncated: (when redirect-uri ;; Clean off any whitespace and/or <...> cruft. (if (string-match "\\([^ \t]+\\)[ \t]" redirect-uri) (setq redirect-uri (match-string 1 redirect-uri))) (if (string-match "^<\\(.*\\)>$" redirect-uri) (setq redirect-uri (match-string 1 redirect-uri))) I think the first regular expression is wrong. I believe that its intent is to remove leading and trailing white space, but it actually truncates the value to the first white space character. Also, redirect-uri is obtained with (let ((redirect-uri (or (mail-fetch-field "Location") (mail-fetch-field "URI")))) and mail-fetch-field already removes leading and trailing whitespace. I think the attached patch should fix the problem. Finally, the removal of the < > delimiters seems unnecessary too as they are not allowed delimiters in HTTP headers (in my reading of https://tools.ietf.org/html/rfc7230) however there are no adverse consequences in leaving it there. Cheers, Dan [-- Attachment #2: 0001-url-http-Fix-handling-of-redirect-locations-containi.patch --] [-- Type: text/plain, Size: 939 bytes --] From f2408eaaa1e2afb6be15588e0f8f8a2c3cfe1ec6 Mon Sep 17 00:00:00 2001 From: Daniele Nicolodi <daniele@grinta.net> Date: Fri, 10 Jul 2020 12:16:01 -0600 Subject: [PATCH] url-http: Fix handling of redirect locations containing whitespace --- lisp/url/url-http.el | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/lisp/url/url-http.el b/lisp/url/url-http.el index 669c24571f..b7d6f42ed5 100644 --- a/lisp/url/url-http.el +++ b/lisp/url/url-http.el @@ -702,9 +702,7 @@ should be shown to the user." ;; Treat everything like '300' nil)) (when redirect-uri - ;; Clean off any whitespace and/or <...> cruft. - (if (string-match "\\([^ \t]+\\)[ \t]" redirect-uri) - (setq redirect-uri (match-string 1 redirect-uri))) + ;; Clean off any <...> cruft. (if (string-match "^<\\(.*\\)>$" redirect-uri) (setq redirect-uri (match-string 1 redirect-uri))) -- 2.24.3 (Apple Git-128) ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: Bug in url-retrieve-synchronously from url.el on redirect 2020-07-10 18:18 Bug in url-retrieve-synchronously from url.el on redirect Daniele Nicolodi @ 2020-07-10 18:49 ` Yuri Khan 2020-07-10 19:37 ` Yuri Khan 2020-07-10 19:43 ` Daniele Nicolodi 0 siblings, 2 replies; 10+ messages in thread From: Yuri Khan @ 2020-07-10 18:49 UTC (permalink / raw) To: Daniele Nicolodi; +Cc: Emacs developers On Sat, 11 Jul 2020 at 01:18, Daniele Nicolodi <daniele@grinta.net> wrote: > url-retrieve-synchronously fails to obey redirect responses if the > returned "Location" header contains spaces: it redirects to the URL > truncated to the first space. It seems that spaces in the Location > header value are allowed (at least ngnix produces headers like that). They are not, and you should report it as a bug against nginx. It should be percent-encoding the space. It should also be percent-encoding any non-ASCII characters. The Location header is defined in RFC 7231, section 7.1.2, with a value of URI-reference as defined in RFC 3986. The complete BNF grammar is listed in Appendix A, and none of the productions there contain the raw space character, nor raw non-ASCII characters. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Bug in url-retrieve-synchronously from url.el on redirect 2020-07-10 18:49 ` Yuri Khan @ 2020-07-10 19:37 ` Yuri Khan 2020-07-10 19:46 ` Daniele Nicolodi 2020-07-10 19:43 ` Daniele Nicolodi 1 sibling, 1 reply; 10+ messages in thread From: Yuri Khan @ 2020-07-10 19:37 UTC (permalink / raw) To: Daniele Nicolodi; +Cc: Emacs developers On Sat, 11 Jul 2020 at 01:49, Yuri Khan <yuri.v.khan@gmail.com> wrote: > > On Sat, 11 Jul 2020 at 01:18, Daniele Nicolodi <daniele@grinta.net> wrote: > > > url-retrieve-synchronously fails to obey redirect responses if the > > returned "Location" header contains spaces: it redirects to the URL > > truncated to the first space. It seems that spaces in the Location > > header value are allowed (at least ngnix produces headers like that). > > They are not, and you should report it as a bug against nginx. It > should be percent-encoding the space. It should also be > percent-encoding any non-ASCII characters. Actually, I went along and reported it myself, with an example that clearly demonstrates a bug in nginx: https://trac.nginx.org/nginx/ticket/2016 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Bug in url-retrieve-synchronously from url.el on redirect 2020-07-10 19:37 ` Yuri Khan @ 2020-07-10 19:46 ` Daniele Nicolodi 0 siblings, 0 replies; 10+ messages in thread From: Daniele Nicolodi @ 2020-07-10 19:46 UTC (permalink / raw) To: emacs-devel On 10/07/2020 13:37, Yuri Khan wrote: > On Sat, 11 Jul 2020 at 01:49, Yuri Khan <yuri.v.khan@gmail.com> wrote: >> >> On Sat, 11 Jul 2020 at 01:18, Daniele Nicolodi <daniele@grinta.net> wrote: >> >>> url-retrieve-synchronously fails to obey redirect responses if the >>> returned "Location" header contains spaces: it redirects to the URL >>> truncated to the first space. It seems that spaces in the Location >>> header value are allowed (at least ngnix produces headers like that). >> >> They are not, and you should report it as a bug against nginx. It >> should be percent-encoding the space. It should also be >> percent-encoding any non-ASCII characters. > > Actually, I went along and reported it myself, with an example that > clearly demonstrates a bug in nginx: > > https://trac.nginx.org/nginx/ticket/2016 Thanks Yury! Much better than what I could have done. Cheers, Dan ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Bug in url-retrieve-synchronously from url.el on redirect 2020-07-10 18:49 ` Yuri Khan 2020-07-10 19:37 ` Yuri Khan @ 2020-07-10 19:43 ` Daniele Nicolodi 2020-07-10 20:25 ` Yuri Khan 1 sibling, 1 reply; 10+ messages in thread From: Daniele Nicolodi @ 2020-07-10 19:43 UTC (permalink / raw) To: emacs-devel On 10/07/2020 12:49, Yuri Khan wrote: > On Sat, 11 Jul 2020 at 01:18, Daniele Nicolodi <daniele@grinta.net> wrote: > >> url-retrieve-synchronously fails to obey redirect responses if the >> returned "Location" header contains spaces: it redirects to the URL >> truncated to the first space. It seems that spaces in the Location >> header value are allowed (at least ngnix produces headers like that). > > They are not, and you should report it as a bug against nginx. It > should be percent-encoding the space. It should also be > percent-encoding any non-ASCII characters. > > The Location header is defined in RFC 7231, section 7.1.2, with a > value of URI-reference as defined in RFC 3986. The complete BNF > grammar is listed in Appendix A, and none of the productions there > contain the raw space character, nor raw non-ASCII characters. Thanks for the explanation. As far as I understand the RFCs (and being wrong before, I may be wrong again) do not allow for < > quoting either. Why does url-http.el strip them? Why does it break the URI at the first space if spaces are not allowed? I would apply the old "be liberal in what you accent and strict in what you emit" wisdom here. Indeed, all the HTTP implementations I tested, except Emacs, interpret the Location header with spaces as most likely intended. Cheers, Dan ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Bug in url-retrieve-synchronously from url.el on redirect 2020-07-10 19:43 ` Daniele Nicolodi @ 2020-07-10 20:25 ` Yuri Khan 2020-07-10 20:32 ` Daniele Nicolodi 0 siblings, 1 reply; 10+ messages in thread From: Yuri Khan @ 2020-07-10 20:25 UTC (permalink / raw) To: Daniele Nicolodi; +Cc: Emacs developers On Sat, 11 Jul 2020 at 02:43, Daniele Nicolodi <daniele@grinta.net> wrote: > As far as I understand the RFCs (and being wrong before, I may be wrong > again) do not allow for < > quoting either. Why does url-http.el strip > them? Why does it break the URI at the first space if spaces are not > allowed? I cannot answer that, maybe someone who is knowledgeable about uri-http.el chimes in. RFC 7231 allows clients to attempt to DTRT with invalid Location URIs in any way they deem appropriate; you could argue for a different recovery heuristic. Me, I’d rather have things break loudly on each violation, so that it does not go unnoticed for too long. Postel’s Razor is how we got HTML in its current shape. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Bug in url-retrieve-synchronously from url.el on redirect 2020-07-10 20:25 ` Yuri Khan @ 2020-07-10 20:32 ` Daniele Nicolodi 2020-07-11 0:55 ` Daniele Nicolodi 0 siblings, 1 reply; 10+ messages in thread From: Daniele Nicolodi @ 2020-07-10 20:32 UTC (permalink / raw) To: Yuri Khan; +Cc: Emacs developers On 10/07/2020 14:25, Yuri Khan wrote: > On Sat, 11 Jul 2020 at 02:43, Daniele Nicolodi <daniele@grinta.net> wrote: > >> As far as I understand the RFCs (and being wrong before, I may be wrong >> again) do not allow for < > quoting either. Why does url-http.el strip >> them? Why does it break the URI at the first space if spaces are not >> allowed? > > I cannot answer that, maybe someone who is knowledgeable about > uri-http.el chimes in. > > RFC 7231 allows clients to attempt to DTRT with invalid Location URIs > in any way they deem appropriate; you could argue for a different > recovery heuristic. Me, I’d rather have things break loudly on each > violation, so that it does not go unnoticed for too long. Postel’s > Razor is how we got HTML in its current shape. I tend to agree with you, but, in this specific case, being compatible with other HTTP implementations is a worthwhile goal. Unfortunately, re-defining url-http-parse-headers is the only work-around I found to make Emacs do the less bad thing when dealing with this malformed URIs. Cheers, Dan ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Bug in url-retrieve-synchronously from url.el on redirect 2020-07-10 20:32 ` Daniele Nicolodi @ 2020-07-11 0:55 ` Daniele Nicolodi 2020-07-13 18:15 ` chad 0 siblings, 1 reply; 10+ messages in thread From: Daniele Nicolodi @ 2020-07-11 0:55 UTC (permalink / raw) To: emacs-devel, wmperry, monnier [-- Attachment #1: Type: text/plain, Size: 1494 bytes --] On 10-07-2020 14:32, Daniele Nicolodi wrote: > On 10/07/2020 14:25, Yuri Khan wrote: >> On Sat, 11 Jul 2020 at 02:43, Daniele Nicolodi <daniele@grinta.net> wrote: >> >>> As far as I understand the RFCs (and being wrong before, I may be wrong >>> again) do not allow for < > quoting either. Why does url-http.el strip >>> them? Why does it break the URI at the first space if spaces are not >>> allowed? >> >> I cannot answer that, maybe someone who is knowledgeable about >> uri-http.el chimes in. >> >> RFC 7231 allows clients to attempt to DTRT with invalid Location URIs >> in any way they deem appropriate; you could argue for a different >> recovery heuristic. Me, I’d rather have things break loudly on each >> violation, so that it does not go unnoticed for too long. Postel’s >> Razor is how we got HTML in its current shape. > > I tend to agree with you, but, in this specific case, being compatible > with other HTTP implementations is a worthwhile goal. > > Unfortunately, re-defining url-http-parse-headers is the only > work-around I found to make Emacs do the less bad thing when dealing > with this malformed URIs. Bill, you seem to be the author of this code, although Stefan is the one that introduced it to the Emacs accordingly to git blame. Do any of you know why the redirect Location is handled like that? I would like to suggest the two attached patches. The first fixes actual issues I encountered, the second simply adjusts a comment. Thank you. Cheers, Dan [-- Attachment #2: 0001-url-http-Fix-handling-of-redirect-locations.patch --] [-- Type: text/plain, Size: 1493 bytes --] From 052a9934380fdd7517b5bf38aca37aaf951a644e Mon Sep 17 00:00:00 2001 From: Daniele Nicolodi <daniele@grinta.net> Date: Fri, 10 Jul 2020 12:16:01 -0600 Subject: [PATCH 1/2] url-http: Fix handling of redirect locations Do not break the redirect Location header when it contain spaces. In RFC 7231 the Location header is defined to carry a URI-reference. According to RFC 3986 it should be percent-encoded and thus should not contain spaces. However, there are HTTP server implementation (notably nginx) that do not do that. This makes Emacs url-http.el behave like most other HTTP client implementatios. Also remove the stripping of angle bracket quotes as they are not valid according to the RFCs. --- lisp/url/url-http.el | 6 ------ 1 file changed, 6 deletions(-) diff --git a/lisp/url/url-http.el b/lisp/url/url-http.el index 669c24571f..a746be8475 100644 --- a/lisp/url/url-http.el +++ b/lisp/url/url-http.el @@ -702,12 +702,6 @@ should be shown to the user." ;; Treat everything like '300' nil)) (when redirect-uri - ;; Clean off any whitespace and/or <...> cruft. - (if (string-match "\\([^ \t]+\\)[ \t]" redirect-uri) - (setq redirect-uri (match-string 1 redirect-uri))) - (if (string-match "^<\\(.*\\)>$" redirect-uri) - (setq redirect-uri (match-string 1 redirect-uri))) - ;; Some stupid sites (like sourceforge) send a ;; non-fully-qualified URL (ie: /), which royally confuses ;; the URL library. -- 2.24.3 (Apple Git-128) [-- Attachment #3: 0002-url-http-Do-not-suggest-a-broken-HTTP-server-impleme.patch --] [-- Type: text/plain, Size: 1083 bytes --] From 8da3326f64aaeb47f4306a4bacc3067d4bc38141 Mon Sep 17 00:00:00 2001 From: Daniele Nicolodi <daniele@grinta.net> Date: Fri, 10 Jul 2020 18:50:39 -0600 Subject: [PATCH 2/2] url-http: Do not suggest a broken HTTP server implementation Relative URIs are allowed in the Location header by the relvant RFCs. --- lisp/url/url-http.el | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/lisp/url/url-http.el b/lisp/url/url-http.el index a746be8475..e0c925e13f 100644 --- a/lisp/url/url-http.el +++ b/lisp/url/url-http.el @@ -702,9 +702,7 @@ should be shown to the user." ;; Treat everything like '300' nil)) (when redirect-uri - ;; Some stupid sites (like sourceforge) send a - ;; non-fully-qualified URL (ie: /), which royally confuses - ;; the URL library. + ;; Handle relative redirect URIs. (if (not (string-match url-nonrelative-link redirect-uri)) ;; Be careful to use the real target URL, otherwise we may ;; compute the redirection relative to the URL of the proxy. -- 2.24.3 (Apple Git-128) ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: Bug in url-retrieve-synchronously from url.el on redirect 2020-07-11 0:55 ` Daniele Nicolodi @ 2020-07-13 18:15 ` chad 2020-07-13 18:48 ` Daniele Nicolodi 0 siblings, 1 reply; 10+ messages in thread From: chad @ 2020-07-13 18:15 UTC (permalink / raw) To: Daniele Nicolodi; +Cc: wmperry, Stefan Monnier, EMACS development team [-- Attachment #1: Type: text/plain, Size: 2084 bytes --] Very likely the </> stripping code dates from a time period when code would recognize strings inside angle-brackets as potential URLs/URIs, and passed the entire string to the url library out of simplicity. If memory serves, Bill Perry's original url code dates from the wild and wooly early days of loose url encoding. I would expect that it can be changed safely. Hope that helps, ~Chad On Fri, Jul 10, 2020 at 5:55 PM Daniele Nicolodi <daniele@grinta.net> wrote: > On 10-07-2020 14:32, Daniele Nicolodi wrote: > > On 10/07/2020 14:25, Yuri Khan wrote: > >> On Sat, 11 Jul 2020 at 02:43, Daniele Nicolodi <daniele@grinta.net> > wrote: > >> > >>> As far as I understand the RFCs (and being wrong before, I may be wrong > >>> again) do not allow for < > quoting either. Why does url-http.el strip > >>> them? Why does it break the URI at the first space if spaces are not > >>> allowed? > >> > >> I cannot answer that, maybe someone who is knowledgeable about > >> uri-http.el chimes in. > >> > >> RFC 7231 allows clients to attempt to DTRT with invalid Location URIs > >> in any way they deem appropriate; you could argue for a different > >> recovery heuristic. Me, I’d rather have things break loudly on each > >> violation, so that it does not go unnoticed for too long. Postel’s > >> Razor is how we got HTML in its current shape. > > > > I tend to agree with you, but, in this specific case, being compatible > > with other HTTP implementations is a worthwhile goal. > > > > Unfortunately, re-defining url-http-parse-headers is the only > > work-around I found to make Emacs do the less bad thing when dealing > > with this malformed URIs. > > Bill, you seem to be the author of this code, although Stefan is the one > that introduced it to the Emacs accordingly to git blame. Do any of you > know why the redirect Location is handled like that? > > I would like to suggest the two attached patches. The first fixes actual > issues I encountered, the second simply adjusts a comment. > > Thank you. > > Cheers, > Dan > > [-- Attachment #2: Type: text/html, Size: 2665 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Bug in url-retrieve-synchronously from url.el on redirect 2020-07-13 18:15 ` chad @ 2020-07-13 18:48 ` Daniele Nicolodi 0 siblings, 0 replies; 10+ messages in thread From: Daniele Nicolodi @ 2020-07-13 18:48 UTC (permalink / raw) To: chad; +Cc: wmperry, Stefan Monnier, EMACS development team On 13/07/2020 12:15, chad wrote: > Very likely the </> stripping code dates from a time period when code > would recognize strings inside angle-brackets as potential URLs/URIs, > and passed the entire string to the url library out of simplicity. If > memory serves, Bill Perry's original url code dates from the wild and > wooly early days of loose url encoding. I would expect that it can be > changed safely. > > Hope that helps, > ~Chad Thanks Chad. This does not quite explain the presence of the </> stripping in handling HTTP protocol headers. But it may be that with subsequent refactoring, this coded ended where it is now. It would be nice if someone with commit rights could find a couple of spare cycles to comment on these patches and hopefully apply them. PS: Emails to Bill Perry address are bouncing for me. Cheers, Dan > > On Fri, Jul 10, 2020 at 5:55 PM Daniele Nicolodi <daniele@grinta.net > <mailto:daniele@grinta.net>> wrote: > > On 10-07-2020 14:32, Daniele Nicolodi wrote: > > On 10/07/2020 14:25, Yuri Khan wrote: > >> On Sat, 11 Jul 2020 at 02:43, Daniele Nicolodi > <daniele@grinta.net <mailto:daniele@grinta.net>> wrote: > >> > >>> As far as I understand the RFCs (and being wrong before, I may > be wrong > >>> again) do not allow for < > quoting either. Why does url-http.el > strip > >>> them? Why does it break the URI at the first space if spaces are not > >>> allowed? > >> > >> I cannot answer that, maybe someone who is knowledgeable about > >> uri-http.el chimes in. > >> > >> RFC 7231 allows clients to attempt to DTRT with invalid Location URIs > >> in any way they deem appropriate; you could argue for a different > >> recovery heuristic. Me, I’d rather have things break loudly on each > >> violation, so that it does not go unnoticed for too long. Postel’s > >> Razor is how we got HTML in its current shape. > > > > I tend to agree with you, but, in this specific case, being compatible > > with other HTTP implementations is a worthwhile goal. > > > > Unfortunately, re-defining url-http-parse-headers is the only > > work-around I found to make Emacs do the less bad thing when dealing > > with this malformed URIs. > > Bill, you seem to be the author of this code, although Stefan is the one > that introduced it to the Emacs accordingly to git blame. Do any of you > know why the redirect Location is handled like that? > > I would like to suggest the two attached patches. The first fixes actual > issues I encountered, the second simply adjusts a comment. > > Thank you. > > Cheers, > Dan > ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2020-07-13 18:48 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-07-10 18:18 Bug in url-retrieve-synchronously from url.el on redirect Daniele Nicolodi 2020-07-10 18:49 ` Yuri Khan 2020-07-10 19:37 ` Yuri Khan 2020-07-10 19:46 ` Daniele Nicolodi 2020-07-10 19:43 ` Daniele Nicolodi 2020-07-10 20:25 ` Yuri Khan 2020-07-10 20:32 ` Daniele Nicolodi 2020-07-11 0:55 ` Daniele Nicolodi 2020-07-13 18:15 ` chad 2020-07-13 18:48 ` Daniele Nicolodi
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).