* bug#13598: 24.3.50; url-http.el doesn't correctly parse headers when they are sent line-by-line
@ 2013-01-31 17:26 Jonas Hoersch
2013-02-07 18:13 ` Jonas Hörsch
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Jonas Hoersch @ 2013-01-31 17:26 UTC (permalink / raw)
To: 13598
hej, everyone,
i just finished hunting down an improbable bug in url-http.el, which
appears when a url is retrieved from a server which sends the headers
line-by-line instead of in one junk, like it is the case for the
BaseHTTPServer classes coming along in python 2.
A simple test-case looks like the following (sorry for the long
non-emacs setup stuff, but it's the minimalist example i could come up
with)
cd into a directory containing only a single minimal text file and
start python's SimpleHTTPServer so it serves it.
$ cd $(mktemp -d)
$ echo "hello world" > textfile
$ python -m SimpleHTTPServer 8000 # works only for python 2.x
(switch-to-buffer (url-retrieve-synchronously
"http://127.0.0.1:8000/textfile"))
now correctly will retrieve the "hello world" but
the buffer-local-variables url-http-content-type and
url-http-content-length are nil in the returned buffer, although one
sees that they have been transmitted by python.
adding an extra debug line to url-http's
url-http-wait-for-headers-change-function around line 1043,
------
(when (re-search-forward "^\r*$" nil t)
;; Saw the end of the headers
(url-http-debug "Saw end of headers... (%s)" (buffer-name))
+ (url-http-debug "when the buffer contained...\n%s" (buffer-substring (point-min) (point-max)))
(setq url-http-end-of-headers (set-marker (make-marker)
(point))
end-of-headers t)
-------
will show you in *URL-DEBUG* (url-debug being t)
-------
http -> Saw end of headers... ( *http 127.0.0.1:8000*-273882)
http -> when the buffer contained...
HTTP/1.0 200 OK
Server: SimpleHTTP/0.6 Python/2.7.3
http -> url-http-parse-response called in ( *http 127.0.0.1:8000*-273882)
http -> No content-length, being dumb.
-------
that the headers haven't completely arrived yet, when url-http decides
it has seen the end of them.
changing the regex in (re-search-forward "^\r*$" nil t) to "^\r*\n"
solves the problem for me, but i'm unsure about what i might possibly be
breaking that way.
thanks for looking into it,
jonas hörsch
In GNU Emacs 24.3.50.1 (x86_64-unknown-linux-gnu, X toolkit, Xaw3d scroll bars)
of 2013-01-29 on kafka
Bzr revision: michael.albinus@gmx.de-20130129081211-mmthn9p4bh75h5pr
Windowing system distributor `The X.Org Foundation', version 11.0.11302000
Configured using:
`configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var
--libexecdir=/usr/lib --mandir=/usr/share/man --without-sound
--with-xft --with-x-toolkit=lucid'
Important settings:
value of $LANG: en_GB.UTF-8
locale-coding-system: utf-8-unix
default enable-multibyte-characters: t
^ permalink raw reply [flat|nested] 12+ messages in thread
* bug#13598: 24.3.50; url-http.el doesn't correctly parse headers when they are sent line-by-line
2013-01-31 17:26 bug#13598: 24.3.50; url-http.el doesn't correctly parse headers when they are sent line-by-line Jonas Hoersch
@ 2013-02-07 18:13 ` Jonas Hörsch
2013-02-13 17:19 ` Bastien
2014-02-26 9:32 ` bug#13598: 24.3.50; Blazej Adamczyk
2014-02-26 16:54 ` bug#13598: 24.3.50 Blazej Adamczyk
2 siblings, 1 reply; 12+ messages in thread
From: Jonas Hörsch @ 2013-02-07 18:13 UTC (permalink / raw)
To: 13598
On Thu, Jan 31 2013, Jonas Hoersch wrote:
> changing the regex in (re-search-forward "^\r*$" nil t) to "^\r*\n"
> solves the problem for me, but i'm unsure about what i might possibly be
> breaking that way.
i'm positive now, that changing the regex to "^\r+$" is the way to go.
i would be happy to supply a patch, but i understand it is probably to
trivial a matter to justify going through the legal requirements first.
the following advice can serve as a hotfix:
(defadvice url-http-wait-for-headers-change-function (around
url-http-properly-wait-for-headers-advice
activate)
(save-excursion
(goto-char (point-min))
(if (re-search-forward "^\r+$" nil t)
ad-do-it
(url-http-debug "Incomplete headers...: %d" (point-max)))))
^ permalink raw reply [flat|nested] 12+ messages in thread
* bug#13598: 24.3.50; url-http.el doesn't correctly parse headers when they are sent line-by-line
2013-02-07 18:13 ` Jonas Hörsch
@ 2013-02-13 17:19 ` Bastien
2013-02-13 19:30 ` Stefan Monnier
2013-02-13 19:42 ` Glenn Morris
0 siblings, 2 replies; 12+ messages in thread
From: Bastien @ 2013-02-13 17:19 UTC (permalink / raw)
To: Jonas Hörsch; +Cc: 13598
Hi Jonas,
coroa@online.de (Jonas Hörsch) writes:
> On Thu, Jan 31 2013, Jonas Hoersch wrote:
>
>> changing the regex in (re-search-forward "^\r*$" nil t) to "^\r*\n"
>> solves the problem for me, but i'm unsure about what i might possibly be
>> breaking that way.
>
> i'm positive now, that changing the regex to "^\r+$" is the way to go.
>
> i would be happy to supply a patch, but i understand it is probably to
> trivial a matter to justify going through the legal requirements first.
>
> the following advice can serve as a hotfix:
>
> (defadvice url-http-wait-for-headers-change-function (around
> url-http-properly-wait-for-headers-advice
> activate)
> (save-excursion
> (goto-char (point-min))
> (if (re-search-forward "^\r+$" nil t)
> ad-do-it
> (url-http-debug "Incomplete headers...: %d" (point-max)))))
I confirm both the problem and the fix.
It does not look critical though. Stefan, Glenn, should I
commit the patch into trunk (or emacs-24)?
--
Bastien
^ permalink raw reply [flat|nested] 12+ messages in thread
* bug#13598: 24.3.50; url-http.el doesn't correctly parse headers when they are sent line-by-line
2013-02-13 17:19 ` Bastien
@ 2013-02-13 19:30 ` Stefan Monnier
2013-02-13 19:42 ` Glenn Morris
1 sibling, 0 replies; 12+ messages in thread
From: Stefan Monnier @ 2013-02-13 19:30 UTC (permalink / raw)
To: Bastien; +Cc: 13598, Jonas Hörsch
> It does not look critical though. Stefan, Glenn, should I
> commit the patch into trunk (or emacs-24)?
AFAIK this is not a regression, so => trunk
And thanks for taking care of it,
Stefan
^ permalink raw reply [flat|nested] 12+ messages in thread
* bug#13598: 24.3.50; url-http.el doesn't correctly parse headers when they are sent line-by-line
2013-02-13 17:19 ` Bastien
2013-02-13 19:30 ` Stefan Monnier
@ 2013-02-13 19:42 ` Glenn Morris
2013-02-13 21:38 ` Glenn Morris
1 sibling, 1 reply; 12+ messages in thread
From: Glenn Morris @ 2013-02-13 19:42 UTC (permalink / raw)
To: Bastien; +Cc: 13598, Jonas Hörsch
Bastien wrote:
>> i'm positive now, that changing the regex to "^\r+$" is the way to go.
I don't understand how this can be correct. What is this supposed to be
matching?
http://www.w3.org/Protocols/rfc2616/rfc2616-sec19.html#sec19.3
The line terminator for message-header fields is the sequence CRLF.
However, we recommend that applications, when parsing such headers,
recognize a single LF as a line terminator and ignore the leading CR.
^ permalink raw reply [flat|nested] 12+ messages in thread
* bug#13598: 24.3.50; url-http.el doesn't correctly parse headers when they are sent line-by-line
2013-02-13 19:42 ` Glenn Morris
@ 2013-02-13 21:38 ` Glenn Morris
2013-02-14 6:08 ` Bastien
2013-02-16 2:06 ` Glenn Morris
0 siblings, 2 replies; 12+ messages in thread
From: Glenn Morris @ 2013-02-13 21:38 UTC (permalink / raw)
To: Bastien; +Cc: 13598, Jonas Hörsch
>>> i'm positive now, that changing the regex to "^\r+$" is the way to go.
Would "^\r?\n" be better?
^ permalink raw reply [flat|nested] 12+ messages in thread
* bug#13598: 24.3.50; url-http.el doesn't correctly parse headers when they are sent line-by-line
2013-02-13 21:38 ` Glenn Morris
@ 2013-02-14 6:08 ` Bastien
2013-02-16 2:06 ` Glenn Morris
1 sibling, 0 replies; 12+ messages in thread
From: Bastien @ 2013-02-14 6:08 UTC (permalink / raw)
To: Glenn Morris; +Cc: 13598, Jonas Hörsch
Hi Glenn,
Glenn Morris <rgm@gnu.org> writes:
>>>> i'm positive now, that changing the regex to "^\r+$" is the way to go.
>
> Would "^\r?\n" be better?
The quote of the OP is misleading -- he proposed to change the regexp
in (re-search-forward "^\r*$" nil t) to "^\r*\n", which is the fix I'm
talking about.
But yes, "^\r?\n" is slightly better than "^\r*\n" because AFAIK there
can be only one CR in the line separating the headers from the body.
Let me know if you want to fix this yourself or if I should do it.
Thanks,
--
Bastien
^ permalink raw reply [flat|nested] 12+ messages in thread
* bug#13598: 24.3.50; url-http.el doesn't correctly parse headers when they are sent line-by-line
2013-02-13 21:38 ` Glenn Morris
2013-02-14 6:08 ` Bastien
@ 2013-02-16 2:06 ` Glenn Morris
1 sibling, 0 replies; 12+ messages in thread
From: Glenn Morris @ 2013-02-16 2:06 UTC (permalink / raw)
To: 13598-done
Version: 24.4
Glenn Morris wrote:
> Would "^\r?\n" be better?
Applied.
^ permalink raw reply [flat|nested] 12+ messages in thread
* bug#13598: 24.3.50;
2013-01-31 17:26 bug#13598: 24.3.50; url-http.el doesn't correctly parse headers when they are sent line-by-line Jonas Hoersch
2013-02-07 18:13 ` Jonas Hörsch
@ 2014-02-26 9:32 ` Blazej Adamczyk
2014-02-26 16:54 ` bug#13598: 24.3.50 Blazej Adamczyk
2 siblings, 0 replies; 12+ messages in thread
From: Blazej Adamczyk @ 2014-02-26 9:32 UTC (permalink / raw)
To: 13598
Hello,
I had to reopen the bug because I faced the same problem as OP. His didn't make himself clear:
By example:
When parsing response we may get in state when we will receive only the following:
"HTTP/1.0 200 OK^M
"
without double quotes (I added them to show the newline character).
In case of current implementation the regexp "^\r?$" and the previous regexp "^\r*$" both are matching the end of string. That is wrong because there will be something in the new line after a while.
RFC 2616 states clear:
generic-message = start-line
*(message-header CRLF)
CRLF
[ message-body ]
start-line = Request-Line | Status-Line
there has to be one (exactly one) CR in a single line between headers and body. Thus I propose a simple regexp "^\r$".
--
Blazej
^ permalink raw reply [flat|nested] 12+ messages in thread
* bug#13598: 24.3.50
2013-01-31 17:26 bug#13598: 24.3.50; url-http.el doesn't correctly parse headers when they are sent line-by-line Jonas Hoersch
2013-02-07 18:13 ` Jonas Hörsch
2014-02-26 9:32 ` bug#13598: 24.3.50; Blazej Adamczyk
@ 2014-02-26 16:54 ` Blazej Adamczyk
2014-02-27 22:43 ` Glenn Morris
2 siblings, 1 reply; 12+ messages in thread
From: Blazej Adamczyk @ 2014-02-26 16:54 UTC (permalink / raw)
To: 13598
[-- Attachment #1: Type: text/plain, Size: 878 bytes --]
Hello,
I had to reopen the bug because I faced the same problem as OP. His didn't make himself clear:
By example:
When parsing response we may get in state when we will receive only the following:
"HTTP/1.0 200 OK^M
"
without double quotes (I added them to show the newline character).
In case of current implementation the regexp "^\r?$" and the previous regexp "^\r*$" both are matching the end of string. That is wrong because there will be something in the new line after a while.
RFC 2616 states clear:
generic-message = start-line
*(message-header CRLF)
CRLF
[ message-body ]
start-line = Request-Line | Status-Line
there has to be one (exactly one) CR in a single line between headers and body. Thus I propose a simple regexp "^\r$".
--
Blazej
[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 495 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* bug#13598: 24.3.50
2014-02-26 16:54 ` bug#13598: 24.3.50 Blazej Adamczyk
@ 2014-02-27 22:43 ` Glenn Morris
2014-03-03 6:10 ` Blazej Adamczyk
0 siblings, 1 reply; 12+ messages in thread
From: Glenn Morris @ 2014-02-27 22:43 UTC (permalink / raw)
To: Blazej Adamczyk; +Cc: 13598
Blazej Adamczyk wrote:
> By example:
> When parsing response we may get in state when we will receive only
> the following:
>
> "HTTP/1.0 200 OK^M
> "
>
> without double quotes (I added them to show the newline character).
>
> In case of current implementation the regexp "^\r?$" and the previous
> regexp "^\r*$" both are matching the end of string. That is wrong
> because there will be something in the new line after a while.
The current implementation uses "^\r?\n", not "^\r?$".
Where did you get "^\r?$" from?
As such I do not see that it will match your example.
> RFC 2616 states clear:
> generic-message = start-line
> *(message-header CRLF)
> CRLF
> [ message-body ]
> start-line = Request-Line | Status-Line
>
> there has to be one (exactly one) CR in a single line between headers
> and body. Thus I propose a simple regexp "^\r$".
Yes, but as I already quoted in
http://debbugs.gnu.org/13598#17
it also recommends tolerance:
The line terminator for message-header fields is the sequence CRLF.
However, we recommend that applications, when parsing such headers,
recognize a single LF as a line terminator and ignore the leading CR.
^ permalink raw reply [flat|nested] 12+ messages in thread
* bug#13598: 24.3.50
2014-02-27 22:43 ` Glenn Morris
@ 2014-03-03 6:10 ` Blazej Adamczyk
0 siblings, 0 replies; 12+ messages in thread
From: Blazej Adamczyk @ 2014-03-03 6:10 UTC (permalink / raw)
To: Glenn Morris; +Cc: 13598
[-- Attachment #1: Type: text/plain, Size: 1454 bytes --]
Ahh yes my mistake! I was looking at some wrong sources. Obviously the current "^\r?\n" is correct.
Sorry and thanks!
Blazej
From Glenn Morris <rgm@gnu.org> w dniu 27 lut 2014, o godz. 23:43:
Blazej Adamczyk wrote:
By example:
When parsing response we may get in state when we will receive only
the following:
"HTTP/1.0 200 OK^M
"
without double quotes (I added them to show the newline character).
In case of current implementation the regexp "^\r?$" and the previous
regexp "^\r*$" both are matching the end of string. That is wrong
because there will be something in the new line after a while.
The current implementation uses "^\r?\n", not "^\r?$".
Where did you get "^\r?$" from?
As such I do not see that it will match your example.
RFC 2616 states clear:
generic-message = start-line
*(message-header CRLF)
CRLF
[ message-body ]
start-line = Request-Line | Status-Line
there has to be one (exactly one) CR in a single line between headers
and body. Thus I propose a simple regexp "^\r$".
Yes, but as I already quoted in
http://debbugs.gnu.org/13598#17
it also recommends tolerance:
The line terminator for message-header fields is the sequence CRLF.
However, we recommend that applications, when parsing such headers,
recognize a single LF as a line terminator and ignore the leading CR.
[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 495 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2014-03-03 6:10 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-31 17:26 bug#13598: 24.3.50; url-http.el doesn't correctly parse headers when they are sent line-by-line Jonas Hoersch
2013-02-07 18:13 ` Jonas Hörsch
2013-02-13 17:19 ` Bastien
2013-02-13 19:30 ` Stefan Monnier
2013-02-13 19:42 ` Glenn Morris
2013-02-13 21:38 ` Glenn Morris
2013-02-14 6:08 ` Bastien
2013-02-16 2:06 ` Glenn Morris
2014-02-26 9:32 ` bug#13598: 24.3.50; Blazej Adamczyk
2014-02-26 16:54 ` bug#13598: 24.3.50 Blazej Adamczyk
2014-02-27 22:43 ` Glenn Morris
2014-03-03 6:10 ` Blazej Adamczyk
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.