unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#13598: 24.3.50; url-http.el doesn't correctly parse headers when they are sent line-by-line
@ 2013-01-31 17:26 Jonas Hoersch
  2013-02-07 18:13 ` Jonas Hörsch
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Jonas Hoersch @ 2013-01-31 17:26 UTC (permalink / raw)
  To: 13598

hej, everyone,

i just finished hunting down an improbable bug in url-http.el, which
appears when a url is retrieved from a server which sends the headers
line-by-line instead of in one junk, like it is the case for the
BaseHTTPServer classes coming along in python 2.

A simple test-case looks like the following (sorry for the long
non-emacs setup stuff, but it's the minimalist example i could come up
with)

cd into a directory containing only a single minimal text file and
start python's SimpleHTTPServer so it serves it.

$ cd $(mktemp -d)
$ echo "hello world" > textfile
$ python -m SimpleHTTPServer 8000  # works only for python 2.x

(switch-to-buffer (url-retrieve-synchronously
"http://127.0.0.1:8000/textfile"))

now correctly will retrieve the "hello world" but

the buffer-local-variables url-http-content-type and
url-http-content-length are nil in the returned buffer, although one
sees that they have been transmitted by python.

adding an extra debug line to url-http's
url-http-wait-for-headers-change-function around line 1043,

------
	(when (re-search-forward "^\r*$" nil t)
	  ;; Saw the end of the headers
	  (url-http-debug "Saw end of headers... (%s)" (buffer-name))          
+         (url-http-debug "when the buffer contained...\n%s" (buffer-substring (point-min) (point-max)))

	  (setq url-http-end-of-headers (set-marker (make-marker)
						    (point))
		end-of-headers t)
-------

will show you in *URL-DEBUG* (url-debug being t)

-------
http -> Saw end of headers... ( *http 127.0.0.1:8000*-273882)
http -> when the buffer contained...
HTTP/1.0 200 OK
Server: SimpleHTTP/0.6 Python/2.7.3

http -> url-http-parse-response called in ( *http 127.0.0.1:8000*-273882)
http -> No content-length, being dumb.
-------

that the headers haven't completely arrived yet, when url-http decides
it has seen the end of them.

changing the regex in (re-search-forward "^\r*$" nil t) to "^\r*\n"
solves the problem for me, but i'm unsure about what i might possibly be
breaking that way.

thanks for looking into it,

jonas hörsch


In GNU Emacs 24.3.50.1 (x86_64-unknown-linux-gnu, X toolkit, Xaw3d scroll bars)
 of 2013-01-29 on kafka
Bzr revision: michael.albinus@gmx.de-20130129081211-mmthn9p4bh75h5pr
Windowing system distributor `The X.Org Foundation', version 11.0.11302000
Configured using:
 `configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var
 --libexecdir=/usr/lib --mandir=/usr/share/man --without-sound
 --with-xft --with-x-toolkit=lucid'

Important settings:
  value of $LANG: en_GB.UTF-8
  locale-coding-system: utf-8-unix
  default enable-multibyte-characters: t





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#13598: 24.3.50; url-http.el doesn't correctly parse headers when they are sent line-by-line
  2013-01-31 17:26 bug#13598: 24.3.50; url-http.el doesn't correctly parse headers when they are sent line-by-line Jonas Hoersch
@ 2013-02-07 18:13 ` Jonas Hörsch
  2013-02-13 17:19   ` Bastien
  2014-02-26  9:32 ` bug#13598: 24.3.50; Blazej Adamczyk
  2014-02-26 16:54 ` bug#13598: 24.3.50 Blazej Adamczyk
  2 siblings, 1 reply; 12+ messages in thread
From: Jonas Hörsch @ 2013-02-07 18:13 UTC (permalink / raw)
  To: 13598

On Thu, Jan 31 2013, Jonas Hoersch wrote:

> changing the regex in (re-search-forward "^\r*$" nil t) to "^\r*\n"
> solves the problem for me, but i'm unsure about what i might possibly be
> breaking that way.

i'm positive now, that changing the regex to "^\r+$" is the way to go.

i would be happy to supply a patch, but i understand it is probably to
trivial a matter to justify going through the legal requirements first.

the following advice can serve as a hotfix:

(defadvice url-http-wait-for-headers-change-function (around
                                                      url-http-properly-wait-for-headers-advice
                                                      activate)
  (save-excursion
    (goto-char (point-min))
    (if (re-search-forward "^\r+$" nil t)
        ad-do-it
      (url-http-debug "Incomplete headers...: %d" (point-max)))))






^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#13598: 24.3.50; url-http.el doesn't correctly parse headers when they are sent line-by-line
  2013-02-07 18:13 ` Jonas Hörsch
@ 2013-02-13 17:19   ` Bastien
  2013-02-13 19:30     ` Stefan Monnier
  2013-02-13 19:42     ` Glenn Morris
  0 siblings, 2 replies; 12+ messages in thread
From: Bastien @ 2013-02-13 17:19 UTC (permalink / raw)
  To: Jonas Hörsch; +Cc: 13598

Hi Jonas,

coroa@online.de (Jonas Hörsch) writes:

> On Thu, Jan 31 2013, Jonas Hoersch wrote:
>
>> changing the regex in (re-search-forward "^\r*$" nil t) to "^\r*\n"
>> solves the problem for me, but i'm unsure about what i might possibly be
>> breaking that way.
>
> i'm positive now, that changing the regex to "^\r+$" is the way to go.
>
> i would be happy to supply a patch, but i understand it is probably to
> trivial a matter to justify going through the legal requirements first.
>
> the following advice can serve as a hotfix:
>
> (defadvice url-http-wait-for-headers-change-function (around
>                                                       url-http-properly-wait-for-headers-advice
>                                                       activate)
>   (save-excursion
>     (goto-char (point-min))
>     (if (re-search-forward "^\r+$" nil t)
>         ad-do-it
>       (url-http-debug "Incomplete headers...: %d" (point-max)))))

I confirm both the problem and the fix.

It does not look critical though.  Stefan, Glenn, should I
commit the patch into trunk (or emacs-24)?

-- 
 Bastien





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#13598: 24.3.50; url-http.el doesn't correctly parse headers when they are sent line-by-line
  2013-02-13 17:19   ` Bastien
@ 2013-02-13 19:30     ` Stefan Monnier
  2013-02-13 19:42     ` Glenn Morris
  1 sibling, 0 replies; 12+ messages in thread
From: Stefan Monnier @ 2013-02-13 19:30 UTC (permalink / raw)
  To: Bastien; +Cc: 13598, Jonas Hörsch

> It does not look critical though.  Stefan, Glenn, should I
> commit the patch into trunk (or emacs-24)?

AFAIK this is not a regression, so => trunk
And thanks for taking care of it,


        Stefan





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#13598: 24.3.50; url-http.el doesn't correctly parse headers when they are sent line-by-line
  2013-02-13 17:19   ` Bastien
  2013-02-13 19:30     ` Stefan Monnier
@ 2013-02-13 19:42     ` Glenn Morris
  2013-02-13 21:38       ` Glenn Morris
  1 sibling, 1 reply; 12+ messages in thread
From: Glenn Morris @ 2013-02-13 19:42 UTC (permalink / raw)
  To: Bastien; +Cc: 13598, Jonas Hörsch

Bastien wrote:

>> i'm positive now, that changing the regex to "^\r+$" is the way to go.

I don't understand how this can be correct. What is this supposed to be
matching?

http://www.w3.org/Protocols/rfc2616/rfc2616-sec19.html#sec19.3

   The line terminator for message-header fields is the sequence CRLF.
   However, we recommend that applications, when parsing such headers,
   recognize a single LF as a line terminator and ignore the leading CR.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#13598: 24.3.50; url-http.el doesn't correctly parse headers when they are sent line-by-line
  2013-02-13 19:42     ` Glenn Morris
@ 2013-02-13 21:38       ` Glenn Morris
  2013-02-14  6:08         ` Bastien
  2013-02-16  2:06         ` Glenn Morris
  0 siblings, 2 replies; 12+ messages in thread
From: Glenn Morris @ 2013-02-13 21:38 UTC (permalink / raw)
  To: Bastien; +Cc: 13598, Jonas Hörsch


>>> i'm positive now, that changing the regex to "^\r+$" is the way to go.

Would "^\r?\n" be better?





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#13598: 24.3.50; url-http.el doesn't correctly parse headers when they are sent line-by-line
  2013-02-13 21:38       ` Glenn Morris
@ 2013-02-14  6:08         ` Bastien
  2013-02-16  2:06         ` Glenn Morris
  1 sibling, 0 replies; 12+ messages in thread
From: Bastien @ 2013-02-14  6:08 UTC (permalink / raw)
  To: Glenn Morris; +Cc: 13598, Jonas Hörsch

Hi Glenn,

Glenn Morris <rgm@gnu.org> writes:

>>>> i'm positive now, that changing the regex to "^\r+$" is the way to go.
>
> Would "^\r?\n" be better?

The quote of the OP is misleading -- he proposed to change the regexp
in (re-search-forward "^\r*$" nil t) to "^\r*\n", which is the fix I'm
talking about.

But yes, "^\r?\n" is slightly better than "^\r*\n" because AFAIK there
can be only one CR in the line separating the headers from the body.

Let me know if you want to fix this yourself or if I should do it.

Thanks,

-- 
 Bastien





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#13598: 24.3.50; url-http.el doesn't correctly parse headers when they are sent line-by-line
  2013-02-13 21:38       ` Glenn Morris
  2013-02-14  6:08         ` Bastien
@ 2013-02-16  2:06         ` Glenn Morris
  1 sibling, 0 replies; 12+ messages in thread
From: Glenn Morris @ 2013-02-16  2:06 UTC (permalink / raw)
  To: 13598-done

Version: 24.4

Glenn Morris wrote:

> Would "^\r?\n" be better?

Applied.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#13598: 24.3.50;
  2013-01-31 17:26 bug#13598: 24.3.50; url-http.el doesn't correctly parse headers when they are sent line-by-line Jonas Hoersch
  2013-02-07 18:13 ` Jonas Hörsch
@ 2014-02-26  9:32 ` Blazej Adamczyk
  2014-02-26 16:54 ` bug#13598: 24.3.50 Blazej Adamczyk
  2 siblings, 0 replies; 12+ messages in thread
From: Blazej Adamczyk @ 2014-02-26  9:32 UTC (permalink / raw)
  To: 13598

Hello, 
I had to reopen the bug because I faced the same problem as OP. His didn't make himself clear:

By example:
When parsing response we may get in state when we will receive only the following:

"HTTP/1.0 200 OK^M
"

without double quotes (I added them to show the newline character).

In case of current implementation the regexp "^\r?$" and the previous regexp "^\r*$" both are matching the end of string. That is wrong because there will be something in the new line after a while. 

RFC 2616 states clear:
        generic-message = start-line
                          *(message-header CRLF)
                          CRLF
                          [ message-body ]
        start-line      = Request-Line | Status-Line

there has to be one (exactly one) CR in a single line between headers and body. Thus I propose a simple regexp "^\r$".

--
Blazej




^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#13598: 24.3.50
  2013-01-31 17:26 bug#13598: 24.3.50; url-http.el doesn't correctly parse headers when they are sent line-by-line Jonas Hoersch
  2013-02-07 18:13 ` Jonas Hörsch
  2014-02-26  9:32 ` bug#13598: 24.3.50; Blazej Adamczyk
@ 2014-02-26 16:54 ` Blazej Adamczyk
  2014-02-27 22:43   ` Glenn Morris
  2 siblings, 1 reply; 12+ messages in thread
From: Blazej Adamczyk @ 2014-02-26 16:54 UTC (permalink / raw)
  To: 13598

[-- Attachment #1: Type: text/plain, Size: 878 bytes --]

Hello, 
I had to reopen the bug because I faced the same problem as OP. His didn't make himself clear:

By example:
When parsing response we may get in state when we will receive only the following:

"HTTP/1.0 200 OK^M
"

without double quotes (I added them to show the newline character).

In case of current implementation the regexp "^\r?$" and the previous regexp "^\r*$" both are matching the end of string. That is wrong because there will be something in the new line after a while. 

RFC 2616 states clear:
       generic-message = start-line
                         *(message-header CRLF)
                         CRLF
                         [ message-body ]
       start-line      = Request-Line | Status-Line

there has to be one (exactly one) CR in a single line between headers and body. Thus I propose a simple regexp "^\r$".

--
Blazej

[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#13598: 24.3.50
  2014-02-26 16:54 ` bug#13598: 24.3.50 Blazej Adamczyk
@ 2014-02-27 22:43   ` Glenn Morris
  2014-03-03  6:10     ` Blazej Adamczyk
  0 siblings, 1 reply; 12+ messages in thread
From: Glenn Morris @ 2014-02-27 22:43 UTC (permalink / raw)
  To: Blazej Adamczyk; +Cc: 13598

Blazej Adamczyk wrote:

> By example:
> When parsing response we may get in state when we will receive only
> the following:
>
> "HTTP/1.0 200 OK^M
> "
>
> without double quotes (I added them to show the newline character).
>
> In case of current implementation the regexp "^\r?$" and the previous
> regexp "^\r*$" both are matching the end of string. That is wrong
> because there will be something in the new line after a while.

The current implementation uses "^\r?\n", not "^\r?$".
Where did you get "^\r?$" from?

As such I do not see that it will match your example.

> RFC 2616 states clear:
>        generic-message = start-line
>                          *(message-header CRLF)
>                          CRLF
>                          [ message-body ]
>        start-line      = Request-Line | Status-Line
>
> there has to be one (exactly one) CR in a single line between headers
> and body. Thus I propose a simple regexp "^\r$".

Yes, but as I already quoted in

http://debbugs.gnu.org/13598#17

it also recommends tolerance:

   The line terminator for message-header fields is the sequence CRLF.
   However, we recommend that applications, when parsing such headers,
   recognize a single LF as a line terminator and ignore the leading CR.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#13598: 24.3.50
  2014-02-27 22:43   ` Glenn Morris
@ 2014-03-03  6:10     ` Blazej Adamczyk
  0 siblings, 0 replies; 12+ messages in thread
From: Blazej Adamczyk @ 2014-03-03  6:10 UTC (permalink / raw)
  To: Glenn Morris; +Cc: 13598

[-- Attachment #1: Type: text/plain, Size: 1454 bytes --]

Ahh yes my mistake! I was looking at some wrong sources. Obviously the current "^\r?\n" is correct. 

Sorry and thanks!
Blazej

From Glenn Morris <rgm@gnu.org> w dniu 27 lut 2014, o godz. 23:43:

Blazej Adamczyk wrote:

By example:
When parsing response we may get in state when we will receive only
the following:

"HTTP/1.0 200 OK^M
"

without double quotes (I added them to show the newline character).

In case of current implementation the regexp "^\r?$" and the previous
regexp "^\r*$" both are matching the end of string. That is wrong
because there will be something in the new line after a while.

The current implementation uses "^\r?\n", not "^\r?$".
Where did you get "^\r?$" from?

As such I do not see that it will match your example.

RFC 2616 states clear:
      generic-message = start-line
                        *(message-header CRLF)
                        CRLF
                        [ message-body ]
      start-line      = Request-Line | Status-Line

there has to be one (exactly one) CR in a single line between headers
and body. Thus I propose a simple regexp "^\r$".

Yes, but as I already quoted in

http://debbugs.gnu.org/13598#17

it also recommends tolerance:

  The line terminator for message-header fields is the sequence CRLF.
  However, we recommend that applications, when parsing such headers,
  recognize a single LF as a line terminator and ignore the leading CR.


[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-03-03  6:10 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-31 17:26 bug#13598: 24.3.50; url-http.el doesn't correctly parse headers when they are sent line-by-line Jonas Hoersch
2013-02-07 18:13 ` Jonas Hörsch
2013-02-13 17:19   ` Bastien
2013-02-13 19:30     ` Stefan Monnier
2013-02-13 19:42     ` Glenn Morris
2013-02-13 21:38       ` Glenn Morris
2013-02-14  6:08         ` Bastien
2013-02-16  2:06         ` Glenn Morris
2014-02-26  9:32 ` bug#13598: 24.3.50; Blazej Adamczyk
2014-02-26 16:54 ` bug#13598: 24.3.50 Blazej Adamczyk
2014-02-27 22:43   ` Glenn Morris
2014-03-03  6:10     ` Blazej Adamczyk

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).