[happy@mcplaksin.org: HTTP redirects make url-retrieve-synchronously asynchronous]

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

* [happy@mcplaksin.org: HTTP redirects make url-retrieve-synchronously asynchronous]
@ 2006-01-17 19:59 Richard Stallman
  2006-01-18  3:13 ` Fwd: HTTP redirects make url-retrieve-synchronously asynchronous Stefan Monnier
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Stallman @ 2006-01-17 19:59 UTC (permalink / raw)


[I sent this message a week ago but did not get a response.]

Would someone please look at this, and then ack?

------- Start of forwarded message -------
X-Injected-Via-Gmane: http://gmane.org/
To: emacs-pretest-bug@gnu.org
From: Mark Plaksin <happy@mcplaksin.org>
Date: Mon, 09 Jan 2006 13:18:46 -0500
X-Gmane-NNTP-Posting-Host: stone.tss.usg.edu
Cancel-Lock: sha1:UaHMftiRGaDChvZKQsoPebVNGFs=
Subject: HTTP redirects make url-retrieve-synchronously asynchronous
Sender: emacs-pretest-bug-bounces+rms=gnu.org@gnu.org
X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on monty-python
X-Spam-Level: 
X-Spam-Status: No, hits=0.0 required=5.0 tests=HTML_MESSAGE autolearn=no 
	version=2.63

url-retrieve-synchronously becomes asynchronous when HTTP redirects are
involved.  When it encounters a redirect, url-http-parse-headers calls
url-retrieve instead of url-retrieve-synchronously.  Naively switching to
the latter doesn't solve the problem and I haven't been able to find a fix..

I encountered the problem in Gnus using nnrss.  I have an old URL for
Slashdot's RSS feed and was experimenting with setting mm-url-use-external
to nil.

To reproduce, evaluate this in *scratch*:
(mm-url-insert "http://slashdot.org/index.rss")

You'll get this:
("http://slashdot.org/index.rss" 316)
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<TITLE>301 Moved Permanently</TITLE>
</HEAD><BODY>
<H1>Moved Permanently</H1>
The document has moved <A HREF="http://rss.slashdot.org/Slashdot/slashdot">here</A>.<P>
<HR>
<ADDRESS>Apache/1.3.33 Server at slashdot.org Port 80</ADDRESS>
</BODY></HTML>

To make the problem go away, add a breakpoint before "(when redirect-uri"
in url-http.el.  Then re-run the test and wait a few seconds after hitting
the breakpoint.  Tell the debugger to continue and you will get the
expected contents of Slashdot's RSS feed instead of the redirect message
above.

I'll keep trying to find a way to fix this but maybe it's trivial for
somebody who already understands the URL library.



_______________________________________________
emacs-pretest-bug mailing list
emacs-pretest-bug@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug
------- End of forwarded message -------

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Fwd: HTTP redirects make url-retrieve-synchronously asynchronous
  2006-01-17 19:59 [happy@mcplaksin.org: HTTP redirects make url-retrieve-synchronously asynchronous] Richard Stallman
@ 2006-01-18  3:13 ` Stefan Monnier
  2006-01-19 17:44   ` Richard M. Stallman
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Monnier @ 2006-01-18  3:13 UTC (permalink / raw)
  Cc: emacs-devel

> [I sent this message a week ago but did not get a response.]
> Would someone please look at this, and then ack?

I've looked at it and understand the problem, but I can't think of a
quickfix, and the real fix will take some work (and enough changes that
it'll risk introducing bugs that'll need more testing than I can do
myself).


        Stefan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Fwd: HTTP redirects make url-retrieve-synchronously asynchronous
  2006-01-18  3:13 ` Fwd: HTTP redirects make url-retrieve-synchronously asynchronous Stefan Monnier
@ 2006-01-19 17:44   ` Richard M. Stallman
  2006-01-20 21:56     ` Stefan Monnier
  0 siblings, 1 reply; 9+ messages in thread
From: Richard M. Stallman @ 2006-01-19 17:44 UTC (permalink / raw)
  Cc: emacs-devel

    I've looked at it and understand the problem, but I can't think of a
    quickfix, and the real fix will take some work (and enough changes that
    it'll risk introducing bugs that'll need more testing than I can do
    myself).

Could you explain the issue to us?  Maybe someone will have an idea.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Fwd: HTTP redirects make url-retrieve-synchronously asynchronous
  2006-01-19 17:44   ` Richard M. Stallman
@ 2006-01-20 21:56     ` Stefan Monnier
  2006-01-22  3:59       ` Richard M. Stallman
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Monnier @ 2006-01-20 21:56 UTC (permalink / raw)
  Cc: emacs-devel

>     I've looked at it and understand the problem, but I can't think of a
>     quickfix, and the real fix will take some work (and enough changes that
>     it'll risk introducing bugs that'll need more testing than I can do
>     myself).

> Could you explain the issue to us?  Maybe someone will have an idea.

Here is what I believe to be the relevant information:

1. when an http server returns a redirect, url-http.el currently calls
   url-retrieve to retrieve the target of the redirect (i.e. transparently
   follow the redirection).

2. url-retrieve takes a url and a callack function and returns a new buffer
   into which the requested "page" will be asynchronously inserted.  When
   the buffer is complete, it calls the callback function (in that buffer).

3. some backends (at least url-http, maybe others) sometimes decide not to
   call the callback, presumably as a way to signal an error (the operation
   can't be completed so the callback can't be called, basically).  This is
   a bug, but I don't know of anyone who's tried to tackle it yet.

4. url-retrieve-synchronously is implemented on top of url-retrieve by
   busy-looping with accept-process-input waiting for a variable to be set to
   t by the callback function.  Now, because of number 3 above
   url-retrieve-synchronously can't assume that the callback will eventually
   be called, so it also stops the busy-waiting if it notices that there's
   no more process running in the buffer that url-retrive returned.

So when a redirect is encountered, the inner call to url-retrieve creates
a brand new buffer, different from the one returned by the original
url-retrieve call, and the subsequent async process runs in that buffer and
the callback will be called in *that* buffer as well.

So url-retrieve-synchronously gets all confused: in the buffer in which it
expects the output, after the redirect there's no more process running
(it's running the newly generated buffer), so it stops busy-waiting
and returns, thinking the download has completed whereas it's actually still
going on, just in another buffer.

So there are fundamentally two bugs in the code:

1. sometimes the callback doesn't get called.
2. sometimes the callback gets called in another buffer than the one
   returned by url-retrieve.

I think the first is due to a bug in the API because there's no documented
way for a backend to indicate to the callback function that the
operation failed.

And the second is an internal bug, but which I think is due to a bug in the
internal API (the one used between the generic part of the URL lib and each
backend) where the url-<foo> function necessarily returns a new buffer.

They (and url-retrieve) should either take an optional destination buffer as
parameter or they should simply not return any buffer at all and the
destination buffer should only be made known when the callback is called.
This second option is simpler but would cause problems for code that wants to
start processing data before it's fully downloaded.

        Stefan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Fwd: HTTP redirects make url-retrieve-synchronously asynchronous
  2006-01-20 21:56     ` Stefan Monnier
@ 2006-01-22  3:59       ` Richard M. Stallman
  2006-01-23 16:40         ` Stefan Monnier
  0 siblings, 1 reply; 9+ messages in thread
From: Richard M. Stallman @ 2006-01-22  3:59 UTC (permalink / raw)
  Cc: emacs-devel

    3. some backends (at least url-http, maybe others) sometimes decide not to
       call the callback, presumably as a way to signal an error (the operation
       can't be completed so the callback can't be called, basically).  This is
       a bug, but I don't know of anyone who's tried to tackle it yet.

What exactly is the bug?  It isn't clear to me.  Are you saying that
they should also call the callback function to report failure?

That seems like a good idea on general principles.  I don't know how
much of a change it would be.  Is the calling convention easy to
extend for this?

    2. sometimes the callback gets called in another buffer than the one
       returned by url-retrieve.

One solution would be to give the first buffer a local variable that
would, in this case, point to the second buffer.  Then
url-retrieve-synchronously could check the local variable, which would
tell it to check the process in the other buffer.

That solution would be safe, since it would not involve changing
general calling conventions.

    They (and url-retrieve) should either take an optional destination buffer as
    parameter or they should simply not return any buffer at all and the
    destination buffer should only be made known when the callback is called.

I don't much like the idea of that big a change in calling conventions.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Fwd: HTTP redirects make url-retrieve-synchronously asynchronous
  2006-01-22  3:59       ` Richard M. Stallman
@ 2006-01-23 16:40         ` Stefan Monnier
  2006-01-23 20:38           ` Stefan Monnier
  2006-01-24 16:47           ` Richard M. Stallman
  0 siblings, 2 replies; 9+ messages in thread
From: Stefan Monnier @ 2006-01-23 16:40 UTC (permalink / raw)
  Cc: emacs-devel

>     3. some backends (at least url-http, maybe others) sometimes decide
>     not to call the callback, presumably as a way to signal an error (the
>     operation can't be completed so the callback can't be called,
>     basically).  This is a bug, but I don't know of anyone who's tried to
>     tackle it yet.

> What exactly is the bug?  It isn't clear to me.  Are you saying that
> they should also call the callback function to report failure?

Yes.

> That seems like a good idea on general principles.  I don't know how
> much of a change it would be.  Is the calling convention easy to
> extend for this?

Not directly.  The most natural way to do that is to pass the
success/failure information as a parameter to the callback function, but if
we don't want to break existing code, we could pass the info via a local
variable in the destination buffer.

>     2. sometimes the callback gets called in another buffer than the one
>        returned by url-retrieve.

> One solution would be to give the first buffer a local variable that
> would, in this case, point to the second buffer.
> Then url-retrieve-synchronously could check the local variable, which
> would tell it to check the process in the other buffer.

Yes, sounds like a good quick-fix.

>     They (and url-retrieve) should either take an optional destination
>     buffer as parameter or they should simply not return any buffer at all
>     and the destination buffer should only be made known when the callback
>     is called.

> I don't much like the idea of that big a change in calling conventions.

Neither do I, especially not that close to a release.  OTOH we haven't
released URL yet so it's the last opportunity to clean up the API.


        Stefan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Fwd: HTTP redirects make url-retrieve-synchronously asynchronous
  2006-01-23 16:40         ` Stefan Monnier
@ 2006-01-23 20:38           ` Stefan Monnier
  2006-02-19 20:15             ` Mark Plaksin
  2006-01-24 16:47           ` Richard M. Stallman
  1 sibling, 1 reply; 9+ messages in thread
From: Stefan Monnier @ 2006-01-23 20:38 UTC (permalink / raw)
  Cc: emacs-devel

>> 2. sometimes the callback gets called in another buffer than the one
>> returned by url-retrieve.

>> One solution would be to give the first buffer a local variable that
>> would, in this case, point to the second buffer.
>> Then url-retrieve-synchronously could check the local variable, which
>> would tell it to check the process in the other buffer.

> Yes, sounds like a good quick-fix.

The patch below uses this approach.  Mark, does it work well for you?


        Stefan


Index: lisp/url/url.el
===================================================================
RCS file: /sources/emacs/emacs/lisp/url/url.el,v
retrieving revision 1.20
diff -u -r1.20 url.el
--- lisp/url/url.el	10 Jan 2006 19:31:15 -0000	1.20
+++ lisp/url/url.el	23 Jan 2006 20:37:41 -0000
@@ -114,6 +114,13 @@
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;; Retrieval functions
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
+(defvar url-redirect-buffer nil
+  "New buffer into which the retrieval will take place.
+Sometimes while retrieving a URL, the URL library needs to use another buffer
+than the one returned initially by `url-retrieve'.  In this case, it sets this
+variable in the original buffer as a forwarding pointer.")
+
 ;;;###autoload
 (defun url-retrieve (url callback &optional cbargs)
   "Retrieve URL asynchronously and call CALLBACK with CBARGS when finished.
@@ -189,10 +189,14 @@
 	  (url-debug 'retrieval
 		     "Spinning in url-retrieve-synchronously: %S (%S)"
 		     retrieval-done asynch-buffer)
+          (if (buffer-local-value 'url-redirect-buffer asynch-buffer)
+              (setq proc (get-buffer-process
+                          (setq asynch-buffer
+                                (buffer-local-value 'url-redirect-buffer
+                                                    asynch-buffer))))
 	  (if (and proc (memq (process-status proc)
                               '(closed exit signal failed))
-                   ;; Make sure another process hasn't been started, as can
-                   ;; happen with http redirections.
+                     ;; Make sure another process hasn't been started.
 		   (eq proc (or (get-buffer-process asynch-buffer) proc)))
 	      ;; FIXME: It's not clear whether url-retrieve's callback is
 	      ;; guaranteed to be called or not.  It seems that url-http
@@ -200,7 +204,7 @@
 	      ;; clear that it's a bug, but even then we need to decide how
 	      ;; url-http can then warn us that the download has completed.
               ;; In the mean time, we use this here workaround.
-              (setq retrieval-done t)
+                (setq retrieval-done t))
             ;; We used to use `sit-for' here, but in some cases it wouldn't
             ;; work because apparently pending keyboard input would always
             ;; interrupt it before it got a chance to handle process input.
Index: lisp/url/url-http.el
===================================================================
RCS file: /sources/emacs/emacs/lisp/url/url-http.el,v
retrieving revision 1.23
diff -u -r1.23 url-http.el
--- lisp/url/url-http.el	18 Nov 2005 16:55:54 -0000	1.23
+++ lisp/url/url-http.el	23 Jan 2006 20:37:41 -0000
@@ -1,6 +1,6 @@
 ;;; url-http.el --- HTTP retrieval routines
 
-;; Copyright (C) 1999, 2001, 2004, 2005 Free Software Foundation, Inc.
+;; Copyright (C) 1999, 2001, 2004, 2005, 2006  Free Software Foundation, Inc.
 
 ;; Author: Bill Perry <wmperry@gnu.org>
 ;; Keywords: comm, data, processes
@@ -35,10 +35,8 @@
 (require 'url-cookie)
 (require 'mail-parse)
 (require 'url-auth)
-(autoload 'url-retrieve-synchronously "url")
-(autoload 'url-retrieve "url")
+(require 'url)
 (autoload 'url-cache-create-filename "url-cache")
-(autoload 'url-mark-buffer-as-dead "url")
 
 (defconst url-http-default-port 80 "Default HTTP port.")
 (defconst url-http-asynchronous-p t "HTTP retrievals are asynchronous.")
@@ -509,10 +509,17 @@
            (let ((url-request-method url-http-method)
 		 (url-request-data url-http-data)
 		 (url-request-extra-headers url-http-extra-headers))
+             ;; Put in the current buffer a forwarding pointer to the new
+             ;; destination buffer.
+             ;; FIXME: This is a hack to fix url-retrieve-synchronously
+             ;; without changing the API.  Instead url-retrieve should
+             ;; either simply not return the "destination" buffer, or it
+             ;; should take an optional `dest-buf' argument.
+             (set (make-local-variable 'url-redirect-buffer)
 	     (url-retrieve redirect-uri url-callback-function
 			   (cons :redirect
 				 (cons redirect-uri
-				       url-callback-arguments)))
+                                            url-callback-arguments))))
 	     (url-mark-buffer-as-dead (current-buffer))))))
       (4				; Client error
        ;; 400 Bad Request

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Fwd: HTTP redirects make url-retrieve-synchronously asynchronous
  2006-01-23 20:38           ` Stefan Monnier
@ 2006-02-19 20:15             ` Mark Plaksin
  0 siblings, 0 replies; 9+ messages in thread
From: Mark Plaksin @ 2006-02-19 20:15 UTC (permalink / raw)


Stefan Monnier <monnier@iro.umontreal.ca> writes:

>>> 2. sometimes the callback gets called in another buffer than the one
>>> returned by url-retrieve.
>
>>> One solution would be to give the first buffer a local variable that
>>> would, in this case, point to the second buffer.
>>> Then url-retrieve-synchronously could check the local variable, which
>>> would tell it to check the process in the other buffer.
>
>> Yes, sounds like a good quick-fix.
>
> The patch below uses this approach.  Mark, does it work well for you?

It does; thanks!  Are applications supposed to be aware of the possibility
of redirect buffers and clean up both the original buffer and the redirect
buffer?  You could easily say "It's a work around, so yes!" :)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Fwd: HTTP redirects make url-retrieve-synchronously asynchronous
  2006-01-23 16:40         ` Stefan Monnier
  2006-01-23 20:38           ` Stefan Monnier
@ 2006-01-24 16:47           ` Richard M. Stallman
  1 sibling, 0 replies; 9+ messages in thread
From: Richard M. Stallman @ 2006-01-24 16:47 UTC (permalink / raw)
  Cc: emacs-devel

    Neither do I, especially not that close to a release.  OTOH we haven't
    released URL yet so it's the last opportunity to clean up the API.

In that case, I'd say it is a good idea to pass the success/failure
info as another argument to the callback function.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2006-02-19 20:15 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-01-17 19:59 [happy@mcplaksin.org: HTTP redirects make url-retrieve-synchronously asynchronous] Richard Stallman
2006-01-18  3:13 ` Fwd: HTTP redirects make url-retrieve-synchronously asynchronous Stefan Monnier
2006-01-19 17:44   ` Richard M. Stallman
2006-01-20 21:56     ` Stefan Monnier
2006-01-22  3:59       ` Richard M. Stallman
2006-01-23 16:40         ` Stefan Monnier
2006-01-23 20:38           ` Stefan Monnier
2006-02-19 20:15             ` Mark Plaksin
2006-01-24 16:47           ` Richard M. Stallman

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.