unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: emacs-devel@gnu.org
Subject: Re: Fwd: HTTP redirects make url-retrieve-synchronously asynchronous
Date: Fri, 20 Jan 2006 16:56:55 -0500	[thread overview]
Message-ID: <87wtgu8trl.fsf-monnier+emacs@gnu.org> (raw)
In-Reply-To: <E1Ezdpa-0008Fg-J7@fencepost.gnu.org> (Richard M. Stallman's message of "Thu, 19 Jan 2006 12:44:18 -0500")

>     I've looked at it and understand the problem, but I can't think of a
>     quickfix, and the real fix will take some work (and enough changes that
>     it'll risk introducing bugs that'll need more testing than I can do
>     myself).

> Could you explain the issue to us?  Maybe someone will have an idea.

Here is what I believe to be the relevant information:

1. when an http server returns a redirect, url-http.el currently calls
   url-retrieve to retrieve the target of the redirect (i.e. transparently
   follow the redirection).

2. url-retrieve takes a url and a callack function and returns a new buffer
   into which the requested "page" will be asynchronously inserted.  When
   the buffer is complete, it calls the callback function (in that buffer).

3. some backends (at least url-http, maybe others) sometimes decide not to
   call the callback, presumably as a way to signal an error (the operation
   can't be completed so the callback can't be called, basically).  This is
   a bug, but I don't know of anyone who's tried to tackle it yet.

4. url-retrieve-synchronously is implemented on top of url-retrieve by
   busy-looping with accept-process-input waiting for a variable to be set to
   t by the callback function.  Now, because of number 3 above
   url-retrieve-synchronously can't assume that the callback will eventually
   be called, so it also stops the busy-waiting if it notices that there's
   no more process running in the buffer that url-retrive returned.

So when a redirect is encountered, the inner call to url-retrieve creates
a brand new buffer, different from the one returned by the original
url-retrieve call, and the subsequent async process runs in that buffer and
the callback will be called in *that* buffer as well.

So url-retrieve-synchronously gets all confused: in the buffer in which it
expects the output, after the redirect there's no more process running
(it's running the newly generated buffer), so it stops busy-waiting
and returns, thinking the download has completed whereas it's actually still
going on, just in another buffer.

So there are fundamentally two bugs in the code:

1. sometimes the callback doesn't get called.
2. sometimes the callback gets called in another buffer than the one
   returned by url-retrieve.

I think the first is due to a bug in the API because there's no documented
way for a backend to indicate to the callback function that the
operation failed.

And the second is an internal bug, but which I think is due to a bug in the
internal API (the one used between the generic part of the URL lib and each
backend) where the url-<foo> function necessarily returns a new buffer.

They (and url-retrieve) should either take an optional destination buffer as
parameter or they should simply not return any buffer at all and the
destination buffer should only be made known when the callback is called.
This second option is simpler but would cause problems for code that wants to
start processing data before it's fully downloaded.


        Stefan

  reply	other threads:[~2006-01-20 21:56 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-01-17 19:59 [happy@mcplaksin.org: HTTP redirects make url-retrieve-synchronously asynchronous] Richard Stallman
2006-01-18  3:13 ` Fwd: HTTP redirects make url-retrieve-synchronously asynchronous Stefan Monnier
2006-01-19 17:44   ` Richard M. Stallman
2006-01-20 21:56     ` Stefan Monnier [this message]
2006-01-22  3:59       ` Richard M. Stallman
2006-01-23 16:40         ` Stefan Monnier
2006-01-23 20:38           ` Stefan Monnier
2006-02-19 20:15             ` Mark Plaksin
2006-01-24 16:47           ` Richard M. Stallman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87wtgu8trl.fsf-monnier+emacs@gnu.org \
    --to=monnier@iro.umontreal.ca \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).