From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: Fwd: HTTP redirects make url-retrieve-synchronously asynchronous Date: Fri, 20 Jan 2006 16:56:55 -0500 Message-ID: <87wtgu8trl.fsf-monnier+emacs@gnu.org> References: <87y81ep6wf.fsf-monnier+emacs@gnu.org> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1137800189 6836 80.91.229.2 (20 Jan 2006 23:36:29 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Fri, 20 Jan 2006 23:36:29 +0000 (UTC) Cc: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Jan 21 00:36:29 2006 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1F05nq-0006fQ-7b for ged-emacs-devel@m.gmane.org; Sat, 21 Jan 2006 00:36:22 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1F05qL-0003MY-PK for ged-emacs-devel@m.gmane.org; Fri, 20 Jan 2006 18:38:57 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1F04Ig-0004Aj-NV for emacs-devel@gnu.org; Fri, 20 Jan 2006 17:00:06 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1F04If-00049V-9r for emacs-devel@gnu.org; Fri, 20 Jan 2006 17:00:05 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1F04Ie-00048h-AH for emacs-devel@gnu.org; Fri, 20 Jan 2006 17:00:04 -0500 Original-Received: from [209.226.175.34] (helo=tomts13-srv.bellnexxia.net) by monty-python.gnu.org with esmtp (Exim 4.34) id 1F04Mg-0006Xg-ID; Fri, 20 Jan 2006 17:04:28 -0500 Original-Received: from alfajor ([67.71.26.73]) by tomts13-srv.bellnexxia.net (InterMail vM.5.01.06.13 201-253-122-130-113-20050324) with ESMTP id <20060120215659.RBSX20927.tomts13-srv.bellnexxia.net@alfajor>; Fri, 20 Jan 2006 16:56:59 -0500 Original-Received: by alfajor (Postfix, from userid 1000) id DFE47D73B3; Fri, 20 Jan 2006 16:56:55 -0500 (EST) Original-To: rms@gnu.org In-Reply-To: (Richard M. Stallman's message of "Thu, 19 Jan 2006 12:44:18 -0500") User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.0.50 (gnu/linux) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:49334 Archived-At: > I've looked at it and understand the problem, but I can't think of a > quickfix, and the real fix will take some work (and enough changes that > it'll risk introducing bugs that'll need more testing than I can do > myself). > Could you explain the issue to us? Maybe someone will have an idea. Here is what I believe to be the relevant information: 1. when an http server returns a redirect, url-http.el currently calls url-retrieve to retrieve the target of the redirect (i.e. transparently follow the redirection). 2. url-retrieve takes a url and a callack function and returns a new buffer into which the requested "page" will be asynchronously inserted. When the buffer is complete, it calls the callback function (in that buffer). 3. some backends (at least url-http, maybe others) sometimes decide not to call the callback, presumably as a way to signal an error (the operation can't be completed so the callback can't be called, basically). This is a bug, but I don't know of anyone who's tried to tackle it yet. 4. url-retrieve-synchronously is implemented on top of url-retrieve by busy-looping with accept-process-input waiting for a variable to be set to t by the callback function. Now, because of number 3 above url-retrieve-synchronously can't assume that the callback will eventually be called, so it also stops the busy-waiting if it notices that there's no more process running in the buffer that url-retrive returned. So when a redirect is encountered, the inner call to url-retrieve creates a brand new buffer, different from the one returned by the original url-retrieve call, and the subsequent async process runs in that buffer and the callback will be called in *that* buffer as well. So url-retrieve-synchronously gets all confused: in the buffer in which it expects the output, after the redirect there's no more process running (it's running the newly generated buffer), so it stops busy-waiting and returns, thinking the download has completed whereas it's actually still going on, just in another buffer. So there are fundamentally two bugs in the code: 1. sometimes the callback doesn't get called. 2. sometimes the callback gets called in another buffer than the one returned by url-retrieve. I think the first is due to a bug in the API because there's no documented way for a backend to indicate to the callback function that the operation failed. And the second is an internal bug, but which I think is due to a bug in the internal API (the one used between the generic part of the URL lib and each backend) where the url- function necessarily returns a new buffer. They (and url-retrieve) should either take an optional destination buffer as parameter or they should simply not return any buffer at all and the destination buffer should only be made known when the callback is called. This second option is simpler but would cause problems for code that wants to start processing data before it's fully downloaded. Stefan