From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Matthias Dahl Newsgroups: gmane.emacs.devel Subject: Re: wait_reading_process_ouput hangs in certain cases (w/ patches) Date: Wed, 15 Nov 2017 15:03:27 +0100 Message-ID: <03261534-6bf5-1a5d-915f-d3c55aaa35e9@binary-island.eu> References: <83lgjz8eiy.fsf@gnu.org> <831slp98ut.fsf@gnu.org> <83tvyj62qg.fsf@gnu.org> <83r2tetf90.fsf@gnu.org> <5150d198-8dd3-9cf4-5914-b7e945294452@binary-island.eu> <83tvy7s6wi.fsf@gnu.org> <83inemrqid.fsf@gnu.org> <398f8d17-b727-d5d6-4a31-772448c7ca0d@binary-island.eu> <56e722a6-95a4-0e42-185c-f26845d4f4bf@binary-island.eu> <21237e45-a353-92f9-01ec-7b51640d2031@cs.ucla.edu> <83vaickfu2.fsf@gnu.org> <83tvxwkexg.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: blaine.gmane.org 1510754650 13801 195.159.176.226 (15 Nov 2017 14:04:10 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Wed, 15 Nov 2017 14:04:10 +0000 (UTC) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 Cc: emacs-devel@gnu.org To: Paul Eggert , Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Nov 15 15:03:56 2017 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eEyIE-0002cE-Kn for ged-emacs-devel@m.gmane.org; Wed, 15 Nov 2017 15:03:54 +0100 Original-Received: from localhost ([::1]:36352 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eEyII-0005ya-U2 for ged-emacs-devel@m.gmane.org; Wed, 15 Nov 2017 09:03:58 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:57300) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eEyHw-0005ru-Cx for emacs-devel@gnu.org; Wed, 15 Nov 2017 09:03:40 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eEyHs-0001AT-Dt for emacs-devel@gnu.org; Wed, 15 Nov 2017 09:03:36 -0500 Original-Received: from ud19.udmedia.de ([194.117.254.59]:38568 helo=mail.ud19.udmedia.de) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1eEyHs-00019B-25 for emacs-devel@gnu.org; Wed, 15 Nov 2017 09:03:32 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple; d=binary-island.eu; h= subject:to:cc:references:from:message-id:date:mime-version :in-reply-to:content-type:content-transfer-encoding; s=k1; bh=vR 1y9XyhNRE+xrnjo7Gt9ly6ojSR9CRGOJWBn62Xmec=; b=EKdIiP2zRwtor0sqVZ V7KiqeOYeXZh4Q4IdsEt88cFnuIPOPSpQsMOfBODhYGOTrcz6c57XcB8G1id5iaI M1CM0PPm9FyzDbqJytbDxGO5KnU2+hYrCqcnjSG1HbgyNEYhbm0izh4sxxghTywU CdaGrScYJn42wSAzd4qfMlWkM= Original-Received: (qmail 29162 invoked from network); 15 Nov 2017 15:03:29 +0100 Original-Received: from unknown (HELO ?IPv6:2a02:810b:c540:234:36aa:25b9:ca8f:d05f?) (ud19?126p1@2a02:810b:c540:234:36aa:25b9:ca8f:d05f) by mail.ud19.udmedia.de with ESMTPSA (ECDHE-RSA-AES128-GCM-SHA256 encrypted, authenticated); 15 Nov 2017 15:03:29 +0100 Openpgp: id=1E87ADA02EFE759EFC20B2D1042F47D273AA780C In-Reply-To: Content-Language: en-US X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 194.117.254.59 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:220207 Archived-At: Hello Paul... On 14/11/17 22:54, Paul Eggert wrote: > I doubt whether anyone understands the problem well enough, which is why > I have been asking questions about the proposed solution - which as far > as I can see, is more of a band-aid rather than a real fix. What makes you think that, just out of curiosity? I poured quite a lot of time into debugging this bug, reading up on the relevant C sources and coming up with a "simple" enough that is easy to grasp and review. This was by no means a 5-minute look and fix... which is nothing I would ever do, period. If anything, I am usually too thorough for my own good. > When this happens, it appears that the original accept-process-output > acted by calling wait_reading_process (0, 0, 0, 0, Qnil, PROC, 0) where > PROC is the ispell-process. First, is that correct? (If not, my > remaining questions may be a wild goose chase....) Exactly. You can also have a deeper look at things, as there is a full and detailed backtrace posted a few messages back (emacs-bt-full.txt). > This meant the original wait_reading_process did the following: set wait > = INFINITY, run the timers (which apparently call wait_reading_process > recursively), then check whether update_tick != process_tick (line 5182 > of process.c in commit 79108894dbcd642121466bb6af6c98c6a56e9233). Is > update_tick equal to process_tick in the problematic call? I'll assume > so, but please check this. (If not, my remaining questions may need to > be changed.) This is pretty much irrelevant, if I am not missing some huge bit piece of the puzzle somewhere. If the branch update_tick != process_tick is taken, there is nothing in there that would eventually notice that the data from our wait_proc has already been read. If thread_select signaled that there is no more data available, we end up with status_notify for our wait_proc -- and that will also only try to read any remaining data, if at all. The crucial part is: ALL data has been read from our wait_proc while we were running timers or filters -- and no further data will become available until there is some interaction again with the process. That is the case with the ispell process. wait_reading_... currently has no chance/code to detect such an event since it solely relies on pselect calls and such -- which will come up empty handed if no further data is available, and there is nothing to change that. [ ... I skipped the remaining questions for now. If you still think those are relevant, let me know and I will do my best to answer those as well. ...] > The changes you're proposing essentially kick the code so that it > pretends that it read some bytes, even though it didn't (because the > bytes were actually read and processed by a subroutine), causing it to > exit the loop (and return nonzero instead of zero -- why?). But isn't > this kick what the update_tick != process_tick (line 5182) check is > supposed to be doing? And if so, why isn't that check working for your > case? Is it because the code is forgetting to increment a tick count? The fix is no band-aid, as you put it earlier. To answer your questions: 1) Yes, it gives wait_reading_... the possibility to exit the loop and return >= 1, if the data for our wait_proc has been read while we were running timers or filters. 2) Returning >= 1 is semantically correct, imho. We were waiting for data to become available. That data became available. Granted, it is not us directly who read it and passed it to the filter, but still, the caller expects us to signal if data become available... and that is exactly what happened. Returning anything else wouldn't make much sense, imho and probably (?) cause problems in this particular case... 3) update_tick != processed_tick ... see earlier in this mail for why this is not helping a bit with this particular case. By the way, I do have ideas for solutions that have no corner-cases, which is what you are asking for. My current solution will fail to detect a read-back if exactly 2**(bit depth of counter) has been read which is very (!) unlikely, but still. That's it... otherwise it will work reliably. But that requires that we change wait_reading_... to really only return -1, 0 or 1 and give up returning the real amount of bytes. Otherwise we either end up with the same corner-case all over again, just in a more complex solution or with a way too complex solution for this problem. No user is using the return value in such a way in-tree. And given that those changes are going only to master, it gives all out-of-tree users who might use that value differently, ample time to adjust -- or voice their concerns. Thanks for your thorough questions and review -- very much appreciated! So long, Matthias -- Dipl.-Inf. (FH) Matthias Dahl | Software Engineer | binary-island.eu