From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Paul Eggert Newsgroups: gmane.emacs.devel Subject: Re: wait_reading_process_ouput hangs in certain cases (w/ patches) Date: Tue, 14 Nov 2017 13:54:35 -0800 Organization: UCLA Computer Science Department Message-ID: References: <83lgjz8eiy.fsf@gnu.org> <831slp98ut.fsf@gnu.org> <83tvyj62qg.fsf@gnu.org> <83r2tetf90.fsf@gnu.org> <5150d198-8dd3-9cf4-5914-b7e945294452@binary-island.eu> <83tvy7s6wi.fsf@gnu.org> <83inemrqid.fsf@gnu.org> <398f8d17-b727-d5d6-4a31-772448c7ca0d@binary-island.eu> <56e722a6-95a4-0e42-185c-f26845d4f4bf@binary-island.eu> <21237e45-a353-92f9-01ec-7b51640d2031@cs.ucla.edu> <83vaickfu2.fsf@gnu.org> <83tvxwkexg.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Trace: blaine.gmane.org 1510696645 23219 195.159.176.226 (14 Nov 2017 21:57:25 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 14 Nov 2017 21:57:25 +0000 (UTC) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 Cc: ml_emacs-lists@binary-island.eu, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Nov 14 22:57:20 2017 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eEjCp-0005hW-5M for ged-emacs-devel@m.gmane.org; Tue, 14 Nov 2017 22:57:19 +0100 Original-Received: from localhost ([::1]:33712 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eEjCw-0004hd-Fz for ged-emacs-devel@m.gmane.org; Tue, 14 Nov 2017 16:57:26 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:34546) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eEjAJ-0003I1-GM for emacs-devel@gnu.org; Tue, 14 Nov 2017 16:54:44 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eEjAI-00024Q-8K for emacs-devel@gnu.org; Tue, 14 Nov 2017 16:54:43 -0500 Original-Received: from zimbra.cs.ucla.edu ([131.179.128.68]:50170) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1eEjAE-00020c-El; Tue, 14 Nov 2017 16:54:38 -0500 Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id A7D56161084; Tue, 14 Nov 2017 13:54:36 -0800 (PST) Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id bOYUvvGfo2Qz; Tue, 14 Nov 2017 13:54:35 -0800 (PST) Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 85460161087; Tue, 14 Nov 2017 13:54:35 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id s_Tf3urB-05a; Tue, 14 Nov 2017 13:54:35 -0800 (PST) Original-Received: from Penguin.CS.UCLA.EDU (Penguin.CS.UCLA.EDU [131.179.64.200]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 66BF8161080; Tue, 14 Nov 2017 13:54:35 -0800 (PST) In-Reply-To: <83tvxwkexg.fsf@gnu.org> Content-Language: en-US X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-Received-From: 131.179.128.68 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:220199 Archived-At: On 11/14/2017 08:23 AM, Eli Zaretskii wrote: > And also the problem you are talking about, because I'm not sure I > understand it well enough. I doubt whether anyone understands the problem well enough, which is why I have been asking questions about the proposed solution - which as far as I can see, is more of a band-aid rather than a real fix. To help move this forward, I just reread the original bug report here: https://lists.gnu.org/archive/html/emacs-devel/2017-10/msg00743.html and I have some further questions that may help understand what's going on. The bug report said: > flyspell.el ... waits for output from its spellchecker process > through accept-process-output and specifies that specific process as > wait_proc. Now depending on timing (race), > wait_reading_process_output can call the pending timers... which in > turn can call accept-process-output again. This almost always leads > to the spellchecker output being read back in full, so there is no > more data left to be read. Thus the original accept-process-output, > which called wait_reading_process_output, will wait for the data to > become available forever since it has no way to know that those have > already been read. When this happens, it appears that the original accept-process-output acted by calling wait_reading_process (0, 0, 0, 0, Qnil, PROC, 0) where PROC is the ispell-process. First, is that correct? (If not, my remaining questions may be a wild goose chase....) This meant the original wait_reading_process did the following: set wait = INFINITY, run the timers (which apparently call wait_reading_process recursively), then check whether update_tick != process_tick (line 5182 of process.c in commit 79108894dbcd642121466bb6af6c98c6a56e9233). Is update_tick equal to process_tick in the problematic call? I'll assume so, but please check this. (If not, my remaining questions may need to be changed.) Next, the original wait_reading_process output checks whether wait_proc->raw_status_new is nonzero (line 5210). Is it nonzero? For now, I'll assume it is zero. (If not, my remaining questions may need to be changed.) Next, the original wait_reading_process_output checks whether (! EQ (wait_proc->status, Qrun) && ! connecting_status (wait_proc->status)) (line 5213). Does this check succeed? For now, I'll assume this check returns false. (If not, then we need to understand why.) Next, the original wait_reading_process_output recomputes the input wait masks, sets check_delay = 0, check_write = true, no_avail = 0, timeout = timer_delay (line 5355), and so forth. This means it wiil call select with a nonzero timeout, even though we don't want it to do that: we want wait_reading_process_output to return 0, because it attempted to receive input but got none. The changes you're proposing essentially kick the code so that it pretends that it read some bytes, even though it didn't (because the bytes were actually read and processed by a subroutine), causing it to exit the loop (and return nonzero instead of zero -- why?). But isn't this kick what the update_tick != process_tick (line 5182) check is supposed to be doing? And if so, why isn't that check working for your case? Is it because the code is forgetting to increment a tick count? This above sort of reasoning is the sort of thing that needs to be done with this sort of change to such an intricate part of the Emacs code.