From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: wait_reading_process_ouput hangs in certain cases (w/ patches) Date: Thu, 26 Oct 2017 19:23:06 +0300 Message-ID: <831slp98ut.fsf@gnu.org> References: <83lgjz8eiy.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: blaine.gmane.org X-Trace: blaine.gmane.org 1509035014 13874 195.159.176.226 (26 Oct 2017 16:23:34 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 26 Oct 2017 16:23:34 +0000 (UTC) Cc: emacs-devel@gnu.org To: Matthias Dahl Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Oct 26 18:23:29 2017 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1e7kwB-00023l-LQ for ged-emacs-devel@m.gmane.org; Thu, 26 Oct 2017 18:23:19 +0200 Original-Received: from localhost ([::1]:53781 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1e7kwJ-0007wd-2c for ged-emacs-devel@m.gmane.org; Thu, 26 Oct 2017 12:23:27 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:54066) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1e7kwB-0007wG-6p for emacs-devel@gnu.org; Thu, 26 Oct 2017 12:23:20 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1e7kw8-0005SQ-Ik for emacs-devel@gnu.org; Thu, 26 Oct 2017 12:23:19 -0400 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:59132) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1e7kw8-0005SM-Et; Thu, 26 Oct 2017 12:23:16 -0400 Original-Received: from [176.228.60.248] (port=2190 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1e7kw7-0000dI-Rx; Thu, 26 Oct 2017 12:23:16 -0400 In-reply-to: (message from Matthias Dahl on Thu, 26 Oct 2017 16:07:31 +0200) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:219786 Archived-At: > Cc: emacs-devel@gnu.org > From: Matthias Dahl > Date: Thu, 26 Oct 2017 16:07:31 +0200 > > When committing, Magit prepares a COMMIT_MSG buffer and does some > process magic of its own which is pretty much irrelevant for this. > > At some point during that, while we are already in an instance of > wait_reading_process_output (due to sit_for), the post-command-hooks are > run. AFAIK, post-command-hooks cannot be run while we are in sit-for, but I guess this is not relevant to the rest of the description? > And here things get interesting. Eventually flyspell-post-command-hook > is run which executes flyspell-word synchronously. That basically does > write out a word that needs to be checked to the spellchecker process, > waits for the results from stdin via accept-process-output and goes on. > Of special note here is that it a) specifies a wait_proc (spellchecker > process) and no timeout or whatsoever. > > The output from the spellchecker is usually there instantaneously, so > that is actually unnoticeable, unless wait_reading_process_output, that > was invoked through that specific accept-process-output, decides to run > the timers. > > And here comes the catch: At this point, usually the spellchecker output > is already available but not yet read. When the timers run, one of them > calls accept-process-output again which will read the entire available > output of the spellchecker process. I understand that this timer calls accept-process-output with its argument nil, is that correct? If so, isn't that a bug for a timer to do that? Doing that runs the risk of eating up output from some subprocess for which the foreground Lisp program is waiting. So please point out the timer that does this, because I think that timer needs to be fixed. > The gist of it is: If we have an active wait_reading_process_output call > with a wait_proc set but no timeout that calls out to either timers or > filters, it is entirely possible that those directly or indirectly call > us again recursively, thus reading the output we are waiting for without > us ever noticing it, if no further output becomes available in addition > to what was read unnoticed... like it happens with flyspell. > > That is what my patches fix: They simply add a bytes read metric to each > process structure that we can check for change at strategically relevant > points and decide if we got some data back that went unnoticed and break > out from wait_reading_process_output. We already record the file descriptors on which we wait for process output, see compute_non_keyboard_wait_mask. Once wait_reading_process_output exits, it clears these records. So it should be possible for us to prevent accept-process-output calls issued by such runaway timers from waiting on the descriptors that are already "taken": if, when we set the bits in the pselect mask, we find that some of the descriptors are already watched by the same thread as the current thread, we could exclude them from the pselect mask we are setting up. Wouldn't that be a better solution? Because AFAIU, your solution just avoids an infinite wait, but doesn't avoid losing the process output, because it was read by the wrong Lisp code. Right? > But I also think that wait_reading_process_output violates its contract > and is buggy in this regard as well, since it should properly function > even if it calls out to filters or timers -- and it clearly does not and > I would wager more hangs seen in the wild that weren't debugged, could > be attributed to this very bug. I think the basic contract that is violated here is that the output from a subprocess is "stolen" by an unrelated call to accept-process-output, and we should prevent that if we can. > If there are any question marks left hanging over your head, please > don't hesitate to ask and I will try my best to clear them up -- but it > might end up being another longish mail, so be warned. ;) Well, I'd like to eyeball the timer which commits this crime.