From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: wait_reading_process_ouput hangs in certain cases (w/ patches) Date: Sat, 28 Oct 2017 12:28:07 +0300 Message-ID: <83tvyj62qg.fsf@gnu.org> References: <83lgjz8eiy.fsf@gnu.org> <831slp98ut.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: blaine.gmane.org X-Trace: blaine.gmane.org 1509182915 24871 195.159.176.226 (28 Oct 2017 09:28:35 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sat, 28 Oct 2017 09:28:35 +0000 (UTC) Cc: emacs-devel@gnu.org To: Matthias Dahl Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Oct 28 11:28:28 2017 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1e8NPe-0004ru-Fa for ged-emacs-devel@m.gmane.org; Sat, 28 Oct 2017 11:28:18 +0200 Original-Received: from localhost ([::1]:60292 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1e8NPl-0006g7-NP for ged-emacs-devel@m.gmane.org; Sat, 28 Oct 2017 05:28:25 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:56554) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1e8NPd-0006fp-1B for emacs-devel@gnu.org; Sat, 28 Oct 2017 05:28:18 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1e8NPY-0005ZV-W2 for emacs-devel@gnu.org; Sat, 28 Oct 2017 05:28:17 -0400 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:46270) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1e8NPY-0005ZD-TT; Sat, 28 Oct 2017 05:28:12 -0400 Original-Received: from [176.228.60.248] (port=4490 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1e8NPY-0002uy-Bc; Sat, 28 Oct 2017 05:28:12 -0400 In-reply-to: (message from Matthias Dahl on Thu, 26 Oct 2017 20:56:03 +0200) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:219801 Archived-At: > Cc: emacs-devel@gnu.org > From: Matthias Dahl > Date: Thu, 26 Oct 2017 20:56:03 +0200 > > > AFAIK, post-command-hooks cannot be run while we are in sit-for, but I > > guess this is not relevant to the rest of the description? > > This probably comes from server.el (server-visit-files) because Magit > uses emacsclient for some of its magic. I see post-command-hook being run from server-visit-files, but I don't see sit-for. The sit_for (not sit-for) in your backtrace is called from read_char, something that happens every time Emacs becomes idle. > I have attached a backtrace, taken during the hang. Unfortunately it is > from a optimized build (would have needed to recompile just now, and I > am a bit in a hurry) but it at least shows the callstack (more or less) > nicely. It lacks the Lisp backtrace ("xbacktrace" can produce it), and the fact that most arguments are either optimized out or complex data types shown as "..." makes the backtrace much less useful. > > I understand that this timer calls accept-process-output with its > > argument nil, is that correct? If so, isn't that a bug for a timer to > > do that? Doing that runs the risk of eating up output from some > > subprocess for which the foreground Lisp program is waiting. > > I haven't actually checked which timer it is, to be quite honest since I > didn't think of it as a bug at all. > > Correct me if I am wrong, calling accept-process-output w/o arguments > is expected to be quite harmless and can be useful. If you specify a > specific process, you will most definitely wait at least as long as > it takes for that process to produce any output. > > Nevertheless: If am not completely mistaken, there is no data lost at > all. It is read and passed to the filter function which was registered > by the interested party -- otherwise the default filter will simply > append it to the buffer it belongs to. > > The only thing that is lost is that it was ever read at all and thus > an endless wait begins. But if the wrong call to accept-process-output have read the process output, it could have also processed it and delivered the results to the wrong application, no? > > So please point out the timer that does this, because I think that > > timer needs to be fixed. > > If you still need that, I will do some digging and try to find it. I think we need a thorough and detailed understanding of what's going on in this case, before we can discuss the solutions, yes. IME, trying to fix a problem without a good understanding what it is that we are fixing tends to produce partial solutions at best, and new bugs at worst. So please reproduce this in an unoptimized build, and please also show the Lisp backtrace in this scenario. Then let's take it from there. > > We already record the file descriptors on which we wait for process > > output, see compute_non_keyboard_wait_mask. Once > > wait_reading_process_output exits, it clears these records. So it > > should be possible for us to prevent accept-process-output calls > > issued by such runaway timers from waiting on the descriptors that are > > already "taken": if, when we set the bits in the pselect mask, we find > > that some of the descriptors are already watched by the same thread as > > the current thread, we could exclude them from the pselect mask we are > > setting up. Wouldn't that be a better solution? Because AFAIU, your > > solution just avoids an infinite wait, but doesn't avoid losing the > > process output, because it was read by the wrong Lisp code. Right? > > Hm... at the moment I don't see where data is lost with my solution. > Maybe I am being totally blind and making a fool out of myself but I > honestly don't see it. Maybe there is no loss, but I'm not really sure your proposal solves the root cause, and without detailed understanding of what's exactly going on, we have no way of discussing this rationally. > > Well, I'd like to eyeball the timer which commits this crime. > > If you still do, let me know and I will try to track it down... I do. I believe we must understand this situation very well before we reason about its solution.