From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!.POSTED!not-for-mail
From: Eli Zaretskii <eliz@gnu.org>
Newsgroups: gmane.emacs.devel
Subject: Re: wait_reading_process_ouput hangs in certain cases (w/ patches)
Date: Sat, 28 Oct 2017 12:28:07 +0300
Message-ID: <83tvyj62qg.fsf@gnu.org>
References: <e4e583ca-d4c4-f01f-70ee-8ced7bac05a7@binary-island.eu>
	<83lgjz8eiy.fsf@gnu.org>
	<cf13c40c-8465-c4d0-116d-7dd4dcbfe368@binary-island.eu>
	<831slp98ut.fsf@gnu.org>
	<f60200b1-c303-bbc5-9001-2dfc57ad995a@binary-island.eu>
Reply-To: Eli Zaretskii <eliz@gnu.org>
NNTP-Posting-Host: blaine.gmane.org
X-Trace: blaine.gmane.org 1509182915 24871 195.159.176.226 (28 Oct 2017 09:28:35 GMT)
X-Complaints-To: usenet@blaine.gmane.org
NNTP-Posting-Date: Sat, 28 Oct 2017 09:28:35 +0000 (UTC)
Cc: emacs-devel@gnu.org
To: Matthias Dahl <ml_emacs-lists@binary-island.eu>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Oct 28 11:28:28 2017
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by blaine.gmane.org with esmtp (Exim 4.84_2)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1e8NPe-0004ru-Fa
	for ged-emacs-devel@m.gmane.org; Sat, 28 Oct 2017 11:28:18 +0200
Original-Received: from localhost ([::1]:60292 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1e8NPl-0006g7-NP
	for ged-emacs-devel@m.gmane.org; Sat, 28 Oct 2017 05:28:25 -0400
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:56554)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <eliz@gnu.org>) id 1e8NPd-0006fp-1B
	for emacs-devel@gnu.org; Sat, 28 Oct 2017 05:28:18 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <eliz@gnu.org>) id 1e8NPY-0005ZV-W2
	for emacs-devel@gnu.org; Sat, 28 Oct 2017 05:28:17 -0400
Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:46270)
	by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <eliz@gnu.org>)
	id 1e8NPY-0005ZD-TT; Sat, 28 Oct 2017 05:28:12 -0400
Original-Received: from [176.228.60.248] (port=4490 helo=home-c4e4a596f7)
	by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256)
	(Exim 4.82) (envelope-from <eliz@gnu.org>)
	id 1e8NPY-0002uy-Bc; Sat, 28 Oct 2017 05:28:12 -0400
In-reply-to: <f60200b1-c303-bbc5-9001-2dfc57ad995a@binary-island.eu> (message
	from Matthias Dahl on Thu, 26 Oct 2017 20:56:03 +0200)
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-Received-From: 2001:4830:134:3::e
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel/>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: "Emacs-devel" <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Xref: news.gmane.org gmane.emacs.devel:219801
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/219801>

> Cc: emacs-devel@gnu.org
> From: Matthias Dahl <ml_emacs-lists@binary-island.eu>
> Date: Thu, 26 Oct 2017 20:56:03 +0200
> 
> > AFAIK, post-command-hooks cannot be run while we are in sit-for, but I
> > guess this is not relevant to the rest of the description?
> 
> This probably comes from server.el (server-visit-files) because Magit
> uses emacsclient for some of its magic.

I see post-command-hook being run from server-visit-files, but I don't
see sit-for.  The sit_for (not sit-for) in your backtrace is called
from read_char, something that happens every time Emacs becomes idle.

> I have attached a backtrace, taken during the hang. Unfortunately it is
> from a optimized build (would have needed to recompile just now, and I
> am a bit in a hurry) but it at least shows the callstack (more or less)
> nicely.

It lacks the Lisp backtrace ("xbacktrace" can produce it), and the
fact that most arguments are either optimized out or complex data
types shown as "..." makes the backtrace much less useful.

> > I understand that this timer calls accept-process-output with its
> > argument nil, is that correct?  If so, isn't that a bug for a timer to
> > do that?  Doing that runs the risk of eating up output from some
> > subprocess for which the foreground Lisp program is waiting.
> 
> I haven't actually checked which timer it is, to be quite honest since I
> didn't think of it as a bug at all.
> 
> Correct me if I am wrong, calling accept-process-output w/o arguments
> is expected to be quite harmless and can be useful. If you specify a
> specific process, you will most definitely wait at least as long as
> it takes for that process to produce any output.
> 
> Nevertheless: If am not completely mistaken, there is no data lost at
> all. It is read and passed to the filter function which was registered
> by the interested party -- otherwise the default filter will simply
> append it to the buffer it belongs to.
> 
> The only thing that is lost is that it was ever read at all and thus
> an endless wait begins.

But if the wrong call to accept-process-output have read the process
output, it could have also processed it and delivered the results to
the wrong application, no?

> > So please point out the timer that does this, because I think that
> > timer needs to be fixed.
> 
> If you still need that, I will do some digging and try to find it.

I think we need a thorough and detailed understanding of what's going
on in this case, before we can discuss the solutions, yes.  IME,
trying to fix a problem without a good understanding what it is that
we are fixing tends to produce partial solutions at best, and new bugs
at worst.

So please reproduce this in an unoptimized build, and please also show
the Lisp backtrace in this scenario.  Then let's take it from there.

> > We already record the file descriptors on which we wait for process
> > output, see compute_non_keyboard_wait_mask.  Once
> > wait_reading_process_output exits, it clears these records.  So it
> > should be possible for us to prevent accept-process-output calls
> > issued by such runaway timers from waiting on the descriptors that are
> > already "taken": if, when we set the bits in the pselect mask, we find
> > that some of the descriptors are already watched by the same thread as
> > the current thread, we could exclude them from the pselect mask we are
> > setting up.  Wouldn't that be a better solution?  Because AFAIU, your
> > solution just avoids an infinite wait, but doesn't avoid losing the
> > process output, because it was read by the wrong Lisp code.  Right?
> 
> Hm... at the moment I don't see where data is lost with my solution.
> Maybe I am being totally blind and making a fool out of myself but I
> honestly don't see it.

Maybe there is no loss, but I'm not really sure your proposal solves
the root cause, and without detailed understanding of what's exactly
going on, we have no way of discussing this rationally.

> > Well, I'd like to eyeball the timer which commits this crime.
> 
> If you still do, let me know and I will try to track it down...

I do.  I believe we must understand this situation very well before we
reason about its solution.