From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Matthias Dahl Newsgroups: gmane.emacs.devel Subject: wait_reading_process_ouput hangs in certain cases (w/ patches) Date: Tue, 24 Oct 2017 20:52:20 +0200 Message-ID: NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------49FFBA510746A7D9CFEDE1A3" X-Trace: blaine.gmane.org 1508872771 13230 195.159.176.226 (24 Oct 2017 19:19:31 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 24 Oct 2017 19:19:31 +0000 (UTC) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Oct 24 21:19:26 2017 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1e74jL-0001uL-W8 for ged-emacs-devel@m.gmane.org; Tue, 24 Oct 2017 21:19:16 +0200 Original-Received: from localhost ([::1]:45204 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1e74jT-0004pZ-Dp for ged-emacs-devel@m.gmane.org; Tue, 24 Oct 2017 15:19:23 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:35720) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1e74jJ-0004nt-Dr for emacs-devel@gnu.org; Tue, 24 Oct 2017 15:19:15 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1e74jF-0002ag-BN for emacs-devel@gnu.org; Tue, 24 Oct 2017 15:19:13 -0400 Original-Received: from ud19.udmedia.de ([194.117.254.59]:45304 helo=mail.ud19.udmedia.de) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1e74jE-0002Td-Sj for emacs-devel@gnu.org; Tue, 24 Oct 2017 15:19:09 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple; d=binary-island.eu; h=to :from:subject:message-id:date:mime-version:content-type; s=k1; bh=i7M5N2+oydw1+KE+Er1w6voANkOoQn8IVtfb1rDEE6E=; b=iNQvlhE0xoY0 m6HYpfvgb84R6E3QNk1Ppcl4TjSWkH8lMIsICTnDQtlSFStxGyu9iGtRT5GTaJFi dHiJUoENBUg7JbK0AUtOixEEeSHTsJgOGnjSBjK7S1utcXQlEzJsed3xHMWvj5YS u3pceKRQIWcPhC6lEJ3TDraFy6Vs/p4= Original-Received: (qmail 24476 invoked from network); 24 Oct 2017 20:52:23 +0200 Original-Received: from unknown (HELO ?IPv6:2a02:810b:c540:234:36aa:25b9:ca8f:d05f?) (ud19?126p1@2a02:810b:c540:234:36aa:25b9:ca8f:d05f) by mail.ud19.udmedia.de with ESMTPSA (ECDHE-RSA-AES128-GCM-SHA256 encrypted, authenticated); 24 Oct 2017 20:52:23 +0200 Openpgp: id=1E87ADA02EFE759EFC20B2D1042F47D273AA780C Content-Language: en-US X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 194.117.254.59 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:219729 Archived-At: This is a multi-part message in MIME format. --------------49FFBA510746A7D9CFEDE1A3 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Hello all, recursively calling accept-process-output with the first call waiting for the output of a specific process, can hang Emacs (as-in: it is waiting forever but can be canceled of course) even though it should not be the case since data was read. This is actually not all that uncommon. One example of this is a hang seen with Magit opening its COMMIT_MSG buffer, reported here [1]. I've myself run into this problem continuously which is why I started to debug it in the first place. The hang with Magit happens in flyspell.el which waits for output from its spellchecker process through accept-process-output and specifies that specific process as wait_proc. Now depending on timing (race), wait_reading_process_output can call the pending timers... which in turn can call accept-process-output again. This almost always leads to the spellchecker output being read back in full, so there is no more data left to be read. Thus the original accept-process-output, which called wait_reading_process_output, will wait for the data to become available forever since it has no way to know that those have already been read. Naturally one could argue that a timeout should be specified when calling accept-process-output. But nevertheless I still think this is a breach of contract. The caller expects accept-process-output to return as promised when data has been read from the specified process and it clearly isn't always the case and can hang forever, depending on timing and the specifics of the data being read. The attached patches fix this by introducing process output read accounting -- simply counting the bytes read per process. And using that data to strategically check in wait_reading_process_output for data being read while we handed over control to timers and/or filters. I haven't seen any ill side-effects in my tests and it clearly fixes the problem seen in [1] as well as I would wager quite a few others that were probably seen by user's of all kinds of setups that seemed unpredictable and mysterious and never debugged. As a side-note: I decided against an artificial metric and went with simply counting the bytes read, since this does come in handy when doing debugging and being able to see how much data was read from a process during specific time intervals. Also, this still leaves the possibility that wait_reading_process_output could wait forever while being called without wait_proc and a timeout set. This could be mitigated as well by some sort of a tick counter that only increases when no wait_proc was specified and data from processes were processed. I decided against implementing that for now since imho the chances of that happening are marginal, if at all present. OTOH, the semantics in that case are not all that clear and would further add complexity to an already rather unhealthy function. I am naturally open to other opinions and implementing this as well if requested. :-) Any suggestions and/or comments are very welcome, as always. Thanks for the patience to read this longish post. :-) So long, Matthias [1] https://github.com/magit/magit/issues/2915 -- Dipl.-Inf. (FH) Matthias Dahl | Software Engineer | binary-island.eu --------------49FFBA510746A7D9CFEDE1A3 Content-Type: text/x-patch; name="0001-Add-process-output-read-accounting.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="0001-Add-process-output-read-accounting.patch" =46rom 6c24b8d7082222df28d2046bfe70ff0e22342f08 Mon Sep 17 00:00:00 2001 From: Matthias Dahl Date: Tue, 24 Oct 2017 15:55:53 +0200 Subject: [PATCH 1/2] Add process output read accounting This tracks the bytes read from a process's stdin which is not used anywhere yet but required for follow-up work. * src/process.c (read_process_output): Track bytes read from a process. * src/process.h (struct Lisp_Process): Add infd_num_bytes_read to track bytes read from a process. --- src/process.c | 2 ++ src/process.h | 2 ++ 2 files changed, 4 insertions(+) diff --git a/src/process.c b/src/process.c index fc46e74332..904ca60863 100644 --- a/src/process.c +++ b/src/process.c @@ -5900,6 +5900,8 @@ read_process_output (Lisp_Object proc, int channel)= /* Now set NBYTES how many bytes we must decode. */ nbytes +=3D carryover; =20 + p->infd_num_bytes_read +=3D nbytes; + odeactivate =3D Vdeactivate_mark; /* There's no good reason to let process filters change the current buffer, and many callers of accept-process-output, sit-for, and diff --git a/src/process.h b/src/process.h index 5a044f669f..f796719a51 100644 --- a/src/process.h +++ b/src/process.h @@ -129,6 +129,8 @@ struct Lisp_Process pid_t pid; /* Descriptor by which we read from this process. */ int infd; + /* Byte-count for process output read from `infd'. */ + unsigned long infd_num_bytes_read; /* Descriptor by which we write to this process. */ int outfd; /* Descriptors that were created for this process and that need --=20 2.14.3 --------------49FFBA510746A7D9CFEDE1A3 Content-Type: text/x-patch; name="0002-src-process.c-wait_reading_process_output-Fix-wait_p.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename*0="0002-src-process.c-wait_reading_process_output-Fix-wait_p.pa"; filename*1="tch" =46rom 57e9adc220312681588180aff2bae1eb07925ad5 Mon Sep 17 00:00:00 2001 From: Matthias Dahl Date: Tue, 24 Oct 2017 15:56:47 +0200 Subject: [PATCH 2/2] * src/process.c (wait_reading_process_output): Fix wait_proc hang. If called recursively (through timers or process filters by the means of accept-process-output), it is possible that the output of wait_proc has already been read by one of those recursive calls, leaving the original call hanging forever if no further output arrives through that fd and no timeout has been specified. Implement proper checks by taking advantage of the process output read accounting. --- src/process.c | 23 ++++++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/src/process.c b/src/process.c index 904ca60863..a743aa973e 100644 --- a/src/process.c +++ b/src/process.c @@ -5003,6 +5003,8 @@ wait_reading_process_output (intmax_t time_limit, i= nt nsecs, int read_kbd, struct timespec got_output_end_time =3D invalid_timespec (); enum { MINIMUM =3D -1, TIMEOUT, INFINITY } wait; int got_some_output =3D -1; + unsigned long initial_wait_proc_num_bytes_read =3D (wait_proc) ? + wait_proc->infd_num_b= ytes_read : 0; #if defined HAVE_GETADDRINFO_A || defined HAVE_GNUTLS bool retry_for_async; #endif @@ -5161,6 +5163,17 @@ wait_reading_process_output (intmax_t time_limit, = int nsecs, int read_kbd, && requeued_events_pending_p ()) break; =20 + /* Timers could have called `accept-process-output', thus read= ing the output + of wait_proc while we (in the worst case) wait endlessly fo= r it to become + available later. So we need to check if data has been read = and break out + early if that is so since our job has been fulfilled. */ + if (wait_proc + && wait_proc->infd_num_bytes_read !=3D initial_wait_proc_n= um_bytes_read) + { + got_some_output =3D 1; + break; + } + /* This is so a breakpoint can be put here. */ if (!timespec_valid_p (timer_delay)) wait_reading_process_output_1 (); @@ -5606,7 +5619,15 @@ wait_reading_process_output (intmax_t time_limit, = int nsecs, int read_kbd, buffered-ahead character if we have one. */ =20 nread =3D read_process_output (proc, channel); - if ((!wait_proc || wait_proc =3D=3D XPROCESS (proc)) + + /* In case a filter was run that called `accept-process-ou= tput', it is + possible that the output from wait_proc was already rea= d, leaving us + waiting for it endlessly (if no timeout was specified).= Thus, we need + to check if data was already read. */ + if (wait_proc + && wait_proc->infd_num_bytes_read !=3D initial_wait_pr= oc_num_bytes_read) + got_some_output =3D 1; + else if ((!wait_proc || wait_proc =3D=3D XPROCESS (proc)) && got_some_output < nread) got_some_output =3D nread; if (nread > 0) --=20 2.14.3 --------------49FFBA510746A7D9CFEDE1A3--