unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* some accept-process-output races fixed; Tramp FIXMEs
@ 2019-01-15 18:33 Paul Eggert
  2019-01-16  8:32 ` Michael Albinus
  2019-01-16 13:25 ` Stefan Monnier
  0 siblings, 2 replies; 23+ messages in thread
From: Paul Eggert @ 2019-01-15 18:33 UTC (permalink / raw)
  To: Emacs development discussions; +Cc: Michael Albinus

[-- Attachment #1: Type: text/plain, Size: 981 bytes --]

I've been annoyed for months that test/src/process-tests.el occasionally 
failed on my work desktop, and finally got annoyed enough to track it 
down. It was a race condition bug, fixed in the attached patch. I looked 
for similar instances of the bug elsewhere in Emacs, and found what 
appear to be some instances in Tramp code, along with some mysterious 
"0.1"s that may be inadequate attempts to work around the bug. I 
installed the attached patch to try to document and fix this mess, 
including some FIXMEs about those "0.1"s; I hope Michael or someone else 
with Tramp expertise (I don't use Tramp) can take a look at those FIXMEs 
and at my other Tramp changes, and perhaps at all other Tramp uses of 
accept-process-output and of tramp-accept-process-output.

Now that I'm thinking about it, I should also mention that a single 
(accept-process-output P) is not guaranteed to slurp all the data from 
process P even if P has already exited. Just a word to the wise....


[-- Attachment #2: 0001-Fix-accept-process-output-process-live-p-confusion.patch --]
[-- Type: text/x-patch, Size: 7610 bytes --]

From 5a144e547d6bd6a718e76499705d81ef5934776d Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Tue, 15 Jan 2019 10:18:45 -0800
Subject: [PATCH] Fix accept-process-output/process-live-p confusion

* doc/lispref/processes.texi (Accepting Output):
Document the issue.
* lisp/net/tramp-adb.el (tramp-adb-parse-device-names):
* lisp/net/tramp-rclone.el (tramp-rclone-parse-device-names):
* lisp/net/tramp-smb.el (tramp-smb-wait-for-output):
* lisp/net/tramp.el (tramp-interrupt-process):
* test/src/process-tests.el (make-process/mix-stderr):
Fix code that uses accept-process-output and process-live-p.
Add FIXME comments as necessary.
* lisp/net/tramp-sudoedit.el (tramp-sudoedit-action-sudo):
* lisp/net/tramp.el (tramp-action-out-of-band):
Add FIXME comments as necessary.
---
 doc/lispref/processes.texi | 20 ++++++++++++++++++++
 lisp/net/tramp-adb.el      |  6 +++---
 lisp/net/tramp-rclone.el   |  6 +++---
 lisp/net/tramp-smb.el      | 19 +++++++++++--------
 lisp/net/tramp-sudoedit.el |  2 ++
 lisp/net/tramp.el          |  9 ++++++---
 test/src/process-tests.el  |  4 ++--
 7 files changed, 47 insertions(+), 19 deletions(-)

diff --git a/doc/lispref/processes.texi b/doc/lispref/processes.texi
index 72b164c5d4..afda8aede8 100644
--- a/doc/lispref/processes.texi
+++ b/doc/lispref/processes.texi
@@ -1859,6 +1859,26 @@ Accepting Output
 arrived.
 @end defun
 
+If a connection from a process contains buffered data,
+@code{accept-process-output} can return non-@code{nil} even after the
+process has exited.  Therefore, although the following loop:
+
+@example
+;; This loop contains a bug.
+(while (process-live-p process)
+  (accept-process-output process))
+@end example
+
+@noindent
+will often work, it has a race condition and can miss some output if
+@code{process-live-p} returns @code{nil} while the connection still
+contains data.  Better is to write the loop like this:
+
+@example
+(while (or (accept-process-output process)
+           (process-live-p process)))
+@end example
+
 @node Processes and Threads
 @subsection Processes and Threads
 @cindex processes, threads
diff --git a/lisp/net/tramp-adb.el b/lisp/net/tramp-adb.el
index e2275bee2a..ca47601e4b 100644
--- a/lisp/net/tramp-adb.el
+++ b/lisp/net/tramp-adb.el
@@ -206,9 +206,9 @@ tramp-adb-parse-device-names
 	(tramp-message v 6 "%s" (mapconcat 'identity (process-command p) " "))
 	(process-put p 'adjust-window-size-function 'ignore)
 	(set-process-query-on-exit-flag p nil)
-	(while (process-live-p p)
-	  (accept-process-output p 0.1))
-	(accept-process-output p 0.1)
+	;; FIXME: Either remove " 0.1", or comment why it's needed.
+	(while (or (accept-process-output p 0.1)
+		   (process-live-p p)))
 	(tramp-message v 6 "\n%s" (buffer-string))
 	(goto-char (point-min))
 	(while (search-forward-regexp "^\\(\\S-+\\)[[:space:]]+device$" nil t)
diff --git a/lisp/net/tramp-rclone.el b/lisp/net/tramp-rclone.el
index 0aa110d9a4..7366057296 100644
--- a/lisp/net/tramp-rclone.el
+++ b/lisp/net/tramp-rclone.el
@@ -183,9 +183,9 @@ tramp-rclone-parse-device-names
 	  (tramp-message v 6 "%s" (mapconcat 'identity (process-command p) " "))
 	  (process-put p 'adjust-window-size-function 'ignore)
 	  (set-process-query-on-exit-flag p nil)
-	  (while (process-live-p p)
-	    (accept-process-output p 0.1))
-	  (accept-process-output p 0.1)
+	  ;; FIXME: Either remove " 0.1", or comment why it's needed.
+	  (while (or (accept-process-output p 0.1)
+	             (process-live-p p)))
 	  (tramp-message v 6 "\n%s" (buffer-string))
 	  (goto-char (point-min))
 	  (while (search-forward-regexp "^\\(\\S-+\\):$" nil t)
diff --git a/lisp/net/tramp-smb.el b/lisp/net/tramp-smb.el
index 0c42a5d1a5..abf3248a35 100644
--- a/lisp/net/tramp-smb.el
+++ b/lisp/net/tramp-smb.el
@@ -2038,10 +2038,10 @@ tramp-smb-wait-for-output
       ;; Algorithm: get waiting output.  See if last line contains
       ;; `tramp-smb-prompt' sentinel or `tramp-smb-errors' strings.
       ;; If not, wait a bit and again get waiting output.
-      (while (and (not found) (not err) (process-live-p p))
-
-	;; Accept pending output.
-	(tramp-accept-process-output p 0.1)
+      ;; FIXME: Either remove " 0.1", or comment why it's needed.
+      (while (and (not found) (not err)
+		  (or (tramp-accept-process-output p 0.1)
+		      (process-live-p p)))
 
 	;; Search for prompt.
 	(goto-char (point-min))
@@ -2052,10 +2052,13 @@ tramp-smb-wait-for-output
 	(setq err (re-search-forward tramp-smb-errors nil t)))
 
       ;; When the process is still alive, read pending output.
-      (while (and (not found) (process-live-p p))
-
-	;; Accept pending output.
-	(tramp-accept-process-output p 0.1)
+      ;; FIXME: This loop should be folded into the previous loop.
+      ;; Also, ERR should be set just once, after the combined
+      ;; loop has finished.
+      ;; FIXME: Either remove " 0.1", or comment why it's needed.
+      (while (and (not found)
+		  (or (tramp-accept-process-output p 0.1)
+		      (process-live-p p)))
 
 	;; Search for prompt.
 	(goto-char (point-min))
diff --git a/lisp/net/tramp-sudoedit.el b/lisp/net/tramp-sudoedit.el
index cab9b8d835..e1e5ab091a 100644
--- a/lisp/net/tramp-sudoedit.el
+++ b/lisp/net/tramp-sudoedit.el
@@ -746,6 +746,8 @@ tramp-sudoedit-handle-write-region
 (defun tramp-sudoedit-action-sudo (proc vec)
   "Check, whether a sudo process copy has finished."
   ;; There might be pending output for the exit status.
+  ;; FIXME: Either remove " 0.1", or comment why it's needed.
+  ;; FIXME: There's a race here.  Shouldn't the next two lines be interchanged?
   (tramp-accept-process-output proc 0.1)
   (when (not (process-live-p proc))
     ;; Delete narrowed region, it would be in the way reading a Lisp form.
diff --git a/lisp/net/tramp.el b/lisp/net/tramp.el
index 437e2d19b9..7632d656a0 100644
--- a/lisp/net/tramp.el
+++ b/lisp/net/tramp.el
@@ -3977,6 +3977,8 @@ tramp-action-process-alive
 (defun tramp-action-out-of-band (proc vec)
   "Check, whether an out-of-band copy has finished."
   ;; There might be pending output for the exit status.
+  ;; FIXME: Either remove " 0.1", or comment why it's needed.
+  ;; FIXME: Shouldn't the following line be wrapped inside (while ...)?
   (tramp-accept-process-output proc 0.1)
   (cond ((and (not (process-live-p proc))
 	      (zerop (process-exit-status proc)))
@@ -4821,9 +4823,10 @@ tramp-interrupt-process
 	;; Wait, until the process has disappeared.  If it doesn't,
 	;; fall back to the default implementation.
 	(with-timeout (1 (ignore))
-	  (while (process-live-p proc)
-	    ;; We cannot run `tramp-accept-process-output', it blocks timers.
-	    (accept-process-output proc 0.1))
+	  ;; We cannot run `tramp-accept-process-output', it blocks timers.
+	  ;; FIXME: Either remove " 0.1", or comment why it's needed.
+	  (while (or (accept-process-output proc 0.1)
+		     (process-live-p proc)))
 	  ;; Report success.
 	  proc)))))
 
diff --git a/test/src/process-tests.el b/test/src/process-tests.el
index 514bd04da4..5dbf441e8c 100644
--- a/test/src/process-tests.el
+++ b/test/src/process-tests.el
@@ -207,8 +207,8 @@ process-tests--mixable
                     :sentinel #'ignore
                     :noquery t
                     :connection-type 'pipe)))
-      (while (process-live-p process)
-        (accept-process-output process))
+      (while (or (accept-process-output process)
+		 (process-live-p process)))
       (should (eq (process-status process) 'exit))
       (should (eq (process-exit-status process) 0))
       (should (process-tests--mixable (string-to-list (buffer-string))
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: some accept-process-output races fixed; Tramp FIXMEs
  2019-01-15 18:33 some accept-process-output races fixed; Tramp FIXMEs Paul Eggert
@ 2019-01-16  8:32 ` Michael Albinus
  2019-01-16  8:38   ` Paul Eggert
  2019-01-16 13:03   ` Michael Albinus
  2019-01-16 13:25 ` Stefan Monnier
  1 sibling, 2 replies; 23+ messages in thread
From: Michael Albinus @ 2019-01-16  8:32 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Emacs development discussions

Paul Eggert <eggert@cs.ucla.edu> writes:

Hi Paul,

> I looked for similar instances of the bug elsewhere in Emacs,
> and found what appear to be some instances in Tramp code, along with
> some mysterious "0.1"s that may be inadequate attempts to work around
> the bug. I installed the attached patch to try to document and fix
> this mess, including some FIXMEs about those "0.1"s; I hope Michael or
> someone else with Tramp expertise (I don't use Tramp) can take a look
> at those FIXMEs and at my other Tramp changes, and perhaps at all
> other Tramp uses of accept-process-output and of
> tramp-accept-process-output.

Thanks for the patch. Yes, occasionally there are some problems with
accept-process-output in Tramp, hopefully your patch stabilizes
this. Due to these problems, test tramp-test43-asynchronous-requests has
been tagged as :unstable. I will observe how it goes on.

The 0.1 sec timeout in accept-process-output was not introduced for this
reason, IIRC. It shall prevent accept-process-output calls from waiting
forever for new process output which won't happen. Tramp was blocked by
this, w/o the 0.1 sec timeout.

I will go through your other FIXMEs, whether there could be arranged
something better.

Best regards, Michael.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: some accept-process-output races fixed; Tramp FIXMEs
  2019-01-16  8:32 ` Michael Albinus
@ 2019-01-16  8:38   ` Paul Eggert
  2019-01-16  8:49     ` Michael Albinus
  2019-01-16 13:03   ` Michael Albinus
  1 sibling, 1 reply; 23+ messages in thread
From: Paul Eggert @ 2019-01-16  8:38 UTC (permalink / raw)
  To: Michael Albinus; +Cc: Emacs development discussions

Michael Albinus wrote:
> occasionally there are some problems with
> accept-process-output in Tramp, hopefully your patch stabilizes
> this.

In doing some code review at the C level, I found some related races inside 
accept-process-output and fixed them recently, in commit 
2019-01-16T07:51:45!eggert@cs.ucla.edu. These have to do with signals arriving 
at inopportune times, which should be quite unlikely but I thought I'd give you 
a heads-up anyway.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: some accept-process-output races fixed; Tramp FIXMEs
  2019-01-16  8:38   ` Paul Eggert
@ 2019-01-16  8:49     ` Michael Albinus
  0 siblings, 0 replies; 23+ messages in thread
From: Michael Albinus @ 2019-01-16  8:49 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Emacs development discussions

Paul Eggert <eggert@cs.ucla.edu> writes:

Hi Paul,

>> occasionally there are some problems with
>> accept-process-output in Tramp, hopefully your patch stabilizes
>> this.
>
> In doing some code review at the C level, I found some related races
> inside accept-process-output and fixed them recently, in commit
> 2019-01-16T07:51:45!eggert@cs.ucla.edu. These have to do with signals
> arriving at inopportune times, which should be quite unlikely but I
> thought I'd give you a heads-up anyway.

Thanks. I will re-enable tramp-test43-asynchronous-requests on EMBA,
because it is simpler to see failed test runs, and I won't disturb too
many people yet. After some weeks we shall know, whether the problem has
gone. Due to its nature, I couldn't hit it ever with debugging.

Best regards, Michael.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: some accept-process-output races fixed; Tramp FIXMEs
  2019-01-16  8:32 ` Michael Albinus
  2019-01-16  8:38   ` Paul Eggert
@ 2019-01-16 13:03   ` Michael Albinus
  1 sibling, 0 replies; 23+ messages in thread
From: Michael Albinus @ 2019-01-16 13:03 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Emacs development discussions

Michael Albinus <michael.albinus@gmx.de> writes:

Hi Paul,

> I will go through your other FIXMEs, whether there could be arranged
> something better.

Done. By this, I could find a serious bug (tramp-accept-process-output
didn't return a proper result). Not all of your FIXMEs needed code
adaption, 'tho.

Thanks again, and best regards, Michael.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: some accept-process-output races fixed; Tramp FIXMEs
  2019-01-15 18:33 some accept-process-output races fixed; Tramp FIXMEs Paul Eggert
  2019-01-16  8:32 ` Michael Albinus
@ 2019-01-16 13:25 ` Stefan Monnier
  2019-01-16 18:39   ` Paul Eggert
  1 sibling, 1 reply; 23+ messages in thread
From: Stefan Monnier @ 2019-01-16 13:25 UTC (permalink / raw)
  To: emacs-devel

> Now that I'm thinking about it, I should also mention that a single
> (accept-process-output P) is not guaranteed to slurp all the data from
> process P even if P has already exited. Just a word to the wise....

So how can the code tell when it has read all the data?


        Stefan




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: some accept-process-output races fixed; Tramp FIXMEs
  2019-01-16 13:25 ` Stefan Monnier
@ 2019-01-16 18:39   ` Paul Eggert
  2019-01-16 19:07     ` Stefan Monnier
  2019-01-20  9:57     ` Michael Albinus
  0 siblings, 2 replies; 23+ messages in thread
From: Paul Eggert @ 2019-01-16 18:39 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Michael Albinus, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1024 bytes --]

On 1/16/19 5:25 AM, Stefan Monnier wrote:
 > So how can the code tell when it has read all the data?

The code keeps calling accept-process-output until it gets nil.

Now that I think about it, if you just want all the data you needn’t 
call process-live-p. I installed the attached doc patch accordingly.

The Tramp code still contains several loops like this:

   (while (or (accept-process-output p 0.1)
              (process-live-p p)))

that suffer from race conditions. Consider the following sequence of events:

* accept-process-output times out after 0.1 seconds, and returns nil.
* P generates some data and then exits.
* process-live-p returns nil.

In this case the loop will exit and lose data. This bug is caused by the 
" 0.1" in that loop. I don’t know why the " 0.1" is there, but if the " 
0.1" has to be there then I suppose one way to fix the bug would be to 
enhance accept-process-output so that its caller can distinguish a 
timeout from a connection-closed.

[-- Attachment #2: 0001-doc-lispref-processes.texi-Accepting-Output-Simplify.patch --]
[-- Type: text/x-patch, Size: 1176 bytes --]

From b76e90d242e9ee0a733fc9d4b6e280eb6892a6d0 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Wed, 16 Jan 2019 10:31:21 -0800
Subject: [PATCH] * doc/lispref/processes.texi (Accepting Output): Simplify.

---
 doc/lispref/processes.texi | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/doc/lispref/processes.texi b/doc/lispref/processes.texi
index afda8aede8..fd6686e882 100644
--- a/doc/lispref/processes.texi
+++ b/doc/lispref/processes.texi
@@ -1870,13 +1870,13 @@ Accepting Output
 @end example
 
 @noindent
-will often work, it has a race condition and can miss some output if
-@code{process-live-p} returns @code{nil} while the connection still
-contains data.  Better is to write the loop like this:
+will often read all output from @var{process}, it has a race condition
+and can miss some output if @code{process-live-p} returns @code{nil}
+while the connection still contains data.  Better is to write the loop
+like this:
 
 @example
-(while (or (accept-process-output process)
-           (process-live-p process)))
+(while (accept-process-output process))
 @end example
 
 @node Processes and Threads
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: some accept-process-output races fixed; Tramp FIXMEs
  2019-01-16 18:39   ` Paul Eggert
@ 2019-01-16 19:07     ` Stefan Monnier
  2019-01-17  0:15       ` Paul Eggert
  2019-01-20  9:57     ` Michael Albinus
  1 sibling, 1 reply; 23+ messages in thread
From: Stefan Monnier @ 2019-01-16 19:07 UTC (permalink / raw)
  To: emacs-devel

>> So how can the code tell when it has read all the data?
> The code keeps calling accept-process-output until it gets nil.

You mean without a timeout, right?
With a timeout a value of nil doesn't guarantee you've had all the
data (if the process is still live).

I think we should make sure that

    (if (not (process-live-p p))
        (progn
          (accept-process-output p)
          ;; Here we should be sure that we processed all the output

whereas IIUC currently we need

    (if (not (process-live-p p))
        (progn
          (while (accept-process-output p))
          ;; Here we are presumably sure that we processed all the output

Right?


        Stefan




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: some accept-process-output races fixed; Tramp FIXMEs
  2019-01-16 19:07     ` Stefan Monnier
@ 2019-01-17  0:15       ` Paul Eggert
  0 siblings, 0 replies; 23+ messages in thread
From: Paul Eggert @ 2019-01-17  0:15 UTC (permalink / raw)
  To: emacs-devel

On 1/16/19 11:07 AM, Stefan Monnier wrote:
> I think we should make sure that
>
>      (if (not (process-live-p p))
>          (progn
>            (accept-process-output p)
>            ;; Here we should be sure that we processed all the output
>
> whereas IIUC currently we need
>
>      (if (not (process-live-p p))
>          (progn
>            (while (accept-process-output p))
>            ;; Here we are presumably sure that we processed all the output

Both those approaches should work now. In my earlier email I was worried 
about the slightly-different case where P exits just before 
(accept-process-output P DELAY) is called. In this case 
accept-process-output might not slurp all the data from P even if P has 
already exited. However, in the above scenarios, where Emacs "knows" 
that P has exited before (accept-process-output P) is called with no 
DELAY, no data should go missing on any sane POSIX platform (though 
admittedly I can't cite chapter and verse on this and I suspect POSIX 
does not strictly require it).

However, I'm still puzzled by Tramp's use of (while (or 
(accept-process-output p 0.1) (process-live-p p))) to get all P's 
output. Why not use (while (accept-process-output p)) instead, which is 
simpler and avoids the race condition I mentioned? Maybe something to do 
with timers? I'm at a loss there.




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: some accept-process-output races fixed; Tramp FIXMEs
  2019-01-16 18:39   ` Paul Eggert
  2019-01-16 19:07     ` Stefan Monnier
@ 2019-01-20  9:57     ` Michael Albinus
  2019-01-21 23:02       ` Daniel Colascione
                         ` (3 more replies)
  1 sibling, 4 replies; 23+ messages in thread
From: Michael Albinus @ 2019-01-20  9:57 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Stefan Monnier, emacs-devel

Paul Eggert <eggert@cs.ucla.edu> writes:

Hi Paul,

> The Tramp code still contains several loops like this:
>
>   (while (or (accept-process-output p 0.1)
>              (process-live-p p)))
>
> that suffer from race conditions. Consider the following sequence of events:
>
> * accept-process-output times out after 0.1 seconds, and returns nil.
> * P generates some data and then exits.
> * process-live-p returns nil.
>
> In this case the loop will exit and lose data. This bug is caused by
> the " 0.1" in that loop. I don’t know why the " 0.1" is there, but if
> the " 0.1" has to be there then I suppose one way to fix the bug would
> be to enhance accept-process-output so that its caller can distinguish
> a timeout from a connection-closed.

IIRC, the timeout is used because accept-process-output could be blocked
otherwise. Tramp has often the need to check whether there is still
output from the process, and the timeout is the only way I know that
accept-process-output returns in finite time.

At least this was the case 15+ years ago, and also supporting XEmacs. If
we could guarantee that (accept-process-output p) returns when either
output has arrived and or the process has finished, I could change the
code. This promise must be kept for all Emacs versions since 24, and for
all platforms, including w32.

Best regards, Michael.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: some accept-process-output races fixed; Tramp FIXMEs
  2019-01-20  9:57     ` Michael Albinus
@ 2019-01-21 23:02       ` Daniel Colascione
  2019-01-22 16:54         ` Eli Zaretskii
  2019-01-22  7:09       ` Paul Eggert
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 23+ messages in thread
From: Daniel Colascione @ 2019-01-21 23:02 UTC (permalink / raw)
  To: Michael Albinus, Paul Eggert; +Cc: Stefan Monnier, emacs-devel

On 1/20/19 4:57 AM, Michael Albinus wrote:
> Paul Eggert <eggert@cs.ucla.edu> writes:
> 
> Hi Paul,
> 
>> The Tramp code still contains several loops like this:
>>
>>    (while (or (accept-process-output p 0.1)
>>               (process-live-p p)))
>>
>> that suffer from race conditions. Consider the following sequence of events:
>>
>> * accept-process-output times out after 0.1 seconds, and returns nil.
>> * P generates some data and then exits.
>> * process-live-p returns nil.
>>
>> In this case the loop will exit and lose data. This bug is caused by
>> the " 0.1" in that loop. I don’t know why the " 0.1" is there, but if
>> the " 0.1" has to be there then I suppose one way to fix the bug would
>> be to enhance accept-process-output so that its caller can distinguish
>> a timeout from a connection-closed.
> 
> IIRC, the timeout is used because accept-process-output could be blocked
> otherwise. Tramp has often the need to check whether there is still
> output from the process, and the timeout is the only way I know that
> accept-process-output returns in finite time.
> 
> At least this was the case 15+ years ago, and also supporting XEmacs. If
> we could guarantee that (accept-process-output p) returns when either
> output has arrived and or the process has finished, I could change the
> code. This promise must be kept for all Emacs versions since 24, and for
> all platforms, including w32.

In general, timeout arguments that mean "I really want to just poll" 
should be exactly zero.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: some accept-process-output races fixed; Tramp FIXMEs
  2019-01-20  9:57     ` Michael Albinus
  2019-01-21 23:02       ` Daniel Colascione
@ 2019-01-22  7:09       ` Paul Eggert
  2019-01-22 16:56       ` Eli Zaretskii
  2019-01-22 22:06       ` Stefan Monnier
  3 siblings, 0 replies; 23+ messages in thread
From: Paul Eggert @ 2019-01-22  7:09 UTC (permalink / raw)
  To: Michael Albinus; +Cc: Stefan Monnier, emacs-devel

Michael Albinus wrote:
>>    (while (or (accept-process-output p 0.1)
>>               (process-live-p p)))
>>
>> that suffer from race conditions. Consider the following sequence of events:
>>
>> * accept-process-output times out after 0.1 seconds, and returns nil.
>> * P generates some data and then exits.
>> * process-live-p returns nil.
>>
>> In this case the loop will exit and lose data. This bug is caused by
>> the " 0.1" in that loop. I don’t know why the " 0.1" is there, but if
>> the " 0.1" has to be there then I suppose one way to fix the bug would
>> be to enhance accept-process-output so that its caller can distinguish
>> a timeout from a connection-closed.
> IIRC, the timeout is used because accept-process-output could be blocked
> otherwise. Tramp has often the need to check whether there is still
> output from the process, and the timeout is the only way I know that
> accept-process-output returns in finite time.

Why does it matter how long accept-process-output takes, when the call is in 
that loop? Even if accept-process-output returns in 0.1 seconds, the loop will 
just call it again if the process is live. And if the process is dead, 
accept-process-output cannot block.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: some accept-process-output races fixed; Tramp FIXMEs
  2019-01-21 23:02       ` Daniel Colascione
@ 2019-01-22 16:54         ` Eli Zaretskii
  0 siblings, 0 replies; 23+ messages in thread
From: Eli Zaretskii @ 2019-01-22 16:54 UTC (permalink / raw)
  To: Daniel Colascione; +Cc: eggert, michael.albinus, monnier, emacs-devel

> From: Daniel Colascione <dancol@dancol.org>
> Date: Mon, 21 Jan 2019 18:02:22 -0500
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>, emacs-devel@gnu.org
> 
> In general, timeout arguments that mean "I really want to just poll" 
> should be exactly zero.

accept-process-output supports such polling.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: some accept-process-output races fixed; Tramp FIXMEs
  2019-01-20  9:57     ` Michael Albinus
  2019-01-21 23:02       ` Daniel Colascione
  2019-01-22  7:09       ` Paul Eggert
@ 2019-01-22 16:56       ` Eli Zaretskii
  2019-01-22 18:58         ` Michael Albinus
  2019-01-22 22:06       ` Stefan Monnier
  3 siblings, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2019-01-22 16:56 UTC (permalink / raw)
  To: Michael Albinus; +Cc: eggert, monnier, emacs-devel

> From: Michael Albinus <michael.albinus@gmx.de>
> Date: Sun, 20 Jan 2019 10:57:16 +0100
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>, emacs-devel@gnu.org
> 
> IIRC, the timeout is used because accept-process-output could be blocked
> otherwise. Tramp has often the need to check whether there is still
> output from the process, and the timeout is the only way I know that
> accept-process-output returns in finite time.
> 
> At least this was the case 15+ years ago, and also supporting XEmacs. If
> we could guarantee that (accept-process-output p) returns when either
> output has arrived and or the process has finished, I could change the
> code. This promise must be kept for all Emacs versions since 24, and for
> all platforms, including w32.

I don't know about XEmacs (is that still a concern in Tramp?), but in
Emacs you can poll a process for output with accept-process-output
like this:

  (accept-process-output PROC 0)

AFAICT, this worked in Emacs 24.1 as well.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: some accept-process-output races fixed; Tramp FIXMEs
  2019-01-22 16:56       ` Eli Zaretskii
@ 2019-01-22 18:58         ` Michael Albinus
  0 siblings, 0 replies; 23+ messages in thread
From: Michael Albinus @ 2019-01-22 18:58 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: eggert, monnier, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

Hi Eli,

>> At least this was the case 15+ years ago, and also supporting XEmacs. If
>> we could guarantee that (accept-process-output p) returns when either
>> output has arrived and or the process has finished, I could change the
>> code. This promise must be kept for all Emacs versions since 24, and for
>> all platforms, including w32.
>
> I don't know about XEmacs (is that still a concern in Tramp?), but in
> Emacs you can poll a process for output with accept-process-output
> like this:
>
>   (accept-process-output PROC 0)
>
> AFAICT, this worked in Emacs 24.1 as well.

XEmacs has gone for Tramp, two years ago. I will check whether this
works for Tramp, but it must wait a little bit. These days I'm concerned
about the smb method. Recent smbclient, with all combinations of
SMB1/SMB2/SMB3 protocol versions, client and server, is a disaster for
Tramp. I must puzzle this out, somehow.

After that, I will see what I can do with accept-process-output.
Avoiding timeouts promises speedup. Let's see.

Best regards, Michael.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: some accept-process-output races fixed; Tramp FIXMEs
  2019-01-20  9:57     ` Michael Albinus
                         ` (2 preceding siblings ...)
  2019-01-22 16:56       ` Eli Zaretskii
@ 2019-01-22 22:06       ` Stefan Monnier
  2019-01-22 22:45         ` Michael Albinus
  3 siblings, 1 reply; 23+ messages in thread
From: Stefan Monnier @ 2019-01-22 22:06 UTC (permalink / raw)
  To: Michael Albinus; +Cc: Paul Eggert, emacs-devel

> IIRC, the timeout is used because accept-process-output could be blocked
> otherwise. Tramp has often the need to check whether there is still
> output from the process, and the timeout is the only way I know that
> accept-process-output returns in finite time.

The time out makes sense when you have *other* things to do (so you're
just polling, making sure those other things don't prevent Emacs from
processing the process/connection's output in a timely fashion).

> If we could guarantee that (accept-process-output p) returns when
> either output has arrived and or the process has finished, I could
> change the code.

This has been the promise of the docstring "for ever", AFAIK.
Which means that if you have nothing else to do until the process dies,
then you're better off using nil for the timeout.
I don't know if that promise was broken by bugs in the code, OTOH.


        Stefan



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: some accept-process-output races fixed; Tramp FIXMEs
  2019-01-22 22:06       ` Stefan Monnier
@ 2019-01-22 22:45         ` Michael Albinus
  2019-01-23 16:18           ` Eli Zaretskii
  2019-01-23 16:47           ` Stefan Monnier
  0 siblings, 2 replies; 23+ messages in thread
From: Michael Albinus @ 2019-01-22 22:45 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Paul Eggert, emacs-devel

Stefan Monnier <monnier@IRO.UMontreal.CA> writes:

>> If we could guarantee that (accept-process-output p) returns when
>> either output has arrived and or the process has finished, I could
>> change the code.
>
> This has been the promise of the docstring "for ever", AFAIK.
> Which means that if you have nothing else to do until the process dies,
> then you're better off using nil for the timeout.
> I don't know if that promise was broken by bugs in the code, OTOH.

At least the w32 version of accept-process-output was broken in the
past, see the comment in tramp-accept-process-output. But this was Emacs
22 time, so hopefully it is fixed now.

I would need to test with Emacs 24, whether this is fixed. No chance for
*me*; I'm lucky when I can test with a recent Emacs on a w32 system.

Well ... I'll test with Emacs 26, and if it passes the tests, I'll
assume it works also with Emacs 24 and 25.

>         Stefan

Best regards, Michael.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: some accept-process-output races fixed; Tramp FIXMEs
  2019-01-22 22:45         ` Michael Albinus
@ 2019-01-23 16:18           ` Eli Zaretskii
  2019-01-23 18:35             ` Michael Albinus
  2019-01-23 16:47           ` Stefan Monnier
  1 sibling, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2019-01-23 16:18 UTC (permalink / raw)
  To: Michael Albinus; +Cc: eggert, monnier, emacs-devel

> From: Michael Albinus <michael.albinus@gmx.de>
> Date: Tue, 22 Jan 2019 23:45:26 +0100
> Cc: Paul Eggert <eggert@cs.ucla.edu>, emacs-devel@gnu.org
> 
> At least the w32 version of accept-process-output was broken in the
> past, see the comment in tramp-accept-process-output. But this was Emacs
> 22 time, so hopefully it is fixed now.

That comment doesn't tell much about the problem, so I couldn't verify
that the problem it mentioned is no longer relevant.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: some accept-process-output races fixed; Tramp FIXMEs
  2019-01-22 22:45         ` Michael Albinus
  2019-01-23 16:18           ` Eli Zaretskii
@ 2019-01-23 16:47           ` Stefan Monnier
  2019-01-23 17:36             ` Paul Eggert
  1 sibling, 1 reply; 23+ messages in thread
From: Stefan Monnier @ 2019-01-23 16:47 UTC (permalink / raw)
  To: Michael Albinus; +Cc: Paul Eggert, emacs-devel

>> This has been the promise of the docstring "for ever", AFAIK.
>> Which means that if you have nothing else to do until the process dies,
>> then you're better off using nil for the timeout.
>> I don't know if that promise was broken by bugs in the code, OTOH.
>
> At least the w32 version of accept-process-output was broken in the
> past, see the comment in tramp-accept-process-output.  But this was Emacs
> 22 time, so hopefully it is fixed now.

I suggest you use a nil timeout in Emacs≥27.  If this reveals there are
still remaining bugs, we should fix them rather than work around them.

If the bugs have all been fixed already, then maybe we could also use
a nil timeout in older Emacsen, but we've lived with 0.1 in them, so
I think it's OK to keep using a suboptimal value there.


        Stefan



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: some accept-process-output races fixed; Tramp FIXMEs
  2019-01-23 16:47           ` Stefan Monnier
@ 2019-01-23 17:36             ` Paul Eggert
  2019-01-28 15:37               ` Michael Albinus
  0 siblings, 1 reply; 23+ messages in thread
From: Paul Eggert @ 2019-01-23 17:36 UTC (permalink / raw)
  To: Stefan Monnier, Michael Albinus; +Cc: emacs-devel

On 1/23/19 8:47 AM, Stefan Monnier wrote:
> If the bugs have all been fixed already, then maybe we could also use
> a nil timeout in older Emacsen, but we've lived with 0.1 in them, so
> I think it's OK to keep using a suboptimal value there.

0.1 is not merely suboptimal (i.e., slower than it should be), it's also 
incorrect (as it can sometimes lose data due to races). Even if we must 
use 0.1 in older Emacs to work around more-serious bugs there, we should 
use nil in Emacs master.




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: some accept-process-output races fixed; Tramp FIXMEs
  2019-01-23 16:18           ` Eli Zaretskii
@ 2019-01-23 18:35             ` Michael Albinus
  2019-01-23 22:19               ` Daniel Pittman
  0 siblings, 1 reply; 23+ messages in thread
From: Michael Albinus @ 2019-01-23 18:35 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: eggert, monnier, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

Hi Eli,

>> At least the w32 version of accept-process-output was broken in the
>> past, see the comment in tramp-accept-process-output. But this was Emacs
>> 22 time, so hopefully it is fixed now.
>
> That comment doesn't tell much about the problem, so I couldn't verify
> that the problem it mentioned is no longer relevant.

Archeology has uncovered
<http://lists.gnu.org/archive/html/tramp-devel/2006-12/msg00007.html>,
which relates to that comment. Sorry, I haven't anything else.

The respective commit in the Tramp repository was CVS 2.516 from
10-Dec-06, if that matters :-)

In that message thread, the OP has mentioned another problem after I've
added the timeout and the comment; this seems to be unrelated.

Best regards, Michael.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: some accept-process-output races fixed; Tramp FIXMEs
  2019-01-23 18:35             ` Michael Albinus
@ 2019-01-23 22:19               ` Daniel Pittman
  0 siblings, 0 replies; 23+ messages in thread
From: Daniel Pittman @ 2019-01-23 22:19 UTC (permalink / raw)
  To: Michael Albinus; +Cc: Eli Zaretskii, eggert, monnier, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1306 bytes --]

For what it is worth, that explanation more or less mirrors my recollection
of why that wasn't a zero to poll, or infinite, timeout.  Working around
broken something; I would have said more likely MacOS-X of the period,
which I think still has a broken implementation of the `poll` syscall, but
I don't fully recall.

(That was still in the era I payed significant attention to TRAMP
development)



On Wed, Jan 23, 2019 at 1:38 PM Michael Albinus <michael.albinus@gmx.de>
wrote:

> Eli Zaretskii <eliz@gnu.org> writes:
>
> Hi Eli,
>
> >> At least the w32 version of accept-process-output was broken in the
> >> past, see the comment in tramp-accept-process-output. But this was Emacs
> >> 22 time, so hopefully it is fixed now.
> >
> > That comment doesn't tell much about the problem, so I couldn't verify
> > that the problem it mentioned is no longer relevant.
>
> Archeology has uncovered
> <http://lists.gnu.org/archive/html/tramp-devel/2006-12/msg00007.html>,
> which relates to that comment. Sorry, I haven't anything else.
>
> The respective commit in the Tramp repository was CVS 2.516 from
> 10-Dec-06, if that matters :-)
>
> In that message thread, the OP has mentioned another problem after I've
> added the timeout and the comment; this seems to be unrelated.
>
> Best regards, Michael.
>
>

[-- Attachment #2: Type: text/html, Size: 1959 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: some accept-process-output races fixed; Tramp FIXMEs
  2019-01-23 17:36             ` Paul Eggert
@ 2019-01-28 15:37               ` Michael Albinus
  0 siblings, 0 replies; 23+ messages in thread
From: Michael Albinus @ 2019-01-28 15:37 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Stefan Monnier, emacs-devel

Paul Eggert <eggert@cs.ucla.edu> writes:

> 0.1 is not merely suboptimal (i.e., slower than it should be), it's
> also incorrect (as it can sometimes lose data due to races). Even if
> we must use 0.1 in older Emacs to work around more-serious bugs there,
> we should use nil in Emacs master.

I've adapted all calls of accept-process-output to use a timeout of
either nil or 0. Running Tramp's test suite does not show relevant
performance differences for most of the backends, except smb. Here we
have a performance boost, (411.009675 sec before, 54.109701 sec after
the change).

The change works on all Emacsen 24-27, tested on GNU/Linux, with the
different backends. So I've taken the risk to apply it w/o compatibility
code. Let's see what people report.

I plan to test it also on OpenBSD and MS Windows, once I find respective
test machines. But this will happen for Emacs 27 only, I believe.

Best regards, Michael.



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2019-01-28 15:37 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-01-15 18:33 some accept-process-output races fixed; Tramp FIXMEs Paul Eggert
2019-01-16  8:32 ` Michael Albinus
2019-01-16  8:38   ` Paul Eggert
2019-01-16  8:49     ` Michael Albinus
2019-01-16 13:03   ` Michael Albinus
2019-01-16 13:25 ` Stefan Monnier
2019-01-16 18:39   ` Paul Eggert
2019-01-16 19:07     ` Stefan Monnier
2019-01-17  0:15       ` Paul Eggert
2019-01-20  9:57     ` Michael Albinus
2019-01-21 23:02       ` Daniel Colascione
2019-01-22 16:54         ` Eli Zaretskii
2019-01-22  7:09       ` Paul Eggert
2019-01-22 16:56       ` Eli Zaretskii
2019-01-22 18:58         ` Michael Albinus
2019-01-22 22:06       ` Stefan Monnier
2019-01-22 22:45         ` Michael Albinus
2019-01-23 16:18           ` Eli Zaretskii
2019-01-23 18:35             ` Michael Albinus
2019-01-23 22:19               ` Daniel Pittman
2019-01-23 16:47           ` Stefan Monnier
2019-01-23 17:36             ` Paul Eggert
2019-01-28 15:37               ` Michael Albinus

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).