bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer

unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed

* bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer
@ 2020-12-08 11:44 João Távora
  2020-12-08 15:39 ` Eli Zaretskii
  0 siblings, 1 reply; 40+ messages in thread
From: João Távora @ 2020-12-08 11:44 UTC (permalink / raw)
  To: 45117, eliz

Hi Maintainers, Eli

When using SLY Common Lisp IDE package for Emacs, a user has recently
reported a strange intermittent error in SLY's asynchronous ElDoc
function.  That function produces documenttion by querying a network
process, a common technique in many such IDEs.  The user found that when
reducing eldoc-idle-delay to 0.1 he could trigger the problem more
often.

The original report lives at
https://github.com/joaotavora/sly/issues/385.

It was triggered with Emacs 27.1, but I have also reproduced it with a
recent master build.

After analysing the problem, I came to the conclusion that given certain
mysterious conditions, process-send-string, which is called from SLY's
`eldoc-documentation-function` will abruptly return non-locally even
though no error or quit seems to have been signalled.

For now, reproducing this means installing SLY, which is easily done
with a git clone of its source at github.com/joaotavora/sly.git.  One
also needs a `lisp` executable pointing to something like SBCL (South
Bank Common Lisp).  Then

   emacs -Q -L . -l sly-autoloads -f sly

should bring up Emacs with SLY and nothing else.  To trigger the error
it's easier to

  M-: (setq eldoc-idle-delay 0.05)
  C-x C-f ~/path/to/sly/slynk/slynk.lisp ;; or some other lisp file

Now, the user should navigate around, triggering ElDoc and seeing the
documentation in the echo area, until one gets an "unexpected reply"
error in the minibufer.

Interestingly, this unexpected reply comes from the fact that the
network process where process-send-string is writing to has already
processed the request (presumably fully), and has answered. Thus the
process's filter function has been run with that output.

Unfortunately, because process-send-string exited prematurely and
non-locally (under line 2 in the sly.el snippet below), the so-called
continuation lambda hasn't been registered (line 3). Thus when the
filter function runs (code not shown) it will fail to find the
continuation and report an unexpected reply.

1  (let ((id (cl-incf (sly-continuation-counter))))
2     (sly-send `(:emacs-rex ,form ,package ,thread ,id ,@extra-options))
3     (push (cons id continuation) (sly-rex-continuations))
4     (sly--refresh-mode-line))

Reading process-send-string's docstring I see no mention of this
possibility, only the fact that output can be accepted between bunches
of strings.  While the latter could happen, I'm pretty sure that the
Lisp network process wouldn't have enough information to work with,
prematurely, so I don't think that's the reason for the unexpected
reply.

Moreover, reading the C source for process.c I do see some worries with
timers in code comments.  However, I cannot correlate those worries with
this use case.

I'll be soon patching this in SLY with an unwind-protect that only adds
the continuation if process-send-string has exited locally.  That should
hide this hard-to-trigger problem, but the underlying problem remains.

Thanks for your attention,
João

PS: Recently, I've also seen similar "silent" non-local-exits with
accept-process-output when called from filter functions, but I will
report these separately once I have better reproduction recipes.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer
  2020-12-08 11:44 bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer João Távora
@ 2020-12-08 15:39 ` Eli Zaretskii
  2020-12-08 15:56   ` João Távora
  0 siblings, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2020-12-08 15:39 UTC (permalink / raw)
  To: João Távora; +Cc: 45117

> From: João Távora <joaotavora@gmail.com>
> Date: Tue, 08 Dec 2020 11:44:39 +0000
> 
> When using SLY Common Lisp IDE package for Emacs, a user has recently
> reported a strange intermittent error in SLY's asynchronous ElDoc
> function.  That function produces documenttion by querying a network
> process, a common technique in many such IDEs.  The user found that when
> reducing eldoc-idle-delay to 0.1 he could trigger the problem more
> often.
> 
> The original report lives at
> https://github.com/joaotavora/sly/issues/385.
> 
> It was triggered with Emacs 27.1, but I have also reproduced it with a
> recent master build.
> 
> After analysing the problem, I came to the conclusion that given certain
> mysterious conditions, process-send-string, which is called from SLY's
> `eldoc-documentation-function` will abruptly return non-locally even
> though no error or quit seems to have been signalled.

Can you elaborate on the evidence you found of this non-local exit?

And does "no error or quit" mean there's no trace of anything abnormal
in the *Messages* buffer?

One possible reason for non-local exit, besides signaling an error, is
stack overflow, although your description doesn't seem to indicate
that this is probable.

One other piece of information that could be relevant is that when
Emacs calls the timer function, it sets inhibit-quit non-nil.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer
  2020-12-08 15:39 ` Eli Zaretskii
@ 2020-12-08 15:56   ` João Távora
  2020-12-08 17:01     ` Eli Zaretskii
  0 siblings, 1 reply; 40+ messages in thread
From: João Távora @ 2020-12-08 15:56 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 45117

Eli Zaretskii <eliz@gnu.org> writes:

>> From: João Távora <joaotavora@gmail.com>
>> Date: Tue, 08 Dec 2020 11:44:39 +0000
>> 
>> When using SLY Common Lisp IDE package for Emacs, a user has recently
>> reported a strange intermittent error in SLY's asynchronous ElDoc
>> function.  That function produces documenttion by querying a network
>> process, a common technique in many such IDEs.  The user found that when
>> reducing eldoc-idle-delay to 0.1 he could trigger the problem more
>> often.
>> 
>> The original report lives at
>> https://github.com/joaotavora/sly/issues/385.
>> 
>> It was triggered with Emacs 27.1, but I have also reproduced it with a
>> recent master build.
>> 
>> After analysing the problem, I came to the conclusion that given certain
>> mysterious conditions, process-send-string, which is called from SLY's
>> `eldoc-documentation-function` will abruptly return non-locally even
>> though no error or quit seems to have been signalled.
>
> Can you elaborate on the evidence you found of this non-local exit?

It was evidenced by M-x trace-function process-send-string RET and also
by substituting the snippet I posted earlier with:

    (unwind-protect
       (progn
         (sly-send `(:emacs-rex ,form ,package ,thread ,id ,@extra-options))
         (setq send-success t))
     (if send-success
         (push (cons id continuation) (sly-rex-continuations))
       (sly-message
        "[issue#385] likely `process-send-string' exited non-locally from timer.")
       (sly-log-event `(:issue-385-sly-send-fishiness :id ,id
                                                      :form
                                                      ,form )
                      sly-dispatching-connection)))

Once in a while, the else branch of the if triggered.

Anyway, here's an observation:

    1 -> (process-send-string #<process sly-1> "00044d(:emacs-rex
    (slynk:autodoc (quote (\"defpackage\" \":slynk\" (\":use\" \":cl\"
    \":slynk-backend\" \":slynk-match\" \":slynk-rpc\") (\":export\"
    \"#:startup-multiprocessing\" \"#:start-server\" \"#:create-server\"
    \"#:stop-server\" \"#:restart-server\" \"#:ed-in-emacs\"
    \"#:inspect-in-emacs\" \"#:print-indentation-lossage\"
    \"#:invoke-sly-debugger\" \"#:slynk-debugger-hook\" \"#:emacs-inspect\"
    \"#:authenticate-client\" \"#:*loopback-interface*\"
    \"#:*buffer-readtable*\" \"#:process-requests\") (\":export\"
    \"#:*communication-style*\" \"#:*dont-close*\"
    \"#:*fasl-pathname-function*\" \"#:*log-events*\" \"#:*log-output*\"
    \"#:*configure-emacs-indentation*\" \"#:*readtable-alist*\"
    \"#:*global-debugger*\" \"#:*sly-db-quit-restart*\"
    \"#:*backtrace-printer-bindings*\"
    \"#:*default-worker-thread-bindings*\"
    \"#:*macroexpand-printer-bindings*\" \"#:*slynk-pprint-bindings*\"
    \"#:*string-elision-length*\" \"#:*inspector-verbose*\"
    \"#:*require-module*\" \"#:*eval-for-emacs-wrappers*\"
    \"#:*debugger-extra-options*\" \"#:*globally-redirect-io*\"
    \"#:*use-dedicated-output-stream*\" \"#:*dedicated-output-stream-port*\"
    slynk::%cursor-marker%))) :print-right-margin 80) \":slynk\" t 100) ") 1
    <- process-send-string: !non-local\ exit!

This was was pretty long, but it can be slower.  Also it was after 100
successful similar calls.

> And does "no error or quit" mean there's no trace of anything abnormal
> in the *Messages* buffer?

Yes, exactly.

> One possible reason for non-local exit, besides signaling an error, is
> stack overflow, although your description doesn't seem to indicate
> that this is probable.

Indeed.

> One other piece of information that could be relevant is that when
> Emacs calls the timer function, it sets inhibit-quit non-nil.

Right, and I added a formatting of quit-flag to the message
call I showed earlier and it came out nil.

   [sly] [issue#385] likely `process-send-string' exited non-locally from timer, also quit-flag = nil.

João





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer
  2020-12-08 15:56   ` João Távora
@ 2020-12-08 17:01     ` Eli Zaretskii
  2020-12-08 17:05       ` João Távora
  2020-12-09 11:24       ` João Távora
  0 siblings, 2 replies; 40+ messages in thread
From: Eli Zaretskii @ 2020-12-08 17:01 UTC (permalink / raw)
  To: João Távora; +Cc: 45117

> From: João Távora <joaotavora@gmail.com>
> Cc: bug-gnu-emacs@gnu.org
> Date: Tue, 08 Dec 2020 15:56:49 +0000
> 
> > Can you elaborate on the evidence you found of this non-local exit?
> 
> It was evidenced by M-x trace-function process-send-string RET and also
> by substituting the snippet I posted earlier with:
> 
>     (unwind-protect
>        (progn
>          (sly-send `(:emacs-rex ,form ,package ,thread ,id ,@extra-options))
>          (setq send-success t))
>      (if send-success
>          (push (cons id continuation) (sly-rex-continuations))
>        (sly-message
>         "[issue#385] likely `process-send-string' exited non-locally from timer.")
>        (sly-log-event `(:issue-385-sly-send-fishiness :id ,id
>                                                       :form
>                                                       ,form )
>                       sly-dispatching-connection)))
> 
> Once in a while, the else branch of the if triggered.
> 
> Anyway, here's an observation:
> 
>     1 -> (process-send-string #<process sly-1> "00044d(:emacs-rex
>     (slynk:autodoc (quote (\"defpackage\" \":slynk\" (\":use\" \":cl\"
>     \":slynk-backend\" \":slynk-match\" \":slynk-rpc\") (\":export\"
>     \"#:startup-multiprocessing\" \"#:start-server\" \"#:create-server\"
>     \"#:stop-server\" \"#:restart-server\" \"#:ed-in-emacs\"
>     \"#:inspect-in-emacs\" \"#:print-indentation-lossage\"
>     \"#:invoke-sly-debugger\" \"#:slynk-debugger-hook\" \"#:emacs-inspect\"
>     \"#:authenticate-client\" \"#:*loopback-interface*\"
>     \"#:*buffer-readtable*\" \"#:process-requests\") (\":export\"
>     \"#:*communication-style*\" \"#:*dont-close*\"
>     \"#:*fasl-pathname-function*\" \"#:*log-events*\" \"#:*log-output*\"
>     \"#:*configure-emacs-indentation*\" \"#:*readtable-alist*\"
>     \"#:*global-debugger*\" \"#:*sly-db-quit-restart*\"
>     \"#:*backtrace-printer-bindings*\"
>     \"#:*default-worker-thread-bindings*\"
>     \"#:*macroexpand-printer-bindings*\" \"#:*slynk-pprint-bindings*\"
>     \"#:*string-elision-length*\" \"#:*inspector-verbose*\"
>     \"#:*require-module*\" \"#:*eval-for-emacs-wrappers*\"
>     \"#:*debugger-extra-options*\" \"#:*globally-redirect-io*\"
>     \"#:*use-dedicated-output-stream*\" \"#:*dedicated-output-stream-port*\"
>     slynk::%cursor-marker%))) :print-right-margin 80) \":slynk\" t 100) ") 1
>     <- process-send-string: !non-local\ exit!

Then my suggestion is to run this under GDB with a breakpoint on the
few places we call sys_longjmp, and when it breaks, show the C and
Lisp backtrace from there.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer
  2020-12-08 17:01     ` Eli Zaretskii
@ 2020-12-08 17:05       ` João Távora
  2020-12-09 11:24       ` João Távora
  1 sibling, 0 replies; 40+ messages in thread
From: João Távora @ 2020-12-08 17:05 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 45117

On Tue, Dec 8, 2020 at 5:01 PM Eli Zaretskii <eliz@gnu.org> wrote:

> Then my suggestion is to run this under GDB with a breakpoint on the
> few places we call sys_longjmp, and when it breaks, show the C and
> Lisp backtrace from there.

Right.  I suspected you'd say that :-)  I wish I had this more oiled up, but
it should be faster to set up than last time.  And likely I can catch
that other accept-process-output thing in the act, too.

Thanks,
João

^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer
  2020-12-08 17:01     ` Eli Zaretskii
  2020-12-08 17:05       ` João Távora
@ 2020-12-09 11:24       ` João Távora
  2020-12-09 15:33         ` Eli Zaretskii
  1 sibling, 1 reply; 40+ messages in thread
From: João Távora @ 2020-12-09 11:24 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 45117

[ We've been CC-ing bug-gnu-emacs@gnu.org for a while.  My fault, the
typical CC blunder.  Wonder how debbugs was dealing with that so
gracefully tho. ]

Eli Zaretskii <eliz@gnu.org> writes:

>> From: João Távora <joaotavora@gmail.com>
>> Cc: bug-gnu-emacs@gnu.org
>> Date: Tue, 08 Dec 2020 15:56:49 +0000
>> 
>> > Can you elaborate on the evidence you found of this non-local exit?
>> 
>> It was evidenced by M-x trace-function process-send-string RET and also
>> by substituting the snippet I posted earlier with:
> few places we call sys_longjmp, and when it breaks, show the C and
> Lisp backtrace from there.

Right, so I got this setup, compiled Emacs 27.1 with all debugging
flags.  I can reproduce it, and even with GDB attached, great.  The
problem is the breakpoints.

If I set breakpoints at _all_ places where we call sys_longjmp(), I risk
tearing down my X, which I did a couple of times.

So I skip those "dangerous" breakpoints.  I'm guessing one of the
interesting loci to break is unwind_to_catch in eval.c.  Of course that
gets called every dang time a signal is thrown, so it's hard for me to
catch the precise situation, even if I set up nicely and then call M-x
redraw-display, and only then enable the breakpoint.

It breaks near immediately, and the `bt` output I get is always from
some other function that expectedly signalled an error as part of its
normal control flow.  (Yeah, maybe I shouldn't be using signals for
normal control flow, but that's another matter.)

So:

1. I have to find a way to set the unwind_to_catch() breakpoint
   conditional on some Elisp/near-elisp context, in this case something
   inside the Elisp function sly-net-send() or Fprocess_send_string.

   Do you think setting a silly global in Fprocess_send_string() and
   then checking that as the breakpoint condition would be a good idea?
   Where would I reset the flag?  Is there some C-version of
   "unwind-protect"?

2. The mysterious long jump may be coming from some other place.  I
   enabled a breakpoint in the sys_longjmp call in
   quit_throw_to_read_char(), but that's not it, I can reproduce the
   error and it doesn't break there.  Then there are some in image.c
   loci that I don't think matter much and the one in alloc.c which
   freezes my X.

3. I set up one of those "tracer" breakpoints for the that you once
   showed me (how did that go?)  They're going to make a LOT of noise,
   but at least we should be able to register the correct
   unwind_to_catch() context.

João

^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer
  2020-12-09 11:24       ` João Távora
@ 2020-12-09 15:33         ` Eli Zaretskii
  2020-12-10 15:00           ` João Távora
  0 siblings, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2020-12-09 15:33 UTC (permalink / raw)
  To: João Távora; +Cc: 45117

> From: João Távora <joaotavora@gmail.com>
> Cc: 45117@debbugs.gnu.org
> Date: Wed, 09 Dec 2020 11:24:47 +0000
> 
> [ We've been CC-ing bug-gnu-emacs@gnu.org for a while.  My fault, the
> typical CC blunder.  Wonder how debbugs was dealing with that so
> gracefully tho. ]

It should deal with this just fine, as long as you keep the same
Subject line.

> If I set breakpoints at _all_ places where we call sys_longjmp(), I risk
> tearing down my X, which I did a couple of times.
> 
> So I skip those "dangerous" breakpoints.  I'm guessing one of the
> interesting loci to break is unwind_to_catch in eval.c.  Of course that
> gets called every dang time a signal is thrown, so it's hard for me to
> catch the precise situation, even if I set up nicely and then call M-x
> redraw-display, and only then enable the breakpoint.

AFAICT, the only relevant call to sys_longjmp is in eval.c.  That is,
if we think Emacs signals an error or otherwise throws to top-level.

> It breaks near immediately, and the `bt` output I get is always from
> some other function that expectedly signalled an error as part of its
> normal control flow.

One simple method of dealing with that is to make GDB continue
immediately after hitting the breakpoint:

  break eval.c:NNNN
  commands
  > bt
  > continue
  > end

(the ">" prompt is printed by GDB).  Then you will have a lot of
backtraces, but only the last one will be relevant.  This simple
method has a disadvantage that it slows down Emacs, and also produces
a lot of possibly uninteresting stuff.

> 1. I have to find a way to set the unwind_to_catch() breakpoint
>    conditional on some Elisp/near-elisp context, in this case something
>    inside the Elisp function sly-net-send() or Fprocess_send_string.
> 
>    Do you think setting a silly global in Fprocess_send_string() and
>    then checking that as the breakpoint condition would be a good idea?
>    Where would I reset the flag?  Is there some C-version of
>    "unwind-protect"?

The C version of unwind-protect is record_unwind_protect.

But I think it will be easier to use an existing variable that is
usually not touched.  For example, you could piggy-back
bidi-inhibit-bpa, which is normally nil.  On the C level, this is a
bool variable bidi_inhibit_bpa, which is normally zero.  So, you could
wrap the problematic Lisp fragment with

  (let ((bidi-inhibit-bpa t))
    ....
    )

and then make the breakpoint conditional on that:

  break eval.c:NNNN if bidi_inhibit_bpa != 0

The advantage of this is that when the let-form unwinds, the variable
will be automatically reset (again, if we believe the theory of
signal/throw that cause the non-local exit).

HTH

^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer
  2020-12-09 15:33         ` Eli Zaretskii
@ 2020-12-10 15:00           ` João Távora
  2020-12-10 15:23             ` Eli Zaretskii
  0 siblings, 1 reply; 40+ messages in thread
From: João Távora @ 2020-12-10 15:00 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 45117

Eli Zaretskii <eliz@gnu.org> writes:

> AFAICT, the only relevant call to sys_longjmp is in eval.c.  That is,
> if we think Emacs signals an error or otherwise throws to top-level.

I thought that, but now I'm confused.  I'm uncertain about possible,
different ways of "exiting non-locally" from a function, which I define
by (foo) running and (bar) never running in (progn (foo) (bar)).  When
that happens, (foo) has exited non-locally.

As far as I know, Elisp has no CL-style TAGBODY or GO, right?  So indeed
I would expect that throw/catch/signal things at the C-level are the
only possible responsibles for these situations.

>   break eval.c:NNNN
>   commands
>   > bt
>   > continue
>   > end
>
> (the ">" prompt is printed by GDB).  Then you will have a lot of
> backtraces, but only the last one will be relevant.  This simple
> method has a disadvantage that it slows down Emacs, and also produces
> a lot of possibly uninteresting stuff.

Thanks.  That's the "tracer" strategy I remember you telling me.  It was
useful in the past, not so much here.

>> 1. I have to find a way to set the unwind_to_catch() breakpoint
>>    conditional on some Elisp/near-elisp context, in this case something
>>    inside the Elisp function sly-net-send() or Fprocess_send_string.
>> 
>>    Do you think setting a silly global in Fprocess_send_string() and
>>    then checking that as the breakpoint condition would be a good idea?
>>    Where would I reset the flag?  Is there some C-version of
>>    "unwind-protect"?
>
> The C version of unwind-protect is record_unwind_protect.
>
> But I think it will be easier to use an existing variable that is
> usually not touched.  For example, you could piggy-back
> bidi-inhibit-bpa,

That's an excellent idea, and I've verified that it works.  But it
didn't help here.  Or rather, not in the way I had anticipated.  It did
help me determine that unwind_to_catch() doesn't seem to be the only
responsible for the non-local exit.

To be clear, I now have this that I put around the "suspicious" places:

   (cl-defmacro DEBUG-45117 ((message) &rest body)
     (declare (indent defun))
     (let ((var (cl-gensym)))
       `(let ((,var nil)
              (bidi-inhibit-bpa t)) ; for your conditional break trick
          (unwind-protect
              (prog1 (progn ,@body)
                (setq ,var t))
            (unless ,var
              (message ,message))))))

Here's how I use it in sly.el, in the code that's called from the idle
timer.

     (defun sly-net-send (sexp proc)
       "Send a SEXP to Lisp over the socket PROC.
     This is the lowest level of communication. The sexp will be READ and
     EVAL'd by Lisp."
       (DEBUG-45117 ("SOMETHING in SLY-NET-SEND bailed")
         (let* ((print-circle nil)
                (print-quoted nil)
                (payload (DEBUG-45117 ("ENCODE-CODING-STRING????")
                           (encode-coding-string
                            (concat (sly-prin1-to-string sexp) "\n")
                            'utf-8-unix)))
                (string (DEBUG-45117 ("LENGTH-ENCODING????")
                          (concat (sly-net-encode-length (length payload))
                                  payload))))
           (DEBUG-45117 ("PROCESS-SEND-STRING?????")
             (process-send-string proc string)))))

I then launch Emacs as I explained earlier:

   gdb -i=mi --args ~/Source/Emacs/emacs-27/src/emacs -Q   \
    -L ~/Source/Emacs/sly                                  \
    -l sly-autoloads                                       \
    -f sly                                                 \
    --eval "(setq eldoc-idle-delay 0.01)"                  \
    ~/Source/Emacs/sly/slynk/slynk.lisp                    

Then ensure that breakpoints looks more or less like this (a couple more
than the one you recommended there.)

    1       breakpoint     keep y   0x00005555557e2580 in terminate_due_to_signal at emacs.c:378
    2       breakpoint     keep y   0x000055555576f4f5 in x_error_quitter at xterm.c:10131
    3       breakpoint     keep y   0x00005555555aa32d in Fredraw_display at dispnew.c:3123
            breakpoint already hit 1 time
    6       breakpoint     keep y   0x0000555555966de5 in unwind_to_catch at eval.c:1178
            stop only if bidi_inhibit_bpa != 0
    7       breakpoint     keep y   0x000055555580b985 in quit_throw_to_read_char at keyboard.c:10970
            stop only if bidi_inhibit_bpa != 0
    10      breakpoint     keep y   0x0000555555963f1a in call_debugger at eval.c:283
            stop only if bidi_inhibit_bpa != 0

Then 'r' to run,  then start the debugging process I explained,
basically just scroll up and down in the slynk.lisp  file.  After a
while, in *Messages*, some of these start appearing.

     ENCODE-CODING-STRING????
     SOMETHING in SLY-NET-SEND bailed
     [sly] [issue#385] likely `process-send-string' exited non-locally from timer.

       ... more scrolling ... 

     SOMETHING in SLY-NET-SEND bailed
     [aly] [issue#385] likely `process-send-string' exited non-locally from timer. [2 times]

Note that ENCODE-CODING-STRING???? is missing from the second
observation!  In this last session I didn't capture the
"PROCESS-SEND-STRING???", but I'm pretty sure I have in the past.

It does seem though, that contrary to my original expectation, this is
not exclusive to process-send-string, but it happens in normal elisp
execution from quickly firing idle timers.

Anyway.

1. Shouldn't all of these have triggered the breakpoint??  I'm setting
   the Elisp/C variable in the macro.  I tested the technique
   separately.

2. Are we sure that no other mechanisms other than throw/catch/signal
   can trigger a non-local exit (that unwind-protect can still somehow
   catch?).

Thanks for any insight you may have,
João

^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer
  2020-12-10 15:00           ` João Távora
@ 2020-12-10 15:23             ` Eli Zaretskii
  2020-12-10 16:15               ` João Távora
  0 siblings, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2020-12-10 15:23 UTC (permalink / raw)
  To: João Távora; +Cc: 45117

> From: João Távora <joaotavora@gmail.com>
> Cc: 45117@debbugs.gnu.org
> Date: Thu, 10 Dec 2020 15:00:58 +0000
> 
>     6       breakpoint     keep y   0x0000555555966de5 in unwind_to_catch at eval.c:1178
>             stop only if bidi_inhibit_bpa != 0

You have put the breakpoint at the point where sys_longjmp is about to
be called, right?  But all the unwind forms are already done at that
point, so I guess bidi_inhibit_bpa is again zero, and the breakpoint
doesn't break.  So I suggest to move the breakpoint before the
do-while loop in unwind_to_catch:

  do
    {
      /* Unwind the specpdl stack, and then restore the proper set of
	 handlers.  */
      unbind_to (handlerlist->pdlcount, Qnil);
      last_time = handlerlist == catch;
      if (! last_time)
	handlerlist = handlerlist->next;
    }
  while (! last_time);

> 1. Shouldn't all of these have triggered the breakpoint??  I'm setting
>    the Elisp/C variable in the macro.  I tested the technique
>    separately.
> 
> 2. Are we sure that no other mechanisms other than throw/catch/signal
>    can trigger a non-local exit (that unwind-protect can still somehow
>    catch?).

Let's see if it works to move the breakpoint location, and take it
from there.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer
  2020-12-10 15:23             ` Eli Zaretskii
@ 2020-12-10 16:15               ` João Távora
  2020-12-10 16:29                 ` João Távora
  2020-12-10 16:41                 ` Eli Zaretskii
  0 siblings, 2 replies; 40+ messages in thread
From: João Távora @ 2020-12-10 16:15 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 45117

Eli Zaretskii <eliz@gnu.org> writes:

>> From: João Távora <joaotavora@gmail.com>
>> Cc: 45117@debbugs.gnu.org
>> Date: Thu, 10 Dec 2020 15:00:58 +0000
>> 
>>     6       breakpoint     keep y   0x0000555555966de5 in unwind_to_catch at eval.c:1178
>>             stop only if bidi_inhibit_bpa != 0
>
> You have put the breakpoint at the point where sys_longjmp is about to
> be called, right?  But all the unwind forms are already done at that
> point, so I guess bidi_inhibit_bpa is again zero, and the breakpoint
> doesn't break.  So I suggest to move the breakpoint before the
> do-while loop in unwind_to_catch:
>

Yes! good idea! though I don't udnerstand why that breakpoint _did_ break
when I did

   (let ((bidi-inhibit-bpa t)) (error "test-error"))

Anyway, it seems process_quit_flag is being called and throwing (though
I don't see "Quit" in the *Messages*). And didn't you tell me that idle
timers run with inhibit-quit = t?  But inhibit-quit seems to be nil,
(which I also confirmed from Elisp.)  I looked in the sly source and am
quite sure I'm not binding it to nil in that circunstance.

Here are two backtraces, I'm going to try just setting inhibit-quit to
non-nil forcibly.

João

    (gdb) bt
    #0  unwind_to_catch (catch=0x555556ed0470, type=NONLOCAL_EXIT_THROW, value=XIL(0x30)) at eval.c:1167
    #1  0x0000555555966e9f in Fthrow (tag=XIL(0x2aaa9bb7f060), value=XIL(0x30)) at eval.c:1195
    #2  0x0000555555967e6f in process_quit_flag () at eval.c:1523
    #3  0x0000555555967ec0 in maybe_quit () at eval.c:1544
    #4  0x000055555596bbf0 in Ffuncall (nargs=3, args=0x7fffffff8a68) at eval.c:2767
    #5  0x00005555559f6eb4 in exec_byte_code (bytestr=XIL(0x555556f0e8a4), vector=XIL(0x555556dc5da5), maxdepth=make_fixnum(9), args_template=make_fixnum(0), nargs=0, args=0x7fffffff8f70) at bytecode.c:633
    #6  0x000055555596ca8b in funcall_lambda (fun=XIL(0x555556dc5e55), nargs=0, arg_vector=0x7fffffff8f70) at eval.c:2990
    #7  0x000055555596bd8e in Ffuncall (nargs=1, args=0x7fffffff8f68) at eval.c:2797
    #8  0x00005555559f6eb4 in exec_byte_code (bytestr=XIL(0x555556ef8764), vector=XIL(0x555556dc0255), maxdepth=make_fixnum(8), args_template=make_fixnum(257), nargs=1, args=0x7fffffff9468) at bytecode.c:633
    #9  0x000055555596ca8b in funcall_lambda (fun=XIL(0x555556dc02d5), nargs=1, arg_vector=0x7fffffff9460) at eval.c:2990
    #10 0x000055555596bd8e in Ffuncall (nargs=2, args=0x7fffffff9458) at eval.c:2797
    #11 0x00005555559f6eb4 in exec_byte_code (bytestr=XIL(0x555556ef6144), vector=XIL(0x555556dbf835), maxdepth=make_fixnum(21), args_template=make_fixnum(513), nargs=1, args=0x7fffffff9ef0) at bytecode.c:633
    #12 0x000055555596ca8b in funcall_lambda (fun=XIL(0x555556d87a55), nargs=1, arg_vector=0x7fffffff9ee8) at eval.c:2990
    #13 0x000055555596bd8e in Ffuncall (nargs=2, args=0x7fffffff9ee0) at eval.c:2797
    #14 0x00005555559f6eb4 in exec_byte_code (bytestr=XIL(0x555556f09df4), vector=XIL(0x555556d1b4b5), maxdepth=make_fixnum(18), args_template=make_fixnum(1025), nargs=2, args=0x7fffffffa390) at bytecode.c:633
    #15 0x000055555596ca8b in funcall_lambda (fun=XIL(0x555556d1b545), nargs=2, arg_vector=0x7fffffffa380) at eval.c:2990
    #16 0x000055555596c6f2 in apply_lambda (fun=XIL(0x555556d1b545), args=XIL(0x555557117b03), count=35) at eval.c:2927
    #17 0x000055555596a51c in eval_sub (form=XIL(0x555557117af3)) at eval.c:2319
    #18 0x0000555555964681 in Fprogn (body=XIL(0)) at eval.c:462
    #19 0x000055555596cf42 in funcall_lambda (fun=XIL(0x555557117bd3), nargs=2, arg_vector=0x0) at eval.c:3061
    #20 0x000055555596c6f2 in apply_lambda (fun=XIL(0x555557117403), args=XIL(0x555557110623), count=33) at eval.c:2927
    #21 0x000055555596a76b in eval_sub (form=XIL(0x555557110613)) at eval.c:2349
    #22 0x0000555555964681 in Fprogn (body=XIL(0)) at eval.c:462
    #23 0x0000555555969e80 in eval_sub (form=XIL(0x555557110333)) at eval.c:2227
    #24 0x00005555559643ec in Fif (args=XIL(0x555557110353)) at eval.c:417
    #25 0x0000555555969e80 in eval_sub (form=XIL(0x555557110363)) at eval.c:2227
    #26 0x0000555555964681 in Fprogn (body=XIL(0x555557110663)) at eval.c:462
    #27 0x000055555596456a in Fcond (args=XIL(0x55555710fc23)) at eval.c:442
    #28 0x0000555555969e80 in eval_sub (form=XIL(0x55555710fc33)) at eval.c:2227
    #29 0x0000555555964681 in Fprogn (body=XIL(0)) at eval.c:462
    #30 0x00005555559661f9 in FletX (args=XIL(0x55555710fc53)) at eval.c:919
    #31 0x0000555555969e80 in eval_sub (form=XIL(0x55555710fc63)) at eval.c:2227
    #32 0x0000555555964681 in Fprogn (body=XIL(0)) at eval.c:462
    #33 0x0000555555969e80 in eval_sub (form=XIL(0x55555710fc73)) at eval.c:2227
    #34 0x00005555559643ec in Fif (args=XIL(0x55555710fca3)) at eval.c:417
    #35 0x0000555555969e80 in eval_sub (form=XIL(0x55555710fc93)) at eval.c:2227
    #36 0x0000555555964681 in Fprogn (body=XIL(0)) at eval.c:462
    #37 0x0000555555966771 in Flet (args=XIL(0x55555710fcd3)) at eval.c:987
    #38 0x0000555555969e80 in eval_sub (form=XIL(0x55555710fce3)) at eval.c:2227
    #39 0x0000555555964681 in Fprogn (body=XIL(0)) at eval.c:462
    #40 0x0000555555969e80 in eval_sub (form=XIL(0x55555710fcf3)) at eval.c:2227
    #41 0x0000555555966f78 in Funwind_protect (args=XIL(0x55555710fd23)) at eval.c:1213
    #42 0x0000555555969e80 in eval_sub (form=XIL(0x55555710fd13)) at eval.c:2227
    #43 0x0000555555964681 in Fprogn (body=XIL(0)) at eval.c:462
    #44 0x0000555555966771 in Flet (args=XIL(0x55555710fd73)) at eval.c:987
    #45 0x0000555555969e80 in eval_sub (form=XIL(0x55555710fd83)) at eval.c:2227
    #46 0x0000555555964681 in Fprogn (body=XIL(0)) at eval.c:462
    #47 0x0000555555946222 in Fsave_excursion (args=XIL(0x55555710fda3)) at editfns.c:842
    #48 0x0000555555969e80 in eval_sub (form=XIL(0x55555710fd93)) at eval.c:2227
    #49 0x0000555555964681 in Fprogn (body=XIL(0)) at eval.c:462
    #50 0x000055555596cf42 in funcall_lambda (fun=XIL(0x55555710fe53), nargs=0, arg_vector=0x0) at eval.c:3061
    #51 0x000055555596bea1 in Ffuncall (nargs=1, args=0x7fffffffb810) at eval.c:2809
    #52 0x00005555559f6eb4 in exec_byte_code (bytestr=XIL(0x7ffff1d77b2c), vector=XIL(0x7ffff1d778ed), maxdepth=make_fixnum(4), args_template=make_fixnum(0), nargs=0, args=0x7fffffffbcd0) at bytecode.c:633
    #53 0x000055555596ca8b in funcall_lambda (fun=XIL(0x7ffff1d778bd), nargs=0, arg_vector=0x7fffffffbcd0) at eval.c:2990
    #54 0x000055555596bd8e in Ffuncall (nargs=1, args=0x7fffffffbcc8) at eval.c:2797
    #55 0x00005555559f6eb4 in exec_byte_code (bytestr=XIL(0x7ffff1d77b6c), vector=XIL(0x7ffff1d77865), maxdepth=make_fixnum(1), args_template=make_fixnum(0), nargs=0, args=0x7fffffffc2f0) at bytecode.c:633
    #56 0x000055555596ca8b in funcall_lambda (fun=XIL(0x7ffff1d7783d), nargs=0, arg_vector=0x7fffffffc2f0) at eval.c:2990
    #57 0x000055555596bd8e in Ffuncall (nargs=1, args=0x7fffffffc2e8) at eval.c:2797
    #58 0x000055555596a8cf in Fapply (nargs=2, args=0x7fffffffc2e8) at eval.c:2378
    #59 0x000055555596c1b1 in funcall_subr (subr=0x555556180000 <Sapply>, numargs=2, args=0x7fffffffc2e8) at eval.c:2848
    #60 0x000055555596bd4a in Ffuncall (nargs=3, args=0x7fffffffc2e0) at eval.c:2795
    #61 0x00005555559f6eb4 in exec_byte_code (bytestr=XIL(0x7ffff20cbe64), vector=XIL(0x7ffff20cbd2d), maxdepth=make_fixnum(10), args_template=make_fixnum(257), nargs=1, args=0x7fffffffc830) at bytecode.c:633
    #62 0x000055555596ca8b in funcall_lambda (fun=XIL(0x7ffff20cbcfd), nargs=1, arg_vector=0x7fffffffc828) at eval.c:2990
    #63 0x000055555596bd8e in Ffuncall (nargs=2, args=0x7fffffffc820) at eval.c:2797
    #64 0x000055555596b549 in call1 (fn=XIL(0xd230), arg1=XIL(0x555557357de5)) at eval.c:2655
    #65 0x00005555557f9db0 in timer_check_2 (timers=XIL(0), idle_timers=XIL(0x5555572d5d73)) at keyboard.c:4336
    #66 0x00005555557f9f06 in timer_check () at keyboard.c:4398
    #67 0x00005555557f7b2a in readable_events (flags=1) at keyboard.c:3397
    #68 0x0000555555800421 in get_input_pending (flags=1) at keyboard.c:6806
    #69 0x000055555580a12a in detect_input_pending_run_timers (do_display=true) at keyboard.c:10367
    #70 0x0000555555a1161a in wait_reading_process_output (time_limit=60, nsecs=0, read_kbd=-1, do_display=true, wait_for_cell=XIL(0), wait_proc=0x0, just_wait_proc=0) at process.c:5707
    #71 0x00005555555b31e8 in sit_for (timeout=make_fixnum(60), reading=true, display_option=1) at dispnew.c:6056
    #72 0x00005555557f51e0 in read_char (commandflag=1, map=XIL(0x5555572d5e53), prev_event=XIL(0), used_mouse_menu=0x7fffffffd195, end_time=0x0) at keyboard.c:2738
    #73 0x0000555555807fc2 in read_key_sequence (keybuf=0x7fffffffd380, prompt=XIL(0), dont_downcase_last=false, can_return_switch_frame=true, fix_current_buffer=true, prevent_redisplay=false) at keyboard.c:9553
    #74 0x00005555557f07c1 in command_loop_1 () at keyboard.c:1350
    #75 0x0000555555967843 in internal_condition_case (bfun=0x5555557f0323 <command_loop_1>, handlers=XIL(0x90), hfun=0x5555557ef8d3 <cmd_error>) at eval.c:1356
    #76 0x00005555557efee4 in command_loop_2 (ignore=XIL(0)) at keyboard.c:1091
    #77 0x0000555555966c26 in internal_catch (tag=XIL(0xd500), func=0x5555557efeb3 <command_loop_2>, arg=XIL(0)) at eval.c:1117
    #78 0x00005555557efe7e in command_loop () at keyboard.c:1070
    #79 0x00005555557ef39a in recursive_edit_1 () at keyboard.c:714
    #80 0x00005555557ef59a in Frecursive_edit () at keyboard.c:786
    #81 0x00005555557e4fc9 in main (argc=11, argv=0x7fffffffd808) at emacs.c:2062
     
    Lisp Backtrace:
    "sly-connection" (0xffff8f70)
    "sly-send" (0xffff9460)
    "sly-dispatch-event" (0xffff9ee8)
    "sly-eval-async" (0xffffa380)
    "sly-autodoc--async" (0xffffa5d0)
    "progn" (0xffffa7b0)
    "if" (0xffffa8f0)
    "cond" (0xffffaa60)
    "let*" (0xffffac00)
    "progn" (0xffffad30)
    "if" (0xffffae70)
    "let" (0xffffb050)
    "progn" (0xffffb180)
    "unwind-protect" (0xffffb2b0)
    "let" (0xffffb490)
    "save-excursion" (0xffffb5f0)
    "sly-autodoc" (0xffffb818)
    "eldoc-print-current-symbol-info" (0xffffbcd0)
    0xf1d77838 PVEC_COMPILED
    "apply" (0xffffc2e8)
    "timer-event-handler" (0xffffc828)
    (gdb)

Here's another one
    
   Thread 1 "emacs" hit Breakpoint 3, unwind_to_catch (catch=0x555556f35d10, type=NONLOCAL_EXIT_THROW, value=XIL(0x30)) at eval.c:1167
   1167	      unbind_to (handlerlist->pdlcount, Qnil);
   (gdb) bt
   #0  unwind_to_catch (catch=0x555556f35d10, type=NONLOCAL_EXIT_THROW, value=XIL(0x30)) at eval.c:1167
   #1  0x0000555555966e9f in Fthrow (tag=XIL(0x2aaa9bb7f060), value=XIL(0x30)) at eval.c:1195
   #2  0x0000555555967e6f in process_quit_flag () at eval.c:1523
   #3  0x0000555555967ec0 in maybe_quit () at eval.c:1544
   #4  0x00005555559ba8d4 in print_object (obj=XIL(0x5555572d1044), printcharfun=XIL(0), escapeflag=true) at print.c:1938
   #5  0x00005555559bbced in print_object (obj=XIL(0x5555571805e3), printcharfun=XIL(0), escapeflag=true) at print.c:2122
   #6  0x00005555559bbced in print_object (obj=XIL(0x555557197333), printcharfun=XIL(0), escapeflag=true) at print.c:2122
   #7  0x00005555559bbced in print_object (obj=XIL(0x555557197253), printcharfun=XIL(0), escapeflag=true) at print.c:2122
   #8  0x00005555559bbced in print_object (obj=XIL(0x555557196d83), printcharfun=XIL(0), escapeflag=true) at print.c:2122
   #9  0x00005555559b7956 in print (obj=XIL(0x555557196b33), printcharfun=XIL(0), escapeflag=true) at print.c:1147
   #10 0x00005555559b54e7 in Fprin1_to_string (object=XIL(0x555557196b33), noescape=XIL(0)) at print.c:685
   #11 0x000055555596c300 in funcall_subr (subr=0x555556182840 <Sprin1_to_string>, numargs=1, args=0x7fffffff8598) at eval.c:2870
   #12 0x000055555596bd4a in Ffuncall (nargs=2, args=0x7fffffff8590) at eval.c:2795
   #13 0x00005555559f6eb4 in exec_byte_code (bytestr=XIL(0x555556ec99b4), vector=XIL(0x555556dc5c85), maxdepth=make_fixnum(5), args_template=make_fixnum(257), nargs=1, args=0x7fffffff8a50) at bytecode.c:633
   #14 0x000055555596ca8b in funcall_lambda (fun=XIL(0x555556dc5cc5), nargs=1, arg_vector=0x7fffffff8a48) at eval.c:2990
   #15 0x000055555596bd8e in Ffuncall (nargs=2, args=0x7fffffff8a40) at eval.c:2797
   #16 0x00005555559f6eb4 in exec_byte_code (bytestr=XIL(0x555556f01344), vector=XIL(0x555556e22ed5), maxdepth=make_fixnum(13), args_template=make_fixnum(514), nargs=2, args=0x7fffffff8f70) at bytecode.c:633
   #17 0x000055555596ca8b in funcall_lambda (fun=XIL(0x555556e22f85), nargs=2, arg_vector=0x7fffffff8f60) at eval.c:2990
   #18 0x000055555596bd8e in Ffuncall (nargs=3, args=0x7fffffff8f58) at eval.c:2797
   #19 0x00005555559f6eb4 in exec_byte_code (bytestr=XIL(0x555556ed6724), vector=XIL(0x555556de7f15), maxdepth=make_fixnum(8), args_template=make_fixnum(257), nargs=1, args=0x7fffffff9468) at bytecode.c:633
   #20 0x000055555596ca8b in funcall_lambda (fun=XIL(0x555556dc9335), nargs=1, arg_vector=0x7fffffff9460) at eval.c:2990
   #21 0x000055555596bd8e in Ffuncall (nargs=2, args=0x7fffffff9458) at eval.c:2797
   #22 0x00005555559f6eb4 in exec_byte_code (bytestr=XIL(0x555556ebfa04), vector=XIL(0x555556de74c5), maxdepth=make_fixnum(21), args_template=make_fixnum(513), nargs=1, args=0x7fffffff9ef0) at bytecode.c:633
   #23 0x000055555596ca8b in funcall_lambda (fun=XIL(0x555556dc9285), nargs=1, arg_vector=0x7fffffff9ee8) at eval.c:2990
   #24 0x000055555596bd8e in Ffuncall (nargs=2, args=0x7fffffff9ee0) at eval.c:2797
   #25 0x00005555559f6eb4 in exec_byte_code (bytestr=XIL(0x555556ece464), vector=XIL(0x555556dc0015), maxdepth=make_fixnum(18), args_template=make_fixnum(1025), nargs=2, args=0x7fffffffa390) at bytecode.c:633
   #26 0x000055555596ca8b in funcall_lambda (fun=XIL(0x555556dc00a5), nargs=2, arg_vector=0x7fffffffa380) at eval.c:2990
   #27 0x000055555596c6f2 in apply_lambda (fun=XIL(0x555556dc00a5), args=XIL(0x55555711a753), count=35) at eval.c:2927
   #28 0x000055555596a51c in eval_sub (form=XIL(0x55555711a743)) at eval.c:2319
   #29 0x0000555555964681 in Fprogn (body=XIL(0)) at eval.c:462
   #30 0x000055555596cf42 in funcall_lambda (fun=XIL(0x55555711a043), nargs=2, arg_vector=0x0) at eval.c:3061
   #31 0x000055555596c6f2 in apply_lambda (fun=XIL(0x55555711a053), args=XIL(0x555557113273), count=33) at eval.c:2927
   #32 0x000055555596a76b in eval_sub (form=XIL(0x555557113263)) at eval.c:2349
   #33 0x0000555555964681 in Fprogn (body=XIL(0)) at eval.c:462
   #34 0x0000555555969e80 in eval_sub (form=XIL(0x555557112f83)) at eval.c:2227
   #35 0x00005555559643ec in Fif (args=XIL(0x555557112fa3)) at eval.c:417
   #36 0x0000555555969e80 in eval_sub (form=XIL(0x555557112fb3)) at eval.c:2227
   #37 0x0000555555964681 in Fprogn (body=XIL(0x5555571132b3)) at eval.c:462
   #38 0x000055555596456a in Fcond (args=XIL(0x555557112873)) at eval.c:442
   #39 0x0000555555969e80 in eval_sub (form=XIL(0x555557112883)) at eval.c:2227
   #40 0x0000555555964681 in Fprogn (body=XIL(0)) at eval.c:462
   #41 0x00005555559661f9 in FletX (args=XIL(0x5555571128a3)) at eval.c:919
   #42 0x0000555555969e80 in eval_sub (form=XIL(0x5555571128b3)) at eval.c:2227
   #43 0x0000555555964681 in Fprogn (body=XIL(0)) at eval.c:462
   #44 0x0000555555969e80 in eval_sub (form=XIL(0x5555571128c3)) at eval.c:2227
   #45 0x00005555559643ec in Fif (args=XIL(0x5555571128f3)) at eval.c:417
   #46 0x0000555555969e80 in eval_sub (form=XIL(0x5555571128e3)) at eval.c:2227
   #47 0x0000555555964681 in Fprogn (body=XIL(0)) at eval.c:462
   #48 0x0000555555966771 in Flet (args=XIL(0x555557112923)) at eval.c:987
   #49 0x0000555555969e80 in eval_sub (form=XIL(0x555557112933)) at eval.c:2227
   #50 0x0000555555964681 in Fprogn (body=XIL(0)) at eval.c:462
   #51 0x0000555555969e80 in eval_sub (form=XIL(0x555557112943)) at eval.c:2227
   #52 0x0000555555966f78 in Funwind_protect (args=XIL(0x555557112973)) at eval.c:1213
   #53 0x0000555555969e80 in eval_sub (form=XIL(0x555557112963)) at eval.c:2227
   #54 0x0000555555964681 in Fprogn (body=XIL(0)) at eval.c:462
   #55 0x0000555555966771 in Flet (args=XIL(0x5555571129c3)) at eval.c:987
   #56 0x0000555555969e80 in eval_sub (form=XIL(0x5555571129d3)) at eval.c:2227
   #57 0x0000555555964681 in Fprogn (body=XIL(0)) at eval.c:462
   #58 0x0000555555946222 in Fsave_excursion (args=XIL(0x5555571129f3)) at editfns.c:842
   #59 0x0000555555969e80 in eval_sub (form=XIL(0x5555571129e3)) at eval.c:2227
   #60 0x0000555555964681 in Fprogn (body=XIL(0)) at eval.c:462
   #61 0x000055555596cf42 in funcall_lambda (fun=XIL(0x555557112aa3), nargs=0, arg_vector=0x0) at eval.c:3061
   #62 0x000055555596bea1 in Ffuncall (nargs=1, args=0x7fffffffb810) at eval.c:2809
   #63 0x00005555559f6eb4 in exec_byte_code (bytestr=XIL(0x7ffff1d77b2c), vector=XIL(0x7ffff1d778ed), maxdepth=make_fixnum(4), args_template=make_fixnum(0), nargs=0, args=0x7fffffffbcd0) at bytecode.c:633
   #64 0x000055555596ca8b in funcall_lambda (fun=XIL(0x7ffff1d778bd), nargs=0, arg_vector=0x7fffffffbcd0) at eval.c:2990
   #65 0x000055555596bd8e in Ffuncall (nargs=1, args=0x7fffffffbcc8) at eval.c:2797
   #66 0x00005555559f6eb4 in exec_byte_code (bytestr=XIL(0x7ffff1d77b6c), vector=XIL(0x7ffff1d77865), maxdepth=make_fixnum(1), args_template=make_fixnum(0), nargs=0, args=0x7fffffffc2f0) at bytecode.c:633
   #67 0x000055555596ca8b in funcall_lambda (fun=XIL(0x7ffff1d7783d), nargs=0, arg_vector=0x7fffffffc2f0) at eval.c:2990
   #68 0x000055555596bd8e in Ffuncall (nargs=1, args=0x7fffffffc2e8) at eval.c:2797
   #69 0x000055555596a8cf in Fapply (nargs=2, args=0x7fffffffc2e8) at eval.c:2378
   #70 0x000055555596c1b1 in funcall_subr (subr=0x555556180000 <Sapply>, numargs=2, args=0x7fffffffc2e8) at eval.c:2848
   #71 0x000055555596bd4a in Ffuncall (nargs=3, args=0x7fffffffc2e0) at eval.c:2795
   #72 0x00005555559f6eb4 in exec_byte_code (bytestr=XIL(0x7ffff20cbe64), vector=XIL(0x7ffff20cbd2d), maxdepth=make_fixnum(10), args_template=make_fixnum(257), nargs=1, args=0x7fffffffc830) at bytecode.c:633
   #73 0x000055555596ca8b in funcall_lambda (fun=XIL(0x7ffff20cbcfd), nargs=1, arg_vector=0x7fffffffc828) at eval.c:2990
   #74 0x000055555596bd8e in Ffuncall (nargs=2, args=0x7fffffffc820) at eval.c:2797
   #75 0x000055555596b549 in call1 (fn=XIL(0xd230), arg1=XIL(0x555557356765)) at eval.c:2655
   #76 0x00005555557f9db0 in timer_check_2 (timers=XIL(0), idle_timers=XIL(0x555557181453)) at keyboard.c:4336
   #77 0x00005555557f9f06 in timer_check () at keyboard.c:4398
   #78 0x00005555557f7b2a in readable_events (flags=1) at keyboard.c:3397
   #79 0x0000555555800421 in get_input_pending (flags=1) at keyboard.c:6806
   #80 0x000055555580a12a in detect_input_pending_run_timers (do_display=true) at keyboard.c:10367
   #81 0x0000555555a1161a in wait_reading_process_output (time_limit=60, nsecs=0, read_kbd=-1, do_display=true, wait_for_cell=XIL(0), wait_proc=0x0, just_wait_proc=0) at process.c:5707
   #82 0x00005555555b31e8 in sit_for (timeout=make_fixnum(60), reading=true, display_option=1) at dispnew.c:6056
   #83 0x00005555557f51e0 in read_char (commandflag=1, map=XIL(0x5555571619d3), prev_event=XIL(0), used_mouse_menu=0x7fffffffd195, end_time=0x0) at keyboard.c:2738
   #84 0x0000555555807fc2 in read_key_sequence (keybuf=0x7fffffffd380, prompt=XIL(0), dont_downcase_last=false, can_return_switch_frame=true, fix_current_buffer=true, prevent_redisplay=false) at keyboard.c:9553
   #85 0x00005555557f07c1 in command_loop_1 () at keyboard.c:1350
   #86 0x0000555555967843 in internal_condition_case (bfun=0x5555557f0323 <command_loop_1>, handlers=XIL(0x90), hfun=0x5555557ef8d3 <cmd_error>) at eval.c:1356
   #87 0x00005555557efee4 in command_loop_2 (ignore=XIL(0)) at keyboard.c:1091
   #88 0x0000555555966c26 in internal_catch (tag=XIL(0xd500), func=0x5555557efeb3 <command_loop_2>, arg=XIL(0)) at eval.c:1117
   #89 0x00005555557efe7e in command_loop () at keyboard.c:1070
   #90 0x00005555557ef39a in recursive_edit_1 () at keyboard.c:714
   #91 0x00005555557ef59a in Frecursive_edit () at keyboard.c:786
   #92 0x00005555557e4fc9 in main (argc=11, argv=0x7fffffffd808) at emacs.c:2062
    
   Lisp Backtrace:
   "prin1-to-string" (0xffff8598)
   "sly-prin1-to-string" (0xffff8a48)
   "sly-net-send" (0xffff8f60)
   "sly-send" (0xffff9460)
   "sly-dispatch-event" (0xffff9ee8)
   "sly-eval-async" (0xffffa380)
   "sly-autodoc--async" (0xffffa5d0)
   "progn" (0xffffa7b0)
   "if" (0xffffa8f0)
   "cond" (0xffffaa60)
   "let*" (0xffffac00)
   "progn" (0xffffad30)
   "if" (0xffffae70)
   "let" (0xffffb050)
   "progn" (0xffffb180)
   "unwind-protect" (0xffffb2b0)
   "let" (0xffffb490)
   "save-excursion" (0xffffb5f0)
   "sly-autodoc" (0xffffb818)
   "eldoc-print-current-symbol-info" (0xffffbcd0)
   0xf1d77838 PVEC_COMPILED
   "apply" (0xffffc2e8)
   "timer-event-handler" (0xffffc828)
   (gdb) 





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer
  2020-12-10 16:15               ` João Távora
@ 2020-12-10 16:29                 ` João Távora
  2020-12-10 17:20                   ` Dmitry Gutov
  2020-12-10 17:51                   ` Stefan Monnier
  2020-12-10 16:41                 ` Eli Zaretskii
  1 sibling, 2 replies; 40+ messages in thread
From: João Távora @ 2020-12-10 16:29 UTC (permalink / raw)
  To: Eli Zaretskii, Stefan Monnier; +Cc: 45117

João Távora <joaotavora@gmail.com> writes:

> Eli Zaretskii <eliz@gnu.org> writes:
>
>>> From: João Távora <joaotavora@gmail.com>
>>> Cc: 45117@debbugs.gnu.org
>>> Date: Thu, 10 Dec 2020 15:00:58 +0000
>>> 
>>>     6       breakpoint     keep y   0x0000555555966de5 in unwind_to_catch at eval.c:1178
>>>             stop only if bidi_inhibit_bpa != 0
>>
>> You have put the breakpoint at the point where sys_longjmp is about to
>> be called, right?  But all the unwind forms are already done at that
>> point, so I guess bidi_inhibit_bpa is again zero, and the breakpoint
>> doesn't break.  So I suggest to move the breakpoint before the
>> do-while loop in unwind_to_catch:
>>
>
> Yes! good idea! though I don't udnerstand why that breakpoint _did_ break
> when I did
>
>    (let ((bidi-inhibit-bpa t)) (error "test-error"))
>
> Anyway, it seems process_quit_flag is being called and throwing (though
> I don't see "Quit" in the *Messages*). And didn't you tell me that idle
> timers run with inhibit-quit = t?  But inhibit-quit seems to be nil,
> (which I also confirmed from Elisp.)  I looked in the sly source and am
> quite sure I'm not binding it to nil in that circunstance.

Aha! I found the culprit.  It is eldoc.el.  It seems to be a
longstanding policy to call Eldoc backends with `with-no-input`.  This
is obviously badly problematic for aynchronous backends.

Stefan, I think this change has to go.  Now that we have proper (or more
proper) async support in eldoc.el, we shouldn't need these tricks: just
use a timer or a process or sth.

Else we must super clearly document that the
eldoc-documentation-function can stop at any moment for whatever reason
and that it's probably a good idea to bullet-proof your async code with
inhibit-quit=t.

In the meantime I'll do that, forcing inhibit-quit back to t in
sly-autodoc.el.

That should fix it.  Thanks a lot, Eli!  Really amazing.

João

commit 12e922156c86a26fa4bb2cb9e7d2b3fd639e4707
Author: Stefan Monnier <monnier@iro.umontreal.ca>
Date:   Tue Dec 4 18:15:44 2018 -0500

    * lisp/emacs-lisp/eldoc.el: Let the user interrupt the search
    
    (eldoc-print-current-symbol-info): Use while-no-input and non-essential.

diff --git a/lisp/emacs-lisp/eldoc.el b/lisp/emacs-lisp/eldoc.el
--- a/lisp/emacs-lisp/eldoc.el
+++ b/lisp/emacs-lisp/eldoc.el
@@ -360,6 +365,4 @@
-    (and (or (eldoc-display-message-p)
-             ;; Erase the last message if we won't display a new one.
-             (when eldoc-last-message
-               (eldoc-message nil)
-               nil))
-	 (eldoc-message (funcall eldoc-documentation-function)))))
+        ;; Only keep looking for the info as long as the user hasn't
+        ;; requested our attention.  This also locally disables inhibit-quit.
+        (while-no-input
+          (eldoc-message (funcall eldoc-documentation-function)))))))





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer
  2020-12-10 16:15               ` João Távora
  2020-12-10 16:29                 ` João Távora
@ 2020-12-10 16:41                 ` Eli Zaretskii
  1 sibling, 0 replies; 40+ messages in thread
From: Eli Zaretskii @ 2020-12-10 16:41 UTC (permalink / raw)
  To: João Távora; +Cc: 45117

> From: João Távora <joaotavora@gmail.com>
> Cc: 45117@debbugs.gnu.org
> Date: Thu, 10 Dec 2020 16:15:42 +0000
> 
> Anyway, it seems process_quit_flag is being called and throwing (though
> I don't see "Quit" in the *Messages*). And didn't you tell me that idle
> timers run with inhibit-quit = t?

Yes, see for yourself, this fragment from timer_check_2, which calls
timer-event-handler:

	      specbind (Qinhibit_quit, Qt);  <<<<<<<<<<<<<<<<<<<

	      call1 (Qtimer_event_handler, chosen_timer);
	      Vdeactivate_mark = old_deactivate_mark;
	      timers_run++;
	      unbind_to (count, Qnil);

> But inhibit-quit seems to be nil (which I also confirmed from
> Elisp.)

I guess some code called from the timer resets inhibit-quit to nil?
You could put a watchpoint on Vinhibit_quit and see who does that.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer
  2020-12-10 16:29                 ` João Távora
@ 2020-12-10 17:20                   ` Dmitry Gutov
  2020-12-10 17:51                   ` Stefan Monnier
  1 sibling, 0 replies; 40+ messages in thread
From: Dmitry Gutov @ 2020-12-10 17:20 UTC (permalink / raw)
  To: João Távora, Eli Zaretskii, Stefan Monnier; +Cc: 45117

On 10.12.2020 18:29, João Távora wrote:
> Stefan, I think this change has to go.  Now that we have proper (or more
> proper) async support in eldoc.el, we shouldn't need these tricks: just
> use a timer or a process or sth.

Not everybody uses the async support. This was a good change, and I'm 
taking advantage of it in at least one external package.

I wonder how hard it would be to fix the async support not to be 
hindered by it.

Also note that that this form could be useful for the asynchronous route 
as well: after the user has send some new input, we don't really want to 
process the "old" eldoc requests anymore, those responses should be ignored.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer
  2020-12-10 16:29                 ` João Távora
  2020-12-10 17:20                   ` Dmitry Gutov
@ 2020-12-10 17:51                   ` Stefan Monnier
  2020-12-10 18:05                     ` João Távora
  1 sibling, 1 reply; 40+ messages in thread
From: Stefan Monnier @ 2020-12-10 17:51 UTC (permalink / raw)
  To: João Távora; +Cc: 45117

> Aha! I found the culprit.  It is eldoc.el.  It seems to be a
> longstanding policy to call Eldoc backends with `with-no-input`.  This
> is obviously badly problematic for aynchronous backends.

Maybe "obviously" so from a pragmatic point of view, but in
a theoretical sense, I don't see why: for async backends, the
`while-no-input` should only apply to the "first chunk" of computation
which launches the async subprocess (or communication) and it seems OK
to abort this first chunk if the user hits a key while it's executing.

So maybe the better answer is to improve the implementation of
`while-no-input` so it doesn't abort here.

        Stefan

^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer
  2020-12-10 17:51                   ` Stefan Monnier
@ 2020-12-10 18:05                     ` João Távora
  2020-12-10 18:37                       ` Stefan Monnier
  0 siblings, 1 reply; 40+ messages in thread
From: João Távora @ 2020-12-10 18:05 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 45117

On Thu, Dec 10, 2020 at 5:51 PM Stefan Monnier <monnier@iro.umontreal.ca> wrote:
>
> > Aha! I found the culprit.  It is eldoc.el.  It seems to be a
> > longstanding policy to call Eldoc backends with `with-no-input`.  This
> > is obviously badly problematic for aynchronous backends.
>
> Maybe "obviously" so from a pragmatic point of view, but in
> a theoretical sense, I don't see why:

I'm not sure.  Every such async system eventually boils down
to a point that sends something to the wire (step A) and a process
filter (step C) that runs later on and finds some suitable "continuation",
(a callback).  To find that suitable continuation it has to be added
to the continuation registry (step B). I'd say step A and B have to
be atomic, whatever the order.  If you interrupt between
A and B you get either an unexpected reply. If you interrupt between
B and A you get a stale leftover continuation.  Both are inconsistent
state, in my opinion.

There are workarounds for all of this, but I think this while-no-input
isn't conceptually sound there, for this reason.

Actually, thinking more about it. I don't think it's sound to have a
while-no-input at all under library control.  A programmer using
that library should be given a predictable evaluation model. At
any rate, this is a regression from 26.3, where things didn't work
like this.

João

^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer
  2020-12-10 18:05                     ` João Távora
@ 2020-12-10 18:37                       ` Stefan Monnier
  2020-12-10 18:48                         ` Eli Zaretskii
  2020-12-10 18:50                         ` João Távora
  0 siblings, 2 replies; 40+ messages in thread
From: Stefan Monnier @ 2020-12-10 18:37 UTC (permalink / raw)
  To: João Távora; +Cc: 45117

> I'm not sure.  Every such async system eventually boils down
> to a point that sends something to the wire (step A) and a process
> filter (step C) that runs later on and finds some suitable "continuation",
> (a callback).

[ I'm not completely sure I understand your scenario above, so we may
  be miscommunicating.  ]

Right.  And `while-no-input` should only wrap the execution of A, so if
A doesn't complete, then presumably none of C nor B will want to be
executed, which seems OK.

IOW the only real problem is if A is interrupted before completing but
after starting to send something on the wire.  In that case, the
subprocess may left hanging waiting for more data.  This can be handled
in two different ways: by inhibiting quit around the "sends" (I
generally recommend against inhibiting quit, so it's not the option
I favor) or by using an unwind-protect that "kills" the subprocess or
closes the pipe in case we're exiting before having sent all the data
(that's a good idea to do also in case a bug signals an error).

> Actually, thinking more about it. I don't think it's sound to have a
> while-no-input at all under library control.  A programmer using
> that library should be given a predictable evaluation model.  At
> any rate, this is a regression from 26.3, where things didn't work
> like this.

The exact same problem affects all normal Elisp code when the user hits
C-g, so I think the better path forward is to make sure it's "easy and
natural" to write code which reacts correctly when it's aborted at some
arbitrary time.  We usually get that via `unwind-protect`, but if it's
not enough we should develop better solutions rather than shy away from
`quit`.

But I had the impression that the original problem under discussion was
not just due to the difficulty of writing code that handles "random
aborts", but rather due to the fact that `while-no-input` sometimes caused
undesired random aborts even when the user didn't hit any key.
This would be a bug in `while-no-input` which we should investigate
a fix (it's likely due to some "innocuous" event being received which
`while-no-input` mistakes for user-input; could be an event linked to
some kind of notification service like dbus, file-notifications, ...).

        Stefan

^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer
  2020-12-10 18:37                       ` Stefan Monnier
@ 2020-12-10 18:48                         ` Eli Zaretskii
  2020-12-10 18:50                         ` João Távora
  1 sibling, 0 replies; 40+ messages in thread
From: Eli Zaretskii @ 2020-12-10 18:48 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: joaotavora, 45117

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Eli Zaretskii <eliz@gnu.org>,  45117@debbugs.gnu.org
> Date: Thu, 10 Dec 2020 13:37:59 -0500
> 
> But I had the impression that the original problem under discussion was
> not just due to the difficulty of writing code that handles "random
> aborts", but rather due to the fact that `while-no-input` sometimes caused
> undesired random aborts even when the user didn't hit any key.
> This would be a bug in `while-no-input` which we should investigate
> a fix (it's likely due to some "innocuous" event being received which
> `while-no-input` mistakes for user-input; could be an event linked to
> some kind of notification service like dbus, file-notifications, ...).

Let me remind you that we nowadays have while-no-input-ignore-events.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer
  2020-12-10 18:37                       ` Stefan Monnier
  2020-12-10 18:48                         ` Eli Zaretskii
@ 2020-12-10 18:50                         ` João Távora
  2020-12-10 19:44                           ` Eli Zaretskii
  2020-12-10 19:46                           ` Stefan Monnier
  1 sibling, 2 replies; 40+ messages in thread
From: João Távora @ 2020-12-10 18:50 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 45117

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> I'm not sure.  Every such async system eventually boils down
>> to a point that sends something to the wire (step A) and a process
>> filter (step C) that runs later on and finds some suitable "continuation",
>> (a callback).
>
> [ I'm not completely sure I understand your scenario above, so we may
>   be miscommunicating.  ]
>
> Right.  And `while-no-input` should only wrap the execution of A, so if
> A doesn't complete, then presumably none of C nor B will want to be
> executed, which seems OK.

We are miscommunicating.  In these programs, B needs to be atomic with
A.  When you send things into an external process, only the most naive
of external communication protocol replies immediately and synchronously
to the thing you just sent.  For those super simple things, like "cat"
and "grep", your model works.

But a complex multi-threaded program being talked to via the network,
will process requests it is fed through the wire in varying rates and
rythms.  So you need system in the Elisp side that decodes what async
request the process is responding to.  See jsonrpc.el's
jsonrpc-connection-receive, for instance.

> closes the pipe in case we're exiting before having sent all the data
> (that's a good idea to do also in case a bug signals an error).

Again, this killing of the subprocess assumes the trivial case of a unix
utility.

>> Actually, thinking more about it. I don't think it's sound to have a
>> while-no-input at all under library control.  A programmer using
>> that library should be given a predictable evaluation model.  At
>> any rate, this is a regression from 26.3, where things didn't work
>> like this.
>
> The exact same problem affects all normal Elisp code when the user hits
> C-g, so I think the better path forward is to make sure it's "easy and
> natural" to write code which reacts correctly when it's aborted at some
> arbitrary time.  We usually get that via `unwind-protect`, but if it's
> not enough we should develop better solutions rather than shy away from
> `quit`.

I get what you're saying, but there's a presumably reason we bind
inhibit-quit to t in timers (Eli?), and it's that that code isn't
triggered by a direct action of he user.  Indeed in idle timers, it's
triggered by the _inaction_ of the user.  So it makes no sense to also
use that allow quitting model there, unless the programmer of the timer
function expects to do something very lengthy, whereup he should
consciously turn it off, either via while-no-input or some other
mechanism.  Doing that for her in the library is violating the premise
of timer functions as one knows them.

> But I had the impression that the original problem under discussion was
> not just due to the difficulty of writing code that handles "random
> aborts", but rather due to the fact that `while-no-input` sometimes caused
> undesired random aborts even when the user didn't hit any key.
> This would be a bug in `while-no-input` which we should investigate
> a fix (it's likely due to some "innocuous" event being received which
> `while-no-input` mistakes for user-input; could be an event linked to
> some kind of notification service like dbus, file-notifications, ...).

Yes, there is that too.  While-no-input has all those Heisenbergian
effects to add to it.  But this was no heisenberg, I think.  I was
pressing C-n the whole time, so that's "input".

João

^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer
  2020-12-10 18:50                         ` João Távora
@ 2020-12-10 19:44                           ` Eli Zaretskii
  2020-12-10 19:47                             ` João Távora
  2020-12-10 19:46                           ` Stefan Monnier
  1 sibling, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2020-12-10 19:44 UTC (permalink / raw)
  To: João Távora; +Cc: monnier, 45117

> From: João Távora <joaotavora@gmail.com>
> Cc: Eli Zaretskii <eliz@gnu.org>,  45117@debbugs.gnu.org
> Date: Thu, 10 Dec 2020 18:50:33 +0000
> 
> there's a presumably reason we bind inhibit-quit to t in timers
> (Eli?)

The reason, quite obviously, is to prevent user's C-g from aborting
the timer function.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer
  2020-12-10 18:50                         ` João Távora
  2020-12-10 19:44                           ` Eli Zaretskii
@ 2020-12-10 19:46                           ` Stefan Monnier
  2020-12-10 20:12                             ` João Távora
  1 sibling, 1 reply; 40+ messages in thread
From: Stefan Monnier @ 2020-12-10 19:46 UTC (permalink / raw)
  To: João Távora; +Cc: 45117

>> Right.  And `while-no-input` should only wrap the execution of A, so if
>> A doesn't complete, then presumably none of C nor B will want to be
>> executed, which seems OK.
>
> We are miscommunicating.  In these programs, B needs to be atomic with
> A.  When you send things into an external process, only the most naive
> of external communication protocol replies immediately and synchronously
> to the thing you just sent.  For those super simple things, like "cat"
> and "grep", your model works.

No, I was no presuming such a simple model, actually.  I was really
thinking about "send data to the LSP server then get some answer
a second or more later".

>> closes the pipe in case we're exiting before having sent all the data
>> (that's a good idea to do also in case a bug signals an error).
> Again, this killing of the subprocess assumes the trivial case of a unix
> utility.

That's just for lack of a vocabulary to say abstractly what I meant.
I understand that in many cases you may want to keep the subprocess (and
pipe) open, in which case you'll have to do something else, but that
"something" will depend on lots of details of the circumstance.

>> The exact same problem affects all normal Elisp code when the user hits
>> C-g, so I think the better path forward is to make sure it's "easy and
>> natural" to write code which reacts correctly when it's aborted at some
>> arbitrary time.  We usually get that via `unwind-protect`, but if it's
>> not enough we should develop better solutions rather than shy away from
>> `quit`.
> I get what you're saying, but there's a presumably reason we bind
> inhibit-quit to t in timers (Eli?), and it's that that code isn't
> triggered by a direct action of he user.

Indeed, we bind inhibit-quit there because when the users hit C-g they
presumably have no idea whether a timer or process filter happens to be
running right now, so they don't actually mean "stop this timer" but
something entirely different (such as run the command `keyboard-quit`).

Note that in return we expect timers and process filters to run only for
a very short amount of time, so that we can still react to C-g promptly.

> Doing that for her in the library is violating the premise of timer
> functions as one knows them.

The contract is different for timer functions than it is for eldoc
functions, yes.  This is because the expectation is that eldoc functions
may run for a non-negligible amount of time.

Maybe we should change that so it's up to the individual eldoc function
to use `while-no-input` if it needs it, but I'm not sure we've reached
that conclusion yet ;-)

> Yes, there is that too.  While-no-input has all those Heisenbergian
> effects to add to it.  But this was no heisenberg, I think.  I was
> pressing C-n the whole time, so that's "input".

OK, so `while-no-input` did its job correctly in your case.  Good.
Now the next question is: given that the user has hit `C-n` how should
we make sure Emacs responds to it as soon as possible even though it's
currently in the middle of sending a command to an LSP subprocess?

Is this "sending" expected to never take a long time (in which case
maybe using `inhibit-quit` could be the better answer)?

What's the alternative: what could the Elisp code do to abort the
communication as quickly as possible (without leaving the subprocess in
an inconsistent state and without forcing a costly restart of that
subprocess)?  If the protocol doesn't offer any way to abort a command,
maybe it could stash the rest of the data to be sent on some list of
pending data so they'll be sent later asynchronously (and remember that
the answer to that command is probably to be ignored because the user
has moved on)?

        Stefan

^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer
  2020-12-10 19:44                           ` Eli Zaretskii
@ 2020-12-10 19:47                             ` João Távora
  2020-12-10 19:55                               ` Eli Zaretskii
  0 siblings, 1 reply; 40+ messages in thread
From: João Távora @ 2020-12-10 19:47 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Stefan Monnier, 45117

On Thu, Dec 10, 2020 at 7:44 PM Eli Zaretskii <eliz@gnu.org> wrote:
>
> > From: João Távora <joaotavora@gmail.com>
> > Cc: Eli Zaretskii <eliz@gnu.org>,  45117@debbugs.gnu.org
> > Date: Thu, 10 Dec 2020 18:50:33 +0000
> >
> > there's a presumably reason we bind inhibit-quit to t in timers
> > (Eli?)
>
> The reason, quite obviously, is to prevent user's C-g from aborting
> the timer function.

I agree, but playing devil's advocate, can you expand on the
rationale for that?  Why shouldn't timer functions be abortable?

João





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer
  2020-12-10 19:47                             ` João Távora
@ 2020-12-10 19:55                               ` Eli Zaretskii
  2020-12-10 19:58                                 ` João Távora
  0 siblings, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2020-12-10 19:55 UTC (permalink / raw)
  To: João Távora; +Cc: monnier, 45117

> From: João Távora <joaotavora@gmail.com>
> Date: Thu, 10 Dec 2020 19:47:08 +0000
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>, 45117@debbugs.gnu.org
> 
> > The reason, quite obviously, is to prevent user's C-g from aborting
> > the timer function.
> 
> I agree, but playing devil's advocate, can you expand on the
> rationale for that?  Why shouldn't timer functions be abortable?

I think that's the wrong question.  The right question is how probable
is it that the user presses C-g to abort a timer function that just
happens to run at this very moment.  I think the answer is "extremely
improbable".  It is much more probable that C-g was meant for
something else, some activity that is much more evident to the user.
Like getting out of the minibuffer after deciding that the command
does not need to be invoked after all, for example.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer
  2020-12-10 19:55                               ` Eli Zaretskii
@ 2020-12-10 19:58                                 ` João Távora
  2020-12-10 20:14                                   ` Eli Zaretskii
  0 siblings, 1 reply; 40+ messages in thread
From: João Távora @ 2020-12-10 19:58 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Stefan Monnier, 45117

On Thu, Dec 10, 2020 at 7:55 PM Eli Zaretskii <eliz@gnu.org> wrote:
>
> > From: João Távora <joaotavora@gmail.com>
> > Date: Thu, 10 Dec 2020 19:47:08 +0000
> > Cc: Stefan Monnier <monnier@iro.umontreal.ca>, 45117@debbugs.gnu.org
> >
> > > The reason, quite obviously, is to prevent user's C-g from aborting
> > > the timer function.
> >
> > I agree, but playing devil's advocate, can you expand on the
> > rationale for that?  Why shouldn't timer functions be abortable?
>
> I think that's the wrong question.  The right question is how probable
> is it that the user presses C-g to abort a timer function that just
> happens to run at this very moment.  I think the answer is "extremely
> improbable".  It is much more probable that C-g was meant for
> something else, some activity that is much more evident to the user.
> Like getting out of the minibuffer after deciding that the command
> does not need to be invoked after all, for example.

I see. Yes it makes sense.  But Stefan is arguing that some "special" timer
functions should be abortable by mere input. And that changes thoses
odds considerably. But at the same time, it doesn't change
the fact, as you well put it, that that input is _not_ meant for the
timer function.

João





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer
  2020-12-10 19:46                           ` Stefan Monnier
@ 2020-12-10 20:12                             ` João Távora
  2020-12-10 20:43                               ` Stefan Monnier
  0 siblings, 1 reply; 40+ messages in thread
From: João Távora @ 2020-12-10 20:12 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 45117

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>>> Right.  And `while-no-input` should only wrap the execution of A, so if
>>> A doesn't complete, then presumably none of C nor B will want to be
>>> executed, which seems OK.
>>
>> We are miscommunicating.  In these programs, B needs to be atomic with
>> A.  When you send things into an external process, only the most naive
>> of external communication protocol replies immediately and synchronously
>> to the thing you just sent.  For those super simple things, like "cat"
>> and "grep", your model works.
>
> No, I was no presuming such a simple model, actually.  I was really
> thinking about "send data to the LSP server then get some answer
> a second or more later".

Right, so in LSP it's perfectly possible to send three requests in a
row, say reqX, reqY and reqZ and get three replies in a completely
different order repZ, repX, repY.  How to you match each reply to each
request?

>>> closes the pipe in case we're exiting before having sent all the data
>>> (that's a good idea to do also in case a bug signals an error).
>> Again, this killing of the subprocess assumes the trivial case of a unix
>> utility.
>
> That's just for lack of a vocabulary to say abstractly what I meant.
> I understand that in many cases you may want to keep the subprocess (and
> pipe) open, in which case you'll have to do something else, but that
> "something" will depend on lots of details of the circumstance.

process_send_string may send things in "bunches", I read in the
docstring, but it will not (and should not) be interrupted.  At least I
see no reason to.  When it returns, sending should be done.  Either that
or it should exit loudly with an error that one can catch, in which case
one should retry the whole thing.

>>> The exact same problem affects all normal Elisp code when the user hits
>>> C-g, so I think the better path forward is to make sure it's "easy and
>>> natural" to write code which reacts correctly when it's aborted at some
>>> arbitrary time.  We usually get that via `unwind-protect`, but if it's
>>> not enough we should develop better solutions rather than shy away from
>>> `quit`.
>> I get what you're saying, but there's a presumably reason we bind
>> inhibit-quit to t in timers (Eli?), and it's that that code isn't
>> triggered by a direct action of he user.
>
> Indeed, we bind inhibit-quit there because when the users hit C-g they
> presumably have no idea whether a timer or process filter happens to be
> running right now, so they don't actually mean "stop this timer" but
> something entirely different (such as run the command `keyboard-quit`).

I see, and you you think it is different for "input something", because
that in ElDoc, would in principle invalidate the context of the
documentation request.  But that is not always so.  And I think it's too
eager of ElDoc to try to do that so early and so brutally.  It's better
to leave it to the callback handlers, which we have now.  That's a much
safer spot to know if the answer we just got still makes sense.  Or if
we're in a hurry, we let the backend know asap.

> Note that in return we expect timers and process filters to run only for
> a very short amount of time, so that we can still react to C-g
> promptly.

Fine, and so they should.  Much like Flymake stuff.  That's in the
contract :-)

> The contract is different for timer functions than it is for eldoc
> functions, yes.  This is because the expectation is that eldoc functions
> may run for a non-negligible amount of time.

Why do you have that expectation?  Any particular example in the wild?

> Maybe we should change that so it's up to the individual eldoc function
> to use `while-no-input` if it needs it, but I'm not sure we've reached
> that conclusion yet ;-)

It was, after all, the status quo after you changed it for 27.1.
Perhaps you had a rationale?

>> Yes, there is that too.  While-no-input has all those Heisenbergian
>> effects to add to it.  But this was no heisenberg, I think.  I was
>> pressing C-n the whole time, so that's "input".
>
> OK, so `while-no-input` did its job correctly in your case.  Good.
> Now the next question is: given that the user has hit `C-n` how should
> we make sure Emacs responds to it as soon as possible even though it's
> currently in the middle of sending a command to an LSP subprocess?
>
> Is this "sending" expected to never take a long time (in which case
> maybe using `inhibit-quit` could be the better answer)?

That's what I did, yes.  Yes, it's expected to be quick or fail fast.

> What's the alternative: what could the Elisp code do to abort the
> communication as quickly as possible (without leaving the subprocess in
> an inconsistent state and without forcing a costly restart of that
> subprocess)?  If the protocol doesn't offer any way to abort a command,
> maybe it could stash the rest of the data to be sent on some list of
> pending data so they'll be sent later asynchronously (and remember that
> the answer to that command is probably to be ignored because the user
> has moved on)?

The protocol could offer an optional abort() switch, yes.  ElDoc would
raise a flag and say: "hey backends, what you were doing is now
useless".  We'd see about the implementation, there is likely more than
one approach, but a dynamic variable accessed by an (eldoc-aborted-p)
seems easiest.  I personal don't know of many places where I would use
it, or where it would bring an advantage in terms of speed.  For
example, in responsive completion, I've been doing fine with discarding
loads and loads of carefully prepared, now invalid, completions.  Fine
in terms of speed/responsiveness.  But maybe one wishes to save power,
which is quite legitimate.

Bottom line is, in my opinion, this ElDoc-to-backend abort signal should
be controlled, it shouldn't be an unhandleable kill signal.  That's
asking for trouble.  I'd be very suprised if the SLIME people don't
start getting this too after they upgrade to 27.1.  And maybe the CIDER
and Elpy people too?  Don't know about Eglot, actually, but I think it's
possible yes. All depends on `eldoc-idle-delay`.  If it's a low value,
it's much much more likely.  Since we start with 0.5, we should be OK.

João

^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer
  2020-12-10 19:58                                 ` João Távora
@ 2020-12-10 20:14                                   ` Eli Zaretskii
  2020-12-10 20:15                                     ` João Távora
  2020-12-10 20:37                                     ` Dmitry Gutov
  0 siblings, 2 replies; 40+ messages in thread
From: Eli Zaretskii @ 2020-12-10 20:14 UTC (permalink / raw)
  To: João Távora; +Cc: monnier, 45117

> From: João Távora <joaotavora@gmail.com>
> Date: Thu, 10 Dec 2020 19:58:12 +0000
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>, 45117@debbugs.gnu.org
> 
> > I think that's the wrong question.  The right question is how probable
> > is it that the user presses C-g to abort a timer function that just
> > happens to run at this very moment.  I think the answer is "extremely
> > improbable".  It is much more probable that C-g was meant for
> > something else, some activity that is much more evident to the user.
> > Like getting out of the minibuffer after deciding that the command
> > does not need to be invoked after all, for example.
> 
> I see. Yes it makes sense.  But Stefan is arguing that some "special" timer
> functions should be abortable by mere input.

That makes sense mainly for idle timers.  Or for timer functions that
take a lot of time to execute (something that generally shouldn't
happen in the first place).  But while-no-input cannot abort its
caller, so as long as the body of while-no-input can handle being
interrupted, that is okay.

> And that changes thoses odds considerably. But at the same time, it
> doesn't change the fact, as you well put it, that that input is
> _not_ meant for the timer function.

I think if a timer function should be interruptible by input, that
function should itself call while-no-input.  It is not the job of
outside code to interrupt bad timers by aborting them by these
measures.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer
  2020-12-10 20:14                                   ` Eli Zaretskii
@ 2020-12-10 20:15                                     ` João Távora
  2020-12-10 20:37                                     ` Dmitry Gutov
  1 sibling, 0 replies; 40+ messages in thread
From: João Távora @ 2020-12-10 20:15 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Stefan Monnier, 45117

On Thu, Dec 10, 2020 at 8:14 PM Eli Zaretskii <eliz@gnu.org> wrote:

> I think if a timer function should be interruptible by input, that
> function should itself call while-no-input.  It is not the job of
> outside code to interrupt bad timers by aborting them by these
> measures.

I think that's my view, exactly.

João





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer
  2020-12-10 20:14                                   ` Eli Zaretskii
  2020-12-10 20:15                                     ` João Távora
@ 2020-12-10 20:37                                     ` Dmitry Gutov
  1 sibling, 0 replies; 40+ messages in thread
From: Dmitry Gutov @ 2020-12-10 20:37 UTC (permalink / raw)
  To: Eli Zaretskii, João Távora; +Cc: monnier, 45117

On 10.12.2020 22:14, Eli Zaretskii wrote:
> That makes sense mainly for idle timers.

This applies to Eldoc, surely.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer
  2020-12-10 20:12                             ` João Távora
@ 2020-12-10 20:43                               ` Stefan Monnier
  2020-12-10 20:55                                 ` Dmitry Gutov
                                                   ` (2 more replies)
  0 siblings, 3 replies; 40+ messages in thread
From: Stefan Monnier @ 2020-12-10 20:43 UTC (permalink / raw)
  To: João Távora; +Cc: 45117

>> No, I was no presuming such a simple model, actually.  I was really
>> thinking about "send data to the LSP server then get some answer
>> a second or more later".
> Right, so in LSP it's perfectly possible to send three requests in a
> row, say reqX, reqY and reqZ and get three replies in a completely
> different order repZ, repX, repY.  How to you match each reply to each
> request?

I assume there's some "request-id" mechanism.  Not sure what this has to
do with this discussion, OTOH.

> process_send_string may send things in "bunches", I read in the
> docstring, but it will not (and should not) be interrupted.

Indeed, I believe it should not be aborted in the middle by
`while-no-input` (it would be a bug, because the `process-send-string`
API doesn't offer any way to know what has been or hasn't been sent in
that case).

>> Indeed, we bind inhibit-quit there because when the users hit C-g they
>> presumably have no idea whether a timer or process filter happens to be
>> running right now, so they don't actually mean "stop this timer" but
>> something entirely different (such as run the command `keyboard-quit`).
> I see, and you you think it is different for "input something", because
> that in ElDoc, would in principle invalidate the context of the
> documentation request.

Indeed for eldoc we know that if there is user input, the current
request can be dropped on the floor because its result shouldn't be
displayed anyway.

In contrast in the general case of timers we don't know whether
user-input affects the usefulness of running the timer.

> But that is not always so.  And I think it's too eager of ElDoc to try
> to do that so early and so brutally.  It's better to leave it to the
> callback handlers, which we have now.  That's a much safer spot to
> know if the answer we just got still makes sense.  Or if we're in
> a hurry, we let the backend know asap.

You might be right: the result of the current request sent to the LSP
could still be useful for the next eldoc-idle-time cycle, indeed.

>> The contract is different for timer functions than it is for eldoc
>> functions, yes.  This is because the expectation is that eldoc functions
>> may run for a non-negligible amount of time.
> Why do you have that expectation?  Any particular example in the wild?

Good question.

> It was, after all, the status quo after you changed it for 27.1.
> Perhaps you had a rationale?

I probably did, but ... can't remember and wasn't clever enough to write
it in the commit message :-(
Maybe to accommodate those backends which needed async operation but had
to live within the confines of the previously limited eldoc API?

Maybe the maintainer of eldoc.el will prefer to undo this change,
then ;-) ?

        Stefan

^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer
  2020-12-10 20:43                               ` Stefan Monnier
@ 2020-12-10 20:55                                 ` Dmitry Gutov
  2020-12-10 22:48                                   ` Stefan Monnier
  2020-12-10 21:16                                 ` João Távora
  2020-12-11  7:31                                 ` Eli Zaretskii
  2 siblings, 1 reply; 40+ messages in thread
From: Dmitry Gutov @ 2020-12-10 20:55 UTC (permalink / raw)
  To: Stefan Monnier, João Távora; +Cc: 45117

On 10.12.2020 22:43, Stefan Monnier wrote:
>> It was, after all, the status quo after you changed it for 27.1.
>> Perhaps you had a rationale?
> I probably did, but ... can't remember and wasn't clever enough to write
> it in the commit message:-(

Both you and Joao can search your email archive for the message titled

   Re: [Emacs-diffs] scratch/octave-eldoc-fixes 1ad0826 1/2: Prevent 
accept-process-output with quit inhibited in octave.el05.

sent on 05.12.2018, 01:19 EET (I think that's the timezone).

There was a small discussion there, with just 3 participants. Not sure 
why it wasn't conducted in public.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer
  2020-12-10 20:43                               ` Stefan Monnier
  2020-12-10 20:55                                 ` Dmitry Gutov
@ 2020-12-10 21:16                                 ` João Távora
  2020-12-10 22:58                                   ` João Távora
  2020-12-11  7:31                                 ` Eli Zaretskii
  2 siblings, 1 reply; 40+ messages in thread
From: João Távora @ 2020-12-10 21:16 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 45117

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>>> No, I was no presuming such a simple model, actually.  I was really
>>> thinking about "send data to the LSP server then get some answer
>>> a second or more later".
>> Right, so in LSP it's perfectly possible to send three requests in a
>> row, say reqX, reqY and reqZ and get three replies in a completely
>> different order repZ, repX, repY.  How to you match each reply to each
>> request?
>
> I assume there's some "request-id" mechanism.  Not sure what this has to
> do with this discussion, OTOH.

Right, a request-id mechanism.  The request must be registered
atomically with process_send_string.  If you interrupt in between, you
have inconsistent requests (either registered request-id's for which no
actual request was fired, or requests which were fired for which no
request-id's were registered).  Client code can detect/prevent these
interruptions, but it's clumsy.  And may cost the dev many hours to
understand what is up.  Shouldn't be default IMO.

>> process_send_string may send things in "bunches", I read in the
>> docstring, but it will not (and should not) be interrupted.
>
> Indeed, I believe it should not be aborted in the middle by
> `while-no-input` (it would be a bug, because the `process-send-string`
> API doesn't offer any way to know what has been or hasn't been sent in
> that case).

Agree.

>> But that is not always so.  And I think it's too eager of ElDoc to try
>> to do that so early and so brutally.  It's better to leave it to the
>> callback handlers, which we have now.  That's a much safer spot to
>> know if the answer we just got still makes sense.  Or if we're in
>> a hurry, we let the backend know asap.
>
> You might be right: the result of the current request sent to the LSP
> could still be useful for the next eldoc-idle-time cycle, indeed.

Yes, it's only an heuristic.

>>> The contract is different for timer functions than it is for eldoc
>>> functions, yes.  This is because the expectation is that eldoc functions
>>> may run for a non-negligible amount of time.
>> Why do you have that expectation?  Any particular example in the wild?
> Good question.

:-)

>> It was, after all, the status quo after you changed it for 27.1.
>> Perhaps you had a rationale?
>
> I probably did, but ... can't remember and wasn't clever enough to write
> it in the commit message :-(
> Maybe to accommodate those backends which needed async operation but had
> to live within the confines of the previously limited eldoc API?

Likely, yes.  But which one of those were the "blocking" type?  Because
even with the limited API, SLY/SLIME were just calling eldoc-message
from the process filter. Which is equivalent to what we have now in
terms of sync.  In fact, to keep backward compatibility, I haven't
touched this part of SLY at all.  Anyway, you must have done it for some
other slow synchronous, wait-for-the-retval backend.  That hypothetical
backend will hurt if we take back the while-no-input.  OTOH that
hypothetical backend can now upgrade to a much better API.

> Maybe the maintainer of eldoc.el will prefer to undo this change,
> then ;-) ?

Who's that? ;-) But OK, eldoc.el is now distributable independently so
we have good defense against this.

João

^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer
  2020-12-10 20:55                                 ` Dmitry Gutov
@ 2020-12-10 22:48                                   ` Stefan Monnier
  0 siblings, 0 replies; 40+ messages in thread
From: Stefan Monnier @ 2020-12-10 22:48 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: João Távora, 45117

>>> It was, after all, the status quo after you changed it for 27.1.
>>> Perhaps you had a rationale?
>> I probably did, but ... can't remember and wasn't clever enough to write
>> it in the commit message:-(
>
> Both you and Joao can search your email archive for the message titled
>
>   Re: [Emacs-diffs] scratch/octave-eldoc-fixes 1ad0826 1/2: Prevent
>   accept-process-output with quit inhibited in octave.el05.
>
> sent on 05.12.2018, 01:19 EET (I think that's the timezone).

Hmm... looks like I already purged it.
But based on the title, we could replace the `while-no-input` of eldoc
with `while-no-input` inside octave.el's eldoc function, as in the
patch below.


        Stefan


diff --git a/lisp/progmodes/octave.el b/lisp/progmodes/octave.el
index c313ad1179..9fdeaa946c 100644
--- a/lisp/progmodes/octave.el
+++ b/lisp/progmodes/octave.el
@@ -1605,8 +1605,9 @@ octave-eldoc-cache
 
 (defun octave-eldoc-function-signatures (fn)
   (unless (equal fn (car octave-eldoc-cache))
-    (inferior-octave-send-list-and-digest
-     (list (format "print_usage ('%s');\n" fn)))
+    (while-no-input
+      (inferior-octave-send-list-and-digest
+       (list (format "print_usage ('%s');\n" fn))))
     (let (result)
       (dolist (line inferior-octave-output-list)
         ;; The help output has changed a few times in GNU Octave.






^ permalink raw reply related	[flat|nested] 40+ messages in thread

* bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer
  2020-12-10 21:16                                 ` João Távora
@ 2020-12-10 22:58                                   ` João Távora
  0 siblings, 0 replies; 40+ messages in thread
From: João Távora @ 2020-12-10 22:58 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 45117

On Thu, Dec 10, 2020 at 9:16 PM João Távora <joaotavora@gmail.com> wrote:

> > Maybe the maintainer of eldoc.el will prefer to undo this change,
> > then ;-) ?
>
> Who's that? ;-) But OK, eldoc.el is now distributable independently so
> we have good defense against this.

Doh, I meant "against whatever third party breakage this causes".
But in the meantime it seems you've already come up
with a good fix for the Octave backend, which is in Emacs anyway.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer
  2020-12-10 20:43                               ` Stefan Monnier
  2020-12-10 20:55                                 ` Dmitry Gutov
  2020-12-10 21:16                                 ` João Távora
@ 2020-12-11  7:31                                 ` Eli Zaretskii
  2020-12-11 14:31                                   ` Stefan Monnier
  2 siblings, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2020-12-11  7:31 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: joaotavora, 45117

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Eli Zaretskii <eliz@gnu.org>,  45117@debbugs.gnu.org
> Date: Thu, 10 Dec 2020 15:43:16 -0500
> 
> > process_send_string may send things in "bunches", I read in the
> > docstring, but it will not (and should not) be interrupted.
> 
> Indeed, I believe it should not be aborted in the middle by
> `while-no-input` (it would be a bug, because the `process-send-string`
> API doesn't offer any way to know what has been or hasn't been sent in
> that case).

If currently process-send-string isn't protected against
while-no-input, then perhaps we should add such a protection?





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer
  2020-12-11  7:31                                 ` Eli Zaretskii
@ 2020-12-11 14:31                                   ` Stefan Monnier
  2020-12-11 14:40                                     ` Eli Zaretskii
  2020-12-11 14:41                                     ` João Távora
  0 siblings, 2 replies; 40+ messages in thread
From: Stefan Monnier @ 2020-12-11 14:31 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: joaotavora, 45117

> If currently process-send-string isn't protected against
> while-no-input, then perhaps we should add such a protection?

Not just "perhaps" but "definitely".  This said, I believe it already is
"protected" so there's nothing we need to do (at least until we get
a bug report showing I'm wrong).


        Stefan






^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer
  2020-12-11 14:31                                   ` Stefan Monnier
@ 2020-12-11 14:40                                     ` Eli Zaretskii
  2020-12-11 14:43                                       ` João Távora
  2020-12-11 14:41                                     ` João Távora
  1 sibling, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2020-12-11 14:40 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: joaotavora, 45117

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: joaotavora@gmail.com,  45117@debbugs.gnu.org
> Date: Fri, 11 Dec 2020 09:31:41 -0500
> 
> > If currently process-send-string isn't protected against
> > while-no-input, then perhaps we should add such a protection?
> 
> Not just "perhaps" but "definitely".  This said, I believe it already is
> "protected" so there's nothing we need to do (at least until we get
> a bug report showing I'm wrong).

I thought this bug report was showing just that, doesn't it?





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer
  2020-12-11 14:31                                   ` Stefan Monnier
  2020-12-11 14:40                                     ` Eli Zaretskii
@ 2020-12-11 14:41                                     ` João Távora
  2020-12-11 14:50                                       ` Stefan Monnier
  1 sibling, 1 reply; 40+ messages in thread
From: João Távora @ 2020-12-11 14:41 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 45117

On Fri, Dec 11, 2020 at 2:31 PM Stefan Monnier <monnier@iro.umontreal.ca> wrote:
>
> > If currently process-send-string isn't protected against
> > while-no-input, then perhaps we should add such a protection?
>
> Not just "perhaps" but "definitely".  This said, I believe it already is
> "protected" so there's nothing we need to do (at least until we get
> a bug report showing I'm wrong).

In this bug report, I did observe a non-local exit from
process-send-string once or twice, though it was rare.
(look up).  My theory is that for strings that go in bunches,
when  there's some output to accept, things can
come in that cause the quit.  But I didn't investigate.

João





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer
  2020-12-11 14:40                                     ` Eli Zaretskii
@ 2020-12-11 14:43                                       ` João Távora
  0 siblings, 0 replies; 40+ messages in thread
From: João Távora @ 2020-12-11 14:43 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Stefan Monnier, 45117

In Fri, Dec 11, 2020 at 2:40 PM Eli Zaretskii <eliz@gnu.org> wrote:
>
> > From: Stefan Monnier <monnier@iro.umontreal.ca>
> > Cc: joaotavora@gmail.com,  45117@debbugs.gnu.org
> > Date: Fri, 11 Dec 2020 09:31:41 -0500
> >
> > > If currently process-send-string isn't protected against
> > > while-no-input, then perhaps we should add such a protection?
> >
> > Not just "perhaps" but "definitely".  This said, I believe it already is
> > "protected" so there's nothing we need to do (at least until we get
> > a bug report showing I'm wrong).
>
> I thought this bug report was showing just that, doesn't it?

Yes, the title is just that.  However, after digging down the
analysis I found that only a small fraction of the unexpected
quits I was getting actually happened in process-send-string.

So the title is a bit exaggerated, but it does happen.  I think
for larger strings, but not sure.

João





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer
  2020-12-11 14:41                                     ` João Távora
@ 2020-12-11 14:50                                       ` Stefan Monnier
  2020-12-13 23:19                                         ` João Távora
  0 siblings, 1 reply; 40+ messages in thread
From: Stefan Monnier @ 2020-12-11 14:50 UTC (permalink / raw)
  To: João Távora; +Cc: 45117

> In this bug report, I did observe a non-local exit from
> process-send-string once or twice, though it was rare.
> (look up).  My theory is that for strings that go in bunches,
> when  there's some output to accept, things can
> come in that cause the quit.  But I didn't investigate.

Hmm... indeed it seems I was completely confused.

`process-send-string` is even a context-switch point, so it can
definitely do a non-local exit for all kinds of reasons (quit and
while-no-input being just some of the possibilities).

IOW, better disregard what I said in this thread, because I was talking
about things I don't know.


        Stefan






^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer
  2020-12-11 14:50                                       ` Stefan Monnier
@ 2020-12-13 23:19                                         ` João Távora
  2020-12-14  0:35                                           ` Stefan Monnier
  0 siblings, 1 reply; 40+ messages in thread
From: João Távora @ 2020-12-13 23:19 UTC (permalink / raw)
  To: Stefan Monnier, 45117-done

Pushed the fix we agreed to (eldoc + octave).  Closing.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer
  2020-12-13 23:19                                         ` João Távora
@ 2020-12-14  0:35                                           ` Stefan Monnier
  0 siblings, 0 replies; 40+ messages in thread
From: Stefan Monnier @ 2020-12-14  0:35 UTC (permalink / raw)
  To: João Távora; +Cc: 45117-done

> Pushed the fix we agreed to (eldoc + octave).  Closing.

Thanks.  FWIW, after thinking about it some more, I think the
`while-no-input` in Octave is also suffering from the kind of race
conditions discussed here.


        Stefan






^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2020-12-14  0:35 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-08 11:44 bug#45117: 28.0.50; process-send-string mysteriously exiting non-locally when called from timer João Távora
2020-12-08 15:39 ` Eli Zaretskii
2020-12-08 15:56   ` João Távora
2020-12-08 17:01     ` Eli Zaretskii
2020-12-08 17:05       ` João Távora
2020-12-09 11:24       ` João Távora
2020-12-09 15:33         ` Eli Zaretskii
2020-12-10 15:00           ` João Távora
2020-12-10 15:23             ` Eli Zaretskii
2020-12-10 16:15               ` João Távora
2020-12-10 16:29                 ` João Távora
2020-12-10 17:20                   ` Dmitry Gutov
2020-12-10 17:51                   ` Stefan Monnier
2020-12-10 18:05                     ` João Távora
2020-12-10 18:37                       ` Stefan Monnier
2020-12-10 18:48                         ` Eli Zaretskii
2020-12-10 18:50                         ` João Távora
2020-12-10 19:44                           ` Eli Zaretskii
2020-12-10 19:47                             ` João Távora
2020-12-10 19:55                               ` Eli Zaretskii
2020-12-10 19:58                                 ` João Távora
2020-12-10 20:14                                   ` Eli Zaretskii
2020-12-10 20:15                                     ` João Távora
2020-12-10 20:37                                     ` Dmitry Gutov
2020-12-10 19:46                           ` Stefan Monnier
2020-12-10 20:12                             ` João Távora
2020-12-10 20:43                               ` Stefan Monnier
2020-12-10 20:55                                 ` Dmitry Gutov
2020-12-10 22:48                                   ` Stefan Monnier
2020-12-10 21:16                                 ` João Távora
2020-12-10 22:58                                   ` João Távora
2020-12-11  7:31                                 ` Eli Zaretskii
2020-12-11 14:31                                   ` Stefan Monnier
2020-12-11 14:40                                     ` Eli Zaretskii
2020-12-11 14:43                                       ` João Távora
2020-12-11 14:41                                     ` João Távora
2020-12-11 14:50                                       ` Stefan Monnier
2020-12-13 23:19                                         ` João Távora
2020-12-14  0:35                                           ` Stefan Monnier
2020-12-10 16:41                 ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).