Insight into the mystery hangs

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

* Insight into the mystery hangs
@ 2024-02-11 21:37 Eric S. Raymond
  2024-02-12 12:38 ` Alan Mackenzie
  2024-02-12 12:52 ` Eli Zaretskii
  0 siblings, 2 replies; 8+ messages in thread
From: Eric S. Raymond @ 2024-02-11 21:37 UTC (permalink / raw)
  To: emacs-devel

I finally beat Emacs into giving me a debug trace from one of the
mode-initialization hangs I described in previous email.  I know
what's going on with at least a subset of them now.

At the top of the stack trace was a call-process to "src status -a".
Turns out it was VC-mode doing this, trying to get the version-control
status of the file being visited. Both hangs I've seen were files I
keep under SRC control.

For those of you unfamiliar, SRC is a little version-control system I
wrote for single-file, single-developer projects - things like config
files that you want to track modifications on without mingling the
history with thst of other files, even other files in the same
directory.  It's directly supported in VC because I wrote that
support.

When I saw that trace, I thought SRC was hanging and I had a serious
bug to trace. Turns out not - turns out src status was incautiously
recursing through every single file in my WWW directory looking for a
match to the path I was visiting. This was due to some recent
cghanges I made to make SRC behave more naturally in trees of
directories containing SRC-controlled files.

I made a two-line change to SRC to stop it from recursing into
directories that don't contain a .src, RCS, or SCCS directory (SRC can
work with all three of those - in particular, if you use it on an RCS
directory it behaves like RCS with a non-horrible UI).

With this change, src status on even very large directories is fast,
and the hang goes away.

However.  Emacs is not entirely off the hook here.  When I'm not under
deadline pressure I will file a bug with a title something like
"With debug-on-quit enabled, Emacs does not reliably raise a debug
trace on interrupt of call-process"

There's some kind of timing or window issue here. You have to get
lucky to get a debug trace - I didn't, previously, in dozens of tries.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

No matter how one approaches the figures, one is forced to the rather
startling conclusion that the use of firearms in crime was very much
less when there were no controls of any sort and when anyone,
convicted criminal or lunatic, could buy any type of firearm without
restriction.  Half a century of strict controls on pistols has ended,
perversely, with a far greater use of this weapon in crime than ever
before.
        -- Colin Greenwood, in the study "Firearms Control", 1972

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Insight into the mystery hangs
  2024-02-11 21:37 Insight into the mystery hangs Eric S. Raymond
@ 2024-02-12 12:38 ` Alan Mackenzie
  2024-02-12 12:52 ` Eli Zaretskii
  1 sibling, 0 replies; 8+ messages in thread
From: Alan Mackenzie @ 2024-02-12 12:38 UTC (permalink / raw)
  To: Eric S. Raymond; +Cc: emacs-devel

Hello, Eric.

On Sun, Feb 11, 2024 at 16:37:37 -0500, Eric S. Raymond wrote:

[ .... ]

> However.  Emacs is not entirely off the hook here.  When I'm not under
> deadline pressure I will file a bug with a title something like
> "With debug-on-quit enabled, Emacs does not reliably raise a debug
> trace on interrupt of call-process"

> There's some kind of timing or window issue here. You have to get
> lucky to get a debug trace - I didn't, previously, in dozens of tries.

More likely a frivolous condition-case.  There are around 1712
occurrences of condition-case in the Emacs Lisp code, and not all of
them should be there.

A condition-case is an extremely useful construct for keeping things
neat and tidy, for preventing users ever being troubled by irritating
things like error messages and backtraces.  Typically, it doesn't even
leave any evidence of its being called.  You've likely been thwarted by
one of these.

Curse condition-case!

> -- 
> 		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

-- 
Alan Mackenzie (Nuremberg, Germany).

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Insight into the mystery hangs
  2024-02-11 21:37 Insight into the mystery hangs Eric S. Raymond
  2024-02-12 12:38 ` Alan Mackenzie
@ 2024-02-12 12:52 ` Eli Zaretskii
  2024-02-12 18:26   ` Eric S. Raymond
  2024-02-13  7:52   ` Kévin Le Gouguec
  1 sibling, 2 replies; 8+ messages in thread
From: Eli Zaretskii @ 2024-02-12 12:52 UTC (permalink / raw)
  To: Eric S. Raymond; +Cc: emacs-devel

> From: "Eric S. Raymond" <esr@thyrsus.com>
> Date: Sun, 11 Feb 2024 16:37:37 -0500 (EST)
> 
> However.  Emacs is not entirely off the hook here.  When I'm not under
> deadline pressure I will file a bug with a title something like
> "With debug-on-quit enabled, Emacs does not reliably raise a debug
> trace on interrupt of call-process"

Isn't that call issued from the mode-line display?  If so, that is
done from redisplay, and redisplay cannot enter debugger, so it
catches all errors.  If you want to produce Lisp backtraces from Lisp
code called by redisplay, you need to use the facilities documented in
the node "Debugging Redisplay" in the ELisp Reference manual.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Insight into the mystery hangs
  2024-02-12 12:52 ` Eli Zaretskii
@ 2024-02-12 18:26   ` Eric S. Raymond
  2024-02-12 19:22     ` Eli Zaretskii
  2024-02-13  7:52   ` Kévin Le Gouguec
  1 sibling, 1 reply; 8+ messages in thread
From: Eric S. Raymond @ 2024-02-12 18:26 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org>:
> > From: "Eric S. Raymond" <esr@thyrsus.com>
> > Date: Sun, 11 Feb 2024 16:37:37 -0500 (EST)
> > 
> > However.  Emacs is not entirely off the hook here.  When I'm not under
> > deadline pressure I will file a bug with a title something like
> > "With debug-on-quit enabled, Emacs does not reliably raise a debug
> > trace on interrupt of call-process"
> 
> Isn't that call issued from the mode-line display?  If so, that is
> done from redisplay, and redisplay cannot enter debugger, so it
> catches all errors.  If you want to produce Lisp backtraces from Lisp
> code called by redisplay, you need to use the facilities documented in
> the node "Debugging Redisplay" in the ELisp Reference manual.

1. Thinking about it, I can see why redisplay can't be allowed to enter
the debugger. Infinite regress...

2. I don't know if that subprocess is called from the modeline
code. Probably, but I'd have to dig into vc.el to check. I won't have
time for that for a few days yet.

3. Assuming that it is called from the modeline code, the question
shifts from "Why did I have so much trouble generating a debug trace?"
to "How could I get one at all"?" There's some kind of timing issue,
I think.

Just to make this saga more interesting, after I turned in my last
report I was disamayed to find the the hang on mode initialization
wasn't *entirely* banished.  Fortunately I had a test case that would
reproduce it reliably.  

Some bisecting revealed that SRC had a *real* hang bug (not a mere.
pseudo-hang due to a long-running command) It seems I omitted a loop
break while I was performing what I thought was a safe refactor. Last
September...

644 cases in my test suite and none of them caught it, nor did I
encounter it in heavy production use between then and yesterday. The
trigger is some strange corner case in parsing RCS masters.  I repaired
the code, but...

...this directs my attention to the fact that that Emacs makes it
generally difficult to notice and diagnose ill-behaved subprocesses,
wuth problems behind the mode-line being the extreme case.  Alas, I
don't know what to do about this other than windmill my arms at the
dev group.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Insight into the mystery hangs
  2024-02-12 18:26   ` Eric S. Raymond
@ 2024-02-12 19:22     ` Eli Zaretskii
  0 siblings, 0 replies; 8+ messages in thread
From: Eli Zaretskii @ 2024-02-12 19:22 UTC (permalink / raw)
  To: esr; +Cc: emacs-devel

> Date: Mon, 12 Feb 2024 13:26:16 -0500
> From: "Eric S. Raymond" <esr@thyrsus.com>
> Cc: emacs-devel@gnu.org
> 
> Eli Zaretskii <eliz@gnu.org>:
> > Isn't that call issued from the mode-line display?  If so, that is
> > done from redisplay, and redisplay cannot enter debugger, so it
> > catches all errors.  If you want to produce Lisp backtraces from Lisp
> > code called by redisplay, you need to use the facilities documented in
> > the node "Debugging Redisplay" in the ELisp Reference manual.
> 
> 1. Thinking about it, I can see why redisplay can't be allowed to enter
> the debugger. Infinite regress...
> 
> 2. I don't know if that subprocess is called from the modeline
> code. Probably, but I'd have to dig into vc.el to check. I won't have
> time for that for a few days yet.
> 
> 3. Assuming that it is called from the modeline code, the question
> shifts from "Why did I have so much trouble generating a debug trace?"
> to "How could I get one at all"?" There's some kind of timing issue,
> I think.

If you eventually can get Emacs into producing a backtrace, the
facilities described in the node "Debugging Redisplay", mentioned
above, should allow you to have it saved in a special buffer that you
can then examine.

> ...this directs my attention to the fact that that Emacs makes it
> generally difficult to notice and diagnose ill-behaved subprocesses,
> wuth problems behind the mode-line being the extreme case.  Alas, I
> don't know what to do about this other than windmill my arms at the
> dev group.

Not sure I understand what kind of features would you like to have to
make this easier.  To detect the fact that a sub-process takes most of
the time you could use profiler.el -- the profile should show that a
large portion of time is spent inside call-process or its ilk.
Alternatively, you could use 'top' or a similar tool to see what the
Emacs process and its subprocesses are doing.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Insight into the mystery hangs
  2024-02-12 12:52 ` Eli Zaretskii
  2024-02-12 18:26   ` Eric S. Raymond
@ 2024-02-13  7:52   ` Kévin Le Gouguec
  2024-02-13 12:56     ` Eli Zaretskii
  1 sibling, 1 reply; 8+ messages in thread
From: Kévin Le Gouguec @ 2024-02-13  7:52 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Eric S. Raymond, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: "Eric S. Raymond" <esr@thyrsus.com>
>> Date: Sun, 11 Feb 2024 16:37:37 -0500 (EST)
>> 
>> However.  Emacs is not entirely off the hook here.  When I'm not under
>> deadline pressure I will file a bug with a title something like
>> "With debug-on-quit enabled, Emacs does not reliably raise a debug
>> trace on interrupt of call-process"
>
> Isn't that call issued from the mode-line display?  If so, that is
> done from redisplay, and redisplay cannot enter debugger, so it
> catches all errors.  If you want to produce Lisp backtraces from Lisp
> code called by redisplay, you need to use the facilities documented in
> the node "Debugging Redisplay" in the ELisp Reference manual.

Tangential, not so much about debugging errors during redisplay, than
merely noticing them: I do periodically forget that these errors are
caught, so every couple of months I find myself frowning at code
(font-locking, usually) that misbehaves yet fails to throw an error for
about five minutes, before remembering to check *Messages* for any
"Error during redisplay".

Couple of questions:

1. "For Science", I just did…

    (setq mode-line-format '(:eval (error "argh")))

   … and I am surprised that this *Messages* diagnostic…

    > Error during redisplay: (eval (error "argh") t) signaled (error "argh")

   … never seems to make its way to the echo area?  It's a small thing
   (I'll probably have *Messages* open when debugging so I'll notice
   eventually) but it is somewhat surprising that these warnings can
   pile up in *Messages* yet the echo area remains blank.

2. It's neat that…

    (setq mode-line-format '("foo " (:eval (error "argh")) " bar"))

   … will be "robust" enough to display "foo bar" in the mode-line;
   combined with the lack of echo-area reporting it does make it hard to
   ever know something went wrong though.  Might it make sense to (have
   an option to) substitute a signaling :eval form with some "[redisplay
   error]" placeholder?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Insight into the mystery hangs
  2024-02-13  7:52   ` Kévin Le Gouguec
@ 2024-02-13 12:56     ` Eli Zaretskii
  2024-02-13 23:05       ` Kévin Le Gouguec
  0 siblings, 1 reply; 8+ messages in thread
From: Eli Zaretskii @ 2024-02-13 12:56 UTC (permalink / raw)
  To: Kévin Le Gouguec; +Cc: esr, emacs-devel

> From: Kévin Le Gouguec <kevin.legouguec@gmail.com>
> Cc: "Eric S. Raymond" <esr@thyrsus.com>,  emacs-devel@gnu.org
> Date: Tue, 13 Feb 2024 08:52:06 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Isn't that call issued from the mode-line display?  If so, that is
> > done from redisplay, and redisplay cannot enter debugger, so it
> > catches all errors.  If you want to produce Lisp backtraces from Lisp
> > code called by redisplay, you need to use the facilities documented in
> > the node "Debugging Redisplay" in the ELisp Reference manual.
> 
> Tangential, not so much about debugging errors during redisplay, than
> merely noticing them: I do periodically forget that these errors are
> caught, so every couple of months I find myself frowning at code
> (font-locking, usually) that misbehaves yet fails to throw an error for
> about five minutes, before remembering to check *Messages* for any
> "Error during redisplay".

Yes, something to keep in mind.

> 1. "For Science", I just did…
> 
>     (setq mode-line-format '(:eval (error "argh")))
> 
>    … and I am surprised that this *Messages* diagnostic…
> 
>     > Error during redisplay: (eval (error "argh") t) signaled (error "argh")
> 
>    … never seems to make its way to the echo area?  It's a small thing
>    (I'll probably have *Messages* open when debugging so I'll notice
>    eventually) but it is somewhat surprising that these warnings can
>    pile up in *Messages* yet the echo area remains blank.

Question: what happens when a message is shown in the echo-area?
Answer: Emacs triggers redisplay.
Question: what happens in redisplay which was triggered by error
message shown in the echo-area because of error during redisplay?
Answer: the same error during redisplay will happen again, as part of
the new redisplay cycle.
Question: what will this do to Emacs?
Answer: a never-ending sequence of error, displaying an error message,
which causes another error, which displays the same error message,
which causes the same error, etc. etc., ad nauseam.
Question: what will the user see?
Answer: a never-ending sequence of rapidly blinking windows, and an
Emacs session that is unusable.

> 2. It's neat that…
> 
>     (setq mode-line-format '("foo " (:eval (error "argh")) " bar"))
> 
>    … will be "robust" enough to display "foo bar" in the mode-line;
>    combined with the lack of echo-area reporting it does make it hard to
>    ever know something went wrong though.

Where ignoring the problematic result can help us recover (at a price
of botching the trouble-making Lisp), we do that.

>    Might it make sense to (have an option to) substitute a signaling
>    :eval form with some "[redisplay error]" placeholder?

You mean, display the error on the mode line or something?  We
actually do that in some simple cases, but the problem is that the
mode line has too little real-estate to show enough information
without catastrophically wiping everything else.  But if you want to
work on a feature whereby a redisplay error adds some button on the
mode line which, if pressed, will show the *Messages* buffer with the
details, feel free.  Just be aware of the following gotcha: changing
how the mode line looks will also trigger a kind of redisplay.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Insight into the mystery hangs
  2024-02-13 12:56     ` Eli Zaretskii
@ 2024-02-13 23:05       ` Kévin Le Gouguec
  0 siblings, 0 replies; 8+ messages in thread
From: Kévin Le Gouguec @ 2024-02-13 23:05 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: esr, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Kévin Le Gouguec <kevin.legouguec@gmail.com>
>> Cc: "Eric S. Raymond" <esr@thyrsus.com>,  emacs-devel@gnu.org
>> Date: Tue, 13 Feb 2024 08:52:06 +0100
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> > Isn't that call issued from the mode-line display?  If so, that is
>> > done from redisplay, and redisplay cannot enter debugger, so it
>> > catches all errors.  If you want to produce Lisp backtraces from Lisp
>> > code called by redisplay, you need to use the facilities documented in
>> > the node "Debugging Redisplay" in the ELisp Reference manual.
>> 
>> Tangential, not so much about debugging errors during redisplay, than
>> merely noticing them: I do periodically forget that these errors are
>> caught, so every couple of months I find myself frowning at code
>> (font-locking, usually) that misbehaves yet fails to throw an error for
>> about five minutes, before remembering to check *Messages* for any
>> "Error during redisplay".
>
> Yes, something to keep in mind.
>
>> 1. "For Science", I just did…
>> 
>>     (setq mode-line-format '(:eval (error "argh")))
>> 
>>    … and I am surprised that this *Messages* diagnostic…
>> 
>>     > Error during redisplay: (eval (error "argh") t) signaled (error "argh")
>> 
>>    … never seems to make its way to the echo area?  It's a small thing
>>    (I'll probably have *Messages* open when debugging so I'll notice
>>    eventually) but it is somewhat surprising that these warnings can
>>    pile up in *Messages* yet the echo area remains blank.
>
> Question: what happens when a message is shown in the echo-area?
> Answer: Emacs triggers redisplay.

Ah, yes; it would, wouldn't it 🥲

> Question: what happens in redisplay which was triggered by error
> message shown in the echo-area because of error during redisplay?
> Answer: the same error during redisplay will happen again, as part of
> the new redisplay cycle.
> Question: what will this do to Emacs?
> Answer: a never-ending sequence of error, displaying an error message,
> which causes another error, which displays the same error message,
> which causes the same error, etc. etc., ad nauseam.
> Question: what will the user see?
> Answer: a never-ending sequence of rapidly blinking windows, and an
> Emacs session that is unusable.

>> 2. It's neat that…
>> 
>>     (setq mode-line-format '("foo " (:eval (error "argh")) " bar"))
>> 
>>    … will be "robust" enough to display "foo bar" in the mode-line;
>>    combined with the lack of echo-area reporting it does make it hard to
>>    ever know something went wrong though.
>
> Where ignoring the problematic result can help us recover (at a price
> of botching the trouble-making Lisp), we do that.
>
>>    Might it make sense to (have an option to) substitute a signaling
>>    :eval form with some "[redisplay error]" placeholder?
>
> You mean, display the error on the mode line or something?  We
> actually do that in some simple cases, but the problem is that the
> mode line has too little real-estate to show enough information
> without catastrophically wiping everything else.  But if you want to
> work on a feature whereby a redisplay error adds some button on the
> mode line which, if pressed, will show the *Messages* buffer with the
> details, feel free.

Right, I think this is where my mind was going with this (taking
observations 1 and 2 into account).

>                      Just be aware of the following gotcha: changing
> how the mode line looks will also trigger a kind of redisplay.

Thanks for taking the time to spell all of this out.  I guess at the
moment I can't think of anything smarter than e.g.

* a boolean set by redisplay when it catches an error,

* a mode-line construct that shows a button when the boolean is set,

* the button offering a mouse binding to clear the boolean, and another
  to visit *Messages*.

Of course (a) the construct might be "late" by one redisplay cycle, if
it is processed before the error is raised in the same cycle (b)
redisplay might reset the boolean immediately after the user clears it,
if the cause of the original error persists (c) the boolean would remain
set indefinitely after the cause has been fixed, until the user manually
clears it.

FWIW I wouldn't see this as much worse than the current situation, but
then again, as my tangent's introduction implies, I don't experience
redisplay errors that often (… or at least I don't notice them 😉).

Again, thanks for your reply; appreciate the food for thought.



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-02-13 23:05 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-11 21:37 Insight into the mystery hangs Eric S. Raymond
2024-02-12 12:38 ` Alan Mackenzie
2024-02-12 12:52 ` Eli Zaretskii
2024-02-12 18:26   ` Eric S. Raymond
2024-02-12 19:22     ` Eli Zaretskii
2024-02-13  7:52   ` Kévin Le Gouguec
2024-02-13 12:56     ` Eli Zaretskii
2024-02-13 23:05       ` Kévin Le Gouguec

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.