unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal
       [not found]                   ` <835ye8fweu.fsf@gnu.org>
@ 2022-12-19  1:56                     ` Po Lu
  2022-12-19 14:36                       ` Eli Zaretskii
  0 siblings, 1 reply; 9+ messages in thread
From: Po Lu @ 2022-12-19  1:56 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> But AFAIK it is _always_ safe for unblock_input to signal.  When do
> you think it isn't?

In almost all of the helper functions where it is called:
tear_down_x_back_buffer, set_up_x_back_buffer, et cetera.

Otherwise, why does it not call maybe_quit after reading pending events?
If it's safe to signal from unblock_input, then there should be no point
in delaying quitting until the next time maybe_quit is called?

> ??? w32_read_socket runs in the Lisp (a.k.a. "main") thread.  So it is
> safe for any code it calls it to signal errors and do anything else
> it's safe to do for the Lisp machine.

It can run from unblock_input.  Basically any code where it is unsafe to
signal (and there is a lot) calling unblock_input is put at risk.  Think
of all the callers of xmalloc without input blocked.

> You cannot request that.  note_mouse_highlight examines properties,
> and that can always signal, because properties are Lisp creatures and
> can invoke Lisp.

Could you please explain how?

> We gradually removed all those unsafe parts, and nowadays we only do
> the minimum in such contexts.  If unsafe processing of input is still
> with us, we should move to safer techniques.  That this unsafe code
> exists for such a long time is therefore not a justification for it to
> continue existing.
>
> Also, I think this unsafe processing of events only happens with
> GTK/PGTK; other X configurations call XTread_socket and
> handle_one_xevent from keyboard.c, in a "safe" context.

Non-local exits from handle_one_xevent are simply not safe.  That
function is not only called from XTread_socket, it is also called from
x_dispatch_event, which is called from various toolkit menus, including
inside the menu widget's own event loop.

> I didn't say we shouldn't process mouse movements.  I said that this
> processing should be limited to generating an Emacs input event and
> queuing it, will all the pertinent information for further processing.
> For example, note_mouse_highlight does just three things:
>
>   . redisplays portion of the window in a special face
>   . changes the way the cursor is drawn
>   . shows help-echo
>
> All of that can be done given an input event read by terminals
> read_socket_hook inside keyboard.c, provided that the information
> about the mouse move is stored in the input event.  There's
> absolutely no reason to produce the above 3 effects right where we are
> fed the raw X or GTK event from the window-system or the toolkit.

How would working with the mouse highlight work inside a popup menu,
then?  Keyboard events are not being read inside popup_get_selection.

> AFAIU, these two events will both be queued, and will both be
> processed, so there will be not "suck mouse highlight".

No, clear_mouse_face is absolutely safe to call synchronously, from the
likes of x_mouse_leave and the XI_Leave handlers.

In fact, it can be called from places where it is impossible to handle
keyboard events: popup dialogs and menus.

> So I still don't understand what is it that I'm missing that makes you
> say this safe processing of window-system events is impossible.

Just that I'm not confident in that.  I can't explain why but the mouse
highlight code is already tricky now, and something tells me moving it
to keyboard.c will make it even trickier.

And besides, note_mouse_highlight should not signal, regardless of
whether or not doing so is safe.  Otherwise, users' work could be
interrupted by simple mouse movement.

> Anyway, this bug report is not the proper place to discuss this.
> Please start a discussion on emacs-devel, and let's pick this up
> there.

Done.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal
  2022-12-19  1:56                     ` bug#60144: 30.0.50; PGTK Emacs crashes after signal Po Lu
@ 2022-12-19 14:36                       ` Eli Zaretskii
  2022-12-20  1:39                         ` Po Lu
  0 siblings, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2022-12-19 14:36 UTC (permalink / raw)
  To: Po Lu; +Cc: emacs-devel

> From: Po Lu <luangruo@yahoo.com>
> Cc: emacs-devel@gnu.org
> Date: Mon, 19 Dec 2022 09:56:40 +0800
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > But AFAIK it is _always_ safe for unblock_input to signal.  When do
> > you think it isn't?
> 
> In almost all of the helper functions where it is called:
> tear_down_x_back_buffer, set_up_x_back_buffer, et cetera.

I don't think I understand why.  Let's take tear_down_x_back_buffer,
for example: why isn't it safe to call unblock_input from it?
tear_down_x_back_buffer is called:

  . from x_set_inhibit_double_buffering, which is a frame parameter
    handler, so it is only called from the Lisp thread
  . from x_free_frame_resources, which AFAIU is only called from the
    Lisp thread, when we delete a frame

So what is the problem here? what am I missing?

> Otherwise, why does it not call maybe_quit after reading pending events?
> If it's safe to signal from unblock_input, then there should be no point
> in delaying quitting until the next time maybe_quit is called?

Sorry, I don't see the relevance.  Even if we don't immediately call
maybe_quit, it doesn't mean we couldn't if we wanted or needed to.  It
isn't evidence that unblock_input cannot signal or is more generally
unsafe to call.

> > ??? w32_read_socket runs in the Lisp (a.k.a. "main") thread.  So it is
> > safe for any code it calls it to signal errors and do anything else
> > it's safe to do for the Lisp machine.
> 
> It can run from unblock_input.

But unblock_input is only called from the Lisp thread, at least in the
w32 build.  And it should only be called from the Lisp thread.  It
cannot be safely called from any other thread.

> Basically any code where it is unsafe to
> signal (and there is a lot) calling unblock_input is put at risk.  Think
> of all the callers of xmalloc without input blocked.

Why would that be the problem?  Did you read the commentary around
line 710 in alloc.c?

> > You cannot request that.  note_mouse_highlight examines properties,
> > and that can always signal, because properties are Lisp creatures and
> > can invoke Lisp.
> 
> Could you please explain how?

Which part is unclear?

> > We gradually removed all those unsafe parts, and nowadays we only do
> > the minimum in such contexts.  If unsafe processing of input is still
> > with us, we should move to safer techniques.  That this unsafe code
> > exists for such a long time is therefore not a justification for it to
> > continue existing.
> >
> > Also, I think this unsafe processing of events only happens with
> > GTK/PGTK; other X configurations call XTread_socket and
> > handle_one_xevent from keyboard.c, in a "safe" context.
> 
> Non-local exits from handle_one_xevent are simply not safe.  That
> function is not only called from XTread_socket, it is also called from
> x_dispatch_event, which is called from various toolkit menus, including
> inside the menu widget's own event loop.

These calls will all have to go, then.  Maybe we should use some
simplified, stripped down version of handle_one_xevent, which doesn't
invoke processing that can access Lisp data and the global state.  Why
would a toolkit menu widget need to call note_mouse_highlight, which
is _our_ function that implements _our_ features, about which GTK (or
any other toolkit) knows absolutely nothing??

The fact that we call handle_one_xevent from places other than the
read_socket_hook is IMO a grave mistake, and we should strive to fix
as many of such mistakes as possible.  Because making these calls safe
is impossible, and by allowing them, perhaps even increasing their
numbers as time goes by, we make Emacs more and more fragile on the
platforms where we do that.

> > I didn't say we shouldn't process mouse movements.  I said that this
> > processing should be limited to generating an Emacs input event and
> > queuing it, will all the pertinent information for further processing.
> > For example, note_mouse_highlight does just three things:
> >
> >   . redisplays portion of the window in a special face
> >   . changes the way the cursor is drawn
> >   . shows help-echo
> >
> > All of that can be done given an input event read by terminals
> > read_socket_hook inside keyboard.c, provided that the information
> > about the mouse move is stored in the input event.  There's
> > absolutely no reason to produce the above 3 effects right where we are
> > fed the raw X or GTK event from the window-system or the toolkit.
> 
> How would working with the mouse highlight work inside a popup menu,
> then?  Keyboard events are not being read inside popup_get_selection.

If it's impossible to do safely, it will not work.  It already doesn't
work in popup menus on w32, for similar reasons.  That's a minor
inconvenience to users, and a much smaller catastrophe than making
these unsafe calls.

> > AFAIU, these two events will both be queued, and will both be
> > processed, so there will be not "suck mouse highlight".
> 
> No, clear_mouse_face is absolutely safe to call synchronously, from the
> likes of x_mouse_leave and the XI_Leave handlers.
> 
> In fact, it can be called from places where it is impossible to handle
> keyboard events: popup dialogs and menus.

I don't see how this is related, sorry.  You asked:

> For example, if an XI_Motion event arrives and is queued, and
> then a subsequent XI_Leave event arrives before that event has a chance
> to be processed ``in our own safe context'', note_mouse_highlight will
> be called after the mouse has left the frame, leading to stuck mouse
> highlight.

And I responded saying that there should be no "mouse stuck" because
both XI_Motion and XI_Leave will be queued and processed in the order
they arrived.  What does this have to do with the safety of calling
clear_mouse_face, or lack thereof?

I'm saying that there's no need to process these events immediately,
when the event calls into the Lisp machine.  We can, and in many cases
already do, queue the event and process it a bit later, and in a
different, safer, context.

> > So I still don't understand what is it that I'm missing that makes you
> > say this safe processing of window-system events is impossible.
> 
> Just that I'm not confident in that.  I can't explain why but the mouse
> highlight code is already tricky now, and something tells me moving it
> to keyboard.c will make it even trickier.

But it's already "in keyboard.c"!  See the part that begins with

  /* Try generating a mouse motion event.  */
  else if (some_mouse_moved ())

which ends with

      if (!NILP (x) && NILP (obj))
	obj = make_lispy_movement (f, bar_window, part, x, y, t);

Then we process these "lispy movements" as input.

> And besides, note_mouse_highlight should not signal, regardless of
> whether or not doing so is safe.

Not normally, no.  But if some Lisp program sets bad text properties
or overlays, it could signal.  We can never assume in Emacs that some
Lisp machine code never signals.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal
  2022-12-19 14:36                       ` Eli Zaretskii
@ 2022-12-20  1:39                         ` Po Lu
  2022-12-20 15:16                           ` Eli Zaretskii
  0 siblings, 1 reply; 9+ messages in thread
From: Po Lu @ 2022-12-20  1:39 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Po Lu <luangruo@yahoo.com>
>> Cc: emacs-devel@gnu.org
>> Date: Mon, 19 Dec 2022 09:56:40 +0800
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> > But AFAIK it is _always_ safe for unblock_input to signal.  When do
>> > you think it isn't?
>> 
>> In almost all of the helper functions where it is called:
>> tear_down_x_back_buffer, set_up_x_back_buffer, et cetera.
>
> I don't think I understand why.  Let's take tear_down_x_back_buffer,
> for example: why isn't it safe to call unblock_input from it?
> tear_down_x_back_buffer is called:
>
>   . from x_set_inhibit_double_buffering, which is a frame parameter
>     handler, so it is only called from the Lisp thread
>   . from x_free_frame_resources, which AFAIU is only called from the
>     Lisp thread, when we delete a frame
>
> So what is the problem here? what am I missing?

If unblock_input signals inside, then the double buffering state could
be left inconsistent.  For example, the back buffer picture could be
freed, but the back buffer itself may be left present, which can lead to
crashes later on.  That it is called from the only thread present does
not make it safe.

More generally, C code does not expect unblock_input to signal!  See
this piece of call_process in callproc.c:

	{
	  if (!specpdl_ref_valid_p (tempfile_index))
	    record_deleted_pid (pid, Qnil);
	  else
	    {
	      eassert (1 < nargs);
	      record_deleted_pid (pid, args[1]);
	      clear_unwind_protect (tempfile_index);
	    }
	  synch_process_pid = 0;
	}
    }

  unblock_child_signal (&oldset);
  unblock_input (); <=======

  if (pid < 0)
    report_file_errno (CHILD_SETUP_ERROR_DESC, Qnil, child_errno);

  /* Close our file descriptors, except for callproc_fd[CALLPROC_PIPEREAD]
     since we will use that to read input from.  */
  for (i = 0; i < CALLPROC_FDS; i++)
    if (i != CALLPROC_PIPEREAD && 0 <= callproc_fd[i])
      {
	emacs_close (callproc_fd[i]);
	callproc_fd[i] = -1;
      }
  emacs_close (filefd);
  clear_unwind_protect (specpdl_ref_add (count, -1));

If unblock_input signals, filefd will not be closed.

Or this piece of x_create_bitmap_mask in image.c:

  pixmap = image_bitmap_pixmap (f, id);
  width = x_bitmap_width (f, id);
  height = x_bitmap_height (f, id);

  block_input ();
  ximg = XGetImage (FRAME_X_DISPLAY (f), pixmap, 0, 0, width, height,
		    ~0, ZPixmap);

  if (!ximg)
    {
      unblock_input ();
      return;
    }

  result = x_create_x_image_and_pixmap (f, width, height, 1, &mask_img, &mask);

  unblock_input (); <======================
  if (!result)
    {
      XDestroyImage (ximg);
      return;
    }

If unblock_input signals, then ximg (which is potentially a lot of data)
leaks.

> But unblock_input is only called from the Lisp thread, at least in the
> w32 build.  And it should only be called from the Lisp thread.  It
> cannot be safely called from any other thread.

Being called from the Lisp thread does not make it safe to signal!
Consider the following example code:

  int fd;
  char *buffer;

  fd = emacs_open (...);

  if (fd == -1)
    return;

  buffer = xmalloc (100);

  block_input ();
  do_something_with_buffer_and_fd (fd, buffer);
  unblock_input ();

  /* more activity.  */
  close (fd);
  xfree (buffer);

Notice how fd and bufferare not closed if unblock_input signals.  Now
look at ftcrfont_open:

  block_input ();

  pat = ftfont_entity_pattern (entity, pixel_size);
  FcConfigSubstitute (NULL, pat, FcMatchPattern);
  FcDefaultSubstitute (pat);
  match = FcFontMatch (NULL, pat, &result);
  ftfont_fix_match (pat, match);

  FcPatternDestroy (pat);
  font_face = cairo_ft_font_face_create_for_pattern (match);
  if (!font_face
      || cairo_font_face_status (font_face) != CAIRO_STATUS_SUCCESS)
    {
      unblock_input ();
      FcPatternDestroy (match);
      return Qnil;
    }
  cairo_matrix_t font_matrix, ctm;
  cairo_matrix_init_scale (&font_matrix, pixel_size, pixel_size);
  cairo_matrix_init_identity (&ctm);

#ifdef HAVE_PGTK
  cairo_font_options_t *options = xsettings_get_font_options ();
#else
  cairo_font_options_t *options = cairo_font_options_create ();
#endif
#ifdef USE_BE_CAIRO
  if (be_use_subpixel_antialiasing ())
    cairo_font_options_set_antialias (options, CAIRO_ANTIALIAS_SUBPIXEL);
#endif
  cairo_scaled_font_t *scaled_font
    = cairo_scaled_font_create (font_face, &font_matrix, &ctm, options);
  cairo_font_face_destroy (font_face);
  cairo_font_options_destroy (options);
  unblock_input ();

If the last unblock_input signals, match (allocated by FcFontMatch)
leaks.

>> > You cannot request that.  note_mouse_highlight examines properties,
>> > and that can always signal, because properties are Lisp creatures and
>> > can invoke Lisp.
>> 
>> Could you please explain how?
>
> Which part is unclear?

Which properties involve Lisp execution, and why the evaluation of that
Lisp cannot be put inside internal_catch_all.

> These calls will all have to go, then.  Maybe we should use some
> simplified, stripped down version of handle_one_xevent, which doesn't
> invoke processing that can access Lisp data and the global state.  Why
> would a toolkit menu widget need to call note_mouse_highlight, which
> is _our_ function that implements _our_ features, about which GTK (or
> any other toolkit) knows absolutely nothing??

The toolkit menu mostly demands that Emacs set the cursor to the
appropriate cursor for whatever the pointer is under.  The X server can
also demand Emacs expose parts of the frame at any time, due to window
configuration changes.

> If it's impossible to do safely, it will not work.  It already doesn't
> work in popup menus on w32, for similar reasons.  That's a minor
> inconvenience to users, and a much smaller catastrophe than making
> these unsafe calls.

But it worked since the beginning of 2006 or so, when xmenu.c was made
to call handle_one_xevent.  Anyway, the "minor inconvenience" can be
avoided by simply wrapping note_mouse_highlight in internal_catch_all.
Why do you think that would be too harsh?

> I don't see how this is related, sorry.  You asked:
>
>> For example, if an XI_Motion event arrives and is queued, and
>> then a subsequent XI_Leave event arrives before that event has a chance
>> to be processed ``in our own safe context'', note_mouse_highlight will
>> be called after the mouse has left the frame, leading to stuck mouse
>> highlight.
>
> And I responded saying that there should be no "mouse stuck" because
> both XI_Motion and XI_Leave will be queued and processed in the order
> they arrived.  What does this have to do with the safety of calling
> clear_mouse_face, or lack thereof?
>
> I'm saying that there's no need to process these events immediately,
> when the event calls into the Lisp machine.  We can, and in many cases
> already do, queue the event and process it a bit later, and in a
> different, safer, context.

But then as a result the mouse face will not be dismissed when entering
a menu.  And we will also lose the valuable development feature of using
mouse-high to determine how wedged a stuck Emacs really is.

> But it's already "in keyboard.c"!  See the part that begins with
>
>   /* Try generating a mouse motion event.  */
>   else if (some_mouse_moved ())
>
> which ends with
>
>       if (!NILP (x) && NILP (obj))
> 	obj = make_lispy_movement (f, bar_window, part, x, y, t);
>
> Then we process these "lispy movements" as input.

That isn't related to mouse-highlight, and only happens when track-mouse
is non-nil.  track-mouse is nil because it requires calling
XTmouse_position on all mouse movement, which is a very expensive
operation (with 15 km between me and the X server, it takes 90
milliseconds!)

> Not normally, no.  But if some Lisp program sets bad text properties
> or overlays, it could signal.  We can never assume in Emacs that some
> Lisp machine code never signals.

Then the simple solution would be to catch those signals inside
note_mouse_highlight.  What could be the problem with that?  We could
even log the signals to *Messages*, like with-demoted-errors.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal
  2022-12-20  1:39                         ` Po Lu
@ 2022-12-20 15:16                           ` Eli Zaretskii
  2022-12-21  1:01                             ` Po Lu
  0 siblings, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2022-12-20 15:16 UTC (permalink / raw)
  To: Po Lu; +Cc: emacs-devel

> From: Po Lu <luangruo@yahoo.com>
> Cc: emacs-devel@gnu.org
> Date: Tue, 20 Dec 2022 09:39:13 +0800
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> From: Po Lu <luangruo@yahoo.com>
> >> Cc: emacs-devel@gnu.org
> >> Date: Mon, 19 Dec 2022 09:56:40 +0800
> >> 
> >> Eli Zaretskii <eliz@gnu.org> writes:
> >> 
> >> > But AFAIK it is _always_ safe for unblock_input to signal.  When do
> >> > you think it isn't?
> >> 
> >> In almost all of the helper functions where it is called:
> >> tear_down_x_back_buffer, set_up_x_back_buffer, et cetera.
> >
> > I don't think I understand why.  Let's take tear_down_x_back_buffer,
> > for example: why isn't it safe to call unblock_input from it?
> > tear_down_x_back_buffer is called:
> >
> >   . from x_set_inhibit_double_buffering, which is a frame parameter
> >     handler, so it is only called from the Lisp thread
> >   . from x_free_frame_resources, which AFAIU is only called from the
> >     Lisp thread, when we delete a frame
> >
> > So what is the problem here? what am I missing?
> 
> If unblock_input signals inside, then the double buffering state could
> be left inconsistent.  For example, the back buffer picture could be
> freed, but the back buffer itself may be left present, which can lead to
> crashes later on.  That it is called from the only thread present does
> not make it safe.
> 
> More generally, C code does not expect unblock_input to signal!

That is definitely incorrect, not in the general form you are
presenting it.  unblock_input processes input events, and those can
legitimately signal errors.  Other places in C call other functions
that can signal.  C code which cannot allow control to escape it via
non-local exits must use the record_unwind_protect machinery to set up
suitable unwinders.

I very much hope that all your work on new X features and in
particular on the new input events and XInput2 was not based on the
above wrong assumption -- that unblock_input can never signal (or
should never signal).

But we moved far away from the original issue, and are in fact
discussing a very far (and much less important, from my POV) tangent.
Let's not lose our focus, okay?  The original issue was whether we can
have code which calls handle_one_xevent and other similar functions,
which can call unblock_input and thus can signal an error, from
threads other than the main thread, or from callbacks we install that
are then called by toolkits from a non-main thread.  By contrast, the
past several messages were discussing whether unblock_input is allowed
to signal even if called from the main thread.  And that is an
entirely different issue.

It is a different issue because my point is simple: calls to
handle_one_xevent etc. from non-main threads _cannot_ be made safe in
Emacs, no matter what you do.  So it's an absolute no-no.  By
contrast, if we call unblock_input from the main thread, and the code
which calls it doesn't expect signals, all we need is either to
restructure the caller, or use record_unwind_protect.  Thus, even if
you are absolutely right about all of the cases you show below -- and
I'm not convinced you are right in all of them, but even if you are --
even if all of them are indeed unsafe in the face of Lisp errors, all
it means that we must examine them one by one and make them safe by
one of the above mentioned techniques.

So I would like to return to the main issue: calls to
handle_one_xevent from non-main threads.  That must not happen in
Emacs, or else the corresponding build will be fragile and can crash
given the opportune conditions.  If you agree with that, let's discuss
how to make those configurations safer.  If you don't agree, please
explain why.

> Being called from the Lisp thread does not make it safe to signal!

Not by itself, it doesn't.  But it definitely makes it fixable, so
that it _can_ be safe.  By contrast, calling the Lisp machine from
another thread _cannot_ be fixed.  It's a time bomb that will
definitely go off.

> If the last unblock_input signals, match (allocated by FcFontMatch)
> leaks.

Yes, code which doesn't expect errors might leak descriptors or memory
or other resources.  But resource leaks are very rarely fatal, unless
you leak a lot of them: modern machines have enough memory and file
descriptors to let us leak them for many days before disaster
strikes.  By contrast, a single error signaled from the wrong thread
will immediately crash Emacs because it longjmp's to the wrong stack!

> >> > You cannot request that.  note_mouse_highlight examines properties,
> >> > and that can always signal, because properties are Lisp creatures and
> >> > can invoke Lisp.
> >> 
> >> Could you please explain how?
> >
> > Which part is unclear?
> 
> Which properties involve Lisp execution, and why the evaluation of that
> Lisp cannot be put inside internal_catch_all.

You've just seen that: we examine text properties and overlays in a
buffer whose restriction could have changed behind our backs, what
with today's proliferation of calls into Lisp from very deep and
low-level functions.  We call Lisp in keyboard.c and in xdisp.c and in
insdel.c and in window.c.

> > These calls will all have to go, then.  Maybe we should use some
> > simplified, stripped down version of handle_one_xevent, which doesn't
> > invoke processing that can access Lisp data and the global state.  Why
> > would a toolkit menu widget need to call note_mouse_highlight, which
> > is _our_ function that implements _our_ features, about which GTK (or
> > any other toolkit) knows absolutely nothing??
> 
> The toolkit menu mostly demands that Emacs set the cursor to the
> appropriate cursor for whatever the pointer is under.

I cannot parse this: "set the cursor to the appropriate cursor"?

> The X server can also demand Emacs expose parts of the frame at any
> time, due to window configuration changes.

We already handle this, by setting a flag in the SIGIO handler, and
then calling expose_frame when it is safe.  What are the problems you
see here if we queue these events instead of processing them
immediately?

> > If it's impossible to do safely, it will not work.  It already doesn't
> > work in popup menus on w32, for similar reasons.  That's a minor
> > inconvenience to users, and a much smaller catastrophe than making
> > these unsafe calls.
> 
> But it worked since the beginning of 2006 or so, when xmenu.c was made
> to call handle_one_xevent.

"Worked" in the sense that we maybe didn't hear about problems, or
heard too few of them?  That doesn't mean the problems don't exist,
just that people don't come immediately running here and complain.

Besides, PGTK is very young, definitely not around since 2006.  Give
it some time.

> Anyway, the "minor inconvenience" can be avoided by simply wrapping
> note_mouse_highlight in internal_catch_all.  Why do you think that
> would be too harsh?

You cannot use that machinery reliably in a non-main thread anyway.
For starters, you have no idea what kind of stack do those threads
get.  They aren't our threads, so we don't know anything about them.
Using internal_catch_all would mean we'd need to run unwinders, and
there's no guarantee they could be run on those other threads.  For
example, they cannot safely access the state of the Lisp machine,
because they run on other threads asynchronically.

Besides, note_mouse_highlight is called from many places that don't
need any internal_catch_all.  If some caller cannot allow non-local
exits, it should call internal_catch_all by itself.  But that's
another tangent.

> > And I responded saying that there should be no "mouse stuck" because
> > both XI_Motion and XI_Leave will be queued and processed in the order
> > they arrived.  What does this have to do with the safety of calling
> > clear_mouse_face, or lack thereof?
> >
> > I'm saying that there's no need to process these events immediately,
> > when the event calls into the Lisp machine.  We can, and in many cases
> > already do, queue the event and process it a bit later, and in a
> > different, safer, context.
> 
> But then as a result the mouse face will not be dismissed when entering
> a menu.

Again, I don't understand: we don't display mouse-face on menus, only
on buffer text, on the mode line, and on the tool bar (if we implement
it in Emacs, not use the toolkit's tool bar).

> And we will also lose the valuable development feature of using
> mouse-high to determine how wedged a stuck Emacs really is.

Now you lost me.  What is that feature, and how is it related?

> > But it's already "in keyboard.c"!  See the part that begins with
> >
> >   /* Try generating a mouse motion event.  */
> >   else if (some_mouse_moved ())
> >
> > which ends with
> >
> >       if (!NILP (x) && NILP (obj))
> > 	obj = make_lispy_movement (f, bar_window, part, x, y, t);
> >
> > Then we process these "lispy movements" as input.
> 
> That isn't related to mouse-highlight, and only happens when track-mouse
> is non-nil.

Those are details.  My point is that we do process _some_ mouse events
in the main loop, and it works.

> > Not normally, no.  But if some Lisp program sets bad text properties
> > or overlays, it could signal.  We can never assume in Emacs that some
> > Lisp machine code never signals.
> 
> Then the simple solution would be to catch those signals inside
> note_mouse_highlight.  What could be the problem with that?

It's wrong, that's the problem.

And again, this is a tangent.  The main issue is that this code cannot
be called from a non-main thread, period.  Let's focus on that and
solve those problems first.  You made me very worried by telling we do
this nonchalantly and for a long time.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal
  2022-12-20 15:16                           ` Eli Zaretskii
@ 2022-12-21  1:01                             ` Po Lu
  2022-12-21 13:06                               ` Eli Zaretskii
  0 siblings, 1 reply; 9+ messages in thread
From: Po Lu @ 2022-12-21  1:01 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> That is definitely incorrect, not in the general form you are
> presenting it.  unblock_input processes input events, and those can
> legitimately signal errors.  Other places in C call other functions
> that can signal.  C code which cannot allow control to escape it via
> non-local exits must use the record_unwind_protect machinery to set up
> suitable unwinders.
>
> I very much hope that all your work on new X features and in
> particular on the new input events and XInput2 was not based on the
> above wrong assumption -- that unblock_input can never signal (or
> should never signal).
>
> But we moved far away from the original issue, and are in fact
> discussing a very far (and much less important, from my POV) tangent.
> Let's not lose our focus, okay?  The original issue was whether we can
> have code which calls handle_one_xevent and other similar functions,
> which can call unblock_input and thus can signal an error, from
> threads other than the main thread, or from callbacks we install that
> are then called by toolkits from a non-main thread.  By contrast, the
> past several messages were discussing whether unblock_input is allowed
> to signal even if called from the main thread.  And that is an
> entirely different issue.
>
> It is a different issue because my point is simple: calls to
> handle_one_xevent etc. from non-main threads _cannot_ be made safe in
> Emacs, no matter what you do.  So it's an absolute no-no.  By
> contrast, if we call unblock_input from the main thread, and the code
> which calls it doesn't expect signals, all we need is either to
> restructure the caller, or use record_unwind_protect.  Thus, even if
> you are absolutely right about all of the cases you show below -- and
> I'm not convinced you are right in all of them, but even if you are --
> even if all of them are indeed unsafe in the face of Lisp errors, all
> it means that we must examine them one by one and make them safe by
> one of the above mentioned techniques.
>
> So I would like to return to the main issue: calls to
> handle_one_xevent from non-main threads.  That must not happen in
> Emacs, or else the corresponding build will be fragile and can crash
> given the opportune conditions.  If you agree with that, let's discuss
> how to make those configurations safer.  If you don't agree, please
> explain why.
>
>> Being called from the Lisp thread does not make it safe to signal!
>
> Not by itself, it doesn't.  But it definitely makes it fixable, so
> that it _can_ be safe.  By contrast, calling the Lisp machine from
> another thread _cannot_ be fixed.  It's a time bomb that will
> definitely go off.
>
>> If the last unblock_input signals, match (allocated by FcFontMatch)
>> leaks.
>
> Yes, code which doesn't expect errors might leak descriptors or memory
> or other resources.  But resource leaks are very rarely fatal, unless
> you leak a lot of them: modern machines have enough memory and file
> descriptors to let us leak them for many days before disaster
> strikes.  By contrast, a single error signaled from the wrong thread
> will immediately crash Emacs because it longjmp's to the wrong stack!
>
>> >> > You cannot request that.  note_mouse_highlight examines properties,
>> >> > and that can always signal, because properties are Lisp creatures and
>> >> > can invoke Lisp.
>> >> 
>> >> Could you please explain how?
>> >
>> > Which part is unclear?
>> 
>> Which properties involve Lisp execution, and why the evaluation of that
>> Lisp cannot be put inside internal_catch_all.
>
> You've just seen that: we examine text properties and overlays in a
> buffer whose restriction could have changed behind our backs, what
> with today's proliferation of calls into Lisp from very deep and
> low-level functions.  We call Lisp in keyboard.c and in xdisp.c and in
> insdel.c and in window.c.
>
>> > These calls will all have to go, then.  Maybe we should use some
>> > simplified, stripped down version of handle_one_xevent, which doesn't
>> > invoke processing that can access Lisp data and the global state.  Why
>> > would a toolkit menu widget need to call note_mouse_highlight, which
>> > is _our_ function that implements _our_ features, about which GTK (or
>> > any other toolkit) knows absolutely nothing??
>> 
>> The toolkit menu mostly demands that Emacs set the cursor to the
>> appropriate cursor for whatever the pointer is under.
>
> I cannot parse this: "set the cursor to the appropriate cursor"?
>
>> The X server can also demand Emacs expose parts of the frame at any
>> time, due to window configuration changes.
>
> We already handle this, by setting a flag in the SIGIO handler, and
> then calling expose_frame when it is safe.  What are the problems you
> see here if we queue these events instead of processing them
> immediately?
>
>> > If it's impossible to do safely, it will not work.  It already doesn't
>> > work in popup menus on w32, for similar reasons.  That's a minor
>> > inconvenience to users, and a much smaller catastrophe than making
>> > these unsafe calls.
>> 
>> But it worked since the beginning of 2006 or so, when xmenu.c was made
>> to call handle_one_xevent.
>
> "Worked" in the sense that we maybe didn't hear about problems, or
> heard too few of them?  That doesn't mean the problems don't exist,
> just that people don't come immediately running here and complain.
>
> Besides, PGTK is very young, definitely not around since 2006.  Give
> it some time.
>
>> Anyway, the "minor inconvenience" can be avoided by simply wrapping
>> note_mouse_highlight in internal_catch_all.  Why do you think that
>> would be too harsh?
>
> You cannot use that machinery reliably in a non-main thread anyway.
> For starters, you have no idea what kind of stack do those threads
> get.  They aren't our threads, so we don't know anything about them.
> Using internal_catch_all would mean we'd need to run unwinders, and
> there's no guarantee they could be run on those other threads.  For
> example, they cannot safely access the state of the Lisp machine,
> because they run on other threads asynchronically.
>
> Besides, note_mouse_highlight is called from many places that don't
> need any internal_catch_all.  If some caller cannot allow non-local
> exits, it should call internal_catch_all by itself.  But that's
> another tangent.
>
>> > And I responded saying that there should be no "mouse stuck" because
>> > both XI_Motion and XI_Leave will be queued and processed in the order
>> > they arrived.  What does this have to do with the safety of calling
>> > clear_mouse_face, or lack thereof?
>> >
>> > I'm saying that there's no need to process these events immediately,
>> > when the event calls into the Lisp machine.  We can, and in many cases
>> > already do, queue the event and process it a bit later, and in a
>> > different, safer, context.
>> 
>> But then as a result the mouse face will not be dismissed when entering
>> a menu.
>
> Again, I don't understand: we don't display mouse-face on menus, only
> on buffer text, on the mode line, and on the tool bar (if we implement
> it in Emacs, not use the toolkit's tool bar).
>
>> And we will also lose the valuable development feature of using
>> mouse-high to determine how wedged a stuck Emacs really is.
>
> Now you lost me.  What is that feature, and how is it related?
>
>> > But it's already "in keyboard.c"!  See the part that begins with
>> >
>> >   /* Try generating a mouse motion event.  */
>> >   else if (some_mouse_moved ())
>> >
>> > which ends with
>> >
>> >       if (!NILP (x) && NILP (obj))
>> > 	obj = make_lispy_movement (f, bar_window, part, x, y, t);
>> >
>> > Then we process these "lispy movements" as input.
>> 
>> That isn't related to mouse-highlight, and only happens when track-mouse
>> is non-nil.
>
> Those are details.  My point is that we do process _some_ mouse events
> in the main loop, and it works.
>
>> > Not normally, no.  But if some Lisp program sets bad text properties
>> > or overlays, it could signal.  We can never assume in Emacs that some
>> > Lisp machine code never signals.
>> 
>> Then the simple solution would be to catch those signals inside
>> note_mouse_highlight.  What could be the problem with that?
>
> It's wrong, that's the problem.
>
> And again, this is a tangent.  The main issue is that this code cannot
> be called from a non-main thread, period.  Let's focus on that and
> solve those problems first.  You made me very worried by telling we do
> this nonchalantly and for a long time.

There is some kind of misunderstanding here.  You seem to be talking
about calling Lisp from another thread, which is a big no-no in my book
as well.  However, the problem is nowhere near as drastic.

Under the GTK builds, there is only a single thread.  The event loop
runs from the main thread.  Those calls to note_mouse_highlight *are*
being done from the main thread.  The problem is that it is unsafe to
signal in the main thread when handle_one_xevent is being called by
GLib, because GLib does the equivalent of this:

  static bool inside_callback;

  assert (!inside_callback);
  inside_callback = true;
  [call handle_one_xevent]
  inside_callback = false;

If handle_one_xevent signals, then inside_callback will never be set to
false.  As a result, the next time GLib enters its own event dispatch
code, it will abort, leading to the crash seen here.

No second thread is involved!



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal
  2022-12-21  1:01                             ` Po Lu
@ 2022-12-21 13:06                               ` Eli Zaretskii
  2022-12-22  1:28                                 ` Po Lu
  0 siblings, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2022-12-21 13:06 UTC (permalink / raw)
  To: Po Lu; +Cc: emacs-devel

> From: Po Lu <luangruo@yahoo.com>
> Cc: emacs-devel@gnu.org
> Date: Wed, 21 Dec 2022 09:01:01 +0800
> 
> There is some kind of misunderstanding here.  You seem to be talking
> about calling Lisp from another thread, which is a big no-no in my book
> as well.  However, the problem is nowhere near as drastic.
> 
> Under the GTK builds, there is only a single thread.  The event loop
> runs from the main thread.  Those calls to note_mouse_highlight *are*
> being done from the main thread.  The problem is that it is unsafe to
> signal in the main thread when handle_one_xevent is being called by
> GLib, because GLib does the equivalent of this:
> 
>   static bool inside_callback;
> 
>   assert (!inside_callback);
>   inside_callback = true;
>   [call handle_one_xevent]
>   inside_callback = false;
> 
> If handle_one_xevent signals, then inside_callback will never be set to
> false.  As a result, the next time GLib enters its own event dispatch
> code, it will abort, leading to the crash seen here.

OK, so then I ask again: why not have the function called by GTK just
enqueue events and return, and then let read_socket_hook read the
enqueued events and process them as they are read, in keyboard.c?



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal
  2022-12-21 13:06                               ` Eli Zaretskii
@ 2022-12-22  1:28                                 ` Po Lu
  2022-12-22  9:04                                   ` Eli Zaretskii
  0 siblings, 1 reply; 9+ messages in thread
From: Po Lu @ 2022-12-22  1:28 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Po Lu <luangruo@yahoo.com>
>> Cc: emacs-devel@gnu.org
>> Date: Wed, 21 Dec 2022 09:01:01 +0800
>> 
>> There is some kind of misunderstanding here.  You seem to be talking
>> about calling Lisp from another thread, which is a big no-no in my book
>> as well.  However, the problem is nowhere near as drastic.
>> 
>> Under the GTK builds, there is only a single thread.  The event loop
>> runs from the main thread.  Those calls to note_mouse_highlight *are*
>> being done from the main thread.  The problem is that it is unsafe to
>> signal in the main thread when handle_one_xevent is being called by
>> GLib, because GLib does the equivalent of this:
>> 
>>   static bool inside_callback;
>> 
>>   assert (!inside_callback);
>>   inside_callback = true;
>>   [call handle_one_xevent]
>>   inside_callback = false;
>> 
>> If handle_one_xevent signals, then inside_callback will never be set to
>> false.  As a result, the next time GLib enters its own event dispatch
>> code, it will abort, leading to the crash seen here.
>
> OK, so then I ask again: why not have the function called by GTK just
> enqueue events and return, and then let read_socket_hook read the
> enqueued events and process them as they are read, in keyboard.c?

After the function returns, it is not guaranteed that Emacs will be able
to enter read_socket_hook soon.

(Sorry if this mail reaches you multiple times: the mail server is
having issues.)



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal
  2022-12-22  1:28                                 ` Po Lu
@ 2022-12-22  9:04                                   ` Eli Zaretskii
  2022-12-22  9:57                                     ` Po Lu
  0 siblings, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2022-12-22  9:04 UTC (permalink / raw)
  To: Po Lu; +Cc: emacs-devel

> From: Po Lu <luangruo@yahoo.com>
> Cc: emacs-devel@gnu.org
> Date: Thu, 22 Dec 2022 09:28:58 +0800
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> Under the GTK builds, there is only a single thread.  The event loop
> >> runs from the main thread.  Those calls to note_mouse_highlight *are*
> >> being done from the main thread.  The problem is that it is unsafe to
> >> signal in the main thread when handle_one_xevent is being called by
> >> GLib, because GLib does the equivalent of this:
> >> 
> >>   static bool inside_callback;
> >> 
> >>   assert (!inside_callback);
> >>   inside_callback = true;
> >>   [call handle_one_xevent]
> >>   inside_callback = false;
> >> 
> >> If handle_one_xevent signals, then inside_callback will never be set to
> >> false.  As a result, the next time GLib enters its own event dispatch
> >> code, it will abort, leading to the crash seen here.
> >
> > OK, so then I ask again: why not have the function called by GTK just
> > enqueue events and return, and then let read_socket_hook read the
> > enqueued events and process them as they are read, in keyboard.c?
> 
> After the function returns, it is not guaranteed that Emacs will be able
> to enter read_socket_hook soon.

Why is that a problem?  If the event we queue includes all the data
needed for processing it, why does it matter how soon it is processed?

And what could delay the processing significantly in this scenario?
Since these callbacks are called from the main thread, it means Emacs
is not busy, and should return to the main loop very soon.  Right?



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal
  2022-12-22  9:04                                   ` Eli Zaretskii
@ 2022-12-22  9:57                                     ` Po Lu
  0 siblings, 0 replies; 9+ messages in thread
From: Po Lu @ 2022-12-22  9:57 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> Why is that a problem?  If the event we queue includes all the data
> needed for processing it, why does it matter how soon it is processed?

GLib might be in the middle of doing something itself, such as waiting
for file I/O, when it calls handle_one_xevent.

> And what could delay the processing significantly in this scenario?
> Since these callbacks are called from the main thread, it means Emacs
> is not busy, and should return to the main loop very soon.  Right?

Right, for some definition of "soon".

I tried moving the processing to pgtk_read_socket.  But on the PGTK
build, GTK apparently expects changes to the mouse pointer to be applied
inside the call to the motion handler function, so now the cursor image
is not updated during mouse motion.

And for whatever reason, gdk_display_flush leads to a deadlock somewhere
in GDK.  I will investigate that later.



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-12-22  9:57 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAGk_8X+jaz1kOcxx3j4kY3NN2UzWBX7_8+2eX3h7SR9gt-Cz4w@mail.gmail.com>
     [not found] ` <87edsxfop0.fsf@yahoo.com>
     [not found]   ` <83359dgt7v.fsf@gnu.org>
     [not found]     ` <87a63lfcz7.fsf@yahoo.com>
     [not found]       ` <83y1r5f6lh.fsf@gnu.org>
     [not found]         ` <875ye9f385.fsf@yahoo.com>
     [not found]           ` <83fsddey48.fsf@gnu.org>
     [not found]             ` <871qowgbay.fsf@yahoo.com>
     [not found]               ` <83bko0gact.fsf@gnu.org>
     [not found]                 ` <87wn6oesfx.fsf@yahoo.com>
     [not found]                   ` <835ye8fweu.fsf@gnu.org>
2022-12-19  1:56                     ` bug#60144: 30.0.50; PGTK Emacs crashes after signal Po Lu
2022-12-19 14:36                       ` Eli Zaretskii
2022-12-20  1:39                         ` Po Lu
2022-12-20 15:16                           ` Eli Zaretskii
2022-12-21  1:01                             ` Po Lu
2022-12-21 13:06                               ` Eli Zaretskii
2022-12-22  1:28                                 ` Po Lu
2022-12-22  9:04                                   ` Eli Zaretskii
2022-12-22  9:57                                     ` Po Lu

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).