all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* bug#60144: 30.0.50; PGTK Emacs crashes after signal
@ 2022-12-17  3:39 Karl Otness via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-12-18  2:08 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 20+ messages in thread
From: Karl Otness via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-12-17  3:39 UTC (permalink / raw)
  To: 60144

Hello, I have been having issues with unpredictable crashes running
Emacs master with PGTK on Wayland. This looks somewhat similar to
bug#59452.

Like that bug, it seems to be caused by an Emacs signal happening in a
GTK callback. It works its way to get_char_property_and_overlay
(textprop.c:644), signals, which longjmps out of the GLib/GObject
signal handling (g_signal_emit) leading to memory corruption and a
segfault.

Backtraces below. The segfault happens after continuing. Seems like
after continuing it reenters g_signal_emit and follows a corrupted
pointer in a linked list of signals to dispatch.

Unfortunately I don't have a good recipe for reliably reproducing it.
I've only seen it happen in buffers with eglot enabled (so far C++
buffers) when clicking around, typing, messing with the eglot menu,
etc.

This is for an Emacs from recent master.
Version: 30.0.50
Commit: 1568123196cd8b57ed64e284b7deb058026be713

Configured using:
 'configure --prefix=/usr --sysconfdir=/etc --libexecdir=/usr/lib
 --localstatedir=/var --with-pgtk --with-native-compilation
 --without-sound --with-harfbuzz --without-m17n-flt --without-xft
 --with-libotf --with-cairo --with-modules --without-gconf
 --without-gsettings --with-gameuser=:games --without-imagemagick
 --with-dumping=pdumper --with-sqlite3 --with-json --with-tree-sitter
 '--program-transform-name=s/^ctags$/ctags.emacs/' 'CFLAGS=-g -ggdb -O3
 -pipe -fno-plt -fstack-protector-all -fstack-clash-protection
 -fcf-protection=full -fPIE -D_FORTIFY_SOURCE=3 -march=native
 -mtune=native' 'LDFLAGS=-pie
 -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now,-z,noexecstack''

Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM HARFBUZZ JPEG JSON LCMS2
LIBOTF LIBSYSTEMD LIBXML2 MODULES NATIVE_COMP NOTIFY INOTIFY PDUMPER
PGTK PNG RSVG SECCOMP SQLITE3 THREADS TIFF TOOLKIT_SCROLL_BARS
TREE_SITTER WEBP XIM GTK3 ZLIB

Let me know if there's anything else I can gather that might be
helpful.

Thanks,
Karl

Here's the backtrace for the signal out of the event handling. From
GDB with a breakpoint on Fsignal and a condition
'$_any_caller_is("g_signal_emit", 20)'

> #0  Fsignal (error_symbol=error_symbol@entry=0x2f40, data=0x55bed53eeac3) at eval.c:1681
> #1  0x000055bece1213bf in xsignal (data=<optimized out>, error_symbol=0x2f40) at emacs/src/lisp.h:4558
> #2  xsignal1 (error_symbol=error_symbol@entry=0x2f40, arg=arg@entry=0x82) at eval.c:1878
> #3  0x000055bece1253e3 in get_char_property_and_overlay (position=0x82, prop=0x5a90, object=0x7f105451a265, overlay=0x0) at textprop.c:644
> #4  0x000055bece156110 in string_buffer_position_lim (string=string@entry=0x55bed52e8b24, from=from@entry=32, to=to@entry=1032, back_p=back_p@entry=false) at xdisp.c:6246
> #5  0x000055bece1561fa in string_buffer_position (string=0x55bed52e8b24, around_charpos=32) at xdisp.c:6284
> #6  0x000055bece1aaddb in note_mouse_highlight (f=f@entry=0x55bed1a839e8, x=<optimized out>, y=<optimized out>) at xdisp.c:35339
> #7  0x000055bece4039ac in note_mouse_movement (event=0x55bed1ce6030, frame=0x55bed1a839e8) at pgtkterm.c:5821
> #8  motion_notify_event (widget=widget@entry=0x55bed2024130, event=0x55bed1ce6030, user_data=<optimized out>) at pgtkterm.c:5905
> #9  0x00007f105c684fd8 in _gtk_marshal_BOOLEAN__BOXED (closure=0x55bed1ef9f40, return_value=0x7ffebc5ae480, n_param_values=<optimized out>, param_values=0x7ffebc5ae4e0, invocation_hint=<optimized out>, marshal_data=<optimized out>)
>     at gtk/gtkmarshalers.c:84
> #10 0x00007f105c095210 in g_closure_invoke (closure=0x55bed1ef9f40, return_value=0x7ffebc5ae480, n_param_values=2, param_values=0x7ffebc5ae4e0, invocation_hint=0x7ffebc5ae460) at ../glib/gobject/gclosure.c:832
> #11 0x00007f105c0c2ea8 in signal_emit_unlocked_R.isra.0
>     (node=<optimized out>, detail=detail@entry=0, instance=instance@entry=0x55bed2024130, emission_return=emission_return@entry=0x7ffebc5ae5f0, instance_and_params=instance_and_params@entry=0x7ffebc5ae4e0)
>     at ../glib/gobject/gsignal.c:3796
> #12 0x00007f105c0b2980 in g_signal_emit_valist (instance=<optimized out>, signal_id=<optimized out>, detail=<optimized out>, var_args=var_args@entry=0x7ffebc5ae6a0) at ../glib/gobject/gsignal.c:3559
> #13 0x00007f105c0b3204 in g_signal_emit (instance=instance@entry=0x55bed2024130, signal_id=<optimized out>, detail=detail@entry=0) at ../glib/gobject/gsignal.c:3606
> #14 0x00007f105c9447f5 in gtk_widget_event_internal.part.0.lto_priv.0 (widget=0x55bed2024130, event=0x55bed1ce6030) at ../gtk/gtk/gtkwidget.c:7812
> #15 0x00007f105c7e20db in propagate_event_up (topmost=<optimized out>, event=<optimized out>, widget=0x55bed2024130) at ../gtk/gtk/gtkmain.c:2588
> #16 propagate_event (widget=widget@entry=0x55bed2024130, event=event@entry=0x55bed1ce6030, captured=captured@entry=0, topmost=topmost@entry=0x0) at ../gtk/gtk/gtkmain.c:2691
> #17 0x00007f105c7e2212 in gtk_propagate_event (widget=widget@entry=0x55bed2024130, event=event@entry=0x55bed1ce6030) at ../gtk/gtk/gtkmain.c:2725
> #18 0x00007f105c7e2fbb in gtk_main_do_event (event=<optimized out>) at ../gtk/gtk/gtkmain.c:1921
> #19 gtk_main_do_event (event=<optimized out>) at ../gtk/gtk/gtkmain.c:1691
> #20 0x00007f105c542cd3 in _gdk_event_emit (event=0x55bed1ce6030) at ../gtk/gdk/gdkevents.c:73
> #21 _gdk_event_emit (event=0x55bed1ce6030) at ../gtk/gdk/gdkevents.c:67
> #22 0x00007f105c576d48 in gdk_event_source_dispatch (base=<optimized out>, callback=<optimized out>, data=<optimized out>) at ../gtk/gdk/wayland/gdkeventsource.c:124
> #23 0x00007f105bf9787b in g_main_dispatch (context=0x55bed0cc5940) at ../glib/glib/gmain.c:3444
> #24 g_main_context_dispatch (context=0x55bed0cc5940) at ../glib/glib/gmain.c:4162
> #25 0x000055bece3feea9 in pgtk_read_socket (terminal=<optimized out>, hold_quit=0x7ffebc5ae9f0) at pgtkterm.c:3839
> #26 pgtk_read_socket (terminal=<optimized out>, hold_quit=0x7ffebc5ae9f0) at pgtkterm.c:3818
> #27 0x000055bece251ae1 in gobble_input () at keyboard.c:7417
> #28 0x000055bece254901 in handle_async_input () at keyboard.c:7648
> #29 process_pending_signals () at keyboard.c:7662
> #30 unblock_input_to (level=0) at keyboard.c:7677
> #31 unblock_input_to (level=<optimized out>) at keyboard.c:7671
> #32 unblock_input () at keyboard.c:7696
> #33 timer_check () at keyboard.c:4742
> #34 0x000055bece254bcd in readable_events (flags=1) at keyboard.c:3524
> #35 0x000055bece25a624 in get_input_pending (flags=1) at keyboard.c:7367
> #36 detect_input_pending_run_timers (do_display=do_display@entry=true) at keyboard.c:10897
> #37 0x000055bece38962f in wait_reading_process_output
>     (time_limit=time_limit@entry=0, nsecs=nsecs@entry=0, read_kbd=read_kbd@entry=-1, do_display=<optimized out>, wait_for_cell=wait_for_cell@entry=0x0, wait_proc=wait_proc@entry=0x0, just_wait_proc=<optimized out>) at process.c:5779
> #38 0x000055bece25271c in kbd_buffer_get_event (end_time=0x0, used_mouse_menu=0x7ffebc5af64b, kbp=<synthetic pointer>) at keyboard.c:4003
> #39 read_event_from_main_queue (used_mouse_menu=0x7ffebc5af64b, local_getcjmp=0x7ffebc5af3c0, end_time=0x0) at keyboard.c:2270
> #40 read_decoded_event_from_main_queue (end_time=0x0, local_getcjmp=0x7ffebc5af3c0, prev_event=0x0, used_mouse_menu=0x7ffebc5af64b) at keyboard.c:2334
> #41 0x000055bece25b904 in read_char (commandflag=1, map=0x55bed51362e3, prev_event=0x0, used_mouse_menu=0x7ffebc5af64b, end_time=0x0) at keyboard.c:2964
> #42 0x000055bece2600b7 in read_key_sequence (keybuf=<optimized out>, prevent_redisplay=false, fix_current_buffer=<optimized out>, can_return_switch_frame=<optimized out>, dont_downcase_last=<optimized out>, prompt=<optimized out>)
>     at keyboard.c:10074
> #43 0x000055bece262141 in command_loop_1 () at keyboard.c:1376
> #44 0x000055bece3055bf in internal_condition_case (bfun=bfun@entry=0x55bece261f70 <command_loop_1>, handlers=handlers@entry=0x90, hfun=hfun@entry=0x55bece248c70 <cmd_error>) at eval.c:1474
> #45 0x000055bece24682f in command_loop_2 (handlers=handlers@entry=0x90) at keyboard.c:1125
> #46 0x000055bece3054e5 in internal_catch (tag=tag@entry=0xfb10, func=func@entry=0x55bece2467f0 <command_loop_2>, arg=arg@entry=0x90) at eval.c:1197
> #47 0x000055bece2467bb in command_loop () at keyboard.c:1103
> #48 0x000055bece24ee1d in recursive_edit_1 () at keyboard.c:712
> #49 0x000055bece24f269 in Frecursive_edit () at keyboard.c:795
> #50 0x000055bece128b15 in main (argc=<optimized out>, argv=0x7ffebc5afc88) at emacs.c:2529

and the stack trace after the longjmp (unwinds all the way to
internal_condition_case):

> #0  0x000055bece305577 in internal_condition_case
>     (bfun=bfun@entry=0x55bece261f70 <command_loop_1>, handlers=handlers@entry=0x90, hfun=hfun@entry=0x55bece248c70 <cmd_error>) at eval.c:1465
> #1  0x000055bece24682f in command_loop_2 (handlers=handlers@entry=0x90) at keyboard.c:1125
> #2  0x000055bece3054e5 in internal_catch
>     (tag=tag@entry=0xfb10, func=func@entry=0x55bece2467f0 <command_loop_2>, arg=arg@entry=0x90) at eval.c:1197
> #3  0x000055bece2467bb in command_loop () at keyboard.c:1103
> #4  0x000055bece24ee1d in recursive_edit_1 () at keyboard.c:712
> #5  0x000055bece24f269 in Frecursive_edit () at keyboard.c:795
> #6  0x000055bece128b15 in main (argc=<optimized out>, argv=0x7ffebc5afc88) at emacs.c:2529





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#60144: 30.0.50; PGTK Emacs crashes after signal
  2022-12-17  3:39 bug#60144: 30.0.50; PGTK Emacs crashes after signal Karl Otness via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-12-18  2:08 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-12-18  5:45   ` Eli Zaretskii
  0 siblings, 1 reply; 20+ messages in thread
From: Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-12-18  2:08 UTC (permalink / raw)
  To: Karl Otness; +Cc: 60144

Karl Otness <karl@karlotness.com> writes:

>> #0  Fsignal (error_symbol=error_symbol@entry=0x2f40, data=0x55bed53eeac3) at eval.c:1681
>> #1  0x000055bece1213bf in xsignal (data=<optimized out>, error_symbol=0x2f40) at emacs/src/lisp.h:4558
>> #2  xsignal1 (error_symbol=error_symbol@entry=0x2f40, arg=arg@entry=0x82) at eval.c:1878
>> #3  0x000055bece1253e3 in get_char_property_and_overlay (position=0x82, prop=0x5a90, object=0x7f105451a265, overlay=0x0) at textprop.c:644
>> #4  0x000055bece156110 in string_buffer_position_lim (string=string@entry=0x55bed52e8b24, from=from@entry=32, to=to@entry=1032, back_p=back_p@entry=false) at xdisp.c:6246
>> #5  0x000055bece1561fa in string_buffer_position (string=0x55bed52e8b24, around_charpos=32) at xdisp.c:6284
>> #6  0x000055bece1aaddb in note_mouse_highlight (f=f@entry=0x55bed1a839e8, x=<optimized out>, y=<optimized out>) at xdisp.c:35339
>> #7  0x000055bece4039ac in note_mouse_movement (event=0x55bed1ce6030, frame=0x55bed1a839e8) at pgtkterm.c:5821
>> #8  motion_notify_event (widget=widget@entry=0x55bed2024130, event=0x55bed1ce6030, user_data=<optimized out>) at pgtkterm.c:5905
>> #9  0x00007f105c684fd8 in _gtk_marshal_BOOLEAN__BOXED (closure=0x55bed1ef9f40, return_value=0x7ffebc5ae480, n_param_values=<optimized out>, param_values=0x7ffebc5ae4e0, invocation_hint=<optimized out>, marshal_data=<optimized out>)
>>     at gtk/gtkmarshalers.c:84
>> #10 0x00007f105c095210 in g_closure_invoke (closure=0x55bed1ef9f40, return_value=0x7ffebc5ae480, n_param_values=2, param_values=0x7ffebc5ae4e0, invocation_hint=0x7ffebc5ae460) at ../glib/gobject/gclosure.c:832
>> #11 0x00007f105c0c2ea8 in signal_emit_unlocked_R.isra.0
>>     (node=<optimized out>, detail=detail@entry=0, instance=instance@entry=0x55bed2024130, emission_return=emission_return@entry=0x7ffebc5ae5f0, instance_and_params=instance_and_params@entry=0x7ffebc5ae4e0)
>>     at ../glib/gobject/gsignal.c:3796
>> #12 0x00007f105c0b2980 in g_signal_emit_valist (instance=<optimized out>, signal_id=<optimized out>, detail=<optimized out>, var_args=var_args@entry=0x7ffebc5ae6a0) at ../glib/gobject/gsignal.c:3559
>> #13 0x00007f105c0b3204 in g_signal_emit (instance=instance@entry=0x55bed2024130, signal_id=<optimized out>, detail=detail@entry=0) at ../glib/gobject/gsignal.c:3606
>> #14 0x00007f105c9447f5 in gtk_widget_event_internal.part.0.lto_priv.0 (widget=0x55bed2024130, event=0x55bed1ce6030) at ../gtk/gtk/gtkwidget.c:7812
>> #15 0x00007f105c7e20db in propagate_event_up (topmost=<optimized out>, event=<optimized out>, widget=0x55bed2024130) at ../gtk/gtk/gtkmain.c:2588
>> #16 propagate_event (widget=widget@entry=0x55bed2024130, event=event@entry=0x55bed1ce6030, captured=captured@entry=0, topmost=topmost@entry=0x0) at ../gtk/gtk/gtkmain.c:2691
>> #17 0x00007f105c7e2212 in gtk_propagate_event (widget=widget@entry=0x55bed2024130, event=event@entry=0x55bed1ce6030) at ../gtk/gtk/gtkmain.c:2725
>> #18 0x00007f105c7e2fbb in gtk_main_do_event (event=<optimized out>) at ../gtk/gtk/gtkmain.c:1921
>> #19 gtk_main_do_event (event=<optimized out>) at ../gtk/gtk/gtkmain.c:1691
>> #20 0x00007f105c542cd3 in _gdk_event_emit (event=0x55bed1ce6030) at ../gtk/gdk/gdkevents.c:73
>> #21 _gdk_event_emit (event=0x55bed1ce6030) at ../gtk/gdk/gdkevents.c:67

Thanks.  This sounds awfully like another bug that was fixed last month.
Would someone please take a look at this? note_mouse_highlight should
never signal.





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#60144: 30.0.50; PGTK Emacs crashes after signal
  2022-12-18  2:08 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-12-18  5:45   ` Eli Zaretskii
  2022-12-18  6:22     ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 20+ messages in thread
From: Eli Zaretskii @ 2022-12-18  5:45 UTC (permalink / raw)
  To: Po Lu; +Cc: 60144, karl

> Cc: 60144@debbugs.gnu.org
> Date: Sun, 18 Dec 2022 10:08:59 +0800
> From:  Po Lu via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
> 
> Karl Otness <karl@karlotness.com> writes:
> 
> >> #0  Fsignal (error_symbol=error_symbol@entry=0x2f40, data=0x55bed53eeac3) at eval.c:1681
> >> #1  0x000055bece1213bf in xsignal (data=<optimized out>, error_symbol=0x2f40) at emacs/src/lisp.h:4558
> >> #2  xsignal1 (error_symbol=error_symbol@entry=0x2f40, arg=arg@entry=0x82) at eval.c:1878
> >> #3  0x000055bece1253e3 in get_char_property_and_overlay (position=0x82, prop=0x5a90, object=0x7f105451a265, overlay=0x0) at textprop.c:644
> >> #4  0x000055bece156110 in string_buffer_position_lim (string=string@entry=0x55bed52e8b24, from=from@entry=32, to=to@entry=1032, back_p=back_p@entry=false) at xdisp.c:6246
> >> #5  0x000055bece1561fa in string_buffer_position (string=0x55bed52e8b24, around_charpos=32) at xdisp.c:6284
> >> #6  0x000055bece1aaddb in note_mouse_highlight (f=f@entry=0x55bed1a839e8, x=<optimized out>, y=<optimized out>) at xdisp.c:35339
> >> #7  0x000055bece4039ac in note_mouse_movement (event=0x55bed1ce6030, frame=0x55bed1a839e8) at pgtkterm.c:5821
> >> #8  motion_notify_event (widget=widget@entry=0x55bed2024130, event=0x55bed1ce6030, user_data=<optimized out>) at pgtkterm.c:5905
> >> #9  0x00007f105c684fd8 in _gtk_marshal_BOOLEAN__BOXED (closure=0x55bed1ef9f40, return_value=0x7ffebc5ae480, n_param_values=<optimized out>, param_values=0x7ffebc5ae4e0, invocation_hint=<optimized out>, marshal_data=<optimized out>)
> >>     at gtk/gtkmarshalers.c:84
> >> #10 0x00007f105c095210 in g_closure_invoke (closure=0x55bed1ef9f40, return_value=0x7ffebc5ae480, n_param_values=2, param_values=0x7ffebc5ae4e0, invocation_hint=0x7ffebc5ae460) at ../glib/gobject/gclosure.c:832
> >> #11 0x00007f105c0c2ea8 in signal_emit_unlocked_R.isra.0
> >>     (node=<optimized out>, detail=detail@entry=0, instance=instance@entry=0x55bed2024130, emission_return=emission_return@entry=0x7ffebc5ae5f0, instance_and_params=instance_and_params@entry=0x7ffebc5ae4e0)
> >>     at ../glib/gobject/gsignal.c:3796
> >> #12 0x00007f105c0b2980 in g_signal_emit_valist (instance=<optimized out>, signal_id=<optimized out>, detail=<optimized out>, var_args=var_args@entry=0x7ffebc5ae6a0) at ../glib/gobject/gsignal.c:3559
> >> #13 0x00007f105c0b3204 in g_signal_emit (instance=instance@entry=0x55bed2024130, signal_id=<optimized out>, detail=detail@entry=0) at ../glib/gobject/gsignal.c:3606
> >> #14 0x00007f105c9447f5 in gtk_widget_event_internal.part.0.lto_priv.0 (widget=0x55bed2024130, event=0x55bed1ce6030) at ../gtk/gtk/gtkwidget.c:7812
> >> #15 0x00007f105c7e20db in propagate_event_up (topmost=<optimized out>, event=<optimized out>, widget=0x55bed2024130) at ../gtk/gtk/gtkmain.c:2588
> >> #16 propagate_event (widget=widget@entry=0x55bed2024130, event=event@entry=0x55bed1ce6030, captured=captured@entry=0, topmost=topmost@entry=0x0) at ../gtk/gtk/gtkmain.c:2691
> >> #17 0x00007f105c7e2212 in gtk_propagate_event (widget=widget@entry=0x55bed2024130, event=event@entry=0x55bed1ce6030) at ../gtk/gtk/gtkmain.c:2725
> >> #18 0x00007f105c7e2fbb in gtk_main_do_event (event=<optimized out>) at ../gtk/gtk/gtkmain.c:1921
> >> #19 gtk_main_do_event (event=<optimized out>) at ../gtk/gtk/gtkmain.c:1691
> >> #20 0x00007f105c542cd3 in _gdk_event_emit (event=0x55bed1ce6030) at ../gtk/gdk/gdkevents.c:73
> >> #21 _gdk_event_emit (event=0x55bed1ce6030) at ../gtk/gdk/gdkevents.c:67
> 
> Thanks.  This sounds awfully like another bug that was fixed last month.
> Would someone please take a look at this? note_mouse_highlight should
> never signal.

You cannot require that from note_mouse_highlight, since it looks up
text and overlay properties, and those can signal an error if the
position is outside the valid/reachable range of buffer positions.

Do you understand why note_mouse_highlight was called in this
scenario?  The backtrace seems strange: why should GTK care about our
mouse highlight?





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#60144: 30.0.50; PGTK Emacs crashes after signal
  2022-12-18  5:45   ` Eli Zaretskii
@ 2022-12-18  6:22     ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-12-18  8:39       ` Eli Zaretskii
  0 siblings, 1 reply; 20+ messages in thread
From: Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-12-18  6:22 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 60144, karl

Eli Zaretskii <eliz@gnu.org> writes:

> You cannot require that from note_mouse_highlight, since it looks up
> text and overlay properties, and those can signal an error if the
> position is outside the valid/reachable range of buffer positions.

How about simply wrapping those calls in
internal_catch_all/internal_condition_case?

> Do you understand why note_mouse_highlight was called in this
> scenario?  The backtrace seems strange: why should GTK care about our
> mouse highlight?

What happens here is that Emacs is reading input through GTK, either
inside xg_select or the read_socket_hook.  GTK then detects some mouse
motion and calls the motion event handler for the frame's widget, which
in turn calls note_mouse_movement.





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#60144: 30.0.50; PGTK Emacs crashes after signal
  2022-12-18  6:22     ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-12-18  8:39       ` Eli Zaretskii
  2022-12-18  9:52         ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 20+ messages in thread
From: Eli Zaretskii @ 2022-12-18  8:39 UTC (permalink / raw)
  To: Po Lu; +Cc: 60144, karl

> From: Po Lu <luangruo@yahoo.com>
> Cc: karl@karlotness.com,  60144@debbugs.gnu.org
> Date: Sun, 18 Dec 2022 14:22:04 +0800
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > You cannot require that from note_mouse_highlight, since it looks up
> > text and overlay properties, and those can signal an error if the
> > position is outside the valid/reachable range of buffer positions.
> 
> How about simply wrapping those calls in
> internal_catch_all/internal_condition_case?

This is too drastic, IMO: it would deprive us of valuable diagnostics
when note_mouse_highlight is called.  Like in this case, for example:
having the code signal an error allowed me to find a real bug.  We
were asking string_buffer_position_lim to check properties and
overlays for positions outside the BEGV..ZV interval, which can never
do anything useful, even if it isn't called in the PGTK context.  I've
now fixed that on the release branch.

In general, note_mouse_highlight should never examine invalid buffer
or string positions.  If it does, it's a bug that needs to be fixed.

> > Do you understand why note_mouse_highlight was called in this
> > scenario?  The backtrace seems strange: why should GTK care about our
> > mouse highlight?
> 
> What happens here is that Emacs is reading input through GTK, either
> inside xg_select or the read_socket_hook.  GTK then detects some mouse
> motion and calls the motion event handler for the frame's widget, which
> in turn calls note_mouse_movement.

Why this fragile architecture of reading input events?  Calling
functions of our Lisp machine from context where those functions
cannot signal an error is very dangerous, and cannot work well in
Emacs.  Why cannot we have the reads through GTK only deliver events
to us, which we enqueue to our own event queue, and then we could
process that queue in the safe context of the Lisp machine, as (AFAIK)
we do on other platforms?





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#60144: 30.0.50; PGTK Emacs crashes after signal
  2022-12-18  8:39       ` Eli Zaretskii
@ 2022-12-18  9:52         ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-12-18 11:43           ` Eli Zaretskii
  0 siblings, 1 reply; 20+ messages in thread
From: Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-12-18  9:52 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 60144, karl

Eli Zaretskii <eliz@gnu.org> writes:

>> > Do you understand why note_mouse_highlight was called in this
>> > scenario?  The backtrace seems strange: why should GTK care about our
>> > mouse highlight?
>> 
>> What happens here is that Emacs is reading input through GTK, either
>> inside xg_select or the read_socket_hook.  GTK then detects some mouse
>> motion and calls the motion event handler for the frame's widget, which
>> in turn calls note_mouse_movement.
>
> Why this fragile architecture of reading input events?  Calling
> functions of our Lisp machine from context where those functions
> cannot signal an error is very dangerous, and cannot work well in
> Emacs.  Why cannot we have the reads through GTK only deliver events
> to us, which we enqueue to our own event queue, and then we could
> process that queue in the safe context of the Lisp machine, as (AFAIK)
> we do on other platforms?

No, signalling there is equally unsafe on the other platforms, where
note_mouse_highlight is called from the same place(s): read_socket_hook,
event_handler_gdk, et cetera.  Just look at the callers of
x_note_mouse_movement in xterm.c, or [EmacsView mouseMoved:] in
nsterm.m.

But I see you already fixed the bug.  Thanks.





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#60144: 30.0.50; PGTK Emacs crashes after signal
  2022-12-18  9:52         ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-12-18 11:43           ` Eli Zaretskii
  2022-12-18 12:12             ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 20+ messages in thread
From: Eli Zaretskii @ 2022-12-18 11:43 UTC (permalink / raw)
  To: Po Lu; +Cc: 60144, karl

> From: Po Lu <luangruo@yahoo.com>
> Cc: karl@karlotness.com,  60144@debbugs.gnu.org
> Date: Sun, 18 Dec 2022 17:52:42 +0800
> 
> > Why this fragile architecture of reading input events?  Calling
> > functions of our Lisp machine from context where those functions
> > cannot signal an error is very dangerous, and cannot work well in
> > Emacs.  Why cannot we have the reads through GTK only deliver events
> > to us, which we enqueue to our own event queue, and then we could
> > process that queue in the safe context of the Lisp machine, as (AFAIK)
> > we do on other platforms?
> 
> No, signalling there is equally unsafe on the other platforms, where
> note_mouse_highlight is called from the same place(s): read_socket_hook,
> event_handler_gdk, et cetera.  Just look at the callers of
> x_note_mouse_movement in xterm.c, or [EmacsView mouseMoved:] in
> nsterm.m.

Sorry, I'm afraid I don't see the danger on other platforms.  Please
explain.  AFAIK, read_socket_hook is called from keyboard.c code which
reads input, and that code has no problem signaling an error.  What am
I missing?





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#60144: 30.0.50; PGTK Emacs crashes after signal
  2022-12-18 11:43           ` Eli Zaretskii
@ 2022-12-18 12:12             ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-12-18 12:33               ` Eli Zaretskii
  0 siblings, 1 reply; 20+ messages in thread
From: Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-12-18 12:12 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 60144, karl

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Po Lu <luangruo@yahoo.com>
>> Cc: karl@karlotness.com,  60144@debbugs.gnu.org
>> Date: Sun, 18 Dec 2022 17:52:42 +0800
>> 
>> > Why this fragile architecture of reading input events?  Calling
>> > functions of our Lisp machine from context where those functions
>> > cannot signal an error is very dangerous, and cannot work well in
>> > Emacs.  Why cannot we have the reads through GTK only deliver events
>> > to us, which we enqueue to our own event queue, and then we could
>> > process that queue in the safe context of the Lisp machine, as (AFAIK)
>> > we do on other platforms?
>> 
>> No, signalling there is equally unsafe on the other platforms, where
>> note_mouse_highlight is called from the same place(s): read_socket_hook,
>> event_handler_gdk, et cetera.  Just look at the callers of
>> x_note_mouse_movement in xterm.c, or [EmacsView mouseMoved:] in
>> nsterm.m.
>
> Sorry, I'm afraid I don't see the danger on other platforms.  Please
> explain.  AFAIK, read_socket_hook is called from keyboard.c code which
> reads input, and that code has no problem signaling an error.  What am
> I missing?

That code has problems signalling errors, unless it is okay for
unblock_input to signal.

On the regular X build with GTK, handle_one_xevent is called from
event_handler_gdk, which is called by GDK when it detects an event.
handle_one_xevent can also be called from x_dispatch_event inside a
popup menu, and during drag-and-drop.  On NS, [EmacsView mouseMoved:] is
called by the system from Objective-C.

Out of all of those places, the only place where it is safe to signal is
inside the drag-and-drop event loop.  Signalling out of the rest will
either lead to catastrophic blowups (if it happens inside
event_handler_gdk or [EmacsView mouseMoved:]), or to grabs never being
released and resource leaks inside a menu.





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#60144: 30.0.50; PGTK Emacs crashes after signal
  2022-12-18 12:12             ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-12-18 12:33               ` Eli Zaretskii
  2022-12-18 13:45                 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 20+ messages in thread
From: Eli Zaretskii @ 2022-12-18 12:33 UTC (permalink / raw)
  To: Po Lu; +Cc: 60144, karl

> From: Po Lu <luangruo@yahoo.com>
> Cc: karl@karlotness.com,  60144@debbugs.gnu.org
> Date: Sun, 18 Dec 2022 20:12:53 +0800
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> No, signalling there is equally unsafe on the other platforms, where
> >> note_mouse_highlight is called from the same place(s): read_socket_hook,
> >> event_handler_gdk, et cetera.  Just look at the callers of
> >> x_note_mouse_movement in xterm.c, or [EmacsView mouseMoved:] in
> >> nsterm.m.
> >
> > Sorry, I'm afraid I don't see the danger on other platforms.  Please
> > explain.  AFAIK, read_socket_hook is called from keyboard.c code which
> > reads input, and that code has no problem signaling an error.  What am
> > I missing?
> 
> That code has problems signalling errors, unless it is okay for
> unblock_input to signal.

I don't understand this part.  Why and how is unblock_input part of
the picture?

> On the regular X build with GTK, handle_one_xevent is called from
> event_handler_gdk, which is called by GDK when it detects an event.
> handle_one_xevent can also be called from x_dispatch_event inside a
> popup menu, and during drag-and-drop.  On NS, [EmacsView mouseMoved:] is
> called by the system from Objective-C.

So in the X/GTK build we have the same problem as with PGTK?  If so,
why not change that as well, to work as I described, i.e. enqueue
events to our own event queue, which we will then read and process in
safe context?

AFAIU, w32 already works like that.  Does it not?

(As for NS, I know it does some very dangerous stuff.)

> Out of all of those places, the only place where it is safe to signal is
> inside the drag-and-drop event loop.  Signalling out of the rest will
> either lead to catastrophic blowups (if it happens inside
> event_handler_gdk or [EmacsView mouseMoved:]), or to grabs never being
> released and resource leaks inside a menu.

Yes, understood.  But it just tells me that we need to change the
architecture so that the events delivered by the window-system are not
processed in callbacks we install to be called by the window-system,
they should be processed in our own safe context.





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#60144: 30.0.50; PGTK Emacs crashes after signal
  2022-12-18 12:33               ` Eli Zaretskii
@ 2022-12-18 13:45                 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-12-18 17:34                   ` Eli Zaretskii
  0 siblings, 1 reply; 20+ messages in thread
From: Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-12-18 13:45 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 60144, karl

Eli Zaretskii <eliz@gnu.org> writes:

>> That code has problems signalling errors, unless it is okay for
>> unblock_input to signal.
>
> I don't understand this part.  Why and how is unblock_input part of
> the picture?

Because unblock_input can call process_pending_signals, and in doing so
handle_async_input, which calls gobble_input (and thus the
read_socket_hook.)  As a result, it is not safe for any read_socket_hook
to signal as long as it is not ok for unblock_input to signal as well.

> So in the X/GTK build we have the same problem as with PGTK?  If so,
> why not change that as well, to work as I described, i.e. enqueue
> events to our own event queue, which we will then read and process in
> safe context?
>
> AFAIU, w32 already works like that.  Does it not?

It doesn't, see how w32_note_mouse_movement is called from
w32_read_socket.

> Yes, understood.  But it just tells me that we need to change the
> architecture so that the events delivered by the window-system are not
> processed in callbacks we install to be called by the window-system,
> they should be processed in our own safe context.

The problem is note_mouse_highlight is simply not supposed to signal.
It is a function called directly while handling async input as far back
as Emacs 19, much like expose_frame.  (IIRC back then there was a
slightly different implementation in each of the *term.c files.)

Moving note_mouse_highlight out of handle_one_xevent would lead to other
bugs, since mouse movement must be processed in order wrt to other X
events.  For example, if an XI_Motion event arrives and is queued, and
then a subsequent XI_Leave event arrives before that event has a chance
to be processed ``in our own safe context'', note_mouse_highlight will
be called after the mouse has left the frame, leading to stuck mouse
highlight.





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#60144: 30.0.50; PGTK Emacs crashes after signal
  2022-12-18 13:45                 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-12-18 17:34                   ` Eli Zaretskii
  2022-12-19  1:56                     ` Po Lu
  0 siblings, 1 reply; 20+ messages in thread
From: Eli Zaretskii @ 2022-12-18 17:34 UTC (permalink / raw)
  To: Po Lu; +Cc: 60144, karl

> From: Po Lu <luangruo@yahoo.com>
> Cc: karl@karlotness.com,  60144@debbugs.gnu.org
> Date: Sun, 18 Dec 2022 21:45:38 +0800
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> That code has problems signalling errors, unless it is okay for
> >> unblock_input to signal.
> >
> > I don't understand this part.  Why and how is unblock_input part of
> > the picture?
> 
> Because unblock_input can call process_pending_signals, and in doing so
> handle_async_input, which calls gobble_input (and thus the
> read_socket_hook.)  As a result, it is not safe for any read_socket_hook
> to signal as long as it is not ok for unblock_input to signal as well.

But AFAIK it is _always_ safe for unblock_input to signal.  When do
you think it isn't?

> > So in the X/GTK build we have the same problem as with PGTK?  If so,
> > why not change that as well, to work as I described, i.e. enqueue
> > events to our own event queue, which we will then read and process in
> > safe context?
> >
> > AFAIU, w32 already works like that.  Does it not?
> 
> It doesn't, see how w32_note_mouse_movement is called from
> w32_read_socket.

??? w32_read_socket runs in the Lisp (a.k.a. "main") thread.  So it is
safe for any code it calls it to signal errors and do anything else
it's safe to do for the Lisp machine.

> > Yes, understood.  But it just tells me that we need to change the
> > architecture so that the events delivered by the window-system are not
> > processed in callbacks we install to be called by the window-system,
> > they should be processed in our own safe context.
> 
> The problem is note_mouse_highlight is simply not supposed to signal.

You cannot request that.  note_mouse_highlight examines properties,
and that can always signal, because properties are Lisp creatures and
can invoke Lisp.

> It is a function called directly while handling async input as far back
> as Emacs 19, much like expose_frame.  (IIRC back then there was a
> slightly different implementation in each of the *term.c files.)

We did a lot of dangerous stuff in Emacs 19, including (oh horror!)
reading input and doing redisplay inside signal handlers.  We
gradually removed all those unsafe parts, and nowadays we only do the
minimum in such contexts.  If unsafe processing of input is still with
us, we should move to safer techniques.  That this unsafe code exists
for such a long time is therefore not a justification for it to
continue existing.

Also, I think this unsafe processing of events only happens with
GTK/PGTK; other X configurations call XTread_socket and
handle_one_xevent from keyboard.c, in a "safe" context.

> Moving note_mouse_highlight out of handle_one_xevent would lead to other
> bugs, since mouse movement must be processed in order wrt to other X
> events.

I didn't say we shouldn't process mouse movements.  I said that this
processing should be limited to generating an Emacs input event and
queuing it, will all the pertinent information for further processing.
For example, note_mouse_highlight does just three things:

  . redisplays portion of the window in a special face
  . changes the way the cursor is drawn
  . shows help-echo

All of that can be done given an input event read by terminals
read_socket_hook inside keyboard.c, provided that the information
about the mouse move is stored in the input event.  There's
absolutely no reason to produce the above 3 effects right where we are
fed the raw X or GTK event from the window-system or the toolkit.

> For example, if an XI_Motion event arrives and is queued, and
> then a subsequent XI_Leave event arrives before that event has a chance
> to be processed ``in our own safe context'', note_mouse_highlight will
> be called after the mouse has left the frame, leading to stuck mouse
> highlight.

AFAIU, these two events will both be queued, and will both be
processed, so there will be not "suck mouse highlight".

So I still don't understand what is it that I'm missing that makes you
say this safe processing of window-system events is impossible.

Anyway, this bug report is not the proper place to discuss this.
Please start a discussion on emacs-devel, and let's pick this up
there.





^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal
  2022-12-18 17:34                   ` Eli Zaretskii
@ 2022-12-19  1:56                     ` Po Lu
  2022-12-19 14:36                       ` Eli Zaretskii
  0 siblings, 1 reply; 20+ messages in thread
From: Po Lu @ 2022-12-19  1:56 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> But AFAIK it is _always_ safe for unblock_input to signal.  When do
> you think it isn't?

In almost all of the helper functions where it is called:
tear_down_x_back_buffer, set_up_x_back_buffer, et cetera.

Otherwise, why does it not call maybe_quit after reading pending events?
If it's safe to signal from unblock_input, then there should be no point
in delaying quitting until the next time maybe_quit is called?

> ??? w32_read_socket runs in the Lisp (a.k.a. "main") thread.  So it is
> safe for any code it calls it to signal errors and do anything else
> it's safe to do for the Lisp machine.

It can run from unblock_input.  Basically any code where it is unsafe to
signal (and there is a lot) calling unblock_input is put at risk.  Think
of all the callers of xmalloc without input blocked.

> You cannot request that.  note_mouse_highlight examines properties,
> and that can always signal, because properties are Lisp creatures and
> can invoke Lisp.

Could you please explain how?

> We gradually removed all those unsafe parts, and nowadays we only do
> the minimum in such contexts.  If unsafe processing of input is still
> with us, we should move to safer techniques.  That this unsafe code
> exists for such a long time is therefore not a justification for it to
> continue existing.
>
> Also, I think this unsafe processing of events only happens with
> GTK/PGTK; other X configurations call XTread_socket and
> handle_one_xevent from keyboard.c, in a "safe" context.

Non-local exits from handle_one_xevent are simply not safe.  That
function is not only called from XTread_socket, it is also called from
x_dispatch_event, which is called from various toolkit menus, including
inside the menu widget's own event loop.

> I didn't say we shouldn't process mouse movements.  I said that this
> processing should be limited to generating an Emacs input event and
> queuing it, will all the pertinent information for further processing.
> For example, note_mouse_highlight does just three things:
>
>   . redisplays portion of the window in a special face
>   . changes the way the cursor is drawn
>   . shows help-echo
>
> All of that can be done given an input event read by terminals
> read_socket_hook inside keyboard.c, provided that the information
> about the mouse move is stored in the input event.  There's
> absolutely no reason to produce the above 3 effects right where we are
> fed the raw X or GTK event from the window-system or the toolkit.

How would working with the mouse highlight work inside a popup menu,
then?  Keyboard events are not being read inside popup_get_selection.

> AFAIU, these two events will both be queued, and will both be
> processed, so there will be not "suck mouse highlight".

No, clear_mouse_face is absolutely safe to call synchronously, from the
likes of x_mouse_leave and the XI_Leave handlers.

In fact, it can be called from places where it is impossible to handle
keyboard events: popup dialogs and menus.

> So I still don't understand what is it that I'm missing that makes you
> say this safe processing of window-system events is impossible.

Just that I'm not confident in that.  I can't explain why but the mouse
highlight code is already tricky now, and something tells me moving it
to keyboard.c will make it even trickier.

And besides, note_mouse_highlight should not signal, regardless of
whether or not doing so is safe.  Otherwise, users' work could be
interrupted by simple mouse movement.

> Anyway, this bug report is not the proper place to discuss this.
> Please start a discussion on emacs-devel, and let's pick this up
> there.

Done.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal
  2022-12-19  1:56                     ` Po Lu
@ 2022-12-19 14:36                       ` Eli Zaretskii
  2022-12-20  1:39                         ` Po Lu
  0 siblings, 1 reply; 20+ messages in thread
From: Eli Zaretskii @ 2022-12-19 14:36 UTC (permalink / raw)
  To: Po Lu; +Cc: emacs-devel

> From: Po Lu <luangruo@yahoo.com>
> Cc: emacs-devel@gnu.org
> Date: Mon, 19 Dec 2022 09:56:40 +0800
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > But AFAIK it is _always_ safe for unblock_input to signal.  When do
> > you think it isn't?
> 
> In almost all of the helper functions where it is called:
> tear_down_x_back_buffer, set_up_x_back_buffer, et cetera.

I don't think I understand why.  Let's take tear_down_x_back_buffer,
for example: why isn't it safe to call unblock_input from it?
tear_down_x_back_buffer is called:

  . from x_set_inhibit_double_buffering, which is a frame parameter
    handler, so it is only called from the Lisp thread
  . from x_free_frame_resources, which AFAIU is only called from the
    Lisp thread, when we delete a frame

So what is the problem here? what am I missing?

> Otherwise, why does it not call maybe_quit after reading pending events?
> If it's safe to signal from unblock_input, then there should be no point
> in delaying quitting until the next time maybe_quit is called?

Sorry, I don't see the relevance.  Even if we don't immediately call
maybe_quit, it doesn't mean we couldn't if we wanted or needed to.  It
isn't evidence that unblock_input cannot signal or is more generally
unsafe to call.

> > ??? w32_read_socket runs in the Lisp (a.k.a. "main") thread.  So it is
> > safe for any code it calls it to signal errors and do anything else
> > it's safe to do for the Lisp machine.
> 
> It can run from unblock_input.

But unblock_input is only called from the Lisp thread, at least in the
w32 build.  And it should only be called from the Lisp thread.  It
cannot be safely called from any other thread.

> Basically any code where it is unsafe to
> signal (and there is a lot) calling unblock_input is put at risk.  Think
> of all the callers of xmalloc without input blocked.

Why would that be the problem?  Did you read the commentary around
line 710 in alloc.c?

> > You cannot request that.  note_mouse_highlight examines properties,
> > and that can always signal, because properties are Lisp creatures and
> > can invoke Lisp.
> 
> Could you please explain how?

Which part is unclear?

> > We gradually removed all those unsafe parts, and nowadays we only do
> > the minimum in such contexts.  If unsafe processing of input is still
> > with us, we should move to safer techniques.  That this unsafe code
> > exists for such a long time is therefore not a justification for it to
> > continue existing.
> >
> > Also, I think this unsafe processing of events only happens with
> > GTK/PGTK; other X configurations call XTread_socket and
> > handle_one_xevent from keyboard.c, in a "safe" context.
> 
> Non-local exits from handle_one_xevent are simply not safe.  That
> function is not only called from XTread_socket, it is also called from
> x_dispatch_event, which is called from various toolkit menus, including
> inside the menu widget's own event loop.

These calls will all have to go, then.  Maybe we should use some
simplified, stripped down version of handle_one_xevent, which doesn't
invoke processing that can access Lisp data and the global state.  Why
would a toolkit menu widget need to call note_mouse_highlight, which
is _our_ function that implements _our_ features, about which GTK (or
any other toolkit) knows absolutely nothing??

The fact that we call handle_one_xevent from places other than the
read_socket_hook is IMO a grave mistake, and we should strive to fix
as many of such mistakes as possible.  Because making these calls safe
is impossible, and by allowing them, perhaps even increasing their
numbers as time goes by, we make Emacs more and more fragile on the
platforms where we do that.

> > I didn't say we shouldn't process mouse movements.  I said that this
> > processing should be limited to generating an Emacs input event and
> > queuing it, will all the pertinent information for further processing.
> > For example, note_mouse_highlight does just three things:
> >
> >   . redisplays portion of the window in a special face
> >   . changes the way the cursor is drawn
> >   . shows help-echo
> >
> > All of that can be done given an input event read by terminals
> > read_socket_hook inside keyboard.c, provided that the information
> > about the mouse move is stored in the input event.  There's
> > absolutely no reason to produce the above 3 effects right where we are
> > fed the raw X or GTK event from the window-system or the toolkit.
> 
> How would working with the mouse highlight work inside a popup menu,
> then?  Keyboard events are not being read inside popup_get_selection.

If it's impossible to do safely, it will not work.  It already doesn't
work in popup menus on w32, for similar reasons.  That's a minor
inconvenience to users, and a much smaller catastrophe than making
these unsafe calls.

> > AFAIU, these two events will both be queued, and will both be
> > processed, so there will be not "suck mouse highlight".
> 
> No, clear_mouse_face is absolutely safe to call synchronously, from the
> likes of x_mouse_leave and the XI_Leave handlers.
> 
> In fact, it can be called from places where it is impossible to handle
> keyboard events: popup dialogs and menus.

I don't see how this is related, sorry.  You asked:

> For example, if an XI_Motion event arrives and is queued, and
> then a subsequent XI_Leave event arrives before that event has a chance
> to be processed ``in our own safe context'', note_mouse_highlight will
> be called after the mouse has left the frame, leading to stuck mouse
> highlight.

And I responded saying that there should be no "mouse stuck" because
both XI_Motion and XI_Leave will be queued and processed in the order
they arrived.  What does this have to do with the safety of calling
clear_mouse_face, or lack thereof?

I'm saying that there's no need to process these events immediately,
when the event calls into the Lisp machine.  We can, and in many cases
already do, queue the event and process it a bit later, and in a
different, safer, context.

> > So I still don't understand what is it that I'm missing that makes you
> > say this safe processing of window-system events is impossible.
> 
> Just that I'm not confident in that.  I can't explain why but the mouse
> highlight code is already tricky now, and something tells me moving it
> to keyboard.c will make it even trickier.

But it's already "in keyboard.c"!  See the part that begins with

  /* Try generating a mouse motion event.  */
  else if (some_mouse_moved ())

which ends with

      if (!NILP (x) && NILP (obj))
	obj = make_lispy_movement (f, bar_window, part, x, y, t);

Then we process these "lispy movements" as input.

> And besides, note_mouse_highlight should not signal, regardless of
> whether or not doing so is safe.

Not normally, no.  But if some Lisp program sets bad text properties
or overlays, it could signal.  We can never assume in Emacs that some
Lisp machine code never signals.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal
  2022-12-19 14:36                       ` Eli Zaretskii
@ 2022-12-20  1:39                         ` Po Lu
  2022-12-20 15:16                           ` Eli Zaretskii
  0 siblings, 1 reply; 20+ messages in thread
From: Po Lu @ 2022-12-20  1:39 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Po Lu <luangruo@yahoo.com>
>> Cc: emacs-devel@gnu.org
>> Date: Mon, 19 Dec 2022 09:56:40 +0800
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> > But AFAIK it is _always_ safe for unblock_input to signal.  When do
>> > you think it isn't?
>> 
>> In almost all of the helper functions where it is called:
>> tear_down_x_back_buffer, set_up_x_back_buffer, et cetera.
>
> I don't think I understand why.  Let's take tear_down_x_back_buffer,
> for example: why isn't it safe to call unblock_input from it?
> tear_down_x_back_buffer is called:
>
>   . from x_set_inhibit_double_buffering, which is a frame parameter
>     handler, so it is only called from the Lisp thread
>   . from x_free_frame_resources, which AFAIU is only called from the
>     Lisp thread, when we delete a frame
>
> So what is the problem here? what am I missing?

If unblock_input signals inside, then the double buffering state could
be left inconsistent.  For example, the back buffer picture could be
freed, but the back buffer itself may be left present, which can lead to
crashes later on.  That it is called from the only thread present does
not make it safe.

More generally, C code does not expect unblock_input to signal!  See
this piece of call_process in callproc.c:

	{
	  if (!specpdl_ref_valid_p (tempfile_index))
	    record_deleted_pid (pid, Qnil);
	  else
	    {
	      eassert (1 < nargs);
	      record_deleted_pid (pid, args[1]);
	      clear_unwind_protect (tempfile_index);
	    }
	  synch_process_pid = 0;
	}
    }

  unblock_child_signal (&oldset);
  unblock_input (); <=======

  if (pid < 0)
    report_file_errno (CHILD_SETUP_ERROR_DESC, Qnil, child_errno);

  /* Close our file descriptors, except for callproc_fd[CALLPROC_PIPEREAD]
     since we will use that to read input from.  */
  for (i = 0; i < CALLPROC_FDS; i++)
    if (i != CALLPROC_PIPEREAD && 0 <= callproc_fd[i])
      {
	emacs_close (callproc_fd[i]);
	callproc_fd[i] = -1;
      }
  emacs_close (filefd);
  clear_unwind_protect (specpdl_ref_add (count, -1));

If unblock_input signals, filefd will not be closed.

Or this piece of x_create_bitmap_mask in image.c:

  pixmap = image_bitmap_pixmap (f, id);
  width = x_bitmap_width (f, id);
  height = x_bitmap_height (f, id);

  block_input ();
  ximg = XGetImage (FRAME_X_DISPLAY (f), pixmap, 0, 0, width, height,
		    ~0, ZPixmap);

  if (!ximg)
    {
      unblock_input ();
      return;
    }

  result = x_create_x_image_and_pixmap (f, width, height, 1, &mask_img, &mask);

  unblock_input (); <======================
  if (!result)
    {
      XDestroyImage (ximg);
      return;
    }

If unblock_input signals, then ximg (which is potentially a lot of data)
leaks.

> But unblock_input is only called from the Lisp thread, at least in the
> w32 build.  And it should only be called from the Lisp thread.  It
> cannot be safely called from any other thread.

Being called from the Lisp thread does not make it safe to signal!
Consider the following example code:

  int fd;
  char *buffer;

  fd = emacs_open (...);

  if (fd == -1)
    return;

  buffer = xmalloc (100);

  block_input ();
  do_something_with_buffer_and_fd (fd, buffer);
  unblock_input ();

  /* more activity.  */
  close (fd);
  xfree (buffer);

Notice how fd and bufferare not closed if unblock_input signals.  Now
look at ftcrfont_open:

  block_input ();

  pat = ftfont_entity_pattern (entity, pixel_size);
  FcConfigSubstitute (NULL, pat, FcMatchPattern);
  FcDefaultSubstitute (pat);
  match = FcFontMatch (NULL, pat, &result);
  ftfont_fix_match (pat, match);

  FcPatternDestroy (pat);
  font_face = cairo_ft_font_face_create_for_pattern (match);
  if (!font_face
      || cairo_font_face_status (font_face) != CAIRO_STATUS_SUCCESS)
    {
      unblock_input ();
      FcPatternDestroy (match);
      return Qnil;
    }
  cairo_matrix_t font_matrix, ctm;
  cairo_matrix_init_scale (&font_matrix, pixel_size, pixel_size);
  cairo_matrix_init_identity (&ctm);

#ifdef HAVE_PGTK
  cairo_font_options_t *options = xsettings_get_font_options ();
#else
  cairo_font_options_t *options = cairo_font_options_create ();
#endif
#ifdef USE_BE_CAIRO
  if (be_use_subpixel_antialiasing ())
    cairo_font_options_set_antialias (options, CAIRO_ANTIALIAS_SUBPIXEL);
#endif
  cairo_scaled_font_t *scaled_font
    = cairo_scaled_font_create (font_face, &font_matrix, &ctm, options);
  cairo_font_face_destroy (font_face);
  cairo_font_options_destroy (options);
  unblock_input ();

If the last unblock_input signals, match (allocated by FcFontMatch)
leaks.

>> > You cannot request that.  note_mouse_highlight examines properties,
>> > and that can always signal, because properties are Lisp creatures and
>> > can invoke Lisp.
>> 
>> Could you please explain how?
>
> Which part is unclear?

Which properties involve Lisp execution, and why the evaluation of that
Lisp cannot be put inside internal_catch_all.

> These calls will all have to go, then.  Maybe we should use some
> simplified, stripped down version of handle_one_xevent, which doesn't
> invoke processing that can access Lisp data and the global state.  Why
> would a toolkit menu widget need to call note_mouse_highlight, which
> is _our_ function that implements _our_ features, about which GTK (or
> any other toolkit) knows absolutely nothing??

The toolkit menu mostly demands that Emacs set the cursor to the
appropriate cursor for whatever the pointer is under.  The X server can
also demand Emacs expose parts of the frame at any time, due to window
configuration changes.

> If it's impossible to do safely, it will not work.  It already doesn't
> work in popup menus on w32, for similar reasons.  That's a minor
> inconvenience to users, and a much smaller catastrophe than making
> these unsafe calls.

But it worked since the beginning of 2006 or so, when xmenu.c was made
to call handle_one_xevent.  Anyway, the "minor inconvenience" can be
avoided by simply wrapping note_mouse_highlight in internal_catch_all.
Why do you think that would be too harsh?

> I don't see how this is related, sorry.  You asked:
>
>> For example, if an XI_Motion event arrives and is queued, and
>> then a subsequent XI_Leave event arrives before that event has a chance
>> to be processed ``in our own safe context'', note_mouse_highlight will
>> be called after the mouse has left the frame, leading to stuck mouse
>> highlight.
>
> And I responded saying that there should be no "mouse stuck" because
> both XI_Motion and XI_Leave will be queued and processed in the order
> they arrived.  What does this have to do with the safety of calling
> clear_mouse_face, or lack thereof?
>
> I'm saying that there's no need to process these events immediately,
> when the event calls into the Lisp machine.  We can, and in many cases
> already do, queue the event and process it a bit later, and in a
> different, safer, context.

But then as a result the mouse face will not be dismissed when entering
a menu.  And we will also lose the valuable development feature of using
mouse-high to determine how wedged a stuck Emacs really is.

> But it's already "in keyboard.c"!  See the part that begins with
>
>   /* Try generating a mouse motion event.  */
>   else if (some_mouse_moved ())
>
> which ends with
>
>       if (!NILP (x) && NILP (obj))
> 	obj = make_lispy_movement (f, bar_window, part, x, y, t);
>
> Then we process these "lispy movements" as input.

That isn't related to mouse-highlight, and only happens when track-mouse
is non-nil.  track-mouse is nil because it requires calling
XTmouse_position on all mouse movement, which is a very expensive
operation (with 15 km between me and the X server, it takes 90
milliseconds!)

> Not normally, no.  But if some Lisp program sets bad text properties
> or overlays, it could signal.  We can never assume in Emacs that some
> Lisp machine code never signals.

Then the simple solution would be to catch those signals inside
note_mouse_highlight.  What could be the problem with that?  We could
even log the signals to *Messages*, like with-demoted-errors.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal
  2022-12-20  1:39                         ` Po Lu
@ 2022-12-20 15:16                           ` Eli Zaretskii
  2022-12-21  1:01                             ` Po Lu
  0 siblings, 1 reply; 20+ messages in thread
From: Eli Zaretskii @ 2022-12-20 15:16 UTC (permalink / raw)
  To: Po Lu; +Cc: emacs-devel

> From: Po Lu <luangruo@yahoo.com>
> Cc: emacs-devel@gnu.org
> Date: Tue, 20 Dec 2022 09:39:13 +0800
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> From: Po Lu <luangruo@yahoo.com>
> >> Cc: emacs-devel@gnu.org
> >> Date: Mon, 19 Dec 2022 09:56:40 +0800
> >> 
> >> Eli Zaretskii <eliz@gnu.org> writes:
> >> 
> >> > But AFAIK it is _always_ safe for unblock_input to signal.  When do
> >> > you think it isn't?
> >> 
> >> In almost all of the helper functions where it is called:
> >> tear_down_x_back_buffer, set_up_x_back_buffer, et cetera.
> >
> > I don't think I understand why.  Let's take tear_down_x_back_buffer,
> > for example: why isn't it safe to call unblock_input from it?
> > tear_down_x_back_buffer is called:
> >
> >   . from x_set_inhibit_double_buffering, which is a frame parameter
> >     handler, so it is only called from the Lisp thread
> >   . from x_free_frame_resources, which AFAIU is only called from the
> >     Lisp thread, when we delete a frame
> >
> > So what is the problem here? what am I missing?
> 
> If unblock_input signals inside, then the double buffering state could
> be left inconsistent.  For example, the back buffer picture could be
> freed, but the back buffer itself may be left present, which can lead to
> crashes later on.  That it is called from the only thread present does
> not make it safe.
> 
> More generally, C code does not expect unblock_input to signal!

That is definitely incorrect, not in the general form you are
presenting it.  unblock_input processes input events, and those can
legitimately signal errors.  Other places in C call other functions
that can signal.  C code which cannot allow control to escape it via
non-local exits must use the record_unwind_protect machinery to set up
suitable unwinders.

I very much hope that all your work on new X features and in
particular on the new input events and XInput2 was not based on the
above wrong assumption -- that unblock_input can never signal (or
should never signal).

But we moved far away from the original issue, and are in fact
discussing a very far (and much less important, from my POV) tangent.
Let's not lose our focus, okay?  The original issue was whether we can
have code which calls handle_one_xevent and other similar functions,
which can call unblock_input and thus can signal an error, from
threads other than the main thread, or from callbacks we install that
are then called by toolkits from a non-main thread.  By contrast, the
past several messages were discussing whether unblock_input is allowed
to signal even if called from the main thread.  And that is an
entirely different issue.

It is a different issue because my point is simple: calls to
handle_one_xevent etc. from non-main threads _cannot_ be made safe in
Emacs, no matter what you do.  So it's an absolute no-no.  By
contrast, if we call unblock_input from the main thread, and the code
which calls it doesn't expect signals, all we need is either to
restructure the caller, or use record_unwind_protect.  Thus, even if
you are absolutely right about all of the cases you show below -- and
I'm not convinced you are right in all of them, but even if you are --
even if all of them are indeed unsafe in the face of Lisp errors, all
it means that we must examine them one by one and make them safe by
one of the above mentioned techniques.

So I would like to return to the main issue: calls to
handle_one_xevent from non-main threads.  That must not happen in
Emacs, or else the corresponding build will be fragile and can crash
given the opportune conditions.  If you agree with that, let's discuss
how to make those configurations safer.  If you don't agree, please
explain why.

> Being called from the Lisp thread does not make it safe to signal!

Not by itself, it doesn't.  But it definitely makes it fixable, so
that it _can_ be safe.  By contrast, calling the Lisp machine from
another thread _cannot_ be fixed.  It's a time bomb that will
definitely go off.

> If the last unblock_input signals, match (allocated by FcFontMatch)
> leaks.

Yes, code which doesn't expect errors might leak descriptors or memory
or other resources.  But resource leaks are very rarely fatal, unless
you leak a lot of them: modern machines have enough memory and file
descriptors to let us leak them for many days before disaster
strikes.  By contrast, a single error signaled from the wrong thread
will immediately crash Emacs because it longjmp's to the wrong stack!

> >> > You cannot request that.  note_mouse_highlight examines properties,
> >> > and that can always signal, because properties are Lisp creatures and
> >> > can invoke Lisp.
> >> 
> >> Could you please explain how?
> >
> > Which part is unclear?
> 
> Which properties involve Lisp execution, and why the evaluation of that
> Lisp cannot be put inside internal_catch_all.

You've just seen that: we examine text properties and overlays in a
buffer whose restriction could have changed behind our backs, what
with today's proliferation of calls into Lisp from very deep and
low-level functions.  We call Lisp in keyboard.c and in xdisp.c and in
insdel.c and in window.c.

> > These calls will all have to go, then.  Maybe we should use some
> > simplified, stripped down version of handle_one_xevent, which doesn't
> > invoke processing that can access Lisp data and the global state.  Why
> > would a toolkit menu widget need to call note_mouse_highlight, which
> > is _our_ function that implements _our_ features, about which GTK (or
> > any other toolkit) knows absolutely nothing??
> 
> The toolkit menu mostly demands that Emacs set the cursor to the
> appropriate cursor for whatever the pointer is under.

I cannot parse this: "set the cursor to the appropriate cursor"?

> The X server can also demand Emacs expose parts of the frame at any
> time, due to window configuration changes.

We already handle this, by setting a flag in the SIGIO handler, and
then calling expose_frame when it is safe.  What are the problems you
see here if we queue these events instead of processing them
immediately?

> > If it's impossible to do safely, it will not work.  It already doesn't
> > work in popup menus on w32, for similar reasons.  That's a minor
> > inconvenience to users, and a much smaller catastrophe than making
> > these unsafe calls.
> 
> But it worked since the beginning of 2006 or so, when xmenu.c was made
> to call handle_one_xevent.

"Worked" in the sense that we maybe didn't hear about problems, or
heard too few of them?  That doesn't mean the problems don't exist,
just that people don't come immediately running here and complain.

Besides, PGTK is very young, definitely not around since 2006.  Give
it some time.

> Anyway, the "minor inconvenience" can be avoided by simply wrapping
> note_mouse_highlight in internal_catch_all.  Why do you think that
> would be too harsh?

You cannot use that machinery reliably in a non-main thread anyway.
For starters, you have no idea what kind of stack do those threads
get.  They aren't our threads, so we don't know anything about them.
Using internal_catch_all would mean we'd need to run unwinders, and
there's no guarantee they could be run on those other threads.  For
example, they cannot safely access the state of the Lisp machine,
because they run on other threads asynchronically.

Besides, note_mouse_highlight is called from many places that don't
need any internal_catch_all.  If some caller cannot allow non-local
exits, it should call internal_catch_all by itself.  But that's
another tangent.

> > And I responded saying that there should be no "mouse stuck" because
> > both XI_Motion and XI_Leave will be queued and processed in the order
> > they arrived.  What does this have to do with the safety of calling
> > clear_mouse_face, or lack thereof?
> >
> > I'm saying that there's no need to process these events immediately,
> > when the event calls into the Lisp machine.  We can, and in many cases
> > already do, queue the event and process it a bit later, and in a
> > different, safer, context.
> 
> But then as a result the mouse face will not be dismissed when entering
> a menu.

Again, I don't understand: we don't display mouse-face on menus, only
on buffer text, on the mode line, and on the tool bar (if we implement
it in Emacs, not use the toolkit's tool bar).

> And we will also lose the valuable development feature of using
> mouse-high to determine how wedged a stuck Emacs really is.

Now you lost me.  What is that feature, and how is it related?

> > But it's already "in keyboard.c"!  See the part that begins with
> >
> >   /* Try generating a mouse motion event.  */
> >   else if (some_mouse_moved ())
> >
> > which ends with
> >
> >       if (!NILP (x) && NILP (obj))
> > 	obj = make_lispy_movement (f, bar_window, part, x, y, t);
> >
> > Then we process these "lispy movements" as input.
> 
> That isn't related to mouse-highlight, and only happens when track-mouse
> is non-nil.

Those are details.  My point is that we do process _some_ mouse events
in the main loop, and it works.

> > Not normally, no.  But if some Lisp program sets bad text properties
> > or overlays, it could signal.  We can never assume in Emacs that some
> > Lisp machine code never signals.
> 
> Then the simple solution would be to catch those signals inside
> note_mouse_highlight.  What could be the problem with that?

It's wrong, that's the problem.

And again, this is a tangent.  The main issue is that this code cannot
be called from a non-main thread, period.  Let's focus on that and
solve those problems first.  You made me very worried by telling we do
this nonchalantly and for a long time.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal
  2022-12-20 15:16                           ` Eli Zaretskii
@ 2022-12-21  1:01                             ` Po Lu
  2022-12-21 13:06                               ` Eli Zaretskii
  0 siblings, 1 reply; 20+ messages in thread
From: Po Lu @ 2022-12-21  1:01 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> That is definitely incorrect, not in the general form you are
> presenting it.  unblock_input processes input events, and those can
> legitimately signal errors.  Other places in C call other functions
> that can signal.  C code which cannot allow control to escape it via
> non-local exits must use the record_unwind_protect machinery to set up
> suitable unwinders.
>
> I very much hope that all your work on new X features and in
> particular on the new input events and XInput2 was not based on the
> above wrong assumption -- that unblock_input can never signal (or
> should never signal).
>
> But we moved far away from the original issue, and are in fact
> discussing a very far (and much less important, from my POV) tangent.
> Let's not lose our focus, okay?  The original issue was whether we can
> have code which calls handle_one_xevent and other similar functions,
> which can call unblock_input and thus can signal an error, from
> threads other than the main thread, or from callbacks we install that
> are then called by toolkits from a non-main thread.  By contrast, the
> past several messages were discussing whether unblock_input is allowed
> to signal even if called from the main thread.  And that is an
> entirely different issue.
>
> It is a different issue because my point is simple: calls to
> handle_one_xevent etc. from non-main threads _cannot_ be made safe in
> Emacs, no matter what you do.  So it's an absolute no-no.  By
> contrast, if we call unblock_input from the main thread, and the code
> which calls it doesn't expect signals, all we need is either to
> restructure the caller, or use record_unwind_protect.  Thus, even if
> you are absolutely right about all of the cases you show below -- and
> I'm not convinced you are right in all of them, but even if you are --
> even if all of them are indeed unsafe in the face of Lisp errors, all
> it means that we must examine them one by one and make them safe by
> one of the above mentioned techniques.
>
> So I would like to return to the main issue: calls to
> handle_one_xevent from non-main threads.  That must not happen in
> Emacs, or else the corresponding build will be fragile and can crash
> given the opportune conditions.  If you agree with that, let's discuss
> how to make those configurations safer.  If you don't agree, please
> explain why.
>
>> Being called from the Lisp thread does not make it safe to signal!
>
> Not by itself, it doesn't.  But it definitely makes it fixable, so
> that it _can_ be safe.  By contrast, calling the Lisp machine from
> another thread _cannot_ be fixed.  It's a time bomb that will
> definitely go off.
>
>> If the last unblock_input signals, match (allocated by FcFontMatch)
>> leaks.
>
> Yes, code which doesn't expect errors might leak descriptors or memory
> or other resources.  But resource leaks are very rarely fatal, unless
> you leak a lot of them: modern machines have enough memory and file
> descriptors to let us leak them for many days before disaster
> strikes.  By contrast, a single error signaled from the wrong thread
> will immediately crash Emacs because it longjmp's to the wrong stack!
>
>> >> > You cannot request that.  note_mouse_highlight examines properties,
>> >> > and that can always signal, because properties are Lisp creatures and
>> >> > can invoke Lisp.
>> >> 
>> >> Could you please explain how?
>> >
>> > Which part is unclear?
>> 
>> Which properties involve Lisp execution, and why the evaluation of that
>> Lisp cannot be put inside internal_catch_all.
>
> You've just seen that: we examine text properties and overlays in a
> buffer whose restriction could have changed behind our backs, what
> with today's proliferation of calls into Lisp from very deep and
> low-level functions.  We call Lisp in keyboard.c and in xdisp.c and in
> insdel.c and in window.c.
>
>> > These calls will all have to go, then.  Maybe we should use some
>> > simplified, stripped down version of handle_one_xevent, which doesn't
>> > invoke processing that can access Lisp data and the global state.  Why
>> > would a toolkit menu widget need to call note_mouse_highlight, which
>> > is _our_ function that implements _our_ features, about which GTK (or
>> > any other toolkit) knows absolutely nothing??
>> 
>> The toolkit menu mostly demands that Emacs set the cursor to the
>> appropriate cursor for whatever the pointer is under.
>
> I cannot parse this: "set the cursor to the appropriate cursor"?
>
>> The X server can also demand Emacs expose parts of the frame at any
>> time, due to window configuration changes.
>
> We already handle this, by setting a flag in the SIGIO handler, and
> then calling expose_frame when it is safe.  What are the problems you
> see here if we queue these events instead of processing them
> immediately?
>
>> > If it's impossible to do safely, it will not work.  It already doesn't
>> > work in popup menus on w32, for similar reasons.  That's a minor
>> > inconvenience to users, and a much smaller catastrophe than making
>> > these unsafe calls.
>> 
>> But it worked since the beginning of 2006 or so, when xmenu.c was made
>> to call handle_one_xevent.
>
> "Worked" in the sense that we maybe didn't hear about problems, or
> heard too few of them?  That doesn't mean the problems don't exist,
> just that people don't come immediately running here and complain.
>
> Besides, PGTK is very young, definitely not around since 2006.  Give
> it some time.
>
>> Anyway, the "minor inconvenience" can be avoided by simply wrapping
>> note_mouse_highlight in internal_catch_all.  Why do you think that
>> would be too harsh?
>
> You cannot use that machinery reliably in a non-main thread anyway.
> For starters, you have no idea what kind of stack do those threads
> get.  They aren't our threads, so we don't know anything about them.
> Using internal_catch_all would mean we'd need to run unwinders, and
> there's no guarantee they could be run on those other threads.  For
> example, they cannot safely access the state of the Lisp machine,
> because they run on other threads asynchronically.
>
> Besides, note_mouse_highlight is called from many places that don't
> need any internal_catch_all.  If some caller cannot allow non-local
> exits, it should call internal_catch_all by itself.  But that's
> another tangent.
>
>> > And I responded saying that there should be no "mouse stuck" because
>> > both XI_Motion and XI_Leave will be queued and processed in the order
>> > they arrived.  What does this have to do with the safety of calling
>> > clear_mouse_face, or lack thereof?
>> >
>> > I'm saying that there's no need to process these events immediately,
>> > when the event calls into the Lisp machine.  We can, and in many cases
>> > already do, queue the event and process it a bit later, and in a
>> > different, safer, context.
>> 
>> But then as a result the mouse face will not be dismissed when entering
>> a menu.
>
> Again, I don't understand: we don't display mouse-face on menus, only
> on buffer text, on the mode line, and on the tool bar (if we implement
> it in Emacs, not use the toolkit's tool bar).
>
>> And we will also lose the valuable development feature of using
>> mouse-high to determine how wedged a stuck Emacs really is.
>
> Now you lost me.  What is that feature, and how is it related?
>
>> > But it's already "in keyboard.c"!  See the part that begins with
>> >
>> >   /* Try generating a mouse motion event.  */
>> >   else if (some_mouse_moved ())
>> >
>> > which ends with
>> >
>> >       if (!NILP (x) && NILP (obj))
>> > 	obj = make_lispy_movement (f, bar_window, part, x, y, t);
>> >
>> > Then we process these "lispy movements" as input.
>> 
>> That isn't related to mouse-highlight, and only happens when track-mouse
>> is non-nil.
>
> Those are details.  My point is that we do process _some_ mouse events
> in the main loop, and it works.
>
>> > Not normally, no.  But if some Lisp program sets bad text properties
>> > or overlays, it could signal.  We can never assume in Emacs that some
>> > Lisp machine code never signals.
>> 
>> Then the simple solution would be to catch those signals inside
>> note_mouse_highlight.  What could be the problem with that?
>
> It's wrong, that's the problem.
>
> And again, this is a tangent.  The main issue is that this code cannot
> be called from a non-main thread, period.  Let's focus on that and
> solve those problems first.  You made me very worried by telling we do
> this nonchalantly and for a long time.

There is some kind of misunderstanding here.  You seem to be talking
about calling Lisp from another thread, which is a big no-no in my book
as well.  However, the problem is nowhere near as drastic.

Under the GTK builds, there is only a single thread.  The event loop
runs from the main thread.  Those calls to note_mouse_highlight *are*
being done from the main thread.  The problem is that it is unsafe to
signal in the main thread when handle_one_xevent is being called by
GLib, because GLib does the equivalent of this:

  static bool inside_callback;

  assert (!inside_callback);
  inside_callback = true;
  [call handle_one_xevent]
  inside_callback = false;

If handle_one_xevent signals, then inside_callback will never be set to
false.  As a result, the next time GLib enters its own event dispatch
code, it will abort, leading to the crash seen here.

No second thread is involved!



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal
  2022-12-21  1:01                             ` Po Lu
@ 2022-12-21 13:06                               ` Eli Zaretskii
  2022-12-22  1:28                                 ` Po Lu
  0 siblings, 1 reply; 20+ messages in thread
From: Eli Zaretskii @ 2022-12-21 13:06 UTC (permalink / raw)
  To: Po Lu; +Cc: emacs-devel

> From: Po Lu <luangruo@yahoo.com>
> Cc: emacs-devel@gnu.org
> Date: Wed, 21 Dec 2022 09:01:01 +0800
> 
> There is some kind of misunderstanding here.  You seem to be talking
> about calling Lisp from another thread, which is a big no-no in my book
> as well.  However, the problem is nowhere near as drastic.
> 
> Under the GTK builds, there is only a single thread.  The event loop
> runs from the main thread.  Those calls to note_mouse_highlight *are*
> being done from the main thread.  The problem is that it is unsafe to
> signal in the main thread when handle_one_xevent is being called by
> GLib, because GLib does the equivalent of this:
> 
>   static bool inside_callback;
> 
>   assert (!inside_callback);
>   inside_callback = true;
>   [call handle_one_xevent]
>   inside_callback = false;
> 
> If handle_one_xevent signals, then inside_callback will never be set to
> false.  As a result, the next time GLib enters its own event dispatch
> code, it will abort, leading to the crash seen here.

OK, so then I ask again: why not have the function called by GTK just
enqueue events and return, and then let read_socket_hook read the
enqueued events and process them as they are read, in keyboard.c?



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal
  2022-12-21 13:06                               ` Eli Zaretskii
@ 2022-12-22  1:28                                 ` Po Lu
  2022-12-22  9:04                                   ` Eli Zaretskii
  0 siblings, 1 reply; 20+ messages in thread
From: Po Lu @ 2022-12-22  1:28 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Po Lu <luangruo@yahoo.com>
>> Cc: emacs-devel@gnu.org
>> Date: Wed, 21 Dec 2022 09:01:01 +0800
>> 
>> There is some kind of misunderstanding here.  You seem to be talking
>> about calling Lisp from another thread, which is a big no-no in my book
>> as well.  However, the problem is nowhere near as drastic.
>> 
>> Under the GTK builds, there is only a single thread.  The event loop
>> runs from the main thread.  Those calls to note_mouse_highlight *are*
>> being done from the main thread.  The problem is that it is unsafe to
>> signal in the main thread when handle_one_xevent is being called by
>> GLib, because GLib does the equivalent of this:
>> 
>>   static bool inside_callback;
>> 
>>   assert (!inside_callback);
>>   inside_callback = true;
>>   [call handle_one_xevent]
>>   inside_callback = false;
>> 
>> If handle_one_xevent signals, then inside_callback will never be set to
>> false.  As a result, the next time GLib enters its own event dispatch
>> code, it will abort, leading to the crash seen here.
>
> OK, so then I ask again: why not have the function called by GTK just
> enqueue events and return, and then let read_socket_hook read the
> enqueued events and process them as they are read, in keyboard.c?

After the function returns, it is not guaranteed that Emacs will be able
to enter read_socket_hook soon.

(Sorry if this mail reaches you multiple times: the mail server is
having issues.)



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal
  2022-12-22  1:28                                 ` Po Lu
@ 2022-12-22  9:04                                   ` Eli Zaretskii
  2022-12-22  9:57                                     ` Po Lu
  0 siblings, 1 reply; 20+ messages in thread
From: Eli Zaretskii @ 2022-12-22  9:04 UTC (permalink / raw)
  To: Po Lu; +Cc: emacs-devel

> From: Po Lu <luangruo@yahoo.com>
> Cc: emacs-devel@gnu.org
> Date: Thu, 22 Dec 2022 09:28:58 +0800
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> Under the GTK builds, there is only a single thread.  The event loop
> >> runs from the main thread.  Those calls to note_mouse_highlight *are*
> >> being done from the main thread.  The problem is that it is unsafe to
> >> signal in the main thread when handle_one_xevent is being called by
> >> GLib, because GLib does the equivalent of this:
> >> 
> >>   static bool inside_callback;
> >> 
> >>   assert (!inside_callback);
> >>   inside_callback = true;
> >>   [call handle_one_xevent]
> >>   inside_callback = false;
> >> 
> >> If handle_one_xevent signals, then inside_callback will never be set to
> >> false.  As a result, the next time GLib enters its own event dispatch
> >> code, it will abort, leading to the crash seen here.
> >
> > OK, so then I ask again: why not have the function called by GTK just
> > enqueue events and return, and then let read_socket_hook read the
> > enqueued events and process them as they are read, in keyboard.c?
> 
> After the function returns, it is not guaranteed that Emacs will be able
> to enter read_socket_hook soon.

Why is that a problem?  If the event we queue includes all the data
needed for processing it, why does it matter how soon it is processed?

And what could delay the processing significantly in this scenario?
Since these callbacks are called from the main thread, it means Emacs
is not busy, and should return to the main loop very soon.  Right?



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal
  2022-12-22  9:04                                   ` Eli Zaretskii
@ 2022-12-22  9:57                                     ` Po Lu
  0 siblings, 0 replies; 20+ messages in thread
From: Po Lu @ 2022-12-22  9:57 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> Why is that a problem?  If the event we queue includes all the data
> needed for processing it, why does it matter how soon it is processed?

GLib might be in the middle of doing something itself, such as waiting
for file I/O, when it calls handle_one_xevent.

> And what could delay the processing significantly in this scenario?
> Since these callbacks are called from the main thread, it means Emacs
> is not busy, and should return to the main loop very soon.  Right?

Right, for some definition of "soon".

I tried moving the processing to pgtk_read_socket.  But on the PGTK
build, GTK apparently expects changes to the mouse pointer to be applied
inside the call to the motion handler function, so now the cursor image
is not updated during mouse motion.

And for whatever reason, gdk_display_flush leads to a deadlock somewhere
in GDK.  I will investigate that later.



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-12-22  9:57 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-17  3:39 bug#60144: 30.0.50; PGTK Emacs crashes after signal Karl Otness via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-12-18  2:08 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-12-18  5:45   ` Eli Zaretskii
2022-12-18  6:22     ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-12-18  8:39       ` Eli Zaretskii
2022-12-18  9:52         ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-12-18 11:43           ` Eli Zaretskii
2022-12-18 12:12             ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-12-18 12:33               ` Eli Zaretskii
2022-12-18 13:45                 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-12-18 17:34                   ` Eli Zaretskii
2022-12-19  1:56                     ` Po Lu
2022-12-19 14:36                       ` Eli Zaretskii
2022-12-20  1:39                         ` Po Lu
2022-12-20 15:16                           ` Eli Zaretskii
2022-12-21  1:01                             ` Po Lu
2022-12-21 13:06                               ` Eli Zaretskii
2022-12-22  1:28                                 ` Po Lu
2022-12-22  9:04                                   ` Eli Zaretskii
2022-12-22  9:57                                     ` Po Lu

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.