unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
@ 2015-08-30 12:51 Pip Cet
  2015-08-30 15:01 ` Eli Zaretskii
  0 siblings, 1 reply; 50+ messages in thread
From: Pip Cet @ 2015-08-30 12:51 UTC (permalink / raw)
  To: 21380

[-- Attachment #1: Type: text/plain, Size: 8930 bytes --]

It appears it's unsafe to schedule an immediate timer in
window-configuration-change-hook:

  (add-hook 'window-configuration-change-hook (lambda () (run-with-timer
                                                          0 nil
'exwm-layout--refresh)))

I saw the following segfault:
#0  0x0000000000547ec0 in XSETCAR (c=0, n=51872517) at lisp.h:1188
#1  0x00000000005efdb3 in concat (nargs=1, args=0x7fffffffd868,
target_type=Lisp_Cons, last_special=false) at fns.c:747
#2  0x00000000005ef0f3 in Fcopy_sequence (arg=53371635) at fns.c:510
#3  0x0000000000557252 in timer_check () at keyboard.c:4569
#4  0x0000000000639f1b in wait_reading_process_output (time_limit=30,
nsecs=0, read_kbd=-1, do_display=true, wait_for_cell=0, wait_proc=0x0,
just_wait_proc=0) at process.c:4611
#5  0x00000000004233b2 in sit_for (timeout=122, reading=true,
display_option=1) at dispnew.c:5756
#6  0x0000000000553942 in read_char (commandflag=1, map=53372867,
prev_event=0, used_mouse_menu=0x7fffffffe06f, end_time=0x0) at
keyboard.c:2775
#7  0x00000000005602ef in read_key_sequence (keybuf=0x7fffffffe220,
bufsize=30, prompt=0, dont_downcase_last=false,
can_return_switch_frame=true, fix_current_buffer=true,
prevent_redisplay=false) at keyboard.c:9139
#8  0x00000000005507b1 in command_loop_1 () at keyboard.c:1406
#9  0x00000000005e77e4 in internal_condition_case (bfun=0x550386
<command_loop_1>, handlers=18912, hfun=0x54fb70 <cmd_error>) at eval.c:1293
#10 0x000000000055008d in command_loop_2 (ignore=0) at keyboard.c:1138
#11 0x00000000005e6fa6 in internal_catch (tag=45264, func=0x550064
<command_loop_2>, arg=0) at eval.c:1057
#12 0x000000000055002f in command_loop () at keyboard.c:1117
#13 0x000000000054f738 in recursive_edit_1 () at keyboard.c:723
#14 0x000000000054f8cc in Frecursive_edit () at keyboard.c:794
#15 0x000000000054d706 in main (argc=5, argv=0x7fffffffe6a8) at emacs.c:1643

Somehow, the argument to Fcopy_sequence was changed while concat was
underway. Further investigation indicates that
window-configuration-change-hook was called in the middle of concat:

Breakpoint 8, run_window_configuration_change_hook (f=0x1718660) at
window.c:3141
3141      ptrdiff_t count = SPECPDL_INDEX ();
#0  run_window_configuration_change_hook (f=0x1718660) at window.c:3141
#1  0x0000000000425882 in adjust_frame_size (f=0x1718660, new_width=824,
new_height=516, inhibit=5, pretend=false, parameter=13152) at frame.c:599
#2  0x0000000000422c01 in change_frame_size_1 (f=0x1718660, new_width=824,
new_height=516, pretend=false, delay=false, safe=false, pixelwise=true) at
dispnew.c:5507
#3  0x0000000000422c57 in change_frame_size (f=0x1718660, new_width=824,
new_height=516, pretend=false, delay=false, safe=false, pixelwise=true) at
dispnew.c:5539
#4  0x0000000000422a2f in do_pending_window_change (safe=false) at
dispnew.c:5465
#5  0x0000000000536c22 in xg_frame_resized (f=0x1718660, pixelwidth=842,
pixelheight=518) at gtkutil.c:924
#6  0x0000000000518a9f in handle_one_xevent (dpyinfo=0x15b07d0,
event=0x7fffffff7210, finish=0xbfaacc, hold_quit=0x7fffffff74a0) at
xterm.c:8294
#7  0x0000000000516a69 in event_handler_gdk (gxev=0x7fffffff7210,
ev=0x568b410, data=0x0) at xterm.c:7294
#8  0x00007ffff6769661 in gdk_event_apply_filters
(xevent=xevent@entry=0x7fffffff7210,
event=event@entry=0x568b410, window=window@entry=0x0) at
/tmp/buildd/gtk+3.0-3.16.6/./gdk/x11/gdkeventsource.c:81
#9  0x00007ffff6769929 in gdk_event_source_translate_event
(xevent=0x7fffffff7210, event_source=0x14e4090) at
/tmp/buildd/gtk+3.0-3.16.6/./gdk/x11/gdkeventsource.c:195
#10 _gdk_x11_display_queue_events (display=0x1506050) at
/tmp/buildd/gtk+3.0-3.16.6/./gdk/x11/gdkeventsource.c:338
#11 0x00007ffff673cae9 in gdk_display_get_event
(display=display@entry=0x1506050)
at /tmp/buildd/gtk+3.0-3.16.6/./gdk/gdkdisplay.c:340
#12 0x00007ffff67696e2 in gdk_event_source_dispatch (source=<optimized
out>, callback=<optimized out>, user_data=<optimized out>) at
/tmp/buildd/gtk+3.0-3.16.6/./gdk/x11/gdkeventsource.c:360
#13 0x00007ffff50bac3d in g_main_dispatch (context=0x14f1f80) at
/tmp/buildd/glib2.0-2.44.1/./glib/gmain.c:3122
#14 g_main_context_dispatch (context=context@entry=0x14f1f80) at
/tmp/buildd/glib2.0-2.44.1/./glib/gmain.c:3737
#15 0x00007ffff50baf20 in g_main_context_iterate
(context=context@entry=0x14f1f80,
block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at
/tmp/buildd/glib2.0-2.44.1/./glib/gmain.c:3808
#16 0x00007ffff50bafcc in g_main_context_iteration (context=0x14f1f80,
context@entry=0x0, may_block=may_block@entry=1) at
/tmp/buildd/glib2.0-2.44.1/./glib/gmain.c:3869
#17 0x00007ffff6be1ff5 in gtk_main_iteration () at
/tmp/buildd/gtk+3.0-3.16.6/./gtk/gtkmain.c:1320
#18 0x0000000000519220 in XTread_socket (terminal=0x11ecdd0,
hold_quit=0x7fffffff74a0) at xterm.c:8644
#19 0x000000000055bbe2 in gobble_input () at keyboard.c:6893
#20 0x000000000055bfcc in handle_async_input () at keyboard.c:7145
#21 0x000000000055bfeb in process_pending_signals () at keyboard.c:7159
#22 0x00000000005c3d3a in Fmake_list (length=0, init=0) at alloc.c:2676
#23 0x00000000005ef6f8 in concat (nargs=1, args=0x7fffffff76e8,
target_type=Lisp_Cons, last_special=false) at fns.c:642
#24 0x00000000005ef0f3 in Fcopy_sequence (arg=49369891) at fns.c:510
#25 0x0000000000557252 in timer_check () at keyboard.c:4569
#26 0x0000000000555106 in readable_events (flags=1) at keyboard.c:3422
#27 0x000000000055ba2b in get_input_pending (flags=1) at keyboard.c:6808
#28 0x0000000000556a49 in swallow_events (do_display=false) at
keyboard.c:4320
#29 0x000000000063af9c in wait_reading_process_output (time_limit=1,
nsecs=0, read_kbd=0, do_display=false, wait_for_cell=0,
wait_proc=0x2fd6078, just_wait_proc=0) at process.c:4992
#30 0x0000000000638f6c in Faccept_process_output (process=50159736,
seconds=6, millisec=0, just_this_one=0) at process.c:4241

I'm not sure whether the bug is in my code (and, if so, how to fix
it—if scheduling an immediate timer isn't safe, what could possibly
be?), in Fmake_list, in QUIT, or in the X or GTK code, but at least
one of them is buggy.

M-x report-emacs-bug information:

I have been unable, so far, to reproduce this bug reliably from `emacs
-Q'.

In GNU Emacs 25.0.50.52 (x86_64-unknown-linux-gnu, GTK+ Version 3.16.6)
 of 2015-08-30
Repository revision: c6af816affb36d512f806725518e6e5f2353b197
Windowing system distributor `The X.Org Foundation', version 11.0.11702000
System Description:    Debian GNU/Linux unstable (sid)

Configured using:
 `configure 'CFLAGS=-O0 -g3''

Configured features:
XPM JPEG TIFF GIF PNG RSVG SOUND DBUS GCONF GSETTINGS NOTIFY LIBSELINUX
GNUTLS LIBXML2 FREETYPE XFT ZLIB TOOLKIT_SCROLL_BARS GTK3 X11

Important settings:
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Text

Minor modes in effect:
  diff-auto-refine-mode: t
  tooltip-mode: t
  global-eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:
None found.

Features:
(shadow sort gnus-util mail-extr emacsbug message dired format-spec
rfc822 mml mml-sec mm-decode mm-bodies mm-encode mail-parse rfc2231
mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045 ietf-drums
mm-util help-fns help-mode cl-loaddefs pcase cl-lib mail-prsvr
mail-utils vc-git diff-mode easymenu easy-mmode time-date mule-util
tooltip eldoc electric uniquify ediff-hook vc-hooks lisp-float-type
mwheel x-win term/common-win x-dnd tool-bar dnd fontset image regexp-opt
fringe tabulated-list newcomment elisp-mode lisp-mode prog-mode register
page menu-bar rfn-eshadow timer select scroll-bar mouse jit-lock
font-lock syntax facemenu font-core frame cl-generic cham georgian
utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean
japanese eucjp-ms cp51932 hebrew greek romanian slovak czech european
ethiopic indian cyrillic chinese charscript case-table epa-hook
jka-cmpr-hook help simple abbrev minibuffer cl-preloaded nadvice
loaddefs button faces cus-face macroexp files text-properties overlay
sha1 md5 base64 format env code-pages mule custom widget
hashtable-print-readable backquote dbusbind inotify dynamic-setting
system-font-setting font-render-setting move-toolbar gtk x-toolkit x
multi-tty make-network-process emacs)

Memory information:
((conses 16 84908 4447)
 (symbols 48 19432 0)
 (miscs 40 43 147)
 (strings 32 14795 4332)
 (string-bytes 1 417071)
 (vectors 16 11731)
 (vector-slots 8 419664 4375)
 (floats 8 138 302)
 (intervals 56 225 13)
 (buffers 976 12)
 (heap 1024 19426 1811))

[-- Attachment #2: Type: text/html, Size: 9398 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2015-08-30 12:51 bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook Pip Cet
@ 2015-08-30 15:01 ` Eli Zaretskii
  2015-08-30 15:24   ` Pip Cet
  0 siblings, 1 reply; 50+ messages in thread
From: Eli Zaretskii @ 2015-08-30 15:01 UTC (permalink / raw)
  To: Pip Cet; +Cc: 21380

> Date: Sun, 30 Aug 2015 12:51:26 +0000
> From: Pip Cet <pipcet@gmail.com>
> 
> It appears it's unsafe to schedule an immediate timer in
> window-configuration-change-hook:
> 
> (add-hook 'window-configuration-change-hook (lambda () (run-with-timer
> 0 nil 'exwm-layout--refresh)))
> 
> I saw the following segfault:
> #0 0x0000000000547ec0 in XSETCAR (c=0, n=51872517) at lisp.h:1188
> #1 0x00000000005efdb3 in concat (nargs=1, args=0x7fffffffd868,
> target_type=Lisp_Cons, last_special=false) at fns.c:747
> #2 0x00000000005ef0f3 in Fcopy_sequence (arg=53371635) at fns.c:510
> #3 0x0000000000557252 in timer_check () at keyboard.c:4569
> #4 0x0000000000639f1b in wait_reading_process_output (time_limit=30, nsecs=0,
> read_kbd=-1, do_display=true, wait_for_cell=0, wait_proc=0x0, just_wait_proc=0)
> at process.c:4611
> #5 0x00000000004233b2 in sit_for (timeout=122, reading=true, display_option=1)
> at dispnew.c:5756
> #6 0x0000000000553942 in read_char (commandflag=1, map=53372867, prev_event=0,
> used_mouse_menu=0x7fffffffe06f, end_time=0x0) at keyboard.c:2775
> #7 0x00000000005602ef in read_key_sequence (keybuf=0x7fffffffe220, bufsize=30,
> prompt=0, dont_downcase_last=false, can_return_switch_frame=true,
> fix_current_buffer=true, prevent_redisplay=false) at keyboard.c:9139
> #8 0x00000000005507b1 in command_loop_1 () at keyboard.c:1406
> #9 0x00000000005e77e4 in internal_condition_case (bfun=0x550386
> <command_loop_1>, handlers=18912, hfun=0x54fb70 <cmd_error>) at eval.c:1293
> #10 0x000000000055008d in command_loop_2 (ignore=0) at keyboard.c:1138
> #11 0x00000000005e6fa6 in internal_catch (tag=45264, func=0x550064
> <command_loop_2>, arg=0) at eval.c:1057
> #12 0x000000000055002f in command_loop () at keyboard.c:1117
> #13 0x000000000054f738 in recursive_edit_1 () at keyboard.c:723
> #14 0x000000000054f8cc in Frecursive_edit () at keyboard.c:794
> #15 0x000000000054d706 in main (argc=5, argv=0x7fffffffe6a8) at emacs.c:1643
> 
> Somehow, the argument to Fcopy_sequence was changed while concat was
> underway.

How do you see that?

> Further investigation indicates that
> window-configuration-change-hook was called in the middle of concat:

Did you understand how this fact is related to the segfault?

Thanks.





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2015-08-30 15:01 ` Eli Zaretskii
@ 2015-08-30 15:24   ` Pip Cet
  2015-08-30 15:27     ` Pip Cet
  2015-08-30 16:39     ` Eli Zaretskii
  0 siblings, 2 replies; 50+ messages in thread
From: Pip Cet @ 2015-08-30 15:24 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 21380

[-- Attachment #1: Type: text/plain, Size: 1287 bytes --]

On Sun, Aug 30, 2015 at 3:01 PM, Eli Zaretskii <eliz@gnu.org> wrote:

> > Date: Sun, 30 Aug 2015 12:51:26 +0000
> > From: Pip Cet <pipcet@gmail.com>
> > Somehow, the argument to Fcopy_sequence was changed while concat was
> > underway.
>
> How do you see that?
>

I originally concluded it was the only way to trigger the bug, but I just
managed to trigger it again and have it open in a GDB session:

#1  0x00000000005efdb3 in concat (nargs=1, args=0x7fffffff76e8,
target_type=Lisp_Cons, last_special=false) at fns.c:747
747                     XSETCAR (tail, elt);
(gdb) p result_len
$22 = 4
(gdb) p debug_print(Flength(args[0]))
5
$23 = void
(gdb)


> > Further investigation indicates that
> > window-configuration-change-hook was called in the middle of concat:
>
> Did you understand how this fact is related to the segfault?
>

I _think_ I do.

1. concat called with args[0] == Vtimer_list
2. concat stores result_len (=4)
3. concat calls make_list (4)
4. make_list interrupted by QUIT
5. see stack trace
6. window-configuration-change-hook modifies Vtimer_list, which now has
length 5
7. control returns to concat
8. concat tries to write 5 elements into a 4-element list, which causes the
segfault because `tail' is unexpectedly NULL.

Does that make sense to you?

Thanks,
Pip

[-- Attachment #2: Type: text/html, Size: 2146 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2015-08-30 15:24   ` Pip Cet
@ 2015-08-30 15:27     ` Pip Cet
  2015-08-30 16:24       ` Pip Cet
  2015-08-30 16:39     ` Eli Zaretskii
  1 sibling, 1 reply; 50+ messages in thread
From: Pip Cet @ 2015-08-30 15:27 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 21380

[-- Attachment #1: Type: text/plain, Size: 1784 bytes --]

I forgot to make clear that I verified with gdb that args[0] ==
Vtimer_list. And if there's anything else you would like me to debug,
please let me know. It's very unfortunate I can't reproduce it with emacs
-Q and I realize that makes it impossible for you to deal with this bug
except through information I provide.

Thanks for trying anyway,
Pip

On Sun, Aug 30, 2015 at 3:24 PM, Pip Cet <pipcet@gmail.com> wrote:

>
>
> On Sun, Aug 30, 2015 at 3:01 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> > Date: Sun, 30 Aug 2015 12:51:26 +0000
>> > From: Pip Cet <pipcet@gmail.com>
>> > Somehow, the argument to Fcopy_sequence was changed while concat was
>> > underway.
>>
>> How do you see that?
>>
>
> I originally concluded it was the only way to trigger the bug, but I just
> managed to trigger it again and have it open in a GDB session:
>
> #1  0x00000000005efdb3 in concat (nargs=1, args=0x7fffffff76e8,
> target_type=Lisp_Cons, last_special=false) at fns.c:747
> 747                     XSETCAR (tail, elt);
> (gdb) p result_len
> $22 = 4
> (gdb) p debug_print(Flength(args[0]))
> 5
> $23 = void
> (gdb)
>
>
>> > Further investigation indicates that
>> > window-configuration-change-hook was called in the middle of concat:
>>
>> Did you understand how this fact is related to the segfault?
>>
>
> I _think_ I do.
>
> 1. concat called with args[0] == Vtimer_list
> 2. concat stores result_len (=4)
> 3. concat calls make_list (4)
> 4. make_list interrupted by QUIT
> 5. see stack trace
> 6. window-configuration-change-hook modifies Vtimer_list, which now has
> length 5
> 7. control returns to concat
> 8. concat tries to write 5 elements into a 4-element list, which causes
> the segfault because `tail' is unexpectedly NULL.
>
> Does that make sense to you?
>
> Thanks,
> Pip
>

[-- Attachment #2: Type: text/html, Size: 3006 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2015-08-30 15:27     ` Pip Cet
@ 2015-08-30 16:24       ` Pip Cet
  2015-08-30 18:10         ` martin rudalics
  0 siblings, 1 reply; 50+ messages in thread
From: Pip Cet @ 2015-08-30 16:24 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 21380

[-- Attachment #1: Type: text/plain, Size: 2880 bytes --]

I think the problem is the call to do_pending_window_change in
xg_frame_resized in gtkutil.c: the commit message (commit 3477e27021db)
says:

Author:     Martin Rudalics <rudalics@gmx.at>
AuthorDate: Sun Jul 27 15:21:30 2014 +0200
Commit:     Martin Rudalics <rudalics@gmx.at>
CommitDate: Sun Jul 27 15:21:30 2014 +0200

    Complete pixelwise frame/window resizing, add horizontal scrollbar
support.
    [...]
    * gtkutil.c (xg_frame_resized): Don't call
    do_pending_window_change.

but the diff is:

@@ -883,6 +884,8 @@ xg_frame_resized (struct frame *f, int pixelwidth, int
pixelheight)
       change_frame_size (f, width, height, 0, 1, 0, 1);
       SET_FRAME_GARBAGED (f);
       cancel_mouse_face (f);
+
+      do_pending_window_change (0);
     }
 }

And my current understanding is this bug would not occur if that call were
removed. The same issue applies to the change to x_set_window_size, but I'm
not certain about removing either call.

On Sun, Aug 30, 2015 at 3:27 PM, Pip Cet <pipcet@gmail.com> wrote:

> I forgot to make clear that I verified with gdb that args[0] ==
> Vtimer_list. And if there's anything else you would like me to debug,
> please let me know. It's very unfortunate I can't reproduce it with emacs
> -Q and I realize that makes it impossible for you to deal with this bug
> except through information I provide.
>
> Thanks for trying anyway,
> Pip
>
> On Sun, Aug 30, 2015 at 3:24 PM, Pip Cet <pipcet@gmail.com> wrote:
>
>>
>>
>> On Sun, Aug 30, 2015 at 3:01 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>>
>>> > Date: Sun, 30 Aug 2015 12:51:26 +0000
>>> > From: Pip Cet <pipcet@gmail.com>
>>> > Somehow, the argument to Fcopy_sequence was changed while concat was
>>> > underway.
>>>
>>> How do you see that?
>>>
>>
>> I originally concluded it was the only way to trigger the bug, but I just
>> managed to trigger it again and have it open in a GDB session:
>>
>> #1  0x00000000005efdb3 in concat (nargs=1, args=0x7fffffff76e8,
>> target_type=Lisp_Cons, last_special=false) at fns.c:747
>> 747                     XSETCAR (tail, elt);
>> (gdb) p result_len
>> $22 = 4
>> (gdb) p debug_print(Flength(args[0]))
>> 5
>> $23 = void
>> (gdb)
>>
>>
>>> > Further investigation indicates that
>>> > window-configuration-change-hook was called in the middle of concat:
>>>
>>> Did you understand how this fact is related to the segfault?
>>>
>>
>> I _think_ I do.
>>
>> 1. concat called with args[0] == Vtimer_list
>> 2. concat stores result_len (=4)
>> 3. concat calls make_list (4)
>> 4. make_list interrupted by QUIT
>> 5. see stack trace
>> 6. window-configuration-change-hook modifies Vtimer_list, which now has
>> length 5
>> 7. control returns to concat
>> 8. concat tries to write 5 elements into a 4-element list, which causes
>> the segfault because `tail' is unexpectedly NULL.
>>
>> Does that make sense to you?
>>
>> Thanks,
>> Pip
>>
>
>

[-- Attachment #2: Type: text/html, Size: 4623 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2015-08-30 15:24   ` Pip Cet
  2015-08-30 15:27     ` Pip Cet
@ 2015-08-30 16:39     ` Eli Zaretskii
  2015-08-30 16:42       ` Pip Cet
  1 sibling, 1 reply; 50+ messages in thread
From: Eli Zaretskii @ 2015-08-30 16:39 UTC (permalink / raw)
  To: Pip Cet; +Cc: 21380

> Date: Sun, 30 Aug 2015 15:24:26 +0000
> From: Pip Cet <pipcet@gmail.com>
> Cc: 21380@debbugs.gnu.org
> 
>     > Further investigation indicates that
>     > window-configuration-change-hook was called in the middle of concat:
>     
>     Did you understand how this fact is related to the segfault?
>     
> 
> I _think_ I do.
> 
> 1. concat called with args[0] == Vtimer_list
> 2. concat stores result_len (=4)
> 3. concat calls make_list (4)
> 4. make_list interrupted by QUIT
> 5. see stack trace
> 6. window-configuration-change-hook modifies Vtimer_list, which now has length
> 5
> 7. control returns to concat
> 8. concat tries to write 5 elements into a 4-element list, which causes the
> segfault because `tail' is unexpectedly NULL.
> 
> Does that make sense to you?

It does, but there's one additional factor that was supposed to
prevent such problems: the first thing timer_check does is copy
Vtimer_list to a local variable; then it works on that copy.  So
whatever happens in the meantime to Vtimer_list should not have
affected concat, because concat is called on a copy.

Which part of this doesn't work, and why?





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2015-08-30 16:39     ` Eli Zaretskii
@ 2015-08-30 16:42       ` Pip Cet
  2015-08-30 19:44         ` Eli Zaretskii
  0 siblings, 1 reply; 50+ messages in thread
From: Pip Cet @ 2015-08-30 16:42 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 21380

[-- Attachment #1: Type: text/plain, Size: 1330 bytes --]

On Sun, Aug 30, 2015 at 4:39 PM, Eli Zaretskii <eliz@gnu.org> wrote:

> > Date: Sun, 30 Aug 2015 15:24:26 +0000
> > From: Pip Cet <pipcet@gmail.com>
> > Cc: 21380@debbugs.gnu.org
> >
> >     > Further investigation indicates that
> >     > window-configuration-change-hook was called in the middle of
> concat:
> >
> >     Did you understand how this fact is related to the segfault?
> >
> >
> > I _think_ I do.
> >
> > 1. concat called with args[0] == Vtimer_list
> > 2. concat stores result_len (=4)
> > 3. concat calls make_list (4)
> > 4. make_list interrupted by QUIT
> > 5. see stack trace
> > 6. window-configuration-change-hook modifies Vtimer_list, which now has
> length
> > 5
> > 7. control returns to concat
> > 8. concat tries to write 5 elements into a 4-element list, which causes
> the
> > segfault because `tail' is unexpectedly NULL.
> >
> > Does that make sense to you?
>
> It does, but there's one additional factor that was supposed to
> prevent such problems: the first thing timer_check does is copy
> Vtimer_list to a local variable; then it works on that copy.  So
> whatever happens in the meantime to Vtimer_list should not have
> affected concat, because concat is called on a copy.
>

I'm not sure I understand. This issue is happening while the temporary copy
is being created, not afterwards, IIUC.

[-- Attachment #2: Type: text/html, Size: 1940 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2015-08-30 16:24       ` Pip Cet
@ 2015-08-30 18:10         ` martin rudalics
  2015-08-30 18:20           ` Pip Cet
  2015-08-30 18:59           ` Pip Cet
  0 siblings, 2 replies; 50+ messages in thread
From: martin rudalics @ 2015-08-30 18:10 UTC (permalink / raw)
  To: Pip Cet, Eli Zaretskii; +Cc: 21380

 > I think the problem is the call to do_pending_window_change in
 > xg_frame_resized in gtkutil.c: the commit message (commit 3477e27021db)
 > says:
 >
 > Author:     Martin Rudalics <rudalics@gmx.at>
 > AuthorDate: Sun Jul 27 15:21:30 2014 +0200
 > Commit:     Martin Rudalics <rudalics@gmx.at>
 > CommitDate: Sun Jul 27 15:21:30 2014 +0200
 >
 >      Complete pixelwise frame/window resizing, add horizontal scrollbar
 > support.
 >      [...]
 >      * gtkutil.c (xg_frame_resized): Don't call
 >      do_pending_window_change.
 >
 > but the diff is:
 >
 > @@ -883,6 +884,8 @@ xg_frame_resized (struct frame *f, int pixelwidth, int
 > pixelheight)
 >         change_frame_size (f, width, height, 0, 1, 0, 1);
 >         SET_FRAME_GARBAGED (f);
 >         cancel_mouse_face (f);
 > +
 > +      do_pending_window_change (0);
 >       }
 >   }

Remarkable.  I don't remember why I added them.  And obviously I have no
idea why I wrote the ChangeLog entry in reverse.  Just as if I read diff
output in the wrong direction.

In my understanding the do_pending_window_change call is not needed and
usually should be a noop.  But I have no idea why this particular call
of do_pending_window_change would run ‘window-configuration-change-hook’
and subsequently cause the havoc you describe.  The last
change_frame_size should have just happened three lines before.

 > And my current understanding is this bug would not occur if that call were
 > removed. The same issue applies to the change to x_set_window_size, but I'm
 > not certain about removing either call.

Maybe.  But the cause of the SEGFAULT must be elsewhere.  I have no
idea how

4. make_list interrupted by QUIT

could happen "while the temporary copy is being created" when
timer_check has set Vinhibit_quit to t.

martin






^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2015-08-30 18:10         ` martin rudalics
@ 2015-08-30 18:20           ` Pip Cet
  2015-08-30 19:50             ` Eli Zaretskii
  2015-08-30 18:59           ` Pip Cet
  1 sibling, 1 reply; 50+ messages in thread
From: Pip Cet @ 2015-08-30 18:20 UTC (permalink / raw)
  To: martin rudalics; +Cc: 21380

[-- Attachment #1: Type: text/plain, Size: 1195 bytes --]

On Sun, Aug 30, 2015 at 6:10 PM, martin rudalics <rudalics@gmx.at> wrote:

> Remarkable.  I don't remember why I added them.  And obviously I have no
> idea why I wrote the ChangeLog entry in reverse.  Just as if I read diff
> output in the wrong direction.
>

I thought it might have been ANTINEWS month and I missed it :-)


> > And my current understanding is this bug would not occur if that call
> were
> > removed. The same issue applies to the change to x_set_window_size, but
> I'm
> > not certain about removing either call.
>
> Maybe.  But the cause of the SEGFAULT must be elsewhere.  I have no
> idea how
>
> 4. make_list interrupted by QUIT
>
> could happen "while the temporary copy is being created" when
> timer_check has set Vinhibit_quit to t.
>

inhibit_quit inhibits process_quit_flag(), but not
process_pending_signals(), if I'm reading this correctly:

#define QUIT                        \
  do {                            \
    if (!NILP (Vquit_flag) && NILP (Vinhibit_quit))    \
      process_quit_flag ();                \
    else if (pending_signals)                \
      process_pending_signals ();            \
  } while (false)

Maybe it should.

Regards,
Pip

[-- Attachment #2: Type: text/html, Size: 2038 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2015-08-30 18:10         ` martin rudalics
  2015-08-30 18:20           ` Pip Cet
@ 2015-08-30 18:59           ` Pip Cet
  2015-08-31  9:20             ` martin rudalics
  1 sibling, 1 reply; 50+ messages in thread
From: Pip Cet @ 2015-08-30 18:59 UTC (permalink / raw)
  To: martin rudalics; +Cc: 21380

[-- Attachment #1: Type: text/plain, Size: 1227 bytes --]

On Sun, Aug 30, 2015 at 6:10 PM, martin rudalics <rudalics@gmx.at> wrote:

> In my understanding the do_pending_window_change call is not needed and
> usually should be a noop.


May I ask why? I don't understand this code very well.


> But I have no idea why this particular call
> of do_pending_window_change would run ‘window-configuration-change-hook’
> and subsequently cause the havoc you describe.  The last
> change_frame_size should have just happened three lines before.
>

But that had delay == true, so change_frame_size_1 never called
adjust_frame_size, right?


> > And my current understanding is this bug would not occur if that call
> were
> > removed.


...but possibly that wouldn't work because of other things being called
from GTK event handlers.

Just thinking out loud for the rest of the Email:
I'm somewhat hesitant to mention this idea, but wouldn't it be best for GTK
events to generate special input events (like we already do for
asynchronous frame switches?), and let the command loop handle those? I've
just hit what appears to be another bug caused by asynchronous frame
destruction by GTK (I'm creating and destroying many Emacs frames in my
test code).

[-- Attachment #2: Type: text/html, Size: 1890 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2015-08-30 16:42       ` Pip Cet
@ 2015-08-30 19:44         ` Eli Zaretskii
  2015-08-30 20:56           ` Pip Cet
  0 siblings, 1 reply; 50+ messages in thread
From: Eli Zaretskii @ 2015-08-30 19:44 UTC (permalink / raw)
  To: Pip Cet; +Cc: 21380

> Date: Sun, 30 Aug 2015 16:42:21 +0000
> From: Pip Cet <pipcet@gmail.com>
> Cc: 21380@debbugs.gnu.org
> 
>     > Does that make sense to you?
>     
>     It does, but there's one additional factor that was supposed to
>     prevent such problems: the first thing timer_check does is copy
>     Vtimer_list to a local variable; then it works on that copy. So
>     whatever happens in the meantime to Vtimer_list should not have
>     affected concat, because concat is called on a copy.
> 
> I'm not sure I understand. This issue is happening while the temporary copy is
> being created, not afterwards, IIUC.

Then all we have to do is block QUIT during that copy, no?





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2015-08-30 18:20           ` Pip Cet
@ 2015-08-30 19:50             ` Eli Zaretskii
  0 siblings, 0 replies; 50+ messages in thread
From: Eli Zaretskii @ 2015-08-30 19:50 UTC (permalink / raw)
  To: Pip Cet; +Cc: 21380

> Date: Sun, 30 Aug 2015 18:20:42 +0000
> From: Pip Cet <pipcet@gmail.com>
> Cc: Eli Zaretskii <eliz@gnu.org>, 21380@debbugs.gnu.org
> 
>     Maybe. But the cause of the SEGFAULT must be elsewhere. I have no
>     idea how
>     
>     4. make_list interrupted by QUIT
>     
>     could happen "while the temporary copy is being created" when
>     timer_check has set Vinhibit_quit to t.
>     
> 
> inhibit_quit inhibits process_quit_flag(), but not process_pending_signals(),
> if I'm reading this correctly:
> 
> #define QUIT \
> do { \
> if (!NILP (Vquit_flag) && NILP (Vinhibit_quit)) \
> process_quit_flag (); \
> else if (pending_signals) \
> process_pending_signals (); \
> } while (false)
> 
> Maybe it should.

Does block_input/unblock_input around the copying solve the problem?





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2015-08-30 19:44         ` Eli Zaretskii
@ 2015-08-30 20:56           ` Pip Cet
  2015-08-30 21:13             ` Pip Cet
  2015-08-31 14:31             ` Eli Zaretskii
  0 siblings, 2 replies; 50+ messages in thread
From: Pip Cet @ 2015-08-30 20:56 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 21380

[-- Attachment #1: Type: text/plain, Size: 2252 bytes --]

On Sun, Aug 30, 2015 at 7:44 PM, Eli Zaretskii <eliz@gnu.org> wrote:

> > I'm not sure I understand. This issue is happening while the temporary
> copy is
> > being created, not afterwards, IIUC.
>
> Then all we have to do is block QUIT during that copy, no?


Yes for this particular segfault.

No* for similar segfaults that I think pose equally severe problems: if any
other function calls concat/copy-sequence on data that is modified by
window-configuration-change-hook, it should* still be possible to produce
the segfault. So it wouldn't even be safe for
window-configuration-change-hook to add a timer to the timer list, because
the outer frame might be in the middle of creating a copy of the timer list
for some Lisp code that hasn't blocked input. (As in my example below)

I really don't think QUIT should run any Lisp hooks, to be honest. From the
point of view of the Lisp user (and the not-totally-careful C user, as in
this case), that might make them happen at any time, and all kinds of race
conditions would occur instead. Maybe this is a matter for emacs-devel?

If I'm wrong and QUIT should be able to run Lisp hooks, concat needs to be
fixed not to rely on its argument's size being unchanged after the
make_sequence call.

*-I have been able to verify this segfault by interrupting the program at
just the right time and resizing its X window then:
 - gdb --args emacs -Q
 - evaluate the following code:

(run-with-timer 0 1 #'ignore) ;; so the timer list isn't empty
(add-hook 'window-configuration-change-hook (lambda () (run-with-timer 0
nil #'ignore)))

(while t
  (copy-sequence timer-list)
  (accept-process-output nil 0)
  )

 - interrupt and set a breakpoint with "b Fmake_sequence if
!input_blocked_p()".
 - wait for breakpoint to be hit, verify that concat's args[0] is equal to
Vtimer_list.
 - resize the X window
 - continue in gdb
 - segfault

As far as I can tell, that should be reproducible. Also as far as I can
tell, it's merely a matter of luck that an X resize doesn't happen at the
point where I interrupted the program to artificially trigger the segfault.
However, I admit that it is a separate issue, less likely to occur in
practice, and I'll open another bug for it if that's the way you prefer
things.

[-- Attachment #2: Type: text/html, Size: 3050 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2015-08-30 20:56           ` Pip Cet
@ 2015-08-30 21:13             ` Pip Cet
  2015-08-31 14:31             ` Eli Zaretskii
  1 sibling, 0 replies; 50+ messages in thread
From: Pip Cet @ 2015-08-30 21:13 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 21380

[-- Attachment #1: Type: text/plain, Size: 452 bytes --]

On Sun, Aug 30, 2015 at 8:56 PM, Pip Cet <pipcet@gmail.com> wrote:

> - interrupt and set a breakpoint with "b Fmake_sequence if
> !input_blocked_p()".
>

Fmake_list, sorry. I should also make it clear that I've tested this with
block_input/unblock_input where you suggested it (thus the
!input_blocked_p() in order not to catch that case). Please let me know if
you can't reproduce it or want me to send a gdb log or anything else.

Thanks again,
Pip

[-- Attachment #2: Type: text/html, Size: 856 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2015-08-30 18:59           ` Pip Cet
@ 2015-08-31  9:20             ` martin rudalics
  0 siblings, 0 replies; 50+ messages in thread
From: martin rudalics @ 2015-08-31  9:20 UTC (permalink / raw)
  To: Pip Cet; +Cc: 21380

 >> But I have no idea why this particular call
 >> of do_pending_window_change would run ‘window-configuration-change-hook’
 >> and subsequently cause the havoc you describe.  The last
 >> change_frame_size should have just happened three lines before.
 >
 > But that had delay == true, so change_frame_size_1 never called
 > adjust_frame_size, right?

Indeed.  I removed the do_pending_window_change calls now.  I don't
recall why I thought that I needed them, maybe to get those border
clearing calls in synch with the resizing of the root window.  In any
case, judging from my ChangeLog entries, I must have had some sort of
premonition that this was a bad idea.

martin






^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2015-08-30 20:56           ` Pip Cet
  2015-08-30 21:13             ` Pip Cet
@ 2015-08-31 14:31             ` Eli Zaretskii
  2015-09-01 10:20               ` Pip Cet
  1 sibling, 1 reply; 50+ messages in thread
From: Eli Zaretskii @ 2015-08-31 14:31 UTC (permalink / raw)
  To: Pip Cet; +Cc: 21380

> Date: Sun, 30 Aug 2015 20:56:33 +0000
> From: Pip Cet <pipcet@gmail.com>
> Cc: 21380@debbugs.gnu.org
> 
>     Then all we have to do is block QUIT during that copy, no?
> 
> Yes for this particular segfault.

Can you show a patch that fixes the original segfault in your use
case?  I'm afraid I lost track, what with all the different scenarios
and potential solutions being thrown at this.  We should install the
fix, assuming it's clean.

> No* for similar segfaults that I think pose equally severe problems: if any other function calls concat/copy-sequence on data that is modified by window-configuration-change-hook, it should* still be possible to produce the segfault.

Emacs gives you enough rope to hang yourself; there's nothing new
here.  We should strive to protect the Emacs internals so that they
won't cause segfaults, but in user code any bets are off, and "don't
do that" is a valid response to whoever does such things.

> So it wouldn't even be safe for window-configuration-change-hook to add a timer to the timer list, because the outer frame might be in the middle of creating a copy of the timer list for some Lisp code that hasn't blocked input. (As in my example below)

Futzing with timers from within some hooks is indeed fundamentally
dangerous.  But we should still try to minimize the probability of a
crash, especially when it's Emacs itself who makes the offending copy,
because people do dangerous things all the time, and expect them to
work.  In this case, blocking input should do, I think.

> I really don't think QUIT should run any Lisp hooks, to be honest.

I don't think this limitation could fly.  It will disable a lot of
useful use patterns, and the outcry will be loud and clear.

> If I'm wrong and QUIT should be able to run Lisp hooks, concat needs to be fixed not to rely on its argument's size being unchanged after the make_sequence call.

That can't do any harm, so let's do it, too.

> *-I have been able to verify this segfault by interrupting the program at just the right time and resizing its X window then:
>  - gdb --args emacs -Q
>  - evaluate the following code:
> 
> (run-with-timer 0 1 #'ignore) ;; so the timer list isn't empty
> (add-hook 'window-configuration-change-hook (lambda () (run-with-timer 0 nil #'ignore)))

> (while t
>   (copy-sequence timer-list)
>   (accept-process-output nil 0)
>   )
> 
>  - interrupt and set a breakpoint with "b Fmake_sequence if !input_blocked_p()".
>  - wait for breakpoint to be hit, verify that concat's args[0] is equal to Vtimer_list.
>  - resize the X window
>  - continue in gdb
>  - segfault
> 
> As far as I can tell, that should be reproducible. Also as far as I can tell, it's merely a matter of luck that an X resize doesn't happen at the point where I interrupted the program to artificially trigger the segfault. However, I admit that it is a separate issue, less likely to occur in practice, and I'll open another bug for it if that's the way you prefer things.

But if input is blocked, as it would be in the case of copying
timer-list inside timer_check, the X events will not be acted upon,
and the problem will not happen, right?

IOW, the above situation is a case of a user shooting herself in the
foot by having that particular function in the hook and that
particular code that copies timer-list (which is an internal variable
unwise users should not touch).  Am I right?





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2015-08-31 14:31             ` Eli Zaretskii
@ 2015-09-01 10:20               ` Pip Cet
  2015-09-01 15:03                 ` Eli Zaretskii
  2015-09-01 15:14                 ` Pip Cet
  0 siblings, 2 replies; 50+ messages in thread
From: Pip Cet @ 2015-09-01 10:20 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 21380


[-- Attachment #1.1: Type: text/plain, Size: 4072 bytes --]

On Mon, Aug 31, 2015 at 2:31 PM, Eli Zaretskii <eliz@gnu.org> wrote:

> > Yes for this particular segfault.
>
> Can you show a patch that fixes the original segfault in your use
> case?


Attached. Note that either one of those changes should work. I'll test this
patch some more using my original code and see whether it blows up.

I'm afraid I lost track, what with all the different scenarios
> and potential solutions being thrown at this.  We should install the
> fix, assuming it's clean.
>

I think we should fix three things:
 - concat shouldn't rely on its argument remaining unchanged in length
 - the timer list copy should happen with block_input/unblock_input wrapped
around it
 - we shouldn't call do_pending_window_change from QUIT [already installed.
Thanks, martin!]

Any one of these is enough to prevent the original segfault. All but the
second also prevent the bizarre-elisp-induced segfault I came up with
later. And I'm perfectly happy for today with the number of hooks called
from QUIT reduced by one, rather than insisting on reducing them to zero
right away.

> No* for similar segfaults that I think pose equally severe problems: if
> any other function calls concat/copy-sequence on data that is modified by
> window-configuration-change-hook, it should* still be possible to produce
> the segfault.
>
> Emacs gives you enough rope to hang yourself; there's nothing new
> here.  We should strive to protect the Emacs internals so that they
> won't cause segfaults, but in user code any bets are off, and "don't
> do that" is a valid response to whoever does such things.
>

It's always good to know what the philosophy is behind the way the code
works, so thank you for that, really.

> So it wouldn't even be safe for window-configuration-change-hook to add a
> timer to the timer list, because the outer frame might be in the middle of
> creating a copy of the timer list for some Lisp code that hasn't blocked
> input. (As in my example below)
>
> Futzing with timers from within some hooks is indeed fundamentally
> dangerous.


Well, doing anything from window-configuration-change-hook is dangerous. My
idea was to schedule an immediate timer from it to get out of the danger
zone to do the actual work, but that backfired...


> But we should still try to minimize the probability of a
> crash, especially when it's Emacs itself who makes the offending copy,
> because people do dangerous things all the time, and expect them to
> work.  In this case, blocking input should do, I think.
>

I agree.

> I really don't think QUIT should run any Lisp hooks, to be honest.
>
> I don't think this limitation could fly.  It will disable a lot of
> useful use patterns, and the outcry will be loud and clear.
>

Okay.

> If I'm wrong and QUIT should be able to run Lisp hooks, concat needs to
> be fixed not to rely on its argument's size being unchanged after the
> make_sequence call.
>
> That can't do any harm, so let's do it, too.
>

Cool.


> > As far as I can tell, that should be reproducible. Also as far as I can
> tell, it's merely a matter of luck that an X resize doesn't happen at the
> point where I interrupted the program to artificially trigger the segfault.
> However, I admit that it is a separate issue, less likely to occur in
> practice, and I'll open another bug for it if that's the way you prefer
> things.
>
> But if input is blocked, as it would be in the case of copying
> timer-list inside timer_check, the X events will not be acted upon,
> and the problem will not happen, right?
>

Indeed, that relies on bizarre elisp code deliberately doing silly things...


> IOW, the above situation is a case of a user shooting herself in the
> foot by having that particular function in the hook and that
> particular code that copies timer-list (which is an internal variable
> unwise users should not touch).  Am I right?
>

I think you are. I'm not sure whether the timer code in timer.el does
anything to the timer list that might count as dangerous, but that's
possibly the only legitimate Lisp user of timer-list.

[-- Attachment #1.2: Type: text/html, Size: 6044 bytes --]

[-- Attachment #2: 0001-Fix-potential-race-conditions-bug-23380.patch --]
[-- Type: text/x-patch, Size: 1381 bytes --]

From 7e188ca7b0e51bf7ffda413cebc53da9ebfe0cd6 Mon Sep 17 00:00:00 2001
From: Philip <pipcet@gmail.com>
Date: Tue, 1 Sep 2015 10:12:31 +0000
Subject: [PATCH] Fix potential race conditions (bug #23380)

        * keyboard.c (timer_check): Call `block_input' around the creation
	of the temporary timer list copy.

	* fns.c (concat): Don't assume argument size remains unchanged
	after call to `Fmake_list'.
---
 src/fns.c      | 3 +++
 src/keyboard.c | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/src/fns.c b/src/fns.c
index 26a98ab..43ff550 100644
--- a/src/fns.c
+++ b/src/fns.c
@@ -744,6 +744,9 @@ concat (ptrdiff_t nargs, Lisp_Object *args,
 	    /* Store this element into the result.  */
 	    if (toindex < 0)
 	      {
+                if (EQ (tail, Qnil))
+                  break;
+
 		XSETCAR (tail, elt);
 		prev = tail;
 		tail = XCDR (tail);
diff --git a/src/keyboard.c b/src/keyboard.c
index dab32b1..4b7d675 100644
--- a/src/keyboard.c
+++ b/src/keyboard.c
@@ -4560,6 +4560,7 @@ timer_check (void)
 
   Lisp_Object tem = Vinhibit_quit;
   Vinhibit_quit = Qt;
+  block_input ();
 
   /* We use copies of the timers' lists to allow a timer to add itself
      again, without locking up Emacs if the newly added timer is
@@ -4573,6 +4574,7 @@ timer_check (void)
   else
     idle_timers = Qnil;
 
+  unblock_input ();
   Vinhibit_quit = tem;
 
   do
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2015-09-01 10:20               ` Pip Cet
@ 2015-09-01 15:03                 ` Eli Zaretskii
  2015-09-01 15:22                   ` Pip Cet
  2015-09-01 15:14                 ` Pip Cet
  1 sibling, 1 reply; 50+ messages in thread
From: Eli Zaretskii @ 2015-09-01 15:03 UTC (permalink / raw)
  To: Pip Cet; +Cc: 21380

> Date: Tue, 1 Sep 2015 10:20:11 +0000
> From: Pip Cet <pipcet@gmail.com>
> Cc: 21380@debbugs.gnu.org
> 
>     Can you show a patch that fixes the original segfault in your use
>     case?
> 
> Attached.

Hmm... isn't that a kludge?  Or am I missing something?  I thought you
intended to recalculate the length on each iteration?

> I think we should fix three things:
> - concat shouldn't rely on its argument remaining unchanged in length
> - the timer list copy should happen with block_input/unblock_input wrapped
> around it
> - we shouldn't call do_pending_window_change from QUIT [already installed.
> Thanks, martin!]
> 
> Any one of these is enough to prevent the original segfault. All but the second
> also prevent the bizarre-elisp-induced segfault I came up with later.

I think we should make all of these changes.

> I'm not sure whether the timer code in timer.el does anything to the
> timer list that might count as dangerous, but that's possibly the
> only legitimate Lisp user of timer-list.

Indeed.  And if it does something unsafe, we should fix that.





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2015-09-01 10:20               ` Pip Cet
  2015-09-01 15:03                 ` Eli Zaretskii
@ 2015-09-01 15:14                 ` Pip Cet
  2015-09-01 16:04                   ` Eli Zaretskii
  1 sibling, 1 reply; 50+ messages in thread
From: Pip Cet @ 2015-09-01 15:14 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 21380


[-- Attachment #1.1: Type: text/plain, Size: 4740 bytes --]

No segfault*.

(* - well, one segfault. But I attribute that to extraordinarily bizarre
actions even by my standards: attempting to display an unprintable ASCII
control character in the echo area. It seems we never make sure that
non-standard faces are cached when redisplaying the echo area. Usually this
is fine because propertized strings never end up in the echo area (I
hope)...)

Revised patch attached (use NILP, fix bug number in changelog entry).


On Tue, Sep 1, 2015 at 10:20 AM, Pip Cet <pipcet@gmail.com> wrote:

> On Mon, Aug 31, 2015 at 2:31 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> > Yes for this particular segfault.
>>
>> Can you show a patch that fixes the original segfault in your use
>> case?
>
>
> Attached. Note that either one of those changes should work. I'll test
> this patch some more using my original code and see whether it blows up.
>
> I'm afraid I lost track, what with all the different scenarios
>> and potential solutions being thrown at this.  We should install the
>> fix, assuming it's clean.
>>
>
> I think we should fix three things:
>  - concat shouldn't rely on its argument remaining unchanged in length
>  - the timer list copy should happen with block_input/unblock_input
> wrapped around it
>  - we shouldn't call do_pending_window_change from QUIT [already
> installed. Thanks, martin!]
>
> Any one of these is enough to prevent the original segfault. All but the
> second also prevent the bizarre-elisp-induced segfault I came up with
> later. And I'm perfectly happy for today with the number of hooks called
> from QUIT reduced by one, rather than insisting on reducing them to zero
> right away.
>
> > No* for similar segfaults that I think pose equally severe problems: if
>> any other function calls concat/copy-sequence on data that is modified by
>> window-configuration-change-hook, it should* still be possible to produce
>> the segfault.
>>
>> Emacs gives you enough rope to hang yourself; there's nothing new
>> here.  We should strive to protect the Emacs internals so that they
>> won't cause segfaults, but in user code any bets are off, and "don't
>> do that" is a valid response to whoever does such things.
>>
>
> It's always good to know what the philosophy is behind the way the code
> works, so thank you for that, really.
>
> > So it wouldn't even be safe for window-configuration-change-hook to add
>> a timer to the timer list, because the outer frame might be in the middle
>> of creating a copy of the timer list for some Lisp code that hasn't blocked
>> input. (As in my example below)
>>
>> Futzing with timers from within some hooks is indeed fundamentally
>> dangerous.
>
>
> Well, doing anything from window-configuration-change-hook is dangerous.
> My idea was to schedule an immediate timer from it to get out of the danger
> zone to do the actual work, but that backfired...
>
>
>> But we should still try to minimize the probability of a
>> crash, especially when it's Emacs itself who makes the offending copy,
>> because people do dangerous things all the time, and expect them to
>> work.  In this case, blocking input should do, I think.
>>
>
> I agree.
>
> > I really don't think QUIT should run any Lisp hooks, to be honest.
>>
>> I don't think this limitation could fly.  It will disable a lot of
>> useful use patterns, and the outcry will be loud and clear.
>>
>
> Okay.
>
> > If I'm wrong and QUIT should be able to run Lisp hooks, concat needs to
>> be fixed not to rely on its argument's size being unchanged after the
>> make_sequence call.
>>
>> That can't do any harm, so let's do it, too.
>>
>
> Cool.
>
>
>> > As far as I can tell, that should be reproducible. Also as far as I can
>> tell, it's merely a matter of luck that an X resize doesn't happen at the
>> point where I interrupted the program to artificially trigger the segfault.
>> However, I admit that it is a separate issue, less likely to occur in
>> practice, and I'll open another bug for it if that's the way you prefer
>> things.
>>
>> But if input is blocked, as it would be in the case of copying
>> timer-list inside timer_check, the X events will not be acted upon,
>> and the problem will not happen, right?
>>
>
> Indeed, that relies on bizarre elisp code deliberately doing silly
> things...
>
>
>> IOW, the above situation is a case of a user shooting herself in the
>> foot by having that particular function in the hook and that
>> particular code that copies timer-list (which is an internal variable
>> unwise users should not touch).  Am I right?
>>
>
> I think you are. I'm not sure whether the timer code in timer.el does
> anything to the timer list that might count as dangerous, but that's
> possibly the only legitimate Lisp user of timer-list.
>

[-- Attachment #1.2: Type: text/html, Size: 7133 bytes --]

[-- Attachment #2: 0001-Fix-potential-race-conditions-Bug-21380.patch --]
[-- Type: text/x-patch, Size: 1376 bytes --]

From 7dc8c9d7a4997781c04001371ac384db24c37e59 Mon Sep 17 00:00:00 2001
From: Philip <pipcet@gmail.com>
Date: Tue, 1 Sep 2015 10:12:31 +0000
Subject: [PATCH] Fix potential race conditions (Bug#21380)

        * keyboard.c (timer_check): Call `block_input' around the creation
	of the temporary timer list copy.

	* fns.c (concat): Don't assume argument size remains unchanged
	after call to `Fmake_list'.
---
 src/fns.c      | 3 +++
 src/keyboard.c | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/src/fns.c b/src/fns.c
index 26a98ab..15d9e64 100644
--- a/src/fns.c
+++ b/src/fns.c
@@ -744,6 +744,9 @@ concat (ptrdiff_t nargs, Lisp_Object *args,
 	    /* Store this element into the result.  */
 	    if (toindex < 0)
 	      {
+                if (NILP (tail))
+                  break;
+
 		XSETCAR (tail, elt);
 		prev = tail;
 		tail = XCDR (tail);
diff --git a/src/keyboard.c b/src/keyboard.c
index dab32b1..4b7d675 100644
--- a/src/keyboard.c
+++ b/src/keyboard.c
@@ -4560,6 +4560,7 @@ timer_check (void)
 
   Lisp_Object tem = Vinhibit_quit;
   Vinhibit_quit = Qt;
+  block_input ();
 
   /* We use copies of the timers' lists to allow a timer to add itself
      again, without locking up Emacs if the newly added timer is
@@ -4573,6 +4574,7 @@ timer_check (void)
   else
     idle_timers = Qnil;
 
+  unblock_input ();
   Vinhibit_quit = tem;
 
   do
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2015-09-01 15:03                 ` Eli Zaretskii
@ 2015-09-01 15:22                   ` Pip Cet
  2015-09-01 16:01                     ` Eli Zaretskii
  0 siblings, 1 reply; 50+ messages in thread
From: Pip Cet @ 2015-09-01 15:22 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 21380

[-- Attachment #1: Type: text/plain, Size: 922 bytes --]

On Tue, Sep 1, 2015 at 3:03 PM, Eli Zaretskii <eliz@gnu.org> wrote:

> > Date: Tue, 1 Sep 2015 10:20:11 +0000
> > From: Pip Cet <pipcet@gmail.com>
> > Cc: 21380@debbugs.gnu.org
> >
> >     Can you show a patch that fixes the original segfault in your use
> >     case?
> >
> > Attached.
>
> Hmm... isn't that a kludge?
>

It is. It replaces segfaults by incorrect results.

Or am I missing something?  I thought you
> intended to recalculate the length on each iteration?
>

If you can think of a good way of doing that, I'd be grateful. I can't,
because Flength calls QUIT, too, so there's no guarantee its results are
still valid when it's done.

All we could do, as far as I can see, is add an extra call to Flength()
which will slow things down and sometimes but not always make the risky
thing the user is attempting work. Other than that, a non-segfault with an
incorrect result is all we can give the user, I fear.

[-- Attachment #2: Type: text/html, Size: 1705 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2015-09-01 15:22                   ` Pip Cet
@ 2015-09-01 16:01                     ` Eli Zaretskii
  2015-09-01 16:02                       ` Pip Cet
  2015-09-02  7:02                       ` martin rudalics
  0 siblings, 2 replies; 50+ messages in thread
From: Eli Zaretskii @ 2015-09-01 16:01 UTC (permalink / raw)
  To: Pip Cet; +Cc: 21380

> Date: Tue, 1 Sep 2015 15:22:58 +0000
> From: Pip Cet <pipcet@gmail.com>
> Cc: 21380@debbugs.gnu.org
> 
>     Or am I missing something? I thought you
>     intended to recalculate the length on each iteration?
> 
> If you can think of a good way of doing that, I'd be grateful. I can't, because
> Flength calls QUIT, too, so there's no guarantee its results are still valid
> when it's done.
> 
> All we could do, as far as I can see, is add an extra call to Flength() which
> will slow things down and sometimes but not always make the risky thing the
> user is attempting work. Other than that, a non-segfault with an incorrect
> result is all we can give the user, I fear.

So maybe we should introduce a special copy_sequence_no_quit function
that never calls QUIT, and then use it for copying the timer lists.
Timer lists are never too long, so that shouldn't be a problem.





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2015-09-01 16:01                     ` Eli Zaretskii
@ 2015-09-01 16:02                       ` Pip Cet
  2015-09-01 16:23                         ` Eli Zaretskii
  2015-09-02  7:02                       ` martin rudalics
  1 sibling, 1 reply; 50+ messages in thread
From: Pip Cet @ 2015-09-01 16:02 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 21380

[-- Attachment #1: Type: text/plain, Size: 330 bytes --]

On Tue, Sep 1, 2015 at 4:01 PM, Eli Zaretskii <eliz@gnu.org> wrote:

> So maybe we should introduce a special copy_sequence_no_quit function
> that never calls QUIT, and then use it for copying the timer lists.
>

How would that be different from calling block_input () and binding
Vinhibit_quit around the call to copy_sequence?

[-- Attachment #2: Type: text/html, Size: 665 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2015-09-01 15:14                 ` Pip Cet
@ 2015-09-01 16:04                   ` Eli Zaretskii
  2015-09-01 16:56                     ` Pip Cet
  0 siblings, 1 reply; 50+ messages in thread
From: Eli Zaretskii @ 2015-09-01 16:04 UTC (permalink / raw)
  To: Pip Cet; +Cc: 21380

> Date: Tue, 1 Sep 2015 15:14:09 +0000
> From: Pip Cet <pipcet@gmail.com>
> Cc: 21380@debbugs.gnu.org
> 
> (* - well, one segfault. But I attribute that to extraordinarily bizarre
> actions even by my standards: attempting to display an unprintable ASCII
> control character in the echo area.

Is this reproducible?  If so, please submit a separate bug report with
a recipe.

> It seems we never make sure that
> non-standard faces are cached when redisplaying the echo area.

I don't see how this can happen, since the face cache is per-frame,
not per-window or buffer.

> Usually this is fine because propertized strings never end up in the
> echo area (I hope)...)

The echo area is a normal buffer, so any face can be used in it.  See,
for example, the message printed by info.el after "i SOMETHING RET".





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2015-09-01 16:02                       ` Pip Cet
@ 2015-09-01 16:23                         ` Eli Zaretskii
  0 siblings, 0 replies; 50+ messages in thread
From: Eli Zaretskii @ 2015-09-01 16:23 UTC (permalink / raw)
  To: Pip Cet; +Cc: 21380

> Date: Tue, 1 Sep 2015 16:02:59 +0000
> From: Pip Cet <pipcet@gmail.com>
> Cc: 21380@debbugs.gnu.org
> 
> On Tue, Sep 1, 2015 at 4:01 PM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>     So maybe we should introduce a special copy_sequence_no_quit function
>     that never calls QUIT, and then use it for copying the timer lists.
> 
> How would that be different from calling block_input () and binding
> Vinhibit_quit around the call to copy_sequence?

That only prevents us from reading new events from the X socket, but
what if some signal that is already pending invokes some Lisp?





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2015-09-01 16:04                   ` Eli Zaretskii
@ 2015-09-01 16:56                     ` Pip Cet
  2015-09-01 17:19                       ` Eli Zaretskii
  0 siblings, 1 reply; 50+ messages in thread
From: Pip Cet @ 2015-09-01 16:56 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 21380

[-- Attachment #1: Type: text/plain, Size: 1381 bytes --]

On Tue, Sep 1, 2015 at 4:04 PM, Eli Zaretskii <eliz@gnu.org> wrote:

> > Date: Tue, 1 Sep 2015 15:14:09 +0000
> > From: Pip Cet <pipcet@gmail.com>
> > Cc: 21380@debbugs.gnu.org
> >
> > (* - well, one segfault. But I attribute that to extraordinarily bizarre
> > actions even by my standards: attempting to display an unprintable ASCII
> > control character in the echo area.
>
> Is this reproducible?  If so, please submit a separate bug report with
> a recipe.
>

#21394.

>
> > Usually this is fine because propertized strings never end up in the
> > echo area (I hope)...)
>
> The echo area is a normal buffer, so any face can be used in it.  See,
> for example, the message printed by info.el after "i SOMETHING RET".
>

Thanks for explaining! You're right, then, it's a bug.

> That only prevents us from reading new events from the X socket, but
> what if some signal that is already pending invokes some Lisp?

I don't understand. How can we call "some signal that is already pending"
(I'm not sure what that means. A unix signal? Or just something that sets
pending_signals to a true value? Or an atimer?) when input_blocked_p () is
true? The only thing gobble_input () does in that case is
store_user_signal_events (), which doesn't call any Lisp.

The only code path that I see that's potentially dangerous is that atimers
appear to be executed even if input is blocked.

[-- Attachment #2: Type: text/html, Size: 2259 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2015-09-01 16:56                     ` Pip Cet
@ 2015-09-01 17:19                       ` Eli Zaretskii
  2015-09-01 20:48                         ` Pip Cet
  0 siblings, 1 reply; 50+ messages in thread
From: Eli Zaretskii @ 2015-09-01 17:19 UTC (permalink / raw)
  To: Pip Cet; +Cc: 21380

> Date: Tue, 1 Sep 2015 16:56:04 +0000
> From: Pip Cet <pipcet@gmail.com>
> Cc: 21380@debbugs.gnu.org
> 
> > That only prevents us from reading new events from the X socket, but
> > what if some signal that is already pending invokes some Lisp?
> 
> I don't understand. How can we call "some signal that is already pending" (I'm
> not sure what that means. A unix signal? Or just something that sets
> pending_signals to a true value? Or an atimer?)

I meant atimers, sorry for being unclear.  See process_pending_signals.

> The only code path that I see that's potentially dangerous is that atimers
> appear to be executed even if input is blocked.

Yes, that's exactly what bothered me.  Not calling QUIT prevents that.

Alternatively, we could turn off atimers (by calling turn_on_atimers)
while Fcopy_sequence runs.





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2015-09-01 17:19                       ` Eli Zaretskii
@ 2015-09-01 20:48                         ` Pip Cet
  2015-09-02 15:08                           ` Eli Zaretskii
  2020-09-07 17:07                           ` Lars Ingebrigtsen
  0 siblings, 2 replies; 50+ messages in thread
From: Pip Cet @ 2015-09-01 20:48 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 21380


[-- Attachment #1.1: Type: text/plain, Size: 846 bytes --]

On Tue, Sep 1, 2015 at 5:19 PM, Eli Zaretskii <eliz@gnu.org> wrote:

> > The only code path that I see that's potentially dangerous is that
> atimers
> > appear to be executed even if input is blocked.
>
> Yes, that's exactly what bothered me.  Not calling QUIT prevents that.
>
> Alternatively, we could turn off atimers (by calling turn_on_atimers)
> while Fcopy_sequence runs.
>

I think that would be a better solution. I've done a quick grep for the
current atimers and at first glance they appear to be okay, but obviously
that's no guarantee for the future. It might be worth thinking about
block_input_and_atimers ().

I think it's safe to assume that Lisp timers are only checked if atimers
are enabled. If it isn't, I think the best way forward is to write
block_input_and_atimers () and lock atimers with a counter just like input
is.

[-- Attachment #1.2: Type: text/html, Size: 1301 bytes --]

[-- Attachment #2: 0001-Fix-potential-race-conditions-Bug-21380.patch --]
[-- Type: text/x-patch, Size: 1516 bytes --]

From 678bdba55e4a07e3baebad204c9fe5c55c99b3d3 Mon Sep 17 00:00:00 2001
From: Philip <pipcet@gmail.com>
Date: Tue, 1 Sep 2015 20:42:44 +0000
Subject: [PATCH] Fix potential race conditions (Bug#21380)

        * keyboard.c (timer_check): Call `block_input' and turn off
	atimers around the creation of the temporary timer list copy.

	* fns.c (concat): Don't assume argument size remains unchanged
	after call to `Fmake_list'.  Return incorrect results (but don't
	segfault) in that case.
---
 src/fns.c      | 3 +++
 src/keyboard.c | 4 ++++
 2 files changed, 7 insertions(+)

diff --git a/src/fns.c b/src/fns.c
index 26a98ab..15d9e64 100644
--- a/src/fns.c
+++ b/src/fns.c
@@ -744,6 +744,9 @@ concat (ptrdiff_t nargs, Lisp_Object *args,
 	    /* Store this element into the result.  */
 	    if (toindex < 0)
 	      {
+                if (NILP (tail))
+                  break;
+
 		XSETCAR (tail, elt);
 		prev = tail;
 		tail = XCDR (tail);
diff --git a/src/keyboard.c b/src/keyboard.c
index dab32b1..4ce830d 100644
--- a/src/keyboard.c
+++ b/src/keyboard.c
@@ -4560,6 +4560,8 @@ timer_check (void)
 
   Lisp_Object tem = Vinhibit_quit;
   Vinhibit_quit = Qt;
+  block_input ();
+  turn_on_atimers (false);
 
   /* We use copies of the timers' lists to allow a timer to add itself
      again, without locking up Emacs if the newly added timer is
@@ -4573,6 +4575,8 @@ timer_check (void)
   else
     idle_timers = Qnil;
 
+  turn_on_atimers (true);
+  unblock_input ();
   Vinhibit_quit = tem;
 
   do
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2015-09-01 16:01                     ` Eli Zaretskii
  2015-09-01 16:02                       ` Pip Cet
@ 2015-09-02  7:02                       ` martin rudalics
  2015-09-02 14:32                         ` Eli Zaretskii
  2015-09-03 15:36                         ` Stefan Monnier
  1 sibling, 2 replies; 50+ messages in thread
From: martin rudalics @ 2015-09-02  7:02 UTC (permalink / raw)
  To: Eli Zaretskii, Pip Cet; +Cc: 21380

 > So maybe we should introduce a special copy_sequence_no_quit function
 > that never calls QUIT, and then use it for copying the timer lists.

Yes.

 > Timer lists are never too long, so that shouldn't be a problem.

Why do you think that copy_sequence_no_quit would be slower than
Fcopy_sequence?  The problem of copy_sequence_no_quit would be with
circularity, not with length.

martin





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2015-09-02  7:02                       ` martin rudalics
@ 2015-09-02 14:32                         ` Eli Zaretskii
  2015-09-03 15:36                         ` Stefan Monnier
  1 sibling, 0 replies; 50+ messages in thread
From: Eli Zaretskii @ 2015-09-02 14:32 UTC (permalink / raw)
  To: martin rudalics; +Cc: 21380, pipcet

> Date: Wed, 02 Sep 2015 09:02:31 +0200
> From: martin rudalics <rudalics@gmx.at>
> CC: 21380@debbugs.gnu.org
> 
>  > So maybe we should introduce a special copy_sequence_no_quit function
>  > that never calls QUIT, and then use it for copying the timer lists.
> 
> Yes.

I think we prefer the alternative that just blocks input and atimers
instead.

>  > Timer lists are never too long, so that shouldn't be a problem.
> 
> Why do you think that copy_sequence_no_quit would be slower than
> Fcopy_sequence?  The problem of copy_sequence_no_quit would be with
> circularity, not with length.

Sorry for being unclear.  I meant that the call to QUIT in
Fcopy_sequence is to allow the user to interrupt a too long copy, so I
think it won't be missed with timer lists, which are normally short.





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2015-09-01 20:48                         ` Pip Cet
@ 2015-09-02 15:08                           ` Eli Zaretskii
  2015-09-02 16:09                             ` Pip Cet
  2020-09-07 17:07                           ` Lars Ingebrigtsen
  1 sibling, 1 reply; 50+ messages in thread
From: Eli Zaretskii @ 2015-09-02 15:08 UTC (permalink / raw)
  To: Pip Cet; +Cc: 21380

> Date: Tue, 1 Sep 2015 20:48:18 +0000
> From: Pip Cet <pipcet@gmail.com>
> Cc: 21380@debbugs.gnu.org
> 
>     Alternatively, we could turn off atimers (by calling turn_on_atimers)
>     while Fcopy_sequence runs.
>     
> 
> I think that would be a better solution. I've done a quick grep for the current
> atimers and at first glance they appear to be okay, but obviously that's no
> guarantee for the future. It might be worth thinking about
> block_input_and_atimers ().
> 
> I think it's safe to assume that Lisp timers are only checked if atimers are
> enabled.

Those are two completely separate and independent features, so no,
it's not safe to make that assumption.  Not sure why you need to
assume that, though.

> If it isn't, I think the best way forward is to write
> block_input_and_atimers () and lock atimers with a counter just like input is.

Not sure I follow you.  Are you saying that just calling block_input
followed by turn_on_atimers is somehow not enough to prevent some Lisp
from changing Vtimer_list under our feet?

> --- a/src/fns.c
> +++ b/src/fns.c
> @@ -744,6 +744,9 @@ concat (ptrdiff_t nargs, Lisp_Object *args,
>  	    /* Store this element into the result.  */
>  	    if (toindex < 0)
>  	      {
> +                if (NILP (tail))
> +                  break;
> +

Is this part still needed?  If so, can you explain in what situation
it is needed?

Thanks.





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2015-09-02 15:08                           ` Eli Zaretskii
@ 2015-09-02 16:09                             ` Pip Cet
  2015-09-02 19:13                               ` Eli Zaretskii
  0 siblings, 1 reply; 50+ messages in thread
From: Pip Cet @ 2015-09-02 16:09 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 21380

[-- Attachment #1: Type: text/plain, Size: 2175 bytes --]

On Wed, Sep 2, 2015 at 3:08 PM, Eli Zaretskii <eliz@gnu.org> wrote:

> > Date: Tue, 1 Sep 2015 20:48:18 +0000
> > From: Pip Cet <pipcet@gmail.com>
> > Cc: 21380@debbugs.gnu.org
> >
> >     Alternatively, we could turn off atimers (by calling turn_on_atimers)
> >     while Fcopy_sequence runs.
> >
> >
> > I think that would be a better solution. I've done a quick grep for the
> current
> > atimers and at first glance they appear to be okay, but obviously that's
> no
> > guarantee for the future. It might be worth thinking about
> > block_input_and_atimers ().
> >
> > I think it's safe to assume that Lisp timers are only checked if atimers
> are
> > enabled.
>
> Those are two completely separate and independent features, so no,
> it's not safe to make that assumption.  Not sure why you need to
> assume that, though.
>

So we can call turn_on_atimers (true) without potentially enabling atimers
in a critical section.

My assumption was that the reason we have both Lisp timers and atimers is
that atimers run strictly more often than Lisp timers.


> > If it isn't, I think the best way forward is to write
> > block_input_and_atimers () and lock atimers with a counter just like
> input is.
>
> Not sure I follow you.  Are you saying that just calling block_input
> followed by turn_on_atimers is somehow not enough to prevent some Lisp
> from changing Vtimer_list under our feet?
>

I'm not saying that, no, but if another function disables atimers, then
runs Lisp timers, then does something critical that needs atimers to be
disabled, it might break.

> --- a/src/fns.c
> > +++ b/src/fns.c
> > @@ -744,6 +744,9 @@ concat (ptrdiff_t nargs, Lisp_Object *args,
> >           /* Store this element into the result.  */
> >           if (toindex < 0)
> >             {
> > +                if (NILP (tail))
> > +                  break;
> > +
>
> Is this part still needed?


As far as I know, the other two fixes are sufficient. It's needed in case
someone calls copy_sequence on a list that's messed with by code run from a
hook from QUIT, and merely succeeds in avoiding a segfault and producing
incorrect results instead, so I'm not at all sure it should go in.

[-- Attachment #2: Type: text/html, Size: 3219 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2015-09-02 16:09                             ` Pip Cet
@ 2015-09-02 19:13                               ` Eli Zaretskii
  2015-09-02 22:08                                 ` Pip Cet
  0 siblings, 1 reply; 50+ messages in thread
From: Eli Zaretskii @ 2015-09-02 19:13 UTC (permalink / raw)
  To: Pip Cet; +Cc: 21380

> Date: Wed, 2 Sep 2015 16:09:53 +0000
> From: Pip Cet <pipcet@gmail.com>
> Cc: 21380@debbugs.gnu.org
> 
>     > I think it's safe to assume that Lisp timers are only checked if atimers
>     are
>     > enabled.
>     
>     Those are two completely separate and independent features, so no,
>     it's not safe to make that assumption. Not sure why you need to
>     assume that, though.
> 
> So we can call turn_on_atimers (true) without potentially enabling atimers in a
> critical section.

My confusion just grew a notch: one of these "atimers" is actually
Lisp timers, right?  If not, I'm afraid I don't see what you mean.

> My assumption was that the reason we have both Lisp timers and atimers is that
> atimers run strictly more often than Lisp timers.

They can be more accurate, but I see no reason why they should run
more often.

>     > If it isn't, I think the best way forward is to write
>     > block_input_and_atimers () and lock atimers with a counter just like
>     input is.
>     
>     Not sure I follow you. Are you saying that just calling block_input
>     followed by turn_on_atimers is somehow not enough to prevent some Lisp
>     from changing Vtimer_list under our feet?
>     
> 
> I'm not saying that, no, but if another function disables atimers, then runs
> Lisp timers, then does something critical that needs atimers to be disabled, it
> might break.

We didn't need to disable atimers until now, except when manipulating
the atimers themselves.  The function we are discussing, which copies
Lisp timers, is the first one in need of this.  So I don't yet see the
need for a counter, but I don't object to one, either.

>     > --- a/src/fns.c
>     > +++ b/src/fns.c
>     > @@ -744,6 +744,9 @@ concat (ptrdiff_t nargs, Lisp_Object *args,
>     > /* Store this element into the result. */
>     > if (toindex < 0)
>     > {
>     > + if (NILP (tail))
>     > + break;
>     > +
>     
>     Is this part still needed?
> 
> As far as I know, the other two fixes are sufficient. It's needed in case
> someone calls copy_sequence on a list that's messed with by code run from a
> hook from QUIT, and merely succeeds in avoiding a segfault and producing
> incorrect results instead, so I'm not at all sure it should go in.

Right.





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2015-09-02 19:13                               ` Eli Zaretskii
@ 2015-09-02 22:08                                 ` Pip Cet
  0 siblings, 0 replies; 50+ messages in thread
From: Pip Cet @ 2015-09-02 22:08 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 21380

[-- Attachment #1: Type: text/plain, Size: 2256 bytes --]

On Wed, Sep 2, 2015 at 7:13 PM, Eli Zaretskii <eliz@gnu.org> wrote:

> > Date: Wed, 2 Sep 2015 16:09:53 +0000
> > From: Pip Cet <pipcet@gmail.com>
> > Cc: 21380@debbugs.gnu.org
> >
> >     > I think it's safe to assume that Lisp timers are only checked if
> atimers
> >     are
> >     > enabled.
> >
> >     Those are two completely separate and independent features, so no,
> >     it's not safe to make that assumption. Not sure why you need to
> >     assume that, though.
> >
> > So we can call turn_on_atimers (true) without potentially enabling
> atimers in a
> > critical section.
>
> My confusion just grew a notch: one of these "atimers" is actually
> Lisp timers, right?


No, I don't think so.

If not, I'm afraid I don't see what you mean.
>

See below, those were two attempts of mine to describe the same thing.

> My assumption was that the reason we have both Lisp timers and atimers is
> that
> > atimers run strictly more often than Lisp timers.
>
> They can be more accurate, but I see no reason why they should run
> more often.
>

Sorry for being unclear. I should have said something like "have strictly
more opportunities to run than Lisp timers".


> >     > If it isn't, I think the best way forward is to write
> >     > block_input_and_atimers () and lock atimers with a counter just
> like
> >     input is.
> >
> >     Not sure I follow you. Are you saying that just calling block_input
> >     followed by turn_on_atimers is somehow not enough to prevent some
> Lisp
> >     from changing Vtimer_list under our feet?
> >
> >
> > I'm not saying that, no, but if another function disables atimers, then
> runs
> > Lisp timers, then does something critical that needs atimers to be
> disabled, it
> > might break.
>
> We didn't need to disable atimers until now, except when manipulating
> the atimers themselves.
>


> The function we are discussing, which copies
> Lisp timers, is the first one in need of this.  So I don't yet see the
> need for a counter, but I don't object to one, either.
>

That's how I feel about disabling atimers at all. I think it's only for
future atimer code that does something dangerous. Maybe I'm missing
something obvious, but there isn't currently any call path from the atimers
to Lisp code.

[-- Attachment #2: Type: text/html, Size: 3641 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2015-09-02  7:02                       ` martin rudalics
  2015-09-02 14:32                         ` Eli Zaretskii
@ 2015-09-03 15:36                         ` Stefan Monnier
  2015-09-05  7:38                           ` Eli Zaretskii
  2015-09-05 16:59                           ` Pip Cet
  1 sibling, 2 replies; 50+ messages in thread
From: Stefan Monnier @ 2015-09-03 15:36 UTC (permalink / raw)
  To: martin rudalics; +Cc: 21380, Pip Cet

>> So maybe we should introduce a special copy_sequence_no_quit function
>> that never calls QUIT, and then use it for copying the timer lists.

That'd be OK, yes.  This said, maybe an even better solution would be to
avoid the copy altogether.

AFAICT these lists are only ever side-effected by timer.el's
timer--activate, which has a special `reuse-cell' argument just to be
able to do that.

I'm not completely sure why we do it this way, but my naive
understanding is the following:
- For historical reasons of limited resources, timer.el tries hard to
  avoid allocating cons cells.
- Then many years later we found a problem with this cell-reuse and
  circumvented it by copying the whole list all the time.
- So we end up working hard to avoid allocating a couple cells on one
  side, only to end up allocating many more on the other.

Maybe we should go back to bugs #12447 and #12326 and see if just
removing the "reuse-cell" code (and the Fcopy_sequence(s)) fixes the
problem as well.


        Stefan





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2015-09-03 15:36                         ` Stefan Monnier
@ 2015-09-05  7:38                           ` Eli Zaretskii
  2015-09-05 15:18                             ` Stefan Monnier
  2022-04-29 12:52                             ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2015-09-05 16:59                           ` Pip Cet
  1 sibling, 2 replies; 50+ messages in thread
From: Eli Zaretskii @ 2015-09-05  7:38 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: pipcet, 21380

> From: Stefan Monnier <monnier@IRO.UMontreal.CA>
> Cc: Eli Zaretskii <eliz@gnu.org>, Pip Cet <pipcet@gmail.com>,
>         21380@debbugs.gnu.org
> Date: Thu, 03 Sep 2015 11:36:44 -0400
> 
> >> So maybe we should introduce a special copy_sequence_no_quit function
> >> that never calls QUIT, and then use it for copying the timer lists.
> 
> That'd be OK, yes.

I believe the currently preferred solution is to block input and
atimers while copying the timer lists to local copies.  Are you okay
with that?

> This said, maybe an even better solution would be to avoid the copy
> altogether.
> 
> AFAICT these lists are only ever side-effected by timer.el's
> timer--activate, which has a special `reuse-cell' argument just to be
> able to do that.
> 
> I'm not completely sure why we do it this way, but my naive
> understanding is the following:
> - For historical reasons of limited resources, timer.el tries hard to
>   avoid allocating cons cells.
> - Then many years later we found a problem with this cell-reuse and
>   circumvented it by copying the whole list all the time.
> - So we end up working hard to avoid allocating a couple cells on one
>   side, only to end up allocating many more on the other.
> 
> Maybe we should go back to bugs #12447 and #12326 and see if just
> removing the "reuse-cell" code (and the Fcopy_sequence(s)) fixes the
> problem as well.

I'm not sure I understand this plan.  Are you saying that consing a
new list in timer--activate, instead of reusing an existing cell, will
avoid the need to wok on a copy of the timer's list when invoking the
timer callbacks?  If so, I'm probably missing something here, because
timer--activate will update the timer list variable anyway, and we
have the same problem, whereby the list changes under our feet, back
again.

IOW, if some Lisp run by a timer callback ends up doing (directly or
indirectly) something like

  (setq timer-list (cons my-new-timer timer-list))

doesn't it mean the value of Vtimer_list in C seen by timer_check
changes as well that very moment?

Not to mention the fact that with timers firing every several tens of
ms, something we've seen while discussing these bugs, allocating a
couple of cells each time might cause a lot of consing per second,
which in turn causes GC, which slows down everything.





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2015-09-05  7:38                           ` Eli Zaretskii
@ 2015-09-05 15:18                             ` Stefan Monnier
  2015-09-05 15:27                               ` Eli Zaretskii
  2022-04-29 12:52                             ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 50+ messages in thread
From: Stefan Monnier @ 2015-09-05 15:18 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, 21380

> I'm not sure I understand this plan.  Are you saying that consing a
> new list in timer--activate, instead of reusing an existing cell, will
> avoid the need to wok on a copy of the timer's list when invoking the
> timer callbacks?

That's the idea, yes.

> If so, I'm probably missing something here, because
> timer--activate will update the timer list variable anyway,

This is OK: by this time, we've already read this timer list variable so
changing it won't affect us.

> doesn't it mean the value of Vtimer_list in C seen by timer_check
> changes as well that very moment?

This setq would happen after we've read the variable, so it won't affect
timer_check.

> Not to mention the fact that with timers firing every several tens of
> ms, something we've seen while discussing these bugs, allocating a
> couple of cells each time might cause a lot of consing per second,
> which in turn causes GC, which slows down everything.

How could it be worse to allocate cells when we activate a timer than
copying the whole list every time we check the timers?


        Stefan





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2015-09-05 15:18                             ` Stefan Monnier
@ 2015-09-05 15:27                               ` Eli Zaretskii
  2015-09-06 22:11                                 ` Stefan Monnier
  0 siblings, 1 reply; 50+ messages in thread
From: Eli Zaretskii @ 2015-09-05 15:27 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: pipcet, 21380

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: rudalics@gmx.at,  pipcet@gmail.com,  21380@debbugs.gnu.org
> Date: Sat, 05 Sep 2015 11:18:54 -0400
> 
> > I'm not sure I understand this plan.  Are you saying that consing a
> > new list in timer--activate, instead of reusing an existing cell, will
> > avoid the need to wok on a copy of the timer's list when invoking the
> > timer callbacks?
> 
> That's the idea, yes.
> 
> > If so, I'm probably missing something here, because
> > timer--activate will update the timer list variable anyway,
> 
> This is OK: by this time, we've already read this timer list variable so
> changing it won't affect us.

What do you mean by "have read the variable"?  We are "reading" it one
member at a time, as timer_check goes about its business.

> > Not to mention the fact that with timers firing every several tens of
> > ms, something we've seen while discussing these bugs, allocating a
> > couple of cells each time might cause a lot of consing per second,
> > which in turn causes GC, which slows down everything.
> 
> How could it be worse to allocate cells when we activate a timer than
> copying the whole list every time we check the timers?

Twice worse, I'd say (assuming "a couple" really means 2).

But this is not the important issue right now.  Right now, I don't
understand how your proposal will solve this and related bugs.





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2015-09-03 15:36                         ` Stefan Monnier
  2015-09-05  7:38                           ` Eli Zaretskii
@ 2015-09-05 16:59                           ` Pip Cet
  2015-09-06 22:22                             ` Stefan Monnier
  1 sibling, 1 reply; 50+ messages in thread
From: Pip Cet @ 2015-09-05 16:59 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 21380

[-- Attachment #1: Type: text/plain, Size: 2178 bytes --]

I think that's a good idea, but there's one corner case that, I think, we
should think about: a timer deleting the next timer from the list. If we
keep the code in timer.el that says:

(defun cancel-timer (timer)
  "Remove TIMER from the list of active timers."
  (timer--check timer)
  (setq timer-list (delq timer timer-list))
  (setq timer-idle-list (delq timer timer-idle-list))
  nil)

then that delq might, in some circumstances, modify the list we're working
on. I'm not sure whether it's okay, performance-wise, to replace delq by
remq here. Is cancelling one timer from another timer's code something that
should ever have well-defined effects (I strongly suspect the answer is
"Don't do that")? I confess I do not know whether delq is guaranteed only
to modify the list in ways that "make sense" (the current implementation
does, but it also doesn't appear to QUIT at all, so maybe it should be
modified anyway...).

On Thu, Sep 3, 2015 at 3:36 PM, Stefan Monnier <monnier@iro.umontreal.ca>
wrote:

> AFAICT these lists are only ever side-effected by timer.el's
> timer--activate, which has a special `reuse-cell' argument just to be
> able to do that.
>

I think delq also side-effects them. And, of course, "(setq timer-list
nil)", which is what I tend to do when I've added a silly timer that does
something bizarre and makes Emacs unusable :-)

Maybe we should go back to bugs #12447 and #12326 and see if just
> removing the "reuse-cell" code (and the Fcopy_sequence(s)) fixes the
> problem as well.
>

I haven't tested this, but I think it would. I still think the underlying
problem here is not having a well-defined rule for when QUIT is allowed to
call Lisp code. (Eli correctly objected to my initial idea that the rule
should be "never", but I still think it should be "much less often than we
currently do". I've performed some very boring and somewhat tedious
semi-automated call chain analysis to get a better idea of what the current
state of things is, and so far the results appear consistent with my idea
that QUIT shouldn't call hooks, but should be able to call isolated
functions implemented in Lisp, such as those used for relative font sizing).

[-- Attachment #2: Type: text/html, Size: 2894 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2015-09-05 15:27                               ` Eli Zaretskii
@ 2015-09-06 22:11                                 ` Stefan Monnier
  0 siblings, 0 replies; 50+ messages in thread
From: Stefan Monnier @ 2015-09-06 22:11 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, 21380

> What do you mean by "have read the variable"?  We are "reading" it one
> member at a time, as timer_check goes about its business.

The Vtimer_list variable is read once and for all at the beginning of
timer_check.  After that, this variable is unused until we finish
processing all the timers in the list.  This processing uses the list
that was contained in this variable, but it doesn't use the variable
itself.  So modifying the list during the loop can be problematic
(hence the use of copy-sequence to avoid interference), whereas
modifying the variable isn't.

>> How could it be worse to allocate cells when we activate a timer than
>> copying the whole list every time we check the timers?
> Twice worse, I'd say (assuming "a couple" really means 2).

What kind of scenario are you imagining.
My reasoning I have in mind is this:

- Every time Emacs is idle, it runs check_timers.
- We don't run timers every time Emacs is idle (because it becomes
  un-idle before it gets a chance to run those timers).
- Most commands don't activate timers (but they end by running check_timers).
- Only some timers end up calling activate-timer.

So, I get the impression that check_timers should be run (much?) more
often than activate-timer.

> But this is not the important issue right now.  Right now, I don't
> understand how your proposal will solve this and related bugs.

Neither do I.  But if replacing Fcopy_sequence by a new
copy_sequence_without_QUIT fixes the bug, then I don't see why replacing
it with a nop wouldn't fix it just as well.


        Stefan





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2015-09-05 16:59                           ` Pip Cet
@ 2015-09-06 22:22                             ` Stefan Monnier
  2015-09-08 15:55                               ` Pip Cet
  0 siblings, 1 reply; 50+ messages in thread
From: Stefan Monnier @ 2015-09-06 22:22 UTC (permalink / raw)
  To: Pip Cet; +Cc: 21380

> I haven't tested this, but I think it would. I still think the underlying
> problem here is not having a well-defined rule for when QUIT is allowed to
> call Lisp code.

Indeed.  Tho see below.

> (Eli correctly objected to my initial idea that the rule should be
> "never",

Hmm... "never" does sound enticing, tho clearly debug-on-quit
challenges the idea.

> but I still think it should be "much less often than we currently do".

OTOH, as soon as it's not "never", then it tends to means "be prepared:
this can run arbitrary Elisp code".

So maybe the issue is not just "when" but also "what": we could limit
the kind of Elisp code that's run by QUIT.  This is a lot more
difficult, tho.

> that QUIT shouldn't call hooks, but should be able to call isolated
> functions implemented in Lisp, such as those used for relative font sizing).

And then someone wants to add an advice on those functions
(e.g. debug-on-entry, trace-function, you name it), and suddenly your
carefully coded function ends up doing all kinds of other things.


        Stefan





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2015-09-06 22:22                             ` Stefan Monnier
@ 2015-09-08 15:55                               ` Pip Cet
  0 siblings, 0 replies; 50+ messages in thread
From: Pip Cet @ 2015-09-08 15:55 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 21380

[-- Attachment #1: Type: text/plain, Size: 3066 bytes --]

On Sun, Sep 6, 2015 at 10:22 PM, Stefan Monnier <monnier@iro.umontreal.ca>
wrote:

> > (Eli correctly objected to my initial idea that the rule should be
> > "never",
>
> Hmm... "never" does sound enticing, tho clearly debug-on-quit
> challenges the idea.
>

I think I've narrowed down the situations in which we actually run elisp
from QUIT:

1. delete_frame (it has a Qnoelisp argument, but runs elisp anyway)
2. decoding C strings from X events
3. some of the new window resizing code (resize_frame_windows,
frame_windows_min_size, sanitize_window_sizes)
4. mouse highlighting
5. x_kill_gs_process runs a lot of stuff, probably including elisp
6. the GTK toolbar/menubar callbacks still appear to be running
window-configuration-change-hook, unless I'm missing something

3, 5, 6, and probably 1 can be fixed by running those handlers from the
command loop. (2) could be fixed by keeping the untranslated string around
until we actually need it decoded, but that's more difficult (in
particular, we need to recognize the quit character even when other input
events are not handled). (4) would probably involve adding a noelisp
argument to some of the face handling code and delaying mouseover updates
in those (rare, hopefully) cases in which the face isn't actually loaded
yet. (The easy way to solve (4) would be not to do mouseover faces until we
get back to the command loop, but I fear that would be unacceptable for
most users).

(This list is valid for my build configuration only, I've not even started
to look at the windows/OS X code).

> but I still think it should be "much less often than we currently do".
>
> OTOH, as soon as it's not "never", then it tends to means "be prepared:
> this can run arbitrary Elisp code".
>
> So maybe the issue is not just "when" but also "what": we could limit
> the kind of Elisp code that's run by QUIT.  This is a lot more
> difficult, tho.
>

Well, we could limit some of the cases to side-effect-free functions that
don't use global variables, but I think that's too limiting. We can go the
other way, however, and declare such pure functions safe, in addition to
some others we'd have to check by hand.

>
> > that QUIT shouldn't call hooks, but should be able to call isolated
> > functions implemented in Lisp, such as those used for relative font
> sizing).
>
> And then someone wants to add an advice on those functions
> (e.g. debug-on-entry, trace-function, you name it), and suddenly your
> carefully coded function ends up doing all kinds of other things.
>

I think debug-on-entry and trace-function are important enough that we
should make them work, even if our policy for arbitrary advice is "don't do
that".

The only problem is data shared between ordinary Lisp and QUIT-run Lisp,
and even then there are many safe cases (for example, anything that doesn't
return is safe, and so is updating a list by consing together a new copy
first). nreverse and delq seem safe. nconc and anything using equal, not so
much. (I'm not sure why assoc_no_quit is called that, since it calls Fequal
which QUITs).

[-- Attachment #2: Type: text/html, Size: 4082 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2015-09-01 20:48                         ` Pip Cet
  2015-09-02 15:08                           ` Eli Zaretskii
@ 2020-09-07 17:07                           ` Lars Ingebrigtsen
  2020-09-07 17:47                             ` Pip Cet
  2020-09-07 19:09                             ` Eli Zaretskii
  1 sibling, 2 replies; 50+ messages in thread
From: Lars Ingebrigtsen @ 2020-09-07 17:07 UTC (permalink / raw)
  To: Pip Cet; +Cc: 21380, Stefan Monnier

Pip Cet <pipcet@gmail.com> writes:

>         * keyboard.c (timer_check): Call `block_input' and turn off
> 	atimers around the creation of the temporary timer list copy.
>
> 	* fns.c (concat): Don't assume argument size remains unchanged
> 	after call to `Fmake_list'.  Return incorrect results (but don't
> 	segfault) in that case.

Skimming this thread, it seemed like the first part fixes a segfault
(and the second part was probably not needed), but not even the first
part was applied?  (I just had a peek at timer_check to check.)

The discussion instead turned to the idea of avoiding the

  timers = Fcopy_sequence (Vtimer_list);

completely, and just working directly off of Vtimer_list.  Which seems
like a good idea, but there may be unknown unknowns in that scenario?

In any case, the timer_check change seems safe, and fixes a (somewhat
obscure) segfault, so does anybody object to putting that in?  (We can
also add a FIXME to the code here mentioning the possibility of not
copying the list.)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2020-09-07 17:07                           ` Lars Ingebrigtsen
@ 2020-09-07 17:47                             ` Pip Cet
  2020-09-07 19:09                             ` Eli Zaretskii
  1 sibling, 0 replies; 50+ messages in thread
From: Pip Cet @ 2020-09-07 17:47 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 21380, Stefan Monnier

Hello Lars, and congratulations on your new role,

On Mon, Sep 7, 2020 at 5:07 PM Lars Ingebrigtsen <larsi@gnus.org> wrote:
> Pip Cet <pipcet@gmail.com> writes:
> In any case, the timer_check change seems safe, and fixes a (somewhat
> obscure) segfault, so does anybody object to putting that in?  (We can
> also add a FIXME to the code here mentioning the possibility of not
> copying the list.)

Sounds good. I must confess I've forgotten the details by now, but if
there are no objections I'll prepare a patch.





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2020-09-07 17:07                           ` Lars Ingebrigtsen
  2020-09-07 17:47                             ` Pip Cet
@ 2020-09-07 19:09                             ` Eli Zaretskii
  2020-09-08  9:57                               ` Lars Ingebrigtsen
  1 sibling, 1 reply; 50+ messages in thread
From: Eli Zaretskii @ 2020-09-07 19:09 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 21380, monnier, pipcet

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: Eli Zaretskii <eliz@gnu.org>,  21380@debbugs.gnu.org, Stefan Monnier
>  <monnier@iro.umontreal.ca>
> Date: Mon, 07 Sep 2020 19:07:18 +0200
> 
> In any case, the timer_check change seems safe, and fixes a (somewhat
> obscure) segfault, so does anybody object to putting that in?

Which part?  There was more than one patch in the discussion.





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2020-09-07 19:09                             ` Eli Zaretskii
@ 2020-09-08  9:57                               ` Lars Ingebrigtsen
  2022-04-29 12:14                                 ` Lars Ingebrigtsen
  0 siblings, 1 reply; 50+ messages in thread
From: Lars Ingebrigtsen @ 2020-09-08  9:57 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 21380, monnier, pipcet

Eli Zaretskii <eliz@gnu.org> writes:

>> In any case, the timer_check change seems safe, and fixes a (somewhat
>> obscure) segfault, so does anybody object to putting that in?
>
> Which part?  There was more than one patch in the discussion.

This bit, but sounds like Pip is preparing a new patch:

diff --git a/src/keyboard.c b/src/keyboard.c
index dab32b1..4ce830d 100644
--- a/src/keyboard.c
+++ b/src/keyboard.c
@@ -4560,6 +4560,8 @@ timer_check (void)
 
   Lisp_Object tem = Vinhibit_quit;
   Vinhibit_quit = Qt;
+  block_input ();
+  turn_on_atimers (false);
 
   /* We use copies of the timers' lists to allow a timer to add itself
      again, without locking up Emacs if the newly added timer is
@@ -4573,6 +4575,8 @@ timer_check (void)
   else
     idle_timers = Qnil;
 
+  turn_on_atimers (true);
+  unblock_input ();
   Vinhibit_quit = tem;
 
   do


-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply related	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2020-09-08  9:57                               ` Lars Ingebrigtsen
@ 2022-04-29 12:14                                 ` Lars Ingebrigtsen
  0 siblings, 0 replies; 50+ messages in thread
From: Lars Ingebrigtsen @ 2022-04-29 12:14 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 21380, monnier, pipcet

Lars Ingebrigtsen <larsi@gnus.org> writes:

>> Which part?  There was more than one patch in the discussion.
>
> This bit, but sounds like Pip is preparing a new patch:

I went ahead and applied it to Emacs 29, and that should (if I
understand this thread correctly) fix the reported issue, so I'm closing
this bug report.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2015-09-05  7:38                           ` Eli Zaretskii
  2015-09-05 15:18                             ` Stefan Monnier
@ 2022-04-29 12:52                             ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-04-29 13:40                               ` Eli Zaretskii
  1 sibling, 1 reply; 50+ messages in thread
From: Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-04-29 12:52 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: rudalics, Stefan Monnier, pipcet, 21380

Eli Zaretskii <eliz@gnu.org> writes:

> I believe the currently preferred solution is to block input and
> atimers while copying the timer lists to local copies.  Are you okay
> with that?

Please forgive my ignorance, but why do atimers have to be blocked in
situations where Lisp code isn't supposed to run?  I don't think we ever
run Lisp inside an atimer, and AFAICT atimers are active in most of the
code where Lisp shouldn't run.





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2022-04-29 12:52                             ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-04-29 13:40                               ` Eli Zaretskii
  2022-04-29 13:44                                 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 50+ messages in thread
From: Eli Zaretskii @ 2022-04-29 13:40 UTC (permalink / raw)
  To: Po Lu; +Cc: monnier, pipcet, 21380

> From: Po Lu <luangruo@yahoo.com>
> Cc: Stefan Monnier <monnier@IRO.UMontreal.CA>,  rudalics@gmx.at,
>   pipcet@gmail.com,  21380@debbugs.gnu.org
> Date: Fri, 29 Apr 2022 20:52:02 +0800
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > I believe the currently preferred solution is to block input and
> > atimers while copying the timer lists to local copies.  Are you okay
> > with that?
> 
> Please forgive my ignorance, but why do atimers have to be blocked in
> situations where Lisp code isn't supposed to run?  I don't think we ever
> run Lisp inside an atimer, and AFAICT atimers are active in most of the
> code where Lisp shouldn't run.

First, I see no guarantee that atimers cannot run Lisp, ever.  And
second, even if they don't run Lisp, they could run code which
directly or indirectly modifies Vtimer_list, and we cannot allow that
in this case.

IOW, the problem is not that Lisp isn't supposed to run in this
situation, the problem is that we examine and modify Vtimer_list, so
we cannot allow any code that could potentially modify that behind our
backs while we are at that.





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2022-04-29 13:40                               ` Eli Zaretskii
@ 2022-04-29 13:44                                 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-04-29 15:02                                   ` Pip Cet
  0 siblings, 1 reply; 50+ messages in thread
From: Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-04-29 13:44 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: rudalics, monnier, pipcet, 21380

Eli Zaretskii <eliz@gnu.org> writes:

> First, I see no guarantee that atimers cannot run Lisp, ever.  And
> second, even if they don't run Lisp, they could run code which
> directly or indirectly modifies Vtimer_list, and we cannot allow that
> in this case.
>
> IOW, the problem is not that Lisp isn't supposed to run in this
> situation, the problem is that we examine and modify Vtimer_list, so
> we cannot allow any code that could potentially modify that behind our
> backs while we are at that.

Hmm, okay, thanks for explaining.  I guess some code called by the input
polling atimer eventually ends up modifying `timer-list' when input
polling is enabled.





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook
  2022-04-29 13:44                                 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-04-29 15:02                                   ` Pip Cet
  0 siblings, 0 replies; 50+ messages in thread
From: Pip Cet @ 2022-04-29 15:02 UTC (permalink / raw)
  To: Po Lu; +Cc: Stefan Monnier, 21380

On Fri, Apr 29, 2022 at 1:44 PM Po Lu <luangruo@yahoo.com> wrote:
> Eli Zaretskii <eliz@gnu.org> writes:
> > IOW, the problem is not that Lisp isn't supposed to run in this
> > situation, the problem is that we examine and modify Vtimer_list, so
> > we cannot allow any code that could potentially modify that behind our
> > backs while we are at that.
>
> Hmm, okay, thanks for explaining.  I guess some code called by the input
> polling atimer eventually ends up modifying `timer-list' when input
> polling is enabled.

Thanks for looking into this! I'm not sure I thought about the way
unblock_input will sometimes read asynchronous input, which might make
unlikely segfaults like this one more likely if we're not careful.

But I must confess I no longer remember the details here clearly.

Pip





^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2022-04-29 15:02 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-30 12:51 bug#21380: 25.0.50; GTK-induced segfault when scheduling timer from window-configuration-change-hook Pip Cet
2015-08-30 15:01 ` Eli Zaretskii
2015-08-30 15:24   ` Pip Cet
2015-08-30 15:27     ` Pip Cet
2015-08-30 16:24       ` Pip Cet
2015-08-30 18:10         ` martin rudalics
2015-08-30 18:20           ` Pip Cet
2015-08-30 19:50             ` Eli Zaretskii
2015-08-30 18:59           ` Pip Cet
2015-08-31  9:20             ` martin rudalics
2015-08-30 16:39     ` Eli Zaretskii
2015-08-30 16:42       ` Pip Cet
2015-08-30 19:44         ` Eli Zaretskii
2015-08-30 20:56           ` Pip Cet
2015-08-30 21:13             ` Pip Cet
2015-08-31 14:31             ` Eli Zaretskii
2015-09-01 10:20               ` Pip Cet
2015-09-01 15:03                 ` Eli Zaretskii
2015-09-01 15:22                   ` Pip Cet
2015-09-01 16:01                     ` Eli Zaretskii
2015-09-01 16:02                       ` Pip Cet
2015-09-01 16:23                         ` Eli Zaretskii
2015-09-02  7:02                       ` martin rudalics
2015-09-02 14:32                         ` Eli Zaretskii
2015-09-03 15:36                         ` Stefan Monnier
2015-09-05  7:38                           ` Eli Zaretskii
2015-09-05 15:18                             ` Stefan Monnier
2015-09-05 15:27                               ` Eli Zaretskii
2015-09-06 22:11                                 ` Stefan Monnier
2022-04-29 12:52                             ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-04-29 13:40                               ` Eli Zaretskii
2022-04-29 13:44                                 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-04-29 15:02                                   ` Pip Cet
2015-09-05 16:59                           ` Pip Cet
2015-09-06 22:22                             ` Stefan Monnier
2015-09-08 15:55                               ` Pip Cet
2015-09-01 15:14                 ` Pip Cet
2015-09-01 16:04                   ` Eli Zaretskii
2015-09-01 16:56                     ` Pip Cet
2015-09-01 17:19                       ` Eli Zaretskii
2015-09-01 20:48                         ` Pip Cet
2015-09-02 15:08                           ` Eli Zaretskii
2015-09-02 16:09                             ` Pip Cet
2015-09-02 19:13                               ` Eli Zaretskii
2015-09-02 22:08                                 ` Pip Cet
2020-09-07 17:07                           ` Lars Ingebrigtsen
2020-09-07 17:47                             ` Pip Cet
2020-09-07 19:09                             ` Eli Zaretskii
2020-09-08  9:57                               ` Lars Ingebrigtsen
2022-04-29 12:14                                 ` Lars Ingebrigtsen

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).