unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#25265: make-thread crashes in OS X 10.6
@ 2016-12-24 11:06 Charles A. Roelli
  2016-12-24 17:51 ` Eli Zaretskii
                   ` (2 more replies)
  0 siblings, 3 replies; 40+ messages in thread
From: Charles A. Roelli @ 2016-12-24 11:06 UTC (permalink / raw)
  To: 25265

Running Emacs -Q on the latest commit (e36a3882),

M-: (make-thread (lambda () "string"))

appears to hang Emacs immediately.

Here's the output of "bt" and "thread apply all bt":

(gdb) bt
#0  0x00000001000cd2b6 in stack_overflow [inlined] () at /Code/emacs-build-threads/src/sysdep.c:1767
#1  0x00000001000cd2b6 in handle_sigsegv (sig=1884515520, siginfo=0x70, arg=0x100621fe8) at sysdep.c:1800
#2  <signal handler called>
#3  0x000000010012d7dc in unbind_to (count=Cannot access memory at address 0x0
) at eval.c:3499
#4  0x00000001001af2c4 in ns_select (nfds=1606412240, readfds=0x7fff5fbfeb80, writefds=0x7fff5fbfeb00, exceptfds=0x7fff5fbfe7d0, timeout=0x7fff5fbfe7d0, sigmask=0x7fff5fbfe7d0) at nsterm.m:4210
#5  0x000000010018d210 in really_call_select (arg=0x7fff5fbfe840) at thread.c:520
#6  0x000000010010ca06 in flush_stack_call_func (func=0, arg=0x5fbfeae800000000) at alloc.c:5137
#7  0x000000010018d1a7 in thread_select (func=0x5fbfeae800000000, max_fds=0, rfds=0x7fff5fbfe828, wfds=0x0, efds=0x7fff5fbfeae8, timeout=0x0, sigmask=0x0) at thread.c:543
#8  0x00000001001752ad in wait_reading_process_output (time_limit=140734799801728, nsecs=0, read_kbd=1606413696, wait_for_cell=140734799801728, wait_proc=0x7fff5fbfed80, just_wait_proc=0, do_display=true) at process.c:5344
#9  0x000000010000471a in sit_for (timeout=30, display_option=0, reading=false) at dispnew.c:5763
#10 0x00000001000bf6f2 in read_char (commandflag=1606414816, map=140734799802848, prev_event=4294967295, used_mouse_menu=0x7fff5fbff1e0, end_time=0x7fff5fbff1e0) at keyboard.c:2725
#11 0x00000001000c26ee in read_key_sequence () at keyboard.c:2599
#12 0x00000001000c4afd in command_loop_1 () at keyboard.c:1373
#13 0x000000010012e2a4 in internal_condition_case (bfun=0x1000c3690 <command_loop_1>, handlers=499667000, hfun=0x7fff5fbff500) at eval.c:1334
#14 0x00000001000c4d50 in command_loop_2 (ignore=499667000) at keyboard.c:1115
#15 0x000000010012e32f in internal_catch (tag=499667000, func=0x1000c4d20 <command_loop_2>, arg=499667000) at eval.c:1100
#16 0x00000001000c4ec6 in command_loop [inlined] () at /Code/emacs-build-threads/src/keyboard.c:1094
#17 0x00000001000c4ec6 in recursive_edit_1 () at keyboard.c:1800
#18 0x00000001000b60df in Frecursive_edit () at keyboard.c:771
#19 0x00000001000b2993 in main (argc=4002, argv=0x7fff5fbff780) at emacs.c:1691

Program received signal SIGSTOP, Stopped (signal).
0x00007fff838f8430 in malloc_gdb_po_unsafe ()
Unsafe to run code: malloc zone lock is held for some zone..
Canceling call as the malloc lock is held so it isn't safe to call the runtime.
Issue the command:
    set objc-non-blocking-mode off
to override this check if you are sure your call doesn't use the malloc libraries or the ObjC runtime.
(gdb) set objc-non-blocking-mode off
(gdb) continue
Continuing.

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x0000000000000040
0x00000001000cd2b6 in stack_overflow [inlined] () at /Code/emacs-build-threads/src/sysdep.c:1767
1767	  char *bot = stack_bottom;

(gdb) thread apply all bt

Thread 4 (process 17249):
#0  0x00007fff8385f932 in select$DARWIN_EXTSN ()
#1  0x000000010019fa79 in -[EmacsApp fd_handler:] (self=0x8, _cmd=0x102380ad0, unused=0x0) at nsterm.m:5512
#2  0x00007fff89967114 in __NSThread__main__ ()
#3  0x00007fff83854fd6 in _pthread_start ()
#4  0x00007fff83854e89 in thread_start ()

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x0000000000000040
[Switching to process 17249 thread 0x903]
0x00000001000cd2b6 in stack_overflow [inlined] () at /Code/emacs-build-threads/src/sysdep.c:1767
1767	  char *bot = stack_bottom;
The program being debugged was signaled while in a function called from GDB.
GDB remains in the frame where the signal was received.
To change this behavior use "set unwindonsignal on"
Evaluation of the expression containing the function (backtrace_top) will be abandoned.

Thread 3 (process 17249):
#0  0x00007fff8385f932 in select$DARWIN_EXTSN ()
#1  0x0000000100ad3983 in g_poll (fds=0x101a37ce0, nfds=28839360, timeout=0) at gpoll.c:385
#2  0x0000000100ac6dfd in g_main_context_iterate () at gmain.c:3198
#3  0x0000000100ac6ee2 in g_main_context_iteration (context=0x6, may_block=1) at gmain.c:3869
#4  0x0000000100ac5e71 in glib_worker_main (data=0x6) at gmain.c:5622
#5  0x0000000100aeb75a in g_thread_proxy (data=0x6) at gthread.c:764
#6  0x00007fff83854fd6 in _pthread_start ()
#7  0x00007fff83854e89 in thread_start ()

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x0000000000000040
[Switching to process 17249 thread 0x903]
0x00000001000cd2b6 in stack_overflow [inlined] () at /Code/emacs-build-threads/src/sysdep.c:1767
1767	  char *bot = stack_bottom;
The program being debugged was signaled while in a function called from GDB.
GDB remains in the frame where the signal was received.
To change this behavior use "set unwindonsignal on"
Evaluation of the expression containing the function (backtrace_top) will be abandoned.

Thread 2 (process 17249):
#0  0x00007fff8383544b in pthread_workqueue_additem_np ()
#1  0x00007fff83834d0a in _dispatch_queue_wakeup_global ()
#2  0x00007fff83834c28 in _dispatch_wakeup ()
#3  0x00007fff8394fd3f in _dispatch_queue_push_list_slow ()
#4  0x00007fff83834c9c in _dispatch_wakeup ()
#5  0x00007fff8394fd3f in _dispatch_queue_push_list_slow ()
#6  0x00007fff83834c9c in _dispatch_wakeup ()
#7  0x00007fff83836ebe in _dispatch_run_timers ()
#8  0x00007fff83836aa2 in _dispatch_mgr_invoke ()
#9  0x00007fff838367b4 in _dispatch_queue_invoke ()
#10 0x00007fff838362de in _dispatch_worker_thread2 ()
#11 0x00007fff83835c08 in _pthread_wqthread ()
#12 0x00007fff83835aa5 in start_wqthread ()

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x0000000000000070
backtrace_top () at eval.c:202
202	  while (backtrace_p (pdl) && pdl->kind != SPECPDL_BACKTRACE)
The program being debugged was signaled while in a function called from GDB.
GDB remains in the frame where the signal was received.
To change this behavior use "set unwindonsignal on"
Evaluation of the expression containing the function (backtrace_top) will be abandoned.

Thread 1 (process 17249):
#0  0x00000001000cd2b6 in stack_overflow [inlined] () at /Code/emacs-build-threads/src/sysdep.c:1767
#1  0x00000001000cd2b6 in handle_sigsegv (sig=1884515520, siginfo=0x70, arg=0x100621fe8) at sysdep.c:1800
#2  <signal handler called>
#3  0x000000010012d7dc in unbind_to (count=Cannot access memory at address 0x0
) at eval.c:3499
#4  0x00000001001af2c4 in ns_select (nfds=1606412240, readfds=0x7fff5fbfeb80, writefds=0x7fff5fbfeb00, exceptfds=0x7fff5fbfe7d0, timeout=0x7fff5fbfe7d0, sigmask=0x7fff5fbfe7d0) at nsterm.m:4210
#5  0x000000010018d210 in really_call_select (arg=0x7fff5fbfe840) at thread.c:520
#6  0x000000010010ca06 in flush_stack_call_func (func=0, arg=0x5fbfeae800000000) at alloc.c:5137
#7  0x000000010018d1a7 in thread_select (func=0x5fbfeae800000000, max_fds=0, rfds=0x7fff5fbfe828, wfds=0x0, efds=0x7fff5fbfeae8, timeout=0x0, sigmask=0x0) at thread.c:543
#8  0x00000001001752ad in wait_reading_process_output (time_limit=140734799801728, nsecs=0, read_kbd=1606413696, wait_for_cell=140734799801728, wait_proc=0x7fff5fbfed80, just_wait_proc=0, do_display=true) at process.c:5344
#9  0x000000010000471a in sit_for (timeout=30, display_option=0, reading=false) at dispnew.c:5763
#10 0x00000001000bf6f2 in read_char (commandflag=1606414816, map=140734799802848, prev_event=4294967295, used_mouse_menu=0x7fff5fbff1e0, end_time=0x7fff5fbff1e0) at keyboard.c:2725
#11 0x00000001000c26ee in read_key_sequence () at keyboard.c:2599
#12 0x00000001000c4afd in command_loop_1 () at keyboard.c:1373
#13 0x000000010012e2a4 in internal_condition_case (bfun=0x1000c3690 <command_loop_1>, handlers=499667000, hfun=0x7fff5fbff500) at eval.c:1334
#14 0x00000001000c4d50 in command_loop_2 (ignore=499667000) at keyboard.c:1115
#15 0x000000010012e32f in internal_catch (tag=499667000, func=0x1000c4d20 <command_loop_2>, arg=499667000) at eval.c:1100
#16 0x00000001000c4ec6 in command_loop [inlined] () at /Code/emacs-build-threads/src/keyboard.c:1094
#17 0x00000001000c4ec6 in recursive_edit_1 () at keyboard.c:1800
#18 0x00000001000b60df in Frecursive_edit () at keyboard.c:771
#19 0x00000001000b2993 in main (argc=4002, argv=0x7fff5fbff780) at emacs.c:1691

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x0000000000000070
backtrace_top () at eval.c:202
202	  while (backtrace_p (pdl) && pdl->kind != SPECPDL_BACKTRACE)
The program being debugged was signaled while in a function called from GDB.
GDB remains in the frame where the signal was received.
To change this behavior use "set unwindonsignal on"
Evaluation of the expression containing the function (backtrace_top) will be abandoned.
(gdb) bt
#0  backtrace_top () at eval.c:202
#1  <function called from gdb>
#2  0x00000001000cd2b6 in stack_overflow [inlined] () at /Code/emacs-build-threads/src/sysdep.c:1767
#3  0x00000001000cd2b6 in handle_sigsegv (sig=1884515520, siginfo=0x70, arg=0x100621fe8) at sysdep.c:202
#4  <signal handler called>
#5  0x000000010012d7dc in unbind_to (count=Cannot access memory at address 0x0
) at eval.c:3499
#6  0x00000001001af2c4 in ns_select (nfds=1606412240, readfds=0x7fff5fbfeb80, writefds=0x7fff5fbfeb00, exceptfds=0x7fff5fbfe7d0, timeout=0x7fff5fbfe7d0, sigmask=0x7fff5fbfe7d0) at nsterm.m:4210
#7  0x000000010018d210 in really_call_select (arg=0x7fff5fbfe840) at thread.c:520
#8  0x000000010010ca06 in flush_stack_call_func (func=0, arg=0x5fbfeae800000000) at alloc.c:5137
#9  0x000000010018d1a7 in thread_select (func=0x5fbfeae800000000, max_fds=0, rfds=0x7fff5fbfe828, wfds=0x0, efds=0x7fff5fbfeae8, timeout=0x0, sigmask=0x0) at thread.c:543
#10 0x00000001001752ad in wait_reading_process_output (time_limit=140734799801728, nsecs=0, read_kbd=1606413696, wait_for_cell=140734799801728, wait_proc=0x7fff5fbfed80, just_wait_proc=0, do_display=true) at process.c:5344
#11 0x000000010000471a in sit_for (timeout=30, display_option=0, reading=false) at dispnew.c:5763
#12 0x00000001000bf6f2 in read_char (commandflag=1606414816, map=140734799802848, prev_event=4294967295, used_mouse_menu=0x7fff5fbff1e0, end_time=0x7fff5fbff1e0) at keyboard.c:2725
#13 0x00000001000c26ee in read_key_sequence () at keyboard.c:2599
#14 0x00000001000c4afd in command_loop_1 () at keyboard.c:1373
#15 0x000000010012e2a4 in internal_condition_case (bfun=0x1000c3690 <command_loop_1>, handlers=499667000, hfun=0x7fff5fbff500) at eval.c:1334
#16 0x00000001000c4d50 in command_loop_2 (ignore=499667000) at keyboard.c:1115
#17 0x000000010012e32f in internal_catch (tag=499667000, func=0x1000c4d20 <command_loop_2>, arg=499667000) at eval.c:1100
#18 0x00000001000c4ec6 in command_loop [inlined] () at /Code/emacs-build-threads/src/keyboard.c:1094
#19 0x00000001000c4ec6 in recursive_edit_1 () at keyboard.c:202
#20 0x00000001000b60df in Frecursive_edit () at keyboard.c:771
#21 0x00000001000b2993 in main (argc=4002, argv=0x7fff5fbff780) at emacs.c:1691

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x0000000000000070
[Switching to process 17249 thread 0x1703]
backtrace_top () at eval.c:202
202	  while (backtrace_p (pdl) && pdl->kind != SPECPDL_BACKTRACE)
The program being debugged was signaled while in a function called from GDB.
GDB remains in the frame where the signal was received.
To change this behavior use "set unwindonsignal on"
Evaluation of the expression containing the function (backtrace_top) will be abandoned.

=========
===END===
=========

Thanks,
Charles






^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#25265: make-thread crashes in OS X 10.6
  2016-12-24 11:06 bug#25265: make-thread crashes in OS X 10.6 Charles A. Roelli
@ 2016-12-24 17:51 ` Eli Zaretskii
  2016-12-25 15:52   ` Eli Zaretskii
  2017-03-06 20:02 ` bug#25265: make-thread crashes in OS X 10.6 Alan Third
  2017-05-02 20:49 ` Alan Third
  2 siblings, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2016-12-24 17:51 UTC (permalink / raw)
  To: Charles A. Roelli; +Cc: 25265

> From: charles@aurox.ch (Charles A. Roelli)
> Date: Sat, 24 Dec 2016 12:06:45 +0100
> 
> Running Emacs -Q on the latest commit (e36a3882),
> 
> M-: (make-thread (lambda () "string"))
> 
> appears to hang Emacs immediately.

I cannot reproduce this, neither on GNU/Linux nor on MS-Windows.  So I
guess this is NS-specific.

I see that ns_select calls block/unblock_input, which makes it unsafe
when more than one thread is running.  It's quite possible that this
is the reason.

I hope some NS expert could take a look at this and fix this.

Thanks.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#25265: make-thread crashes in OS X 10.6
  2016-12-24 17:51 ` Eli Zaretskii
@ 2016-12-25 15:52   ` Eli Zaretskii
  2016-12-26 13:09     ` Alan Third
  0 siblings, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2016-12-25 15:52 UTC (permalink / raw)
  To: charles; +Cc: 25265

> Date: Sat, 24 Dec 2016 19:51:13 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 25265@debbugs.gnu.org
> 
> I see that ns_select calls block/unblock_input, which makes it unsafe
> when more than one thread is running.  It's quite possible that this
> is the reason.
> 
> I hope some NS expert could take a look at this and fix this.

One idea is to write an implementation of block_input and
unblock_input that uses atomic increment and decrement functions, then
use those only in ns_select.

Could someone with access to NS please try that?





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#25265: make-thread crashes in OS X 10.6
  2016-12-25 15:52   ` Eli Zaretskii
@ 2016-12-26 13:09     ` Alan Third
  2016-12-26 15:52       ` Eli Zaretskii
  0 siblings, 1 reply; 40+ messages in thread
From: Alan Third @ 2016-12-26 13:09 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: charles, 25265

On Sun, Dec 25, 2016 at 05:52:37PM +0200, Eli Zaretskii wrote:
> > Date: Sat, 24 Dec 2016 19:51:13 +0200
> > From: Eli Zaretskii <eliz@gnu.org>
> > Cc: 25265@debbugs.gnu.org
> > 
> > I see that ns_select calls block/unblock_input, which makes it unsafe
> > when more than one thread is running.  It's quite possible that this
> > is the reason.
> > 
> > I hope some NS expert could take a look at this and fix this.
> 
> One idea is to write an implementation of block_input and
> unblock_input that uses atomic increment and decrement functions, then
> use those only in ns_select.
> 
> Could someone with access to NS please try that?

I’ve just tried the below patch and it doesn’t make any difference: it
still hangs immediately.

Back in October Ken Raeburn listed some things that need done in
ns_select which are probably relevant:

https://lists.gnu.org/archive/html/emacs-devel/2016-10/msg00525.html

2 files changed, 23 insertions(+)
src/blockinput.h |  8 ++++++++
src/keyboard.c   | 15 +++++++++++++++

modified   src/blockinput.h
@@ -49,10 +49,18 @@ extern volatile int interrupt_input_blocked;
 
 /* Begin critical section. */
 
+#ifdef NS_IMPL_COCOA
+#include <libkern/OSAtomic.h>
+#endif
+
 INLINE void
 block_input (void)
 {
+#ifdef NS_IMPL_COCOA
+  OSAtomicIncrement32(&interrupt_input_blocked);
+#else
   interrupt_input_blocked++;
+#endif
 }
 
 extern void unblock_input (void);
modified   src/keyboard.c
@@ -54,6 +54,10 @@ along with GNU Emacs.  If not, see <http://www.gnu.org/licenses/>.  */
 #include <sys/ioctl.h>
 #endif /* not MSDOS */
 
+#ifdef NS_IMPL_COCOA
+#include <libkern/OSAtomic.h>
+#endif
+
 #if defined USABLE_FIONREAD && defined USG5_4
 # include <sys/filio.h>
 #endif
@@ -7183,7 +7187,18 @@ unblock_input_to (int level)
 void
 unblock_input (void)
 {
+#ifdef NS_IMPL_COCOA
+  OSAtomicDecrement32(&interrupt_input_blocked);
+  if (interrupt_input_blocked == 0)
+    {
+      if (pending_signals && !fatal_error_in_progress)
+	process_pending_signals ();
+    }
+  else if (interrupt_input_blocked < 0)
+    emacs_abort ();
+#else  
   unblock_input_to (interrupt_input_blocked - 1);
+#endif
 }
 
 /* Undo any number of BLOCK_INPUT calls,


-- 
Alan Third





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#25265: make-thread crashes in OS X 10.6
  2016-12-26 13:09     ` Alan Third
@ 2016-12-26 15:52       ` Eli Zaretskii
  2016-12-26 20:56         ` Alan Third
  0 siblings, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2016-12-26 15:52 UTC (permalink / raw)
  To: Alan Third; +Cc: charles, 25265

> Date: Mon, 26 Dec 2016 13:09:17 +0000
> From: Alan Third <alan@idiocy.org>
> Cc: charles@aurox.ch, 25265@debbugs.gnu.org
> 
> I’ve just tried the below patch and it doesn’t make any difference: it
> still hangs immediately.

Can you try and figure out why it hangs?

> Back in October Ken Raeburn listed some things that need done in
> ns_select which are probably relevant:
> 
> https://lists.gnu.org/archive/html/emacs-devel/2016-10/msg00525.html

Hmm... none of the changes Ken mentioned were in the concurrency
branch when I merged it.  Perhaps Ken didn't push the changes back
then.

The problems in xg_select were solved in a different way, I believe.
The problems in ns_select are the issue at hand here.

Wrt static variables used by ns_select: either they should become
members of struct thread_state, or the calls to release_global_lock
and acquire_global_lock should be moved into ns_select (which was what
Ken was proposing, AFAIU).  I'm not sure which possibility is better,
nor whether this is all that needs to be done, because I don't
understand well enough the semantics of some parts of ns_select, in
particular where it says

    [NSApp run];

Can this part and its surrounding code be made thread-safe?  The call
to record_unwind_protect in particular confuses me: that seems to hint
the code wants to be interruptible here, which might require stuff
like maybe_reacquire_global_lock I added to keyboard.c, to avoid
problems when Emacs gets SIGINT on a TTY frame.

Thanks.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#25265: make-thread crashes in OS X 10.6
  2016-12-26 15:52       ` Eli Zaretskii
@ 2016-12-26 20:56         ` Alan Third
  2016-12-27  7:30           ` Eli Zaretskii
  0 siblings, 1 reply; 40+ messages in thread
From: Alan Third @ 2016-12-26 20:56 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: charles, 25265

On Mon, Dec 26, 2016 at 05:52:30PM +0200, Eli Zaretskii wrote:
> > Date: Mon, 26 Dec 2016 13:09:17 +0000
> > From: Alan Third <alan@idiocy.org>
> > Cc: charles@aurox.ch, 25265@debbugs.gnu.org
> > 
> > I’ve just tried the below patch and it doesn’t make any difference: it
> > still hangs immediately.
> 
> Can you try and figure out why it hangs?

I’m not having any luck with this. My entirely uneducated guess is
that the main thread can’t access the contents of current_thread. It
crashes on this line:

    while (specpdl_ptr != specpdl + count)

with EXC_BAD_ACCESS. Only three things are being accessed: count,
which is a function argument, and specpdl and specpdl_ptr are
#defines to variables within current_thread.

I can’t access anything within current_thread using lldb, but I can
see current_thread itself. I don’t know if that signifies anything.

I’ll put the complete backtrace at the bottom of this email.

> > Back in October Ken Raeburn listed some things that need done in
> > ns_select which are probably relevant:
> > 
> > https://lists.gnu.org/archive/html/emacs-devel/2016-10/msg00525.html
> 
> Hmm... none of the changes Ken mentioned were in the concurrency
> branch when I merged it.  Perhaps Ken didn't push the changes back
> then.
> 
> The problems in xg_select were solved in a different way, I believe.
> The problems in ns_select are the issue at hand here.
> 
> Wrt static variables used by ns_select: either they should become
> members of struct thread_state, or the calls to release_global_lock
> and acquire_global_lock should be moved into ns_select (which was what
> Ken was proposing, AFAIU).  I'm not sure which possibility is better,
> nor whether this is all that needs to be done, because I don't
> understand well enough the semantics of some parts of ns_select, in
> particular where it says
> 
>     [NSApp run];
> 
> Can this part and its surrounding code be made thread-safe?

I think this particular method call has to be done *only* from the
main thread. I imagine that could be a problem.

> The call to record_unwind_protect in particular confuses me: that
> seems to hint the code wants to be interruptible here, which might
> require stuff like maybe_reacquire_global_lock I added to
> keyboard.c, to avoid problems when Emacs gets SIGINT on a TTY frame.

-- 
Alan Third





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#25265: make-thread crashes in OS X 10.6
  2016-12-26 20:56         ` Alan Third
@ 2016-12-27  7:30           ` Eli Zaretskii
  2016-12-27 10:44             ` Alan Third
  0 siblings, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2016-12-27  7:30 UTC (permalink / raw)
  To: Alan Third; +Cc: charles, 25265

> Date: Mon, 26 Dec 2016 20:56:32 +0000
> From: Alan Third <alan@idiocy.org>
> Cc: charles@aurox.ch, 25265@debbugs.gnu.org
> 
> > Can you try and figure out why it hangs?
> 
> I’m not having any luck with this. My entirely uneducated guess is
> that the main thread can’t access the contents of current_thread. It
> crashes on this line:
> 
>     while (specpdl_ptr != specpdl + count)
> 
> with EXC_BAD_ACCESS. Only three things are being accessed: count,
> which is a function argument, and specpdl and specpdl_ptr are
> #defines to variables within current_thread.

The loop above is in unbind_to, right?

> I can’t access anything within current_thread using lldb, but I can
> see current_thread itself. I don’t know if that signifies anything.

I have bad experience with using lldb (not on OS X, though), so I
don't trust it.  Can you use GDB instead?  Failing that, is it
possible to just fprintf these values to stderr?

It is strange that this is where it hangs, since these are
thread-specific variables.  Maybe the problem is that apploopnr is a
static variable, and some other thread stomps on it?

> I’ll put the complete backtrace at the bottom of this email.

I don't see any backtraces attached.

> >     [NSApp run];
> > 
> > Can this part and its surrounding code be made thread-safe?
> 
> I think this particular method call has to be done *only* from the
> main thread. I imagine that could be a problem.

It could be a problem, yes.  But what does this do, exactly, and why
does it need to be called as part of ns_select?

Also, why does ns_select sometimes call pselect instead -- is that for
non-interactive or non-GUI sessions or something?

Thanks.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#25265: make-thread crashes in OS X 10.6
  2016-12-27  7:30           ` Eli Zaretskii
@ 2016-12-27 10:44             ` Alan Third
  2016-12-27 11:13               ` Eli Zaretskii
  0 siblings, 1 reply; 40+ messages in thread
From: Alan Third @ 2016-12-27 10:44 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: charles, 25265

On Tue, Dec 27, 2016 at 09:30:28AM +0200, Eli Zaretskii wrote:
> > Date: Mon, 26 Dec 2016 20:56:32 +0000
> > From: Alan Third <alan@idiocy.org>
> > Cc: charles@aurox.ch, 25265@debbugs.gnu.org
> > 
> > I’m not having any luck with this. My entirely uneducated guess is
> > that the main thread can’t access the contents of current_thread. It
> > crashes on this line:
> > 
> >     while (specpdl_ptr != specpdl + count)
> > 
> > with EXC_BAD_ACCESS. Only three things are being accessed: count,
> > which is a function argument, and specpdl and specpdl_ptr are
> > #defines to variables within current_thread.
> 
> The loop above is in unbind_to, right?

Yes.

> > I can’t access anything within current_thread using lldb, but I can
> > see current_thread itself. I don’t know if that signifies anything.
> 
> I have bad experience with using lldb (not on OS X, though), so I
> don't trust it.  Can you use GDB instead?  Failing that, is it
> possible to just fprintf these values to stderr?

OK, I’ve ended up just using printf and it looks like in this thread
current_thread is null. GDB says it’s null too. Here’s what I put in just
before the while loop:

  printf("AT: %d\n", current_thread);
  fflush(stdout);

And here’s the output from GDB. You can see where current_thread
switches from non‐null to null, just before Emacs hangs:

AT: 7842432
AT: 43712208
AT: 43712208
AT: 0

Thread 2 received signal SIGSEGV, Segmentation fault.
0x0000000100200920 in unbind_to (count=0, value=0) at eval.c:3502
3502	  while (specpdl_ptr != specpdl + count)

(gdb) p current_thread
$1 = (struct thread_state *) 0x0

> > I’ll put the complete backtrace at the bottom of this email.
> 
> I don't see any backtraces attached.

I’ve actually put it in this time. Sorry about that.

> > >     [NSApp run];
> > > 
> > > Can this part and its surrounding code be made thread-safe?
> > 
> > I think this particular method call has to be done *only* from the
> > main thread. I imagine that could be a problem.
> 
> It could be a problem, yes.  But what does this do, exactly, and why
> does it need to be called as part of ns_select?

It runs an event loop that picks up all GUI events and then dispatches
them. It’s part of NextStep. It’s unclear to me why it’s run in
ns_select, but presumably it’s because it needs to be run somewhere
and whoever wrote it thought that was a good place.

If we need to, I expect we could move it to its own thread. That seems
to be a known pattern in the GNUStep/Cocoa world.

> Also, why does ns_select sometimes call pselect instead -- is that for
> non-interactive or non-GUI sessions or something?

It looks that way. It uses it if NSApp isn’t defined, which will
happen if Emacs isn’t being run as a GUI application. I’m pretty sure
that Emacs doesn’t even use ns_select when it’s run in the terminal,
though. It also uses it when timeouts are set to zero. I don’t know
what that’s about.

The backtrace:

Thread 2 received signal SIGSEGV, Segmentation fault.
0x0000000100200933 in unbind_to (count=4, value=0) at eval.c:3499
3499	  while (specpdl_ptr != specpdl + count)
(gdb) thread apply all bt

Thread 13 (Thread 0x1a33 of process 45034):
#0  0x00007fffb03594e2 in ?? ()
#1  0x00007fffb0441791 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 12 (Thread 0x190f of process 45034):
#0  0x00007fffb04411e0 in ?? ()
#1  0x0000000000000000 in ?? ()

Thread 11 (Thread 0x113f of process 45034):
#0  0x00007fffb03594e2 in ?? ()
#1  0x00007fffb0441791 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 10 (Thread 0x1d03 of process 45034):
#0  0x00007fffb035138a in ?? ()
#1  0x00007fffb03507d7 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 6 (Thread 0x1507 of process 45034):
#0  0x00007fffb04411e0 in ?? ()
#1  0x000000000ee6b280 in ?? ()
---Type <return> to continue, or q <return> to quit---
#2  0x000000003b99b0ae in ?? ()
#3  0x00007000029d1aa0 in ?? ()
#4  0x00007000029d1a9c in ?? ()
#5  0x00007000029d17b0 in ?? ()
#6  0x00007000029d1a30 in ?? ()
#7  0xe9bfbe3cda25000b in ?? ()
#8  0x0000000000000000 in ?? ()

Thread 5 (Thread 0x1407 of process 45034):
#0  0x00007fffb0358f4a in ?? ()
#1  0x0000000100b0cc0e in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 4 (Thread 0x120b of process 45034):
#0  0x00007fffb0358f4a in ?? ()
#1  0x00000001002cca9a in -[EmacsApp fd_handler:] (self=0x1016217e0, 
    _cmd=0x100371b45, unused=0x0) at nsterm.m:5512
#2  0x00007fff9c6f6c6d in ?? ()
#3  0x0000000000000000 in ?? ()

Thread 2 (Thread 0x1707 of process 45034):
#0  0x0000000100200933 in unbind_to (count=4, value=0) at eval.c:3499
#1  0x00000001002c6962 in ns_select (nfds=0, readfds=0x7fff5fbfe5a8, 
---Type <return> to continue, or q <return> to quit---
    writefds=0x7fff5fbfe528, exceptfds=0x0, timeout=0x7fff5fbfe500, 
    sigmask=0x0) at nsterm.m:4210
#2  0x00000001002a4e3f in really_call_select (arg=0x7fff5fbfdee8)
    at thread.c:520
#3  0x00000001001d5295 in flush_stack_call_func (
    func=0x1002a4dc0 <really_call_select>, arg=0x7fff5fbfdee8) at alloc.c:5137
#4  0x00000001002a4da6 in thread_select (func=0x1002c62d0 <ns_select>, 
    max_fds=0, rfds=0x7fff5fbfe5a8, wfds=0x7fff5fbfe528, efds=0x0, 
    timeout=0x7fff5fbfe500, sigmask=0x0) at thread.c:543
#5  0x0000000100274571 in wait_reading_process_output (time_limit=30, nsecs=0, 
    read_kbd=-1, do_display=true, wait_for_cell=0, wait_proc=0x0, 
    just_wait_proc=0) at process.c:5344
#6  0x00000001000092b4 in sit_for (timeout=122, reading=true, display_option=1)
    at dispnew.c:5763
#7  0x0000000100140c5e in read_char (commandflag=1, map=4337270851, 
    prev_event=0, used_mouse_menu=0x7fff5fbff037, end_time=0x0)
    at keyboard.c:2729
#8  0x000000010013c0f0 in read_key_sequence (keybuf=0x7fff5fbff360, 
    bufsize=30, prompt=0, dont_downcase_last=false, 
    can_return_switch_frame=true, fix_current_buffer=true, 
    prevent_redisplay=false) at keyboard.c:9154
#9  0x000000010013ac32 in command_loop_1 () at keyboard.c:1377
#10 0x000000010020455f in internal_condition_case (
---Type <return> to continue, or q <return> to quit---
    bfun=0x10013a690 <command_loop_1>, handlers=17232, 
    hfun=0x1001505c0 <cmd_error>) at eval.c:1334
#11 0x00000001001504bc in command_loop_2 (ignore=0) at keyboard.c:1119
#12 0x0000000100203cf8 in internal_catch (tag=43536, 
    func=0x100150490 <command_loop_2>, arg=0) at eval.c:1100
#13 0x0000000100139be8 in command_loop () at keyboard.c:1098
#14 0x0000000100139a50 in recursive_edit_1 () at keyboard.c:704
#15 0x0000000100139d88 in Frecursive_edit () at keyboard.c:775
#16 0x0000000100137a7d in main (argc=2, argv=0x7fff5fbff9a0) at emacs.c:1690
(gdb) p current_thread.m_specpdl
Cannot access memory at address 0x70
(gdb) p current_thread->m_specpdl
Cannot access memory at address 0x70
(gdb) 


-- 
Alan Third





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#25265: make-thread crashes in OS X 10.6
  2016-12-27 10:44             ` Alan Third
@ 2016-12-27 11:13               ` Eli Zaretskii
  2016-12-28 19:36                 ` Alan Third
  0 siblings, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2016-12-27 11:13 UTC (permalink / raw)
  To: Alan Third; +Cc: charles, 25265

> Date: Tue, 27 Dec 2016 10:44:24 +0000
> From: Alan Third <alan@idiocy.org>
> Cc: charles@aurox.ch, 25265@debbugs.gnu.org
> 
> OK, I’ve ended up just using printf and it looks like in this thread
> current_thread is null.

This happens in only one place: run_thread, after a thread exits.  And
yes, it's a bad idea to rely on current_thread being non-NULL in code
that is run when other threads could be running.

More generally, this means calling unbind_to, i.e. unwinding the Lisp
stack, is not going to work in this context at all, because that in
effect runs Lisp when more than one thread could run their code.

> > > >     [NSApp run];
> > > > 
> > > > Can this part and its surrounding code be made thread-safe?
> > > 
> > > I think this particular method call has to be done *only* from the
> > > main thread. I imagine that could be a problem.
> > 
> > It could be a problem, yes.  But what does this do, exactly, and why
> > does it need to be called as part of ns_select?
> 
> It runs an event loop that picks up all GUI events and then dispatches
> them. It’s part of NextStep. It’s unclear to me why it’s run in
> ns_select, but presumably it’s because it needs to be run somewhere
> and whoever wrote it thought that was a good place.
> 
> If we need to, I expect we could move it to its own thread. That seems
> to be a known pattern in the GNUStep/Cocoa world.

That's probably the only reasonable way.  But why does it use
record_unwind_protect and unbind_to in the first place?  What happens
if the user presses C-g while the event loop runs?

> I’m pretty sure that Emacs doesn’t even use ns_select when it’s run
> in the terminal, though.

Actually, I think it has to use ns_select, because subprocesses are
still supported on a text terminal, and reading their output calls
ns_select.  Likewise network connections.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#25265: make-thread crashes in OS X 10.6
  2016-12-27 11:13               ` Eli Zaretskii
@ 2016-12-28 19:36                 ` Alan Third
  2016-12-29 17:12                   ` Eli Zaretskii
  0 siblings, 1 reply; 40+ messages in thread
From: Alan Third @ 2016-12-28 19:36 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: charles, 25265

On Tue, Dec 27, 2016 at 01:13:11PM +0200, Eli Zaretskii wrote:
> > Date: Tue, 27 Dec 2016 10:44:24 +0000
> > From: Alan Third <alan@idiocy.org>
> > Cc: charles@aurox.ch, 25265@debbugs.gnu.org
> 
> > > > >     [NSApp run];
> > > > > 
> > > > > Can this part and its surrounding code be made thread-safe?
> > > > 
> > > > I think this particular method call has to be done *only* from the
> > > > main thread. I imagine that could be a problem.
> > > 
> > > It could be a problem, yes.  But what does this do, exactly, and why
> > > does it need to be called as part of ns_select?
> > 
> > It runs an event loop that picks up all GUI events and then dispatches
> > them. It’s part of NextStep. It’s unclear to me why it’s run in
> > ns_select, but presumably it’s because it needs to be run somewhere
> > and whoever wrote it thought that was a good place.
> > 
> > If we need to, I expect we could move it to its own thread. That seems
> > to be a known pattern in the GNUStep/Cocoa world.
> 
> That's probably the only reasonable way.  But why does it use
> record_unwind_protect and unbind_to in the first place?  What happens
> if the user presses C-g while the event loop runs?

It was added in change 08f27aa39c498d34418c7420b33bf2db913827c3 which
relates to bug 18345. Previously it was simply apploopnr--, but
sometimes users would find themselves in a situation where that line
was never run (hitting C‐g while the event loop runs?) so the next
time ns_select (or ns_read_socket) ran it aborted. I take it unbind_to
fixes that particular problem.

Is there a safer way of ensuring that apploopnr is decremented?

I’m slowly beginning to understand what’s going on in ns_select. It
seems the idea is that it should detect both input on file descriptors
(using pselect in the background), and NS events coming from [NSApp
run].

The following is for my own future reference, if nothing else.

There is another thread that runs in a loop (fd_handler), and when
it’s signalled from ns_select, it runs pselect. At the same time
ns_select sets up a timer, then it calls [NSApp run].

If either the pselect thread, or the timer run out before any input is
detected, they send a user defined event to the NSApp event loop so it
ends.

Similarly if the pselect thread detects input it sends an event to the
NSApp loop, which then ends.

If there’s NS input, it’s processed by the NSApp loop, which then
ends.

Whatever happens, the NSApp loop eventually returns control back to
ns_select, which works out what happened by examining the last NS
event and returns the relevant value.

I have my doubts this is thread safe. The communication between
ns_select and fd_handler is done by setting static variables and then
running:

    c = 'g';
    emacs_write_sig (selfds[1], &c, 1);

I don’t really know what that does. Sends something on an fd?

Then it writes back to the same global variables for ns_select to pick
up.

Now I’ve written it down it seems a lot less complicated...
-- 
Alan Third





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#25265: make-thread crashes in OS X 10.6
  2016-12-28 19:36                 ` Alan Third
@ 2016-12-29 17:12                   ` Eli Zaretskii
  2016-12-30 18:45                     ` Alan Third
  0 siblings, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2016-12-29 17:12 UTC (permalink / raw)
  To: Alan Third; +Cc: charles, 25265

> Date: Wed, 28 Dec 2016 19:36:33 +0000
> From: Alan Third <alan@idiocy.org>
> Cc: charles@aurox.ch, 25265@debbugs.gnu.org
> 
> > That's probably the only reasonable way.  But why does it use
> > record_unwind_protect and unbind_to in the first place?  What happens
> > if the user presses C-g while the event loop runs?
> 
> It was added in change 08f27aa39c498d34418c7420b33bf2db913827c3 which
> relates to bug 18345. Previously it was simply apploopnr--, but
> sometimes users would find themselves in a situation where that line
> was never run (hitting C‐g while the event loop runs?) so the next
> time ns_select (or ns_read_socket) ran it aborted. I take it unbind_to
> fixes that particular problem.
> 
> Is there a safer way of ensuring that apploopnr is decremented?

There could be, but I see that ns_select and ns_read_socket use it for
some kind of synchronization between them, which I don't really
understand, so I cannot answer that question.

> I’m slowly beginning to understand what’s going on in ns_select. It
> seems the idea is that it should detect both input on file descriptors
> (using pselect in the background), and NS events coming from [NSApp
> run].

What do "NS events" include?  Do they include, for example, keyboard
input?

Also, does C-g cause a signal or some other async event on NS, which
could potentially interrupt the [NSApp run] stuff?  Or could SIGIO do
that (does NS use SIGIO for input?)?

If nothing could interrupt/QUIT [NSApp run], then I don't really
understand why we have unwind_protect there.  The bug in question
cites some undisclosed Lisp for reproducing the problem, so I wonder
how could that cause [NSApp run] to be interrupted and longjmp to
higher level.

> There is another thread that runs in a loop (fd_handler), and when
> it’s signalled from ns_select, it runs pselect. At the same time
> ns_select sets up a timer, then it calls [NSApp run].

(I think ns_select only sets up a timer when there are no descriptors
to watch, to avoid waking up the fd_handler thread in that case.)

So this means there are 2 jobs to be done here: the pselect call and
the [NSApp run] call.

> If either the pselect thread, or the timer run out before any input is
> detected, they send a user defined event to the NSApp event loop so it
> ends.
> 
> Similarly if the pselect thread detects input it sends an event to the
> NSApp loop, which then ends.
> 
> If there’s NS input, it’s processed by the NSApp loop

Processed how?  Shouldn't Emacs be involved in this processing?  IOW,
these events should be read by Emacs, via the read_socket_hook.

> Whatever happens, the NSApp loop eventually returns control back to
> ns_select, which works out what happened by examining the last NS
> event and returns the relevant value.
> 
> I have my doubts this is thread safe.

It isn't.  Unless each Emacs application thread will have its own
fd_handler thread.

One possible solution might be to let only one thread, say the main
thread, to call [NSApp run].  The other threads, when they get into
ns_select, will behave as if Emacs runs in non-GUI mode, and will only
call pselect.  Not sure what this will mean from the POV of all
threads being equal (since the delicate dance between ns_select and
ns_read_socket is still unclear to me), but at least it might avoid
crashes and hangs.  Can you try something like that?

> The communication between ns_select and fd_handler is done by
> setting static variables and then running:
> 
>     c = 'g';
>     emacs_write_sig (selfds[1], &c, 1);
> 
> I don’t really know what that does. Sends something on an fd?

Yes.  I guess 'g' stands for GO and 's' stands for SUSPEND.  These are
commands to the fd_handler thread, or so it seems.

Thanks.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#25265: make-thread crashes in OS X 10.6
  2016-12-29 17:12                   ` Eli Zaretskii
@ 2016-12-30 18:45                     ` Alan Third
  2016-12-30 21:08                       ` Eli Zaretskii
  0 siblings, 1 reply; 40+ messages in thread
From: Alan Third @ 2016-12-30 18:45 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: charles, 25265

On Thu, Dec 29, 2016 at 07:12:54PM +0200, Eli Zaretskii wrote:
> > Date: Wed, 28 Dec 2016 19:36:33 +0000
> > From: Alan Third <alan@idiocy.org>
> > Cc: charles@aurox.ch, 25265@debbugs.gnu.org
> > 
> > I’m slowly beginning to understand what’s going on in ns_select. It
> > seems the idea is that it should detect both input on file descriptors
> > (using pselect in the background), and NS events coming from [NSApp
> > run].
> 
> What do "NS events" include?  Do they include, for example, keyboard
> input?

As far as I can tell it’s mouse and keyboard input, as well as
messages from the ‘window manager’, telling it to close the frame,
etc.

> > There is another thread that runs in a loop (fd_handler), and when
> > it’s signalled from ns_select, it runs pselect. At the same time
> > ns_select sets up a timer, then it calls [NSApp run].
> 
> (I think ns_select only sets up a timer when there are no descriptors
> to watch, to avoid waking up the fd_handler thread in that case.)
> 
> So this means there are 2 jobs to be done here: the pselect call and
> the [NSApp run] call.

Correct on both counts.

> > If there’s NS input, it’s processed by the NSApp loop
> 
> Processed how?  Shouldn't Emacs be involved in this processing?  IOW,
> these events should be read by Emacs, via the read_socket_hook.

Ah! Is this the missing piece of the puzzle? When the [NSApp run] loop
receives an event, say keyboard input, it creates an emacs_event and
then raises SIGIO (via hold_event). SIGIO causes ns_read_socket to be
run, which ALSO tries to run [NSApp run].

Am I right in thinking that raising SIGIO will cause ns_read_socket to
be potentially run immediately? Asynchronously?

I’ve just commented out the section of ns_read_socket that calls
[NSApp run] and I can’t see any difference in behaviour. I suspect
that someone’s doubled up on it when they didn’t need to.

> One possible solution might be to let only one thread, say the main
> thread, to call [NSApp run].  The other threads, when they get into
> ns_select, will behave as if Emacs runs in non-GUI mode, and will only
> call pselect.  Not sure what this will mean from the POV of all
> threads being equal (since the delicate dance between ns_select and
> ns_read_socket is still unclear to me), but at least it might avoid
> crashes and hangs.  Can you try something like that?

Yes, I will. Am I right in thinking that if we remove all the NSApp
junk from ns_select it will literally just be calling pselect with
the same arguments?

So, my plan of action:

Run [NSApp run] in it’s own thread with no flow control (unless it’s
important that emacs events are only created at specific times?)

Replace ns_select with pselect.

Thanks for helping with this, I don’t think I’d be able to work it out
on my own.
-- 
Alan Third





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#25265: make-thread crashes in OS X 10.6
  2016-12-30 18:45                     ` Alan Third
@ 2016-12-30 21:08                       ` Eli Zaretskii
  2016-12-30 22:05                         ` Alan Third
  0 siblings, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2016-12-30 21:08 UTC (permalink / raw)
  To: Alan Third; +Cc: charles, 25265

> Date: Fri, 30 Dec 2016 18:45:32 +0000
> From: Alan Third <alan@idiocy.org>
> Cc: charles@aurox.ch, 25265@debbugs.gnu.org
> 
> > > If there’s NS input, it’s processed by the NSApp loop
> > 
> > Processed how?  Shouldn't Emacs be involved in this processing?  IOW,
> > these events should be read by Emacs, via the read_socket_hook.
> 
> Ah! Is this the missing piece of the puzzle? When the [NSApp run] loop
> receives an event, say keyboard input, it creates an emacs_event and
> then raises SIGIO (via hold_event). SIGIO causes ns_read_socket to be
> run, which ALSO tries to run [NSApp run].
> 
> Am I right in thinking that raising SIGIO will cause ns_read_socket to
> be potentially run immediately? Asynchronously?

I very much hope not.  We don't run any non-trivial code from a signal
handler.  I'd expect SIGIO just to set a flag, and then the resulting
input be processed when we call unblock_input next time, and the
blocking level gets to zero.  Then we run process_pending_signals,
which calls handle_async_input, and that's where ns_read_socket will
be called by gobble_input.

> I’ve just commented out the section of ns_read_socket that calls
> [NSApp run] and I can’t see any difference in behaviour. I suspect
> that someone’s doubled up on it when they didn’t need to.

I cannot help you here.  Maybe it's needed for Emacs to be more
responsive?  If you run "git log -L" on ns_read_socket, does the
history tell anything about why this call was added?  Perhaps some
discussion or bug report?

> > One possible solution might be to let only one thread, say the main
> > thread, to call [NSApp run].  The other threads, when they get into
> > ns_select, will behave as if Emacs runs in non-GUI mode, and will only
> > call pselect.  Not sure what this will mean from the POV of all
> > threads being equal (since the delicate dance between ns_select and
> > ns_read_socket is still unclear to me), but at least it might avoid
> > crashes and hangs.  Can you try something like that?
> 
> Yes, I will. Am I right in thinking that if we remove all the NSApp
> junk from ns_select it will literally just be calling pselect with
> the same arguments?

It looks like that, yes.

> So, my plan of action:
> 
> Run [NSApp run] in it’s own thread with no flow control (unless it’s
> important that emacs events are only created at specific times?)

How will that thread communicate the events to Emacs?

> Thanks for helping with this, I don’t think I’d be able to work it out
> on my own.

Thank you for digging into the problem.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#25265: make-thread crashes in OS X 10.6
  2016-12-30 21:08                       ` Eli Zaretskii
@ 2016-12-30 22:05                         ` Alan Third
  2016-12-31  9:20                           ` Eli Zaretskii
  0 siblings, 1 reply; 40+ messages in thread
From: Alan Third @ 2016-12-30 22:05 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: charles, 25265

On Fri, Dec 30, 2016 at 11:08:54PM +0200, Eli Zaretskii wrote:
> > Date: Fri, 30 Dec 2016 18:45:32 +0000
> > From: Alan Third <alan@idiocy.org>
> > Cc: charles@aurox.ch, 25265@debbugs.gnu.org
> > 
> > Ah! Is this the missing piece of the puzzle? When the [NSApp run] loop
> > receives an event, say keyboard input, it creates an emacs_event and
> > then raises SIGIO (via hold_event). SIGIO causes ns_read_socket to be
> > run, which ALSO tries to run [NSApp run].
> > 
> > Am I right in thinking that raising SIGIO will cause ns_read_socket to
> > be potentially run immediately? Asynchronously?
> 
> I very much hope not.  We don't run any non-trivial code from a signal
> handler.  I'd expect SIGIO just to set a flag, and then the resulting
> input be processed when we call unblock_input next time, and the
> blocking level gets to zero.  Then we run process_pending_signals,
> which calls handle_async_input, and that's where ns_read_socket will
> be called by gobble_input.

OK. I am, again, none the wiser.

> > I’ve just commented out the section of ns_read_socket that calls
> > [NSApp run] and I can’t see any difference in behaviour. I suspect
> > that someone’s doubled up on it when they didn’t need to.
> 
> I cannot help you here.  Maybe it's needed for Emacs to be more
> responsive?  If you run "git log -L" on ns_read_socket, does the
> history tell anything about why this call was added?  Perhaps some
> discussion or bug report?

It’s been there since the NS port was integrated, it seems.

> > So, my plan of action:
> > 
> > Run [NSApp run] in it’s own thread with no flow control (unless it’s
> > important that emacs events are only created at specific times?)
> 
> How will that thread communicate the events to Emacs?

I’m hoping using the same method as it does now. It creates emacs
events from the NS events, and then fires SIGIO.

I have run into a problem almost right away, though. I don’t know how
to go about creating this thread.

The NSApp loop needs to run in the ‘main’ thread on macOS. I
understand the main thread is always the original thread, so if I want
to use it for the NSApp loop, I need to move Emacs’ normal operation
into a child thread.

I don’t have any experience with pthreads and can’t see any obvious
way of doing it (and I think, from my experiments, it breaks unexec or
something). Is it possible, or can you think of a better way of
handling this?

Thanks!
-- 
Alan Third





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#25265: make-thread crashes in OS X 10.6
  2016-12-30 22:05                         ` Alan Third
@ 2016-12-31  9:20                           ` Eli Zaretskii
  2016-12-31 16:09                             ` bug#25265: [PATCH] Rework NS event handling (bug#25265) Alan Third
  0 siblings, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2016-12-31  9:20 UTC (permalink / raw)
  To: Alan Third; +Cc: charles, 25265

> Date: Fri, 30 Dec 2016 22:05:44 +0000
> From: Alan Third <alan@idiocy.org>
> Cc: charles@aurox.ch, 25265@debbugs.gnu.org
> 
> > > Am I right in thinking that raising SIGIO will cause ns_read_socket to
> > > be potentially run immediately? Asynchronously?
> > 
> > I very much hope not.  We don't run any non-trivial code from a signal
> > handler.  I'd expect SIGIO just to set a flag, and then the resulting
> > input be processed when we call unblock_input next time, and the
> > blocking level gets to zero.  Then we run process_pending_signals,
> > which calls handle_async_input, and that's where ns_read_socket will
> > be called by gobble_input.
> 
> OK. I am, again, none the wiser.

Don't give up ;-)

> > > So, my plan of action:
> > > 
> > > Run [NSApp run] in it’s own thread with no flow control (unless it’s
> > > important that emacs events are only created at specific times?)
> > 
> > How will that thread communicate the events to Emacs?
> 
> I’m hoping using the same method as it does now. It creates emacs
> events from the NS events, and then fires SIGIO.
> 
> I have run into a problem almost right away, though. I don’t know how
> to go about creating this thread.
> 
> The NSApp loop needs to run in the ‘main’ thread on macOS. I
> understand the main thread is always the original thread, so if I want
> to use it for the NSApp loop, I need to move Emacs’ normal operation
> into a child thread.

How about a more modest goal instead?  The code in ns_select can
detect when it is run by the main thread: we have a function,
main_thread_p, for that.  (For it to work, you need to pass
current_thread to it, so we will have to make sure the first part of
ns_select, up to the pselect call, is run with the global lock owned
by a single thread.  The easiest way of achieving that is to do for
ns_select the same change I did yesterday for xg_select: call
ns_select directly in process.c:wait_reading_process_output, then
change ns_select to call thread_select instead of pselect.)

When ns_select detects it's run by a non-main thread, it will only
call thread_select and return its result.  Otherwise, it will call
thread_select with a very small timeout and with NULL descriptor sets
(to let other threads an opportunity to run, and when that returns, do
the [NSApp run] dance with the surrounding code.  Note that the latter
runs after the main thread has re-acquired the global lock, so no
other Lisp threads could be active at that point.

Would that work?

Thanks.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#25265: [PATCH] Rework NS event handling (bug#25265)
  2016-12-31  9:20                           ` Eli Zaretskii
@ 2016-12-31 16:09                             ` Alan Third
  2016-12-31 16:25                               ` Eli Zaretskii
  2017-01-01 15:03                               ` Alan Third
  0 siblings, 2 replies; 40+ messages in thread
From: Alan Third @ 2016-12-31 16:09 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: charles, 25265

* src/nsterm.m (unwind_apploopnr): Remove.
(ns_read_socket): Remove references to apploopnr.  Make processing the
NS event loop conditional on being in the main thread.
(ns_select): Remove references to apploopnr.  Remove all fd_handler
related stuff.  Check if there are events waiting on the NS event
queue rather than running the event loop.  Remove unused variables and
code.
(fd_handler): Remove.
(ns_term_init): Remove creation of fd_handler thread.
(hold_event, EmacsApp:sendEvent, EmacsView:mouseMoved,
EmacsView:windowDidExpose): Remove send_appdefined.
(ns_send_appdefined): Always check the event queue for
applicationDefined events rather than relying on send_appdefined var.
* src/nsterm.h: Remove reference to fd_handler method.
---
 src/nsterm.h |   1 -
 src/nsterm.m | 380 +++++++++++------------------------------------------------
 2 files changed, 68 insertions(+), 313 deletions(-)

diff --git a/src/nsterm.h b/src/nsterm.h
index 35c6e1a..dc222a7 100644
--- a/src/nsterm.h
+++ b/src/nsterm.h
@@ -392,7 +392,6 @@ char const * nstrace_fullscreen_type_name (int);
 - (void)sendEvent: (NSEvent *)theEvent;
 - (void)showPreferencesWindow: (id)sender;
 - (BOOL) openFile: (NSString *)fileName;
-- (void)fd_handler: (id)unused;
 - (void)timeout_handler: (NSTimer *)timedEntry;
 - (BOOL)fulfillService: (NSString *)name withArg: (NSString *)arg;
 #ifdef NS_IMPL_GNUSTEP
diff --git a/src/nsterm.m b/src/nsterm.m
index 7e6ec85..98fd8ab 100644
--- a/src/nsterm.m
+++ b/src/nsterm.m
@@ -279,18 +279,10 @@ - (NSColor *)colorUsingDefaultColorSpace
 /*static int debug_lock = 0; */
 
 /* event loop */
-static BOOL send_appdefined = YES;
 #define NO_APPDEFINED_DATA (-8)
 static int last_appdefined_event_data = NO_APPDEFINED_DATA;
 static NSTimer *timed_entry = 0;
 static NSTimer *scroll_repeat_entry = nil;
-static fd_set select_readfds, select_writefds;
-enum { SELECT_HAVE_READ = 1, SELECT_HAVE_WRITE = 2, SELECT_HAVE_TMO = 4 };
-static int select_nfds = 0, select_valid = 0;
-static struct timespec select_timeout = { 0, 0 };
-static int selfds[2] = { -1, -1 };
-static pthread_mutex_t select_mutex;
-static int apploopnr = 0;
 static NSAutoreleasePool *outerpool;
 static struct input_event *emacs_event = NULL;
 static struct input_event *q_event_ptr = NULL;
@@ -457,7 +449,6 @@ - (NSColor *)colorUsingDefaultColorSpace
   hold_event_q.q[hold_event_q.nr++] = *event;
   /* Make sure ns_read_socket is called, i.e. we have input.  */
   raise (SIGIO);
-  send_appdefined = YES;
 }
 
 static Lisp_Object
@@ -3872,31 +3863,17 @@ overwriting cursor (usually when cursor on a tab) */
       return;
     }
 
-  /* Only post this event if we haven't already posted one.  This will end
-       the [NXApp run] main loop after having processed all events queued at
-       this moment.  */
-
-#ifdef NS_IMPL_COCOA
-  if (! send_appdefined)
-    {
-      /* OS X 10.10.1 swallows the AppDefined event we are sending ourselves
-         in certain situations (rapid incoming events).
-         So check if we have one, if not add one.  */
-      NSEvent *appev = [NSApp nextEventMatchingMask:NSEventMaskApplicationDefined
-                                          untilDate:[NSDate distantPast]
-                                             inMode:NSDefaultRunLoopMode
-                                            dequeue:NO];
-      if (! appev) send_appdefined = YES;
-    }
-#endif
-
-  if (send_appdefined)
+  /* Only post this event if we haven't already posted one.  This will
+     end the [NXApp run] main loop after having processed all events
+     queued at this moment.  */
+  NSEvent *appev = [NSApp nextEventMatchingMask:NSEventMaskApplicationDefined
+                                      untilDate:[NSDate distantPast]
+                                         inMode:NSDefaultRunLoopMode
+                                        dequeue:NO];
+  if (! appev)
     {
       NSEvent *nxev;
 
-      /* We only need one NX_APPDEFINED event to stop NXApp from running.  */
-      send_appdefined = NO;
-
       /* Don't need wakeup timer any more */
       if (timed_entry)
         {
@@ -4011,14 +3988,6 @@ in certain situations (rapid incoming events).
 }
 #endif /* NS_IMPL_COCOA */
 
-static void
-unwind_apploopnr (Lisp_Object not_used)
-{
-  --apploopnr;
-  n_emacs_events_pending = 0;
-  ns_finish_events ();
-  q_event_ptr = NULL;
-}
 
 static int
 ns_read_socket (struct terminal *terminal, struct input_event *hold_quit)
@@ -4029,13 +3998,10 @@ in certain situations (rapid incoming events).
    -------------------------------------------------------------------------- */
 {
   struct input_event ev;
-  int nevents;
+  int nevents = 0;
 
   NSTRACE_WHEN (NSTRACE_GROUP_EVENTS, "ns_read_socket");
 
-  if (apploopnr > 0)
-    return -1; /* Already within event loop. */
-
 #ifdef HAVE_NATIVE_FS
   check_native_fs ();
 #endif
@@ -4052,54 +4018,49 @@ in certain situations (rapid incoming events).
       return i;
     }
 
-  block_input ();
-  n_emacs_events_pending = 0;
-  ns_init_events (&ev);
-  q_event_ptr = hold_quit;
-
-  /* we manage autorelease pools by allocate/reallocate each time around
-     the loop; strict nesting is occasionally violated but seems not to
-     matter.. earlier methods using full nesting caused major memory leaks */
-  [outerpool release];
-  outerpool = [[NSAutoreleasePool alloc] init];
-
-  /* If have pending open-file requests, attend to the next one of those. */
-  if (ns_pending_files && [ns_pending_files count] != 0
-      && [(EmacsApp *)NSApp openFile: [ns_pending_files objectAtIndex: 0]])
+  if ([NSThread isMainThread])
     {
-      [ns_pending_files removeObjectAtIndex: 0];
-    }
-  /* Deal with pending service requests. */
-  else if (ns_pending_service_names && [ns_pending_service_names count] != 0
-    && [(EmacsApp *)
-         NSApp fulfillService: [ns_pending_service_names objectAtIndex: 0]
-                      withArg: [ns_pending_service_args objectAtIndex: 0]])
-    {
-      [ns_pending_service_names removeObjectAtIndex: 0];
-      [ns_pending_service_args removeObjectAtIndex: 0];
-    }
-  else
-    {
-      ptrdiff_t specpdl_count = SPECPDL_INDEX ();
-      /* Run and wait for events.  We must always send one NX_APPDEFINED event
-         to ourself, otherwise [NXApp run] will never exit.  */
-      send_appdefined = YES;
-      ns_send_appdefined (-1);
-
-      if (++apploopnr != 1)
+      block_input ();
+      n_emacs_events_pending = 0;
+      ns_init_events (&ev);
+      q_event_ptr = hold_quit;
+
+      /* we manage autorelease pools by allocate/reallocate each time around
+         the loop; strict nesting is occasionally violated but seems not to
+         matter.. earlier methods using full nesting caused major memory leaks */
+      [outerpool release];
+      outerpool = [[NSAutoreleasePool alloc] init];
+
+      /* If have pending open-file requests, attend to the next one of those. */
+      if (ns_pending_files && [ns_pending_files count] != 0
+          && [(EmacsApp *)NSApp openFile: [ns_pending_files objectAtIndex: 0]])
         {
-          emacs_abort ();
+          [ns_pending_files removeObjectAtIndex: 0];
         }
-      record_unwind_protect (unwind_apploopnr, Qt);
-      [NSApp run];
-      unbind_to (specpdl_count, Qnil);  /* calls unwind_apploopnr */
-    }
+      /* Deal with pending service requests. */
+      else if (ns_pending_service_names && [ns_pending_service_names count] != 0
+               && [(EmacsApp *)
+                    NSApp fulfillService: [ns_pending_service_names objectAtIndex: 0]
+                                 withArg: [ns_pending_service_args objectAtIndex: 0]])
+        {
+          [ns_pending_service_names removeObjectAtIndex: 0];
+          [ns_pending_service_args removeObjectAtIndex: 0];
+        }
+      else
+        {
+          /* Run and wait for events.  We must always send one NX_APPDEFINED event
+             to ourself, otherwise [NXApp run] will never exit.  */
+          ns_send_appdefined (-1);
 
-  nevents = n_emacs_events_pending;
-  n_emacs_events_pending = 0;
-  ns_finish_events ();
-  q_event_ptr = NULL;
-  unblock_input ();
+          [NSApp run];
+        }
+
+      nevents = n_emacs_events_pending;
+      n_emacs_events_pending = 0;
+      ns_finish_events ();
+      q_event_ptr = NULL;
+      unblock_input ();
+    }
 
   return nevents;
 }
@@ -4114,15 +4075,11 @@ in certain situations (rapid incoming events).
    -------------------------------------------------------------------------- */
 {
   int result;
-  int t, k, nr = 0;
-  struct input_event event;
-  char c;
+  NSDate *timeout_date = nil;
+  NSEvent *ns_event;
 
   NSTRACE_WHEN (NSTRACE_GROUP_EVENTS, "ns_select");
 
-  if (apploopnr > 0)
-    return -1; /* Already within event loop. */
-
 #ifdef HAVE_NATIVE_FS
   check_native_fs ();
 #endif
@@ -4135,121 +4092,34 @@ in certain situations (rapid incoming events).
       return -1;
     }
 
-  for (k = 0; k < nfds+1; k++)
-    {
-      if (readfds && FD_ISSET(k, readfds)) ++nr;
-      if (writefds && FD_ISSET(k, writefds)) ++nr;
-    }
-
   if (NSApp == nil
+      || ![NSThread isMainThread]
       || (timeout && timeout->tv_sec == 0 && timeout->tv_nsec == 0))
-    return pselect (nfds, readfds, writefds, exceptfds, timeout, sigmask);
+    return pselect(nfds, readfds, writefds,
+                   exceptfds, timeout, sigmask);
+
+  result = pselect(nfds, readfds, writefds, exceptfds,
+                   &(struct timespec){.tv_sec = 0, .tv_nsec = 100},
+                   sigmask);
 
   [outerpool release];
   outerpool = [[NSAutoreleasePool alloc] init];
 
-
-  send_appdefined = YES;
-  if (nr > 0)
-    {
-      pthread_mutex_lock (&select_mutex);
-      select_nfds = nfds;
-      select_valid = 0;
-      if (readfds)
-        {
-          select_readfds = *readfds;
-          select_valid += SELECT_HAVE_READ;
-        }
-      if (writefds)
-        {
-          select_writefds = *writefds;
-          select_valid += SELECT_HAVE_WRITE;
-        }
-
-      if (timeout)
-        {
-          select_timeout = *timeout;
-          select_valid += SELECT_HAVE_TMO;
-        }
-
-      pthread_mutex_unlock (&select_mutex);
-
-      /* Inform fd_handler that select should be called */
-      c = 'g';
-      emacs_write_sig (selfds[1], &c, 1);
-    }
-  else if (nr == 0 && timeout)
+  if (timeout)
     {
-      /* No file descriptor, just a timeout, no need to wake fd_handler  */
       double time = timespectod (*timeout);
-      timed_entry = [[NSTimer scheduledTimerWithTimeInterval: time
-                                                      target: NSApp
-                                                    selector:
-                                  @selector (timeout_handler:)
-                                                    userInfo: 0
-                                                     repeats: NO]
-                      retain];
-    }
-  else /* No timeout and no file descriptors, can this happen?  */
-    {
-      /* Send appdefined so we exit from the loop */
-      ns_send_appdefined (-1);
-    }
-
-  block_input ();
-  ns_init_events (&event);
-  if (++apploopnr != 1)
-    {
-      emacs_abort ();
-    }
-
-  {
-    ptrdiff_t specpdl_count = SPECPDL_INDEX ();
-    record_unwind_protect (unwind_apploopnr, Qt);
-    [NSApp run];
-    unbind_to (specpdl_count, Qnil);  /* calls unwind_apploopnr */
-  }
-
-  ns_finish_events ();
-  if (nr > 0 && readfds)
-    {
-      c = 's';
-      emacs_write_sig (selfds[1], &c, 1);
+      timeout_date = [NSDate dateWithTimeIntervalSinceNow:time];
     }
-  unblock_input ();
 
-  t = last_appdefined_event_data;
+  /* Listen for a new NSEvent. */
+  ns_event = [NSApp nextEventMatchingMask:NSEventMaskAny
+                                untilDate:timeout_date
+                                   inMode:NSDefaultRunLoopMode
+                                  dequeue:NO];
 
-  if (t != NO_APPDEFINED_DATA)
+  if (ns_event != nil)
     {
-      last_appdefined_event_data = NO_APPDEFINED_DATA;
-
-      if (t == -2)
-        {
-          /* The NX_APPDEFINED event we received was a timeout. */
-          result = 0;
-        }
-      else if (t == -1)
-        {
-          /* The NX_APPDEFINED event we received was the result of
-             at least one real input event arriving.  */
-          errno = EINTR;
-          result = -1;
-        }
-      else
-        {
-          /* Received back from select () in fd_handler; copy the results */
-          pthread_mutex_lock (&select_mutex);
-          if (readfds) *readfds = select_readfds;
-          if (writefds) *writefds = select_writefds;
-          pthread_mutex_unlock (&select_mutex);
-          result = t;
-        }
-    }
-  else
-    {
-      errno = EINTR;
-      result = -1;
+      raise (SIGIO);
     }
 
   return result;
@@ -4765,21 +4635,6 @@ static Lisp_Object ns_string_to_lispmod (const char *s)
   baud_rate = 38400;
   Fset_input_interrupt_mode (Qnil);
 
-  if (selfds[0] == -1)
-    {
-      if (emacs_pipe (selfds) != 0)
-        {
-          fprintf (stderr, "Failed to create pipe: %s\n",
-                   emacs_strerror (errno));
-          emacs_abort ();
-        }
-
-      fcntl (selfds[0], F_SETFL, O_NONBLOCK|fcntl (selfds[0], F_GETFL));
-      FD_ZERO (&select_readfds);
-      FD_ZERO (&select_writefds);
-      pthread_mutex_init (&select_mutex, NULL);
-    }
-
   ns_pending_files = [[NSMutableArray alloc] init];
   ns_pending_service_names = [[NSMutableArray alloc] init];
   ns_pending_service_args = [[NSMutableArray alloc] init];
@@ -4792,11 +4647,6 @@ Needs to be here because ns_initialize_display_info () uses AppKit classes.
     return NULL;
   [NSApp setDelegate: NSApp];
 
-  /* Start the select thread.  */
-  [NSThread detachNewThreadSelector:@selector (fd_handler:)
-                           toTarget:NSApp
-                         withObject:nil];
-
   /* debugging: log all notifications */
   /*   [[NSNotificationCenter defaultCenter] addObserver: NSApp
                                          selector: @selector (logNotification:)
@@ -5178,10 +5028,6 @@ - (void)sendEvent: (NSEvent *)theEvent
           last_appdefined_event_data = [theEvent data1];
           [self stop: self];
         }
-      else
-        {
-          send_appdefined = YES;
-        }
     }
 
 
@@ -5484,95 +5330,6 @@ - (void)sendFromMainThread:(id)unused
   ns_send_appdefined (nextappdefined);
 }
 
-- (void)fd_handler:(id)unused
-/* --------------------------------------------------------------------------
-     Check data waiting on file descriptors and terminate if so
-   -------------------------------------------------------------------------- */
-{
-  int result;
-  int waiting = 1, nfds;
-  char c;
-
-  fd_set readfds, writefds, *wfds;
-  struct timespec timeout, *tmo;
-  NSAutoreleasePool *pool = nil;
-
-  /* NSTRACE ("fd_handler"); */
-
-  for (;;)
-    {
-      [pool release];
-      pool = [[NSAutoreleasePool alloc] init];
-
-      if (waiting)
-        {
-          fd_set fds;
-          FD_ZERO (&fds);
-          FD_SET (selfds[0], &fds);
-          result = select (selfds[0]+1, &fds, NULL, NULL, NULL);
-          if (result > 0 && read (selfds[0], &c, 1) == 1 && c == 'g')
-	    waiting = 0;
-        }
-      else
-        {
-          pthread_mutex_lock (&select_mutex);
-          nfds = select_nfds;
-
-          if (select_valid & SELECT_HAVE_READ)
-            readfds = select_readfds;
-          else
-            FD_ZERO (&readfds);
-
-          if (select_valid & SELECT_HAVE_WRITE)
-            {
-              writefds = select_writefds;
-              wfds = &writefds;
-            }
-          else
-            wfds = NULL;
-          if (select_valid & SELECT_HAVE_TMO)
-            {
-              timeout = select_timeout;
-              tmo = &timeout;
-            }
-          else
-            tmo = NULL;
-
-          pthread_mutex_unlock (&select_mutex);
-
-          FD_SET (selfds[0], &readfds);
-          if (selfds[0] >= nfds) nfds = selfds[0]+1;
-
-          result = pselect (nfds, &readfds, wfds, NULL, tmo, NULL);
-
-          if (result == 0)
-            ns_send_appdefined (-2);
-          else if (result > 0)
-            {
-              if (FD_ISSET (selfds[0], &readfds))
-                {
-                  if (read (selfds[0], &c, 1) == 1 && c == 's')
-		    waiting = 1;
-                }
-              else
-                {
-                  pthread_mutex_lock (&select_mutex);
-                  if (select_valid & SELECT_HAVE_READ)
-                    select_readfds = readfds;
-                  if (select_valid & SELECT_HAVE_WRITE)
-                    select_writefds = writefds;
-                  if (select_valid & SELECT_HAVE_TMO)
-                    select_timeout = timeout;
-                  pthread_mutex_unlock (&select_mutex);
-
-                  ns_send_appdefined (result);
-                }
-            }
-          waiting = 1;
-        }
-    }
-}
-
 
 
 /* ==========================================================================
@@ -6361,7 +6118,7 @@ - (void)mouseMoved: (NSEvent *)e
                       help_echo_object, help_echo_pos);
     }
 
-  if (emacsframe->mouse_moved && send_appdefined)
+  if (emacsframe->mouse_moved)
     ns_send_appdefined (-1);
 }
 
@@ -7058,8 +6815,7 @@ - (void)windowDidExpose: sender
   SET_FRAME_VISIBLE (emacsframe, 1);
   SET_FRAME_GARBAGED (emacsframe);
 
-  if (send_appdefined)
-    ns_send_appdefined (-1);
+  ns_send_appdefined (-1);
 }
 
 
-- 

Here we go. This seems to work. It turns out there are trade‐offs
between the GUI and network performance. If I remove
NSApp:nextEventMatchingMask from ns_select, eww loads web pages
*significantly* faster, but the scrollbars become effectively
unusable.

-- 
Alan Third





^ permalink raw reply related	[flat|nested] 40+ messages in thread

* bug#25265: [PATCH] Rework NS event handling (bug#25265)
  2016-12-31 16:09                             ` bug#25265: [PATCH] Rework NS event handling (bug#25265) Alan Third
@ 2016-12-31 16:25                               ` Eli Zaretskii
  2016-12-31 16:46                                 ` Alan Third
  2017-01-01 15:03                               ` Alan Third
  1 sibling, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2016-12-31 16:25 UTC (permalink / raw)
  To: Alan Third; +Cc: charles, 25265

> Date: Sat, 31 Dec 2016 16:09:30 +0000
> From: Alan Third <alan@idiocy.org>
> Cc: charles@aurox.ch, 25265@debbugs.gnu.org
> 
> Here we go. This seems to work. It turns out there are trade‐offs
> between the GUI and network performance. If I remove
> NSApp:nextEventMatchingMask from ns_select, eww loads web pages
> *significantly* faster, but the scrollbars become effectively
> unusable.

Wow, thanks a lot!

I cannot review the patch, as I don't really understand the details.
So I suggest to push this and see if users report bugs.

(I presume you have run all the tests, including those in
test/src/thread-tests.el and those posted in the few recent
discussions related to thread-related problems.  If not, I very much
recommend that you do that, to see if the new code passes these
"entry-level" tests.)





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#25265: [PATCH] Rework NS event handling (bug#25265)
  2016-12-31 16:25                               ` Eli Zaretskii
@ 2016-12-31 16:46                                 ` Alan Third
  0 siblings, 0 replies; 40+ messages in thread
From: Alan Third @ 2016-12-31 16:46 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: charles, 25265

On Sat, Dec 31, 2016 at 06:25:58PM +0200, Eli Zaretskii wrote:
> > Date: Sat, 31 Dec 2016 16:09:30 +0000
> > From: Alan Third <alan@idiocy.org>
> > Cc: charles@aurox.ch, 25265@debbugs.gnu.org
> > 
> > Here we go. This seems to work. It turns out there are trade‐offs
> > between the GUI and network performance. If I remove
> > NSApp:nextEventMatchingMask from ns_select, eww loads web pages
> > *significantly* faster, but the scrollbars become effectively
> > unusable.
> 
> Wow, thanks a lot!
> 
> I cannot review the patch, as I don't really understand the details.
> So I suggest to push this and see if users report bugs.

I forgot to say before, I think there may be slightly more lag then
before when using the scroll bars. I don’t really use them so I don’t
know if people will be bothered by it. With NSTrace turned on I can
actually crash Emacs by waving the scroll bars about, but I can’t with
NSTrace off. Everything else seems fast enough. If I push it I suppose
we’ll find out soon enough if it’s acceptable.

> (I presume you have run all the tests, including those in
> test/src/thread-tests.el and those posted in the few recent
> discussions related to thread-related problems.  If not, I very much
> recommend that you do that, to see if the new code passes these
> "entry-level" tests.)

I hadn’t, but I have now. Make check reports 3 unexpected results as
usual, and the log for thread-tests.el shows 27/27 passed.

I tried this from one of the other threads (I’ve not looked through
them completely yet):

(dotimes (x 10) (make-thread (lambda ()
                               (let ((n (random 10)))
                                 (with-current-buffer "z"
                                   (sleep-for n)
                                   (insert (format "Foo:%d\n" n)))))))

and it seems to do what it’s supposed to do.
-- 
Alan Third





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#25265: [PATCH] Rework NS event handling (bug#25265)
  2016-12-31 16:09                             ` bug#25265: [PATCH] Rework NS event handling (bug#25265) Alan Third
  2016-12-31 16:25                               ` Eli Zaretskii
@ 2017-01-01 15:03                               ` Alan Third
  2017-01-01 15:42                                 ` Eli Zaretskii
  1 sibling, 1 reply; 40+ messages in thread
From: Alan Third @ 2017-01-01 15:03 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: charles, 25265

On Sat, Dec 31, 2016 at 04:09:30PM +0000, Alan Third wrote:
> * src/nsterm.m (unwind_apploopnr): Remove.
> (ns_read_socket): Remove references to apploopnr.  Make processing the
> NS event loop conditional on being in the main thread.
> (ns_select): Remove references to apploopnr.  Remove all fd_handler
> related stuff.  Check if there are events waiting on the NS event
> queue rather than running the event loop.  Remove unused variables and
> code.
> (fd_handler): Remove.
> (ns_term_init): Remove creation of fd_handler thread.
> (hold_event, EmacsApp:sendEvent, EmacsView:mouseMoved,
> EmacsView:windowDidExpose): Remove send_appdefined.
> (ns_send_appdefined): Always check the event queue for
> applicationDefined events rather than relying on send_appdefined var.
> * src/nsterm.h: Remove reference to fd_handler method.

OK, I’m running into performance bugs with this almost straight away.

It all looks OK until I start flyspell-mode. Then it appears that
redisplay is only called every two or three keypresses. It looks like
Emacs is still going fine, though, as messages to the modeline appear,
even if the action isn’t immediately displayed in the buffer.

For example, I open up an org file and start flyspell-mode, then I hit
the down arrow which should take me to a heading but the cursor
doesn’t move. Then I hit TAB, and I get a message in the modeline
telling me that the section associated with the heading has been
expanded, but the buffer is still displayed with the cursor on the
previous line and the section not expanded. Finally I hit the down
arrow again and the buffer updates to display the expanded section and
the cursor where I’d expect it.

emacsclient runs with a delay, which I guess corresponds to the
timeout on the NS event queue check.

I’m not at all sure how to fix these problems.

One option I thought about is to wrap the fd’s in ns_select with
NSFileHandle, which should then mean we could look for notifications
on the NS event queue, and emulate select that way. Unfortunately as
far as I can see NSFileHandle only provides an event for ‘data
available for reading’, and it looks like we’d need to be able to spot
write availability too.

(See under Notifications:
https://developer.apple.com/reference/foundation/nsfilehandle)
-- 
Alan Third





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#25265: [PATCH] Rework NS event handling (bug#25265)
  2017-01-01 15:03                               ` Alan Third
@ 2017-01-01 15:42                                 ` Eli Zaretskii
  0 siblings, 0 replies; 40+ messages in thread
From: Eli Zaretskii @ 2017-01-01 15:42 UTC (permalink / raw)
  To: Alan Third; +Cc: charles, 25265

> Date: Sun, 1 Jan 2017 15:03:52 +0000
> From: Alan Third <alan@idiocy.org>
> Cc: charles@aurox.ch, 25265@debbugs.gnu.org
> 
> OK, I’m running into performance bugs with this almost straight away.
> 
> It all looks OK until I start flyspell-mode. Then it appears that
> redisplay is only called every two or three keypresses. It looks like
> Emacs is still going fine, though, as messages to the modeline appear,
> even if the action isn’t immediately displayed in the buffer.
> 
> For example, I open up an org file and start flyspell-mode, then I hit
> the down arrow which should take me to a heading but the cursor
> doesn’t move. Then I hit TAB, and I get a message in the modeline
> telling me that the section associated with the heading has been
> expanded, but the buffer is still displayed with the cursor on the
> previous line and the section not expanded. Finally I hit the down
> arrow again and the buffer updates to display the expanded section and
> the cursor where I’d expect it.
> 
> emacsclient runs with a delay, which I guess corresponds to the
> timeout on the NS event queue check.
> 
> I’m not at all sure how to fix these problems.

Do you understand the problems, though?

The crucial question is why isn't redisplay being called as it was
called before the changes?  One way of answering that would be to run
the previous version under a debugger, put a breakpoint in
redisplay_internal, show a backtrace to see who called it, and
continue.  Then do the same in the new version, and see if you can
spot any differences.

Do you really need to type several keys to get redisplay?  Or is it
enough to type just one and wait for some time?  IOW, if you type one
key, then wait, does redisplay ever happen?

Thanks.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#25265: make-thread crashes in OS X 10.6
  2016-12-24 11:06 bug#25265: make-thread crashes in OS X 10.6 Charles A. Roelli
  2016-12-24 17:51 ` Eli Zaretskii
@ 2017-03-06 20:02 ` Alan Third
  2017-03-08 20:17   ` Charles A. Roelli
  2017-05-02 20:49 ` Alan Third
  2 siblings, 1 reply; 40+ messages in thread
From: Alan Third @ 2017-03-06 20:02 UTC (permalink / raw)
  To: Charles A. Roelli; +Cc: 25265

charles@aurox.ch (Charles A. Roelli) writes:

> Running Emacs -Q on the latest commit (e36a3882),
>
> M-: (make-thread (lambda () "string"))
>
> appears to hang Emacs immediately.

I've just pushed a new fix for this to master. If anyone is able to test
it and let me know if they run into any problems, I would be grateful.

-- 
Alan Third





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#25265: make-thread crashes in OS X 10.6
  2017-03-06 20:02 ` bug#25265: make-thread crashes in OS X 10.6 Alan Third
@ 2017-03-08 20:17   ` Charles A. Roelli
  2017-03-14 14:49     ` Alan Third
  0 siblings, 1 reply; 40+ messages in thread
From: Charles A. Roelli @ 2017-03-08 20:17 UTC (permalink / raw)
  To: Alan Third; +Cc: 25265

On Mon, Mar 06 2017 at 08:02:16 pm, Alan Third wrote:

> charles@aurox.ch (Charles A. Roelli) writes:
>
>> Running Emacs -Q on the latest commit (e36a3882),
>>
>> M-: (make-thread (lambda () "string"))
>>
>> appears to hang Emacs immediately.
>
> I've just pushed a new fix for this to master. If anyone is able to test
> it and let me know if they run into any problems, I would be grateful.

Fixes the bug for me!  Thank you for working on this.  I didn't notice
any adverse changes in Emacs behavior with your patch, but I haven't yet
been running it for too long.  Will report back soon again.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#25265: make-thread crashes in OS X 10.6
  2017-03-08 20:17   ` Charles A. Roelli
@ 2017-03-14 14:49     ` Alan Third
  0 siblings, 0 replies; 40+ messages in thread
From: Alan Third @ 2017-03-14 14:49 UTC (permalink / raw)
  To: Charles A. Roelli; +Cc: 25265

On Wed, Mar 08, 2017 at 09:17:48PM +0100, Charles A. Roelli wrote:
> On Mon, Mar 06 2017 at 08:02:16 pm, Alan Third wrote:
> 
> > charles@aurox.ch (Charles A. Roelli) writes:
> >
> >> Running Emacs -Q on the latest commit (e36a3882),
> >>
> >> M-: (make-thread (lambda () "string"))
> >>
> >> appears to hang Emacs immediately.
> >
> > I've just pushed a new fix for this to master. If anyone is able to test
> > it and let me know if they run into any problems, I would be grateful.
> 
> Fixes the bug for me!  Thank you for working on this.  I didn't notice
> any adverse changes in Emacs behavior with your patch, but I haven't yet
> been running it for too long.  Will report back soon again.

I’ve reverted the changes as it was causing Emacs to hang under
certain circumstances.

I’m all out of ideas for how to sort this concurrency thing.
-- 
Alan Third





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#25265: make-thread crashes in OS X 10.6
  2016-12-24 11:06 bug#25265: make-thread crashes in OS X 10.6 Charles A. Roelli
  2016-12-24 17:51 ` Eli Zaretskii
  2017-03-06 20:02 ` bug#25265: make-thread crashes in OS X 10.6 Alan Third
@ 2017-05-02 20:49 ` Alan Third
  2017-06-12 19:32   ` Charles A. Roelli
  2 siblings, 1 reply; 40+ messages in thread
From: Alan Third @ 2017-05-02 20:49 UTC (permalink / raw)
  To: Charles A. Roelli; +Cc: 25265

[-- Attachment #1: Type: text/plain, Size: 997 bytes --]

On Sat, Dec 24, 2016 at 12:06:45PM +0100, Charles A. Roelli wrote:
> Running Emacs -Q on the latest commit (e36a3882),
> 
> M-: (make-thread (lambda () "string"))
> 
> appears to hang Emacs immediately.

I’ve been working on this on and off for a while now, and I just can’t
fix it.

I’ve attached two patches that together are the best I’ve managed to
achieve, but unfortunately it randomly freezes up with 100% CPU usage.

I’ve not yet managed to work out what it’s doing when it goes into the
100% CPU loop. I can only assume I’ve missed some crucial case in
ns_select or something.

The first patch stops the NS port from using SIGIO, as it seems to be
the source of a number of problems. The second removes the NS event
loop in ns_select as it requires block_input/unblock_input wrapped
around it, but that’s what’s causing the crash in make-thread.

Instead it just looks for whether there is a new NS event as I don’t
think that requires blocking input.

-- 
Alan Third

[-- Attachment #2: Remove SIGIO stuff --]
[-- Type: text/plain, Size: 2831 bytes --]

From bff3dbfe43a24ee924edd777a7a876b627f94f11 Mon Sep 17 00:00:00 2001
From: Alan Third <alan@idiocy.org>
Date: Sat, 29 Apr 2017 23:35:16 +0100
Subject: [PATCH] Don't use SIGIO in NS

* src/keyboard.c (handle_user_signal) [HAVE_NS]: Don't handle SIGIO.
* src/keyboard.h (handle_input_available_signal) [HAVE_NS]: Make
available to external code.
* src/nsterm.m (hold_event, ns_Select): Call handle_input_available_signal
directly.
* src/sysdep.c (emacs_sigaction_init) [HAVE_NS]: Ignore SIGIO.
---
 src/keyboard.c |  3 ++-
 src/keyboard.h | 11 +++++++++++
 src/nsterm.m   |  4 ++--
 src/sysdep.c   |  2 +-
 4 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/src/keyboard.c b/src/keyboard.c
index c9fa2a9f5e..d22f7b08c6 100644
--- a/src/keyboard.c
+++ b/src/keyboard.c
@@ -7264,7 +7264,8 @@ handle_user_signal (int sig)
           }
 
 	p->npending++;
-#ifdef USABLE_SIGIO
+#if defined (USABLE_SIGIO) && !defined (HAVE_NS)
+        /* Dont use the interrupt handler in NS. */
 	if (interrupt_input)
 	  handle_input_available_signal (sig);
 	else
diff --git a/src/keyboard.h b/src/keyboard.h
index 2219c01135..29279bef86 100644
--- a/src/keyboard.h
+++ b/src/keyboard.h
@@ -495,6 +495,17 @@ extern void mark_kboards (void);
 extern const char *const lispy_function_keys[];
 #endif
 
+#ifdef HAVE_NS
+/* The NS port only uses SIGIO internally, and even then only in two
+   places in the code, but it causes crashes if certain code is not
+   properly wrapped in block_input/unblock_input.
+
+   Since it's only used internally, we should be able to disable it,
+   and call the SIGIO handler directly.
+ */
+extern void handle_input_available_signal (int sig);
+#endif
+
 extern char const DEV_TTY[];
 
 INLINE_HEADER_END
diff --git a/src/nsterm.m b/src/nsterm.m
index c22c5a70ba..00b7a89472 100644
--- a/src/nsterm.m
+++ b/src/nsterm.m
@@ -472,7 +472,7 @@ - (NSColor *)colorUsingDefaultColorSpace
 
   hold_event_q.q[hold_event_q.nr++] = *event;
   /* Make sure ns_read_socket is called, i.e. we have input.  */
-  raise (SIGIO);
+  handle_input_available_signal (SIGIO);
   send_appdefined = YES;
 }
 
@@ -4289,7 +4289,7 @@ in certain situations (rapid incoming events).
   if (hold_event_q.nr > 0)
     {
       /* We already have events pending. */
-      raise (SIGIO);
+      handle_input_available_signal (SIGIO);
       errno = EINTR;
       return -1;
     }
diff --git a/src/sysdep.c b/src/sysdep.c
index 91b2a5cb94..70fdf92964 100644
--- a/src/sysdep.c
+++ b/src/sysdep.c
@@ -1606,7 +1606,7 @@ emacs_sigaction_init (struct sigaction *action, signal_handler_t handler)
     {
       sigaddset (&action->sa_mask, SIGINT);
       sigaddset (&action->sa_mask, SIGQUIT);
-#ifdef USABLE_SIGIO
+#if defined (USABLE_SIGIO) && !defined (HAVE_NS)
       sigaddset (&action->sa_mask, SIGIO);
 #endif
     }
-- 
2.12.0


[-- Attachment #3: Remove NS event loop from ns_select --]
[-- Type: text/plain, Size: 5727 bytes --]

From f759adb30eaf12af474057ebbc98f5f76cc590bf Mon Sep 17 00:00:00 2001
From: Alan Third <alan@idiocy.org>
Date: Sun, 30 Apr 2017 10:04:16 +0100
Subject: [PATCH] Remove event loop from ns_select

* src/nsterm.m (ns_select): Remove event processing loop and replace
with simple test for a new event.
(ns_send_appdefined): Remove redundant timer code.
---
 src/nsterm.m | 92 ++++++++++++++++++++++++++++--------------------------------
 1 file changed, 43 insertions(+), 49 deletions(-)

diff --git a/src/nsterm.m b/src/nsterm.m
index 00b7a89472..462ab176c9 100644
--- a/src/nsterm.m
+++ b/src/nsterm.m
@@ -282,7 +282,6 @@ - (NSColor *)colorUsingDefaultColorSpace
 static BOOL send_appdefined = YES;
 #define NO_APPDEFINED_DATA (-8)
 static int last_appdefined_event_data = NO_APPDEFINED_DATA;
-static NSTimer *timed_entry = 0;
 static NSTimer *scroll_repeat_entry = nil;
 static fd_set select_readfds, select_writefds;
 enum { SELECT_HAVE_READ = 1, SELECT_HAVE_WRITE = 2, SELECT_HAVE_TMO = 4 };
@@ -4074,14 +4073,6 @@ in certain situations (rapid incoming events).
       /* We only need one NX_APPDEFINED event to stop NXApp from running.  */
       send_appdefined = NO;
 
-      /* Don't need wakeup timer any more */
-      if (timed_entry)
-        {
-          [timed_entry invalidate];
-          [timed_entry release];
-          timed_entry = nil;
-        }
-
       nxev = [NSEvent otherEventWithType: NSEventTypeApplicationDefined
                                 location: NSMakePoint (0, 0)
                            modifierFlags: 0
@@ -4277,9 +4268,11 @@ in certain situations (rapid incoming events).
 {
   int result;
   int t, k, nr = 0;
-  struct input_event event;
   char c;
 
+  NSDate *timeout_date = nil;
+  NSEvent *ns_event;
+
   NSTRACE_WHEN (NSTRACE_GROUP_EVENTS, "ns_select");
 
 #ifdef HAVE_NATIVE_FS
@@ -4337,70 +4330,71 @@ in certain situations (rapid incoming events).
       /* Inform fd_handler that select should be called */
       c = 'g';
       emacs_write_sig (selfds[1], &c, 1);
+      /* We rely on fd_handler timing out to cause
+         nextEventMatchingMask below to return, so set it's timeout to
+         an unreasonably long time. */
+      timeout_date = [NSDate distantFuture];
     }
   else if (nr == 0 && timeout)
     {
-      /* No file descriptor, just a timeout, no need to wake fd_handler  */
+      /* No file descriptor, just a timeout, no need to wake
+         fd_handler. Set nextEventMatchingMask timeout. */
       double time = timespectod (*timeout);
-      timed_entry = [[NSTimer scheduledTimerWithTimeInterval: time
-                                                      target: NSApp
-                                                    selector:
-                                  @selector (timeout_handler:)
-                                                    userInfo: 0
-                                                     repeats: NO]
-                      retain];
-    }
-  else /* No timeout and no file descriptors, can this happen?  */
-    {
-      /* Send appdefined so we exit from the loop */
-      ns_send_appdefined (-1);
+      timeout_date = [NSDate dateWithTimeIntervalSinceNow: time];
     }
 
-  block_input ();
-  ns_init_events (&event);
-
-  [NSApp run];
+  /* Listen for a new NSEvent. */
+  ns_event = [NSApp nextEventMatchingMask: NSEventMaskAny
+                                untilDate: timeout_date
+                                   inMode: NSDefaultRunLoopMode
+                                  dequeue: NO];
 
-  ns_finish_events ();
   if (nr > 0 && readfds)
     {
       c = 's';
       emacs_write_sig (selfds[1], &c, 1);
     }
-  unblock_input ();
-
-  t = last_appdefined_event_data;
 
-  if (t != NO_APPDEFINED_DATA)
+  if (ns_event != nil)
     {
-      last_appdefined_event_data = NO_APPDEFINED_DATA;
-
-      if (t == -2)
+      if ([ns_event type] == NSEventTypeApplicationDefined)
         {
-          /* The NX_APPDEFINED event we received was a timeout. */
-          result = 0;
+          if ([ns_event data1] < 0)
+            {
+              /* The NX_APPDEFINED event we received was a timeout. */
+              result = 0;
+            }
+          else
+            {
+              /* Received back from select () in fd_handler; copy the results */
+              pthread_mutex_lock (&select_mutex);
+              if (readfds) *readfds = select_readfds;
+              if (writefds) *writefds = select_writefds;
+              pthread_mutex_unlock (&select_mutex);
+              result = [ns_event data1];
+            }
+
+          /* Remove the NX_APPDEFINED event from the queue as it's no
+             longer needed. */
+          [NSApp nextEventMatchingMask: NSEventMaskAny
+                             untilDate: nil
+                                inMode: NSDefaultRunLoopMode
+                               dequeue: YES];
         }
-      else if (t == -1)
+      else
         {
-          /* The NX_APPDEFINED event we received was the result of
-             at least one real input event arriving.  */
+          /* A real NSEvent came in. */
+          handle_input_available_signal (0);
           errno = EINTR;
           result = -1;
         }
-      else
-        {
-          /* Received back from select () in fd_handler; copy the results */
-          pthread_mutex_lock (&select_mutex);
-          if (readfds) *readfds = select_readfds;
-          if (writefds) *writefds = select_writefds;
-          pthread_mutex_unlock (&select_mutex);
-          result = t;
-        }
     }
   else
     {
       errno = EINTR;
       result = -1;
+      /* Reading from the NSEvent queue timed out. */
+      result = 0;
     }
 
   return result;
-- 
2.12.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* bug#25265: make-thread crashes in OS X 10.6
  2017-05-02 20:49 ` Alan Third
@ 2017-06-12 19:32   ` Charles A. Roelli
  2017-06-13 20:46     ` Alan Third
  0 siblings, 1 reply; 40+ messages in thread
From: Charles A. Roelli @ 2017-06-12 19:32 UTC (permalink / raw)
  To: Alan Third; +Cc: 25265

The issue seems to be improved now (not sure what changed it).

I tried this sample code, and it worked without a crash:

(dotimes (i 100)
   (make-thread (lambda () "string")))

Then I tried this code, and there's a crash every few runs (or
sometimes, an infinite loop that can't be exited without killing the
process).  The crash normally happens when I supply input (via the
keyboard), and the loop seems to be happen randomly.

(make-thread (lambda ()
            (dotimes (i 10)
               (sit-for 1)
               (goto-char (random (buffer-size))))))

I also noticed GDB's I/O buffer printing many of these warnings as
soon as the `make-thread' form was called:

2017-06-12 21:13:55.943 Emacs[10829:6683] *** __NSAutoreleaseNoPool(): 
Object 0x10216bf50 of class NSBezierPath autoreleased with no pool in 
place - just leaking
2017-06-12 21:13:55.944 Emacs[10829:6683] *** __NSAutoreleaseNoPool(): 
Object 0x101ec41b0 of class NSBezierPath autoreleased with no pool in 
place - just leaking
2017-06-12 21:13:56.443 Emacs[10829:6683] *** __NSAutoreleaseNoPool(): 
Object 0x10216c0f0 of class NSBezierPath autoreleased with no pool in 
place - just leaking
2017-06-12 21:13:56.444 Emacs[10829:6683] *** __NSAutoreleaseNoPool(): 
Object 0x101ec43e0 of class NSBezierPath autoreleased with no pool in 
place - just leaking



On 02/05/2017 22:49, Alan Third wrote:
> On Sat, Dec 24, 2016 at 12:06:45PM +0100, Charles A. Roelli wrote:
>> Running Emacs -Q on the latest commit (e36a3882),
>>
>> M-: (make-thread (lambda () "string"))
>>
>> appears to hang Emacs immediately.
> I’ve been working on this on and off for a while now, and I just can’t
> fix it.
>
> I’ve attached two patches that together are the best I’ve managed to
> achieve, but unfortunately it randomly freezes up with 100% CPU usage.
>
> I’ve not yet managed to work out what it’s doing when it goes into the
> 100% CPU loop. I can only assume I’ve missed some crucial case in
> ns_select or something.
>
> The first patch stops the NS port from using SIGIO, as it seems to be
> the source of a number of problems. The second removes the NS event
> loop in ns_select as it requires block_input/unblock_input wrapped
> around it, but that’s what’s causing the crash in make-thread.
>
> Instead it just looks for whether there is a new NS event as I don’t
> think that requires blocking input.
>






^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#25265: make-thread crashes in OS X 10.6
  2017-06-12 19:32   ` Charles A. Roelli
@ 2017-06-13 20:46     ` Alan Third
  2017-06-15 18:57       ` Charles A. Roelli
  0 siblings, 1 reply; 40+ messages in thread
From: Alan Third @ 2017-06-13 20:46 UTC (permalink / raw)
  To: Charles A. Roelli; +Cc: 25265

[-- Attachment #1: Type: text/plain, Size: 1984 bytes --]

On Mon, Jun 12, 2017 at 09:32:21PM +0200, Charles A. Roelli wrote:
> The issue seems to be improved now (not sure what changed it).
> 
> I tried this sample code, and it worked without a crash:
> 
> (dotimes (i 100)
>   (make-thread (lambda () "string")))
> 
> Then I tried this code, and there's a crash every few runs (or
> sometimes, an infinite loop that can't be exited without killing the
> process).  The crash normally happens when I supply input (via the
> keyboard), and the loop seems to be happen randomly.
> 
> (make-thread (lambda ()
>            (dotimes (i 10)
>               (sit-for 1)
>               (goto-char (random (buffer-size))))))

I went back and noticed an approach Eli suggested that I had given up
on, but understood this time round.

I’ve attached a patch that seems to not crash. It introduces a warning
or two, and test/src/thread-tests.el randomly fails up to two tests,
but no crashes afaics.

> I also noticed GDB's I/O buffer printing many of these warnings as
> soon as the `make-thread' form was called:
> 
> 2017-06-12 21:13:55.943 Emacs[10829:6683] *** __NSAutoreleaseNoPool():
> Object 0x10216bf50 of class NSBezierPath autoreleased with no pool in place
> - just leaking
> 2017-06-12 21:13:55.944 Emacs[10829:6683] *** __NSAutoreleaseNoPool():
> Object 0x101ec41b0 of class NSBezierPath autoreleased with no pool in place
> - just leaking
> 2017-06-12 21:13:56.443 Emacs[10829:6683] *** __NSAutoreleaseNoPool():
> Object 0x10216c0f0 of class NSBezierPath autoreleased with no pool in place
> - just leaking
> 2017-06-12 21:13:56.444 Emacs[10829:6683] *** __NSAutoreleaseNoPool():
> Object 0x101ec43e0 of class NSBezierPath autoreleased with no pool in place
> - just leaking

It must be a sub‐thread running some objective c code without an
autorelease pool. It would be nice to be able to work out where that’s
happening. Perhaps threads will need to define an autorelease pool on
creation, and drain it on ending...
-- 
Alan Third

[-- Attachment #2: 0001-Fix-threads-on-NS-bug-25265.patch --]
[-- Type: text/plain, Size: 2624 bytes --]

From 22c33d9f2516dd488c3d76ccb3360e4855df7bb6 Mon Sep 17 00:00:00 2001
From: Alan Third <alan@idiocy.org>
Date: Tue, 13 Jun 2017 21:40:25 +0100
Subject: [PATCH] Fix threads on NS (bug#25265)

---
 src/nsterm.h  |  2 +-
 src/nsterm.m  | 10 ++++++++--
 src/process.c |  4 ++++
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/src/nsterm.h b/src/nsterm.h
index bed0b92c79..10d5db2cff 100644
--- a/src/nsterm.h
+++ b/src/nsterm.h
@@ -1226,7 +1226,7 @@ extern void x_set_z_group (struct frame *f, Lisp_Object new_value,
                            Lisp_Object old_value);
 extern int ns_select (int nfds, fd_set *readfds, fd_set *writefds,
 		      fd_set *exceptfds, struct timespec const *timeout,
-		      sigset_t const *sigmask);
+		      sigset_t *sigmask);
 extern unsigned long ns_get_rgb_color (struct frame *f,
                                        float r, float g, float b, float a);
 
diff --git a/src/nsterm.m b/src/nsterm.m
index e05dbf45fb..8fc27ed752 100644
--- a/src/nsterm.m
+++ b/src/nsterm.m
@@ -4294,7 +4294,7 @@ in certain situations (rapid incoming events).
 int
 ns_select (int nfds, fd_set *readfds, fd_set *writefds,
 	   fd_set *exceptfds, struct timespec const *timeout,
-	   sigset_t const *sigmask)
+	   sigset_t *sigmask)
 /* --------------------------------------------------------------------------
      Replacement for select, checking for events
    -------------------------------------------------------------------------- */
@@ -4327,7 +4327,13 @@ in certain situations (rapid incoming events).
   if (NSApp == nil
       || ![NSThread isMainThread]
       || (timeout && timeout->tv_sec == 0 && timeout->tv_nsec == 0))
-    return pselect (nfds, readfds, writefds, exceptfds, timeout, sigmask);
+    return thread_select(pselect, nfds, readfds, writefds,
+                         exceptfds, timeout, sigmask);
+  else
+    {
+      struct timespec t = {0, 0};
+      thread_select(pselect, 0, NULL, NULL, NULL, &t, sigmask);
+    }
 
   [outerpool release];
   outerpool = [[NSAutoreleasePool alloc] init];
diff --git a/src/process.c b/src/process.c
index 2a1c2eecde..eec907e14a 100644
--- a/src/process.c
+++ b/src/process.c
@@ -5371,6 +5371,10 @@ wait_reading_process_output (intmax_t time_limit, int nsecs, int read_kbd,
 	  nfds = xg_select (max_desc + 1,
 			    &Available, (check_write ? &Writeok : 0),
 			    NULL, &timeout, NULL);
+#elif defined HAVE_NS
+          nfds = ns_select (max_desc + 1,
+			    &Available, (check_write ? &Writeok : 0),
+			    NULL, &timeout, NULL);
 #else  /* !HAVE_GLIB */
 	  nfds = thread_select (
 # ifdef HAVE_NS
-- 
2.12.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* bug#25265: make-thread crashes in OS X 10.6
  2017-06-13 20:46     ` Alan Third
@ 2017-06-15 18:57       ` Charles A. Roelli
  2017-06-15 19:04         ` Alan Third
  0 siblings, 1 reply; 40+ messages in thread
From: Charles A. Roelli @ 2017-06-15 18:57 UTC (permalink / raw)
  To: Alan Third; +Cc: 25265

Well done, it works here without a crash.

I ran all the tests and 'thread-tests.el' passed.  But I couldn't find
how to rerun it.  What command did you use?


On 13/06/2017 22:46, Alan Third wrote:
> On Mon, Jun 12, 2017 at 09:32:21PM +0200, Charles A. Roelli wrote:
>> The issue seems to be improved now (not sure what changed it).
>>
>> I tried this sample code, and it worked without a crash:
>>
>> (dotimes (i 100)
>>    (make-thread (lambda () "string")))
>>
>> Then I tried this code, and there's a crash every few runs (or
>> sometimes, an infinite loop that can't be exited without killing the
>> process).  The crash normally happens when I supply input (via the
>> keyboard), and the loop seems to be happen randomly.
>>
>> (make-thread (lambda ()
>>             (dotimes (i 10)
>>                (sit-for 1)
>>                (goto-char (random (buffer-size))))))
> I went back and noticed an approach Eli suggested that I had given up
> on, but understood this time round.
>
> I’ve attached a patch that seems to not crash. It introduces a warning
> or two, and test/src/thread-tests.el randomly fails up to two tests,
> but no crashes afaics.
>
>> I also noticed GDB's I/O buffer printing many of these warnings as
>> soon as the `make-thread' form was called:
>>
>> 2017-06-12 21:13:55.943 Emacs[10829:6683] *** __NSAutoreleaseNoPool():
>> Object 0x10216bf50 of class NSBezierPath autoreleased with no pool in place
>> - just leaking
>> 2017-06-12 21:13:55.944 Emacs[10829:6683] *** __NSAutoreleaseNoPool():
>> Object 0x101ec41b0 of class NSBezierPath autoreleased with no pool in place
>> - just leaking
>> 2017-06-12 21:13:56.443 Emacs[10829:6683] *** __NSAutoreleaseNoPool():
>> Object 0x10216c0f0 of class NSBezierPath autoreleased with no pool in place
>> - just leaking
>> 2017-06-12 21:13:56.444 Emacs[10829:6683] *** __NSAutoreleaseNoPool():
>> Object 0x101ec43e0 of class NSBezierPath autoreleased with no pool in place
>> - just leaking
> It must be a sub‐thread running some objective c code without an
> autorelease pool. It would be nice to be able to work out where that’s
> happening. Perhaps threads will need to define an autorelease pool on
> creation, and drain it on ending...






^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#25265: make-thread crashes in OS X 10.6
  2017-06-15 18:57       ` Charles A. Roelli
@ 2017-06-15 19:04         ` Alan Third
  2017-06-15 19:14           ` Noam Postavsky
  2017-06-16 19:45           ` Alan Third
  0 siblings, 2 replies; 40+ messages in thread
From: Alan Third @ 2017-06-15 19:04 UTC (permalink / raw)
  To: Charles A. Roelli; +Cc: 25265

On Thu, Jun 15, 2017 at 08:57:45PM +0200, Charles A. Roelli wrote:
> Well done, it works here without a crash.
> 
> I ran all the tests and 'thread-tests.el' passed.  But I couldn't find
> how to rerun it.  What command did you use?

M-x ert<RET><RET>

Last time I tried I couldn’t get two tests to fail, so I don’t know
what the second one was. The other test fails more often than not
here.

I think I’ll probably have to go back through old threading bugs and
see if I can pick up any hints as to what might cause the test to
fail.

My output:

Selector: t
Passed:  27
Failed:  1 (1 unexpected)
Skipped: 0
Total:   28/28

Started at:   2017-06-15 20:03:07+0100
Finished.
Finished at:  2017-06-15 20:03:12+0100

.F..........................

F thread-signal-early
    Test signaling a thread as soon as it is started by the OS.
    (ert-test-failed
     ((should-not
       (thread-alive-p thread))
      :form
      (thread-alive-p #<thread 0x102238000>)
      :value t))

-- 
Alan Third





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#25265: make-thread crashes in OS X 10.6
  2017-06-15 19:04         ` Alan Third
@ 2017-06-15 19:14           ` Noam Postavsky
  2017-06-16 19:45           ` Alan Third
  1 sibling, 0 replies; 40+ messages in thread
From: Noam Postavsky @ 2017-06-15 19:14 UTC (permalink / raw)
  To: Alan Third; +Cc: Charles A. Roelli, 25265

On Thu, Jun 15, 2017 at 3:04 PM, Alan Third <alan@idiocy.org> wrote:
> On Thu, Jun 15, 2017 at 08:57:45PM +0200, Charles A. Roelli wrote:
>>
>> I ran all the tests and 'thread-tests.el' passed.  But I couldn't find
>> how to rerun it.  What command did you use?
>
> M-x ert<RET><RET>

R (ert-results-rerun-all-tests) in the *ert* buffer should also work.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#25265: make-thread crashes in OS X 10.6
  2017-06-15 19:04         ` Alan Third
  2017-06-15 19:14           ` Noam Postavsky
@ 2017-06-16 19:45           ` Alan Third
  2017-06-16 20:05             ` Noam Postavsky
  1 sibling, 1 reply; 40+ messages in thread
From: Alan Third @ 2017-06-16 19:45 UTC (permalink / raw)
  To: Charles A. Roelli; +Cc: 25265

On Thu, Jun 15, 2017 at 08:04:32PM +0100, Alan Third wrote:
> Last time I tried I couldn’t get two tests to fail, so I don’t know
> what the second one was. The other test fails more often than not
> here.

I got two to fail this time:

Selector: t
Passed:  26
Failed:  2 (2 unexpected)
Skipped: 0
Total:   28/28

Started at:   2017-06-16 20:36:24+0100
Finished.
Finished at:  2017-06-16 20:36:29+0100

.F........F.................

F thread-signal-early
    Test signaling a thread as soon as it is started by the OS.
    (ert-test-failed
     ((should-not
       (thread-alive-p thread))
      :form
      (thread-alive-p #<thread 0x102151c50>)
      :value t))

F threads-condvar-wait
    test waiting on conditional variable
    (ert-test-failed
     ((should
       (=
	(length
	 (all-threads))
	1))
      :form
      (= 2 1)
      :value nil))


Anyone got any bright ideas where to start with debugging these? The
first one looks like the thread’s starting too quick?

The second looks like the thread’s not dying quick enough?

Is this possibly just an artifact of the way that ns_select calls
thread_select then afterwards does its actual pselect/NS Event stuff?

-- 
Alan Third





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#25265: make-thread crashes in OS X 10.6
  2017-06-16 19:45           ` Alan Third
@ 2017-06-16 20:05             ` Noam Postavsky
  2017-06-16 20:51               ` Alan Third
  0 siblings, 1 reply; 40+ messages in thread
From: Noam Postavsky @ 2017-06-16 20:05 UTC (permalink / raw)
  To: Alan Third; +Cc: Charles A. Roelli, 25265

On Fri, Jun 16, 2017 at 3:45 PM, Alan Third <alan@idiocy.org> wrote:
> The first one looks like the thread’s starting too quick?

I think it's rather failing to be killed by the `thread-signal' call.

> The second looks like the thread’s not dying quick enough?

Same for that one. I don't know much about the thread internals, and
even less macOS internals though...





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#25265: make-thread crashes in OS X 10.6
  2017-06-16 20:05             ` Noam Postavsky
@ 2017-06-16 20:51               ` Alan Third
  2017-06-18 13:05                 ` Charles A. Roelli
  0 siblings, 1 reply; 40+ messages in thread
From: Alan Third @ 2017-06-16 20:51 UTC (permalink / raw)
  To: Noam Postavsky; +Cc: Charles A. Roelli, 25265

[-- Attachment #1: Type: text/plain, Size: 1011 bytes --]

On Fri, Jun 16, 2017 at 04:05:58PM -0400, Noam Postavsky wrote:
> On Fri, Jun 16, 2017 at 3:45 PM, Alan Third <alan@idiocy.org> wrote:
> > The first one looks like the thread’s starting too quick?
> 
> I think it's rather failing to be killed by the `thread-signal' call.

Ah, yes. I completely misread the test.

> > The second looks like the thread’s not dying quick enough?
> 
> Same for that one. I don't know much about the thread internals, and
> even less macOS internals though...

Well, I’ve discovered that if I keep the mouse moving over the Emacs
frame while the tests are running then none of them fail, so I think
it’s something to do with ns_select jamming everything up for the
duration of its timeout.

I could probably get round that by getting the thread to fire off an
NS app defined event when it dies. That would break ns_select out of
its event loop, presumably letting the dying thread actually die...

Amazingly that seems to have worked! Updated patch attached.
-- 
Alan Third

[-- Attachment #2: 0001-Fix-threads-on-NS-bug-25265.patch --]
[-- Type: text/plain, Size: 4787 bytes --]

From ab35f15d5091c29011f6a538009612f1b5c5137d Mon Sep 17 00:00:00 2001
From: Alan Third <alan@idiocy.org>
Date: Tue, 13 Jun 2017 21:40:25 +0100
Subject: [PATCH] Fix threads on NS (bug#25265)

src/nsterm.h (ns_select): Compiler doesn't like sigmask being const.
(ns_select_break) [HAVE_PTHREAD]: New function.
src/nsterm.m (ns_select): Call thread_select from within ns_select.
(ns_select_break) [HAVE_PTHREAD]: New function.
(ns_send_appdefined): Don't wait for main thread when sending app
defined event.
src/process.c (wait_reading_process_output): Call thread_select from
within ns_select.
src/systhread.c (sys_cond_broadcast) [HAVE_NS]: Break ns_select out of
its event loop.
---
 src/nsterm.h    |  7 +++++--
 src/nsterm.m    | 21 +++++++++++++++++----
 src/process.c   | 13 ++++++-------
 src/systhread.c |  7 +++++++
 4 files changed, 35 insertions(+), 13 deletions(-)

diff --git a/src/nsterm.h b/src/nsterm.h
index bed0b92c79..eac54e1bcf 100644
--- a/src/nsterm.h
+++ b/src/nsterm.h
@@ -1225,8 +1225,11 @@ extern void x_set_no_accept_focus (struct frame *f, Lisp_Object new_value,
 extern void x_set_z_group (struct frame *f, Lisp_Object new_value,
                            Lisp_Object old_value);
 extern int ns_select (int nfds, fd_set *readfds, fd_set *writefds,
-		      fd_set *exceptfds, struct timespec const *timeout,
-		      sigset_t const *sigmask);
+		      fd_set *exceptfds, struct timespec *timeout,
+		      sigset_t *sigmask);
+#ifdef HAVE_PTHREAD
+extern void ns_select_break (void);
+#endif
 extern unsigned long ns_get_rgb_color (struct frame *f,
                                        float r, float g, float b, float a);
 
diff --git a/src/nsterm.m b/src/nsterm.m
index e05dbf45fb..29a45ed851 100644
--- a/src/nsterm.m
+++ b/src/nsterm.m
@@ -4068,7 +4068,7 @@ overwriting cursor (usually when cursor on a tab) */
       app->nextappdefined = value;
       [app performSelectorOnMainThread:@selector (sendFromMainThread:)
                             withObject:nil
-                         waitUntilDone:YES];
+                         waitUntilDone:NO];
       return;
     }
 
@@ -4293,8 +4293,8 @@ in certain situations (rapid incoming events).
 
 int
 ns_select (int nfds, fd_set *readfds, fd_set *writefds,
-	   fd_set *exceptfds, struct timespec const *timeout,
-	   sigset_t const *sigmask)
+	   fd_set *exceptfds, struct timespec *timeout,
+	   sigset_t *sigmask)
 /* --------------------------------------------------------------------------
      Replacement for select, checking for events
    -------------------------------------------------------------------------- */
@@ -4327,7 +4327,13 @@ in certain situations (rapid incoming events).
   if (NSApp == nil
       || ![NSThread isMainThread]
       || (timeout && timeout->tv_sec == 0 && timeout->tv_nsec == 0))
-    return pselect (nfds, readfds, writefds, exceptfds, timeout, sigmask);
+    return thread_select(pselect, nfds, readfds, writefds,
+                         exceptfds, timeout, sigmask);
+  else
+    {
+      struct timespec t = {0, 0};
+      thread_select(pselect, 0, NULL, NULL, NULL, &t, sigmask);
+    }
 
   [outerpool release];
   outerpool = [[NSAutoreleasePool alloc] init];
@@ -4430,6 +4436,13 @@ in certain situations (rapid incoming events).
   return result;
 }
 
+#ifdef HAVE_PTHREAD
+void
+ns_select_break ()
+{
+  ns_send_appdefined(-1);
+}
+#endif
 
 
 /* ==========================================================================
diff --git a/src/process.c b/src/process.c
index 2a1c2eecde..abd017bb90 100644
--- a/src/process.c
+++ b/src/process.c
@@ -5371,14 +5371,13 @@ wait_reading_process_output (intmax_t time_limit, int nsecs, int read_kbd,
 	  nfds = xg_select (max_desc + 1,
 			    &Available, (check_write ? &Writeok : 0),
 			    NULL, &timeout, NULL);
+#elif defined HAVE_NS
+          /* And NS builds call thread_select in ns_select. */
+          nfds = ns_select (max_desc + 1,
+			    &Available, (check_write ? &Writeok : 0),
+			    NULL, &timeout, NULL);
 #else  /* !HAVE_GLIB */
-	  nfds = thread_select (
-# ifdef HAVE_NS
-				ns_select
-# else
-				pselect
-# endif
-				, max_desc + 1,
+	  nfds = thread_select (pselect, max_desc + 1,
 				&Available,
 				(check_write ? &Writeok : 0),
 				NULL, &timeout, NULL);
diff --git a/src/systhread.c b/src/systhread.c
index a84060c18f..53dfe7a9c4 100644
--- a/src/systhread.c
+++ b/src/systhread.c
@@ -20,6 +20,10 @@ along with GNU Emacs.  If not, see <http://www.gnu.org/licenses/>.  */
 #include <setjmp.h>
 #include "lisp.h"
 
+#ifdef HAVE_NS
+#include "nsterm.h"
+#endif
+
 #ifndef THREADS_ENABLED
 
 void
@@ -130,6 +134,9 @@ void
 sys_cond_broadcast (sys_cond_t *cond)
 {
   pthread_cond_broadcast (cond);
+#ifdef HAVE_NS
+  ns_select_break ();
+#endif
 }
 
 void
-- 
2.12.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* bug#25265: make-thread crashes in OS X 10.6
  2017-06-16 20:51               ` Alan Third
@ 2017-06-18 13:05                 ` Charles A. Roelli
  2017-06-18 14:01                   ` Alan Third
  0 siblings, 1 reply; 40+ messages in thread
From: Charles A. Roelli @ 2017-06-18 13:05 UTC (permalink / raw)
  To: Alan Third, Noam Postavsky; +Cc: 25265

The tests sometimes all pass with the updated patch, but then I got
this once (the errors you've seen previously, I think):

Selector: t
Passed:  26
Failed:  2 (2 unexpected)
Skipped: 0
Total:   28/28

Started at:   2017-06-18 14:54:43+0200
Finished.
Finished at:  2017-06-18 14:54:47+0200

.F........F.................

F thread-signal-early
     Test signaling a thread as soon as it is started by the OS.
     (ert-test-failed
      ((should-not
        (thread-alive-p thread))
       :form
       (thread-alive-p #<thread 0x102da5498>)
       :value t))

F threads-condvar-wait
     test waiting on conditional variable
     (ert-test-failed
      ((should
        (=
     (length
      (all-threads))
     1))
       :form
       (= 2 1)
       :value nil))

from nextstep/Emacs.app/Contents/MacOS/Emacs -Q -l ert -l 
test/src/thread-tests.el


I also tried this way of running the tests in batch mode:

cd test && rm -rf src/thread-tests.log && make src/thread-tests.log

but got a segfault:

   GEN      src/thread-tests.log
/bin/sh: line 1: 87214 Segmentation fault      HOME=/nonexistent 
EMACSLOADPATH= LC_ALL=C 
EMACS_TEST_DIRECTORY=/Volumes/Toblerone/Code/emacs-devel/test 
"../src/emacs" -batch --no-site-file --no-site-lisp -L ":." -l ert -l 
src/thread-tests.el --eval "(ert-run-tests-batch-and-exit nil)" > 
src/thread-tests.log 2>&1
Running 28 tests (2017-06-18 15:00:03+0200)
make: *** [src/thread-tests.log] Error 139

I'm not sure where the problem is happening (or if it's specific to
NS).  The file 'src/thread-tests.log' also doesn't give any extra
information (besides 'Running 28 tests (2017-06-18 15:00:03+0200)').



On 16/06/2017 22:51, Alan Third wrote:
> On Fri, Jun 16, 2017 at 04:05:58PM -0400, Noam Postavsky wrote:
>> On Fri, Jun 16, 2017 at 3:45 PM, Alan Third <alan@idiocy.org> wrote:
>>> The first one looks like the thread’s starting too quick?
>> I think it's rather failing to be killed by the `thread-signal' call.
> Ah, yes. I completely misread the test.
>
>>> The second looks like the thread’s not dying quick enough?
>> Same for that one. I don't know much about the thread internals, and
>> even less macOS internals though...
> Well, I’ve discovered that if I keep the mouse moving over the Emacs
> frame while the tests are running then none of them fail, so I think
> it’s something to do with ns_select jamming everything up for the
> duration of its timeout.
>
> I could probably get round that by getting the thread to fire off an
> NS app defined event when it dies. That would break ns_select out of
> its event loop, presumably letting the dying thread actually die...
>
> Amazingly that seems to have worked! Updated patch attached.






^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#25265: make-thread crashes in OS X 10.6
  2017-06-18 13:05                 ` Charles A. Roelli
@ 2017-06-18 14:01                   ` Alan Third
  2017-06-19 18:34                     ` Charles A. Roelli
  0 siblings, 1 reply; 40+ messages in thread
From: Alan Third @ 2017-06-18 14:01 UTC (permalink / raw)
  To: Charles A. Roelli; +Cc: Noam Postavsky, 25265

[-- Attachment #1: Type: text/plain, Size: 1491 bytes --]

On Sun, Jun 18, 2017 at 03:05:04PM +0200, Charles A. Roelli wrote:
> The tests sometimes all pass with the updated patch, but then I got
> this once (the errors you've seen previously, I think):
> 
<snip>

Yeah, they look the same. I’m not sure what I can do to fix them if
they’re still happening.

> I also tried this way of running the tests in batch mode:
> 
> cd test && rm -rf src/thread-tests.log && make src/thread-tests.log
> 
> but got a segfault:
> 
>   GEN      src/thread-tests.log
> /bin/sh: line 1: 87214 Segmentation fault      HOME=/nonexistent
> EMACSLOADPATH= LC_ALL=C
> EMACS_TEST_DIRECTORY=/Volumes/Toblerone/Code/emacs-devel/test "../src/emacs"
> -batch --no-site-file --no-site-lisp -L ":." -l ert -l src/thread-tests.el
> --eval "(ert-run-tests-batch-and-exit nil)" > src/thread-tests.log 2>&1
> Running 28 tests (2017-06-18 15:00:03+0200)
> make: *** [src/thread-tests.log] Error 139
> 
> I'm not sure where the problem is happening (or if it's specific to
> NS).  The file 'src/thread-tests.log' also doesn't give any extra
> information (besides 'Running 28 tests (2017-06-18 15:00:03+0200)').

Yeah, I think that’s because I, accidentally, have the non‐GUI version
trying to run some GUI stuff. I noticed this myself yesterday.

Running NS Emacs without a GUI has always worked with threads because
it just bypasses the NS run loop which is the problem.

I’ve attached a new patch that actually checks whether we’re running
the NS GUI.
-- 
Alan Third

[-- Attachment #2: 0001-Fix-threads-on-NS-bug-25265.patch --]
[-- Type: text/plain, Size: 5282 bytes --]

From 62ac72333084ae32719d63b82166f1c6daee1ed9 Mon Sep 17 00:00:00 2001
From: Alan Third <alan@idiocy.org>
Date: Tue, 13 Jun 2017 21:40:25 +0100
Subject: [PATCH] Fix threads on NS (bug#25265)

src/nsterm.h (ns_select): Compiler doesn't like sigmask being const.
(ns_run_loop_break) [HAVE_PTHREAD]: New function.
src/nsterm.m (ns_select): Call thread_select from within ns_select.
(ns_run_loop_break) [HAVE_PTHREAD]: New function.
(ns_send_appdefined): Don't wait for main thread when sending app
defined event.
src/process.c (wait_reading_process_output): Call thread_select from
within ns_select.
src/systhread.c (sys_cond_broadcast) [HAVE_NS]: Break ns_select out of
its event loop using ns_run_loop_break.
---
 src/nsterm.h    |  7 +++++--
 src/nsterm.m    | 26 ++++++++++++++++++++++----
 src/process.c   | 13 ++++++-------
 src/systhread.c | 11 +++++++++++
 4 files changed, 44 insertions(+), 13 deletions(-)

diff --git a/src/nsterm.h b/src/nsterm.h
index bed0b92c79..73d846d114 100644
--- a/src/nsterm.h
+++ b/src/nsterm.h
@@ -1225,8 +1225,11 @@ extern void x_set_no_accept_focus (struct frame *f, Lisp_Object new_value,
 extern void x_set_z_group (struct frame *f, Lisp_Object new_value,
                            Lisp_Object old_value);
 extern int ns_select (int nfds, fd_set *readfds, fd_set *writefds,
-		      fd_set *exceptfds, struct timespec const *timeout,
-		      sigset_t const *sigmask);
+		      fd_set *exceptfds, struct timespec *timeout,
+		      sigset_t *sigmask);
+#ifdef HAVE_PTHREAD
+extern void ns_run_loop_break (void);
+#endif
 extern unsigned long ns_get_rgb_color (struct frame *f,
                                        float r, float g, float b, float a);
 
diff --git a/src/nsterm.m b/src/nsterm.m
index e05dbf45fb..bf83550b3d 100644
--- a/src/nsterm.m
+++ b/src/nsterm.m
@@ -4068,7 +4068,7 @@ overwriting cursor (usually when cursor on a tab) */
       app->nextappdefined = value;
       [app performSelectorOnMainThread:@selector (sendFromMainThread:)
                             withObject:nil
-                         waitUntilDone:YES];
+                         waitUntilDone:NO];
       return;
     }
 
@@ -4293,8 +4293,8 @@ in certain situations (rapid incoming events).
 
 int
 ns_select (int nfds, fd_set *readfds, fd_set *writefds,
-	   fd_set *exceptfds, struct timespec const *timeout,
-	   sigset_t const *sigmask)
+	   fd_set *exceptfds, struct timespec *timeout,
+	   sigset_t *sigmask)
 /* --------------------------------------------------------------------------
      Replacement for select, checking for events
    -------------------------------------------------------------------------- */
@@ -4327,7 +4327,13 @@ in certain situations (rapid incoming events).
   if (NSApp == nil
       || ![NSThread isMainThread]
       || (timeout && timeout->tv_sec == 0 && timeout->tv_nsec == 0))
-    return pselect (nfds, readfds, writefds, exceptfds, timeout, sigmask);
+    return thread_select(pselect, nfds, readfds, writefds,
+                         exceptfds, timeout, sigmask);
+  else
+    {
+      struct timespec t = {0, 0};
+      thread_select(pselect, 0, NULL, NULL, NULL, &t, sigmask);
+    }
 
   [outerpool release];
   outerpool = [[NSAutoreleasePool alloc] init];
@@ -4430,6 +4436,18 @@ in certain situations (rapid incoming events).
   return result;
 }
 
+#ifdef HAVE_PTHREAD
+void
+ns_run_loop_break ()
+/* Break out of the NS run loop in ns_select or ns_read_socket. */
+{
+  NSTRACE_WHEN (NSTRACE_GROUP_EVENTS, "ns_run_loop_break");
+
+  /* If we don't have a GUI, don't send the event. */
+  if (NSApp != NULL)
+    ns_send_appdefined(-1);
+}
+#endif
 
 
 /* ==========================================================================
diff --git a/src/process.c b/src/process.c
index 2a1c2eecde..abd017bb90 100644
--- a/src/process.c
+++ b/src/process.c
@@ -5371,14 +5371,13 @@ wait_reading_process_output (intmax_t time_limit, int nsecs, int read_kbd,
 	  nfds = xg_select (max_desc + 1,
 			    &Available, (check_write ? &Writeok : 0),
 			    NULL, &timeout, NULL);
+#elif defined HAVE_NS
+          /* And NS builds call thread_select in ns_select. */
+          nfds = ns_select (max_desc + 1,
+			    &Available, (check_write ? &Writeok : 0),
+			    NULL, &timeout, NULL);
 #else  /* !HAVE_GLIB */
-	  nfds = thread_select (
-# ifdef HAVE_NS
-				ns_select
-# else
-				pselect
-# endif
-				, max_desc + 1,
+	  nfds = thread_select (pselect, max_desc + 1,
 				&Available,
 				(check_write ? &Writeok : 0),
 				NULL, &timeout, NULL);
diff --git a/src/systhread.c b/src/systhread.c
index a84060c18f..aee12a9b48 100644
--- a/src/systhread.c
+++ b/src/systhread.c
@@ -20,6 +20,10 @@ along with GNU Emacs.  If not, see <http://www.gnu.org/licenses/>.  */
 #include <setjmp.h>
 #include "lisp.h"
 
+#ifdef HAVE_NS
+#include "nsterm.h"
+#endif
+
 #ifndef THREADS_ENABLED
 
 void
@@ -130,6 +134,13 @@ void
 sys_cond_broadcast (sys_cond_t *cond)
 {
   pthread_cond_broadcast (cond);
+#ifdef HAVE_NS
+  /* Send an app defined event to break out of the NS run loop.
+     It seems that if ns_select is running the NS run loop, this
+     broadcast has no effect until the loop is done, breaking a couple
+     of tests in thread-tests.el. */
+  ns_run_loop_break ();
+#endif
 }
 
 void
-- 
2.12.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* bug#25265: make-thread crashes in OS X 10.6
  2017-06-18 14:01                   ` Alan Third
@ 2017-06-19 18:34                     ` Charles A. Roelli
  2017-07-01 12:04                       ` Alan Third
  0 siblings, 1 reply; 40+ messages in thread
From: Charles A. Roelli @ 2017-06-19 18:34 UTC (permalink / raw)
  To: Alan Third; +Cc: Noam Postavsky, 25265

On 18/06/2017 16:01, Alan Third wrote:
> On Sun, Jun 18, 2017 at 03:05:04PM +0200, Charles A. Roelli wrote:
>> The tests sometimes all pass with the updated patch, but then I got
>> this once (the errors you've seen previously, I think):
>>
> <snip>
>
> Yeah, they look the same. I’m not sure what I can do to fix them if
> they’re still happening.

Wish I could help here.  Thanks for getting this far!

>
>> I also tried this way of running the tests in batch mode:
>>
>> cd test && rm -rf src/thread-tests.log && make src/thread-tests.log
>>
>> but got a segfault:
>>
>>    GEN      src/thread-tests.log
>> /bin/sh: line 1: 87214 Segmentation fault      HOME=/nonexistent
>> EMACSLOADPATH= LC_ALL=C
>> EMACS_TEST_DIRECTORY=/Volumes/Toblerone/Code/emacs-devel/test "../src/emacs"
>> -batch --no-site-file --no-site-lisp -L ":." -l ert -l src/thread-tests.el
>> --eval "(ert-run-tests-batch-and-exit nil)" > src/thread-tests.log 2>&1
>> Running 28 tests (2017-06-18 15:00:03+0200)
>> make: *** [src/thread-tests.log] Error 139
>>
>> I'm not sure where the problem is happening (or if it's specific to
>> NS).  The file 'src/thread-tests.log' also doesn't give any extra
>> information (besides 'Running 28 tests (2017-06-18 15:00:03+0200)').
> Yeah, I think that’s because I, accidentally, have the non‐GUI version
> trying to run some GUI stuff. I noticed this myself yesterday.
>
> Running NS Emacs without a GUI has always worked with threads because
> it just bypasses the NS run loop which is the problem.
>
> I’ve attached a new patch that actually checks whether we’re running
> the NS GUI.

Thanks, the new patch fixes the issue.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#25265: make-thread crashes in OS X 10.6
  2017-06-19 18:34                     ` Charles A. Roelli
@ 2017-07-01 12:04                       ` Alan Third
  2017-07-04  6:59                         ` Charles A. Roelli
  0 siblings, 1 reply; 40+ messages in thread
From: Alan Third @ 2017-07-01 12:04 UTC (permalink / raw)
  To: Charles A. Roelli; +Cc: Noam Postavsky, 25265

On Mon, Jun 19, 2017 at 08:34:00PM +0200, Charles A. Roelli wrote:
> On 18/06/2017 16:01, Alan Third wrote:
> > 
> > I’ve attached a new patch that actually checks whether we’re running
> > the NS GUI.
> 
> Thanks, the new patch fixes the issue.

I’ve pushed this patch up to master. Even if it’s still throwing up
errors in the thread tests occasionally, it’s far better than just
crashing.

Should we close this bug report, or would you rather leave it open
until we can be sure it’s all working fine?
-- 
Alan Third





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#25265: make-thread crashes in OS X 10.6
  2017-07-01 12:04                       ` Alan Third
@ 2017-07-04  6:59                         ` Charles A. Roelli
  2017-07-04 12:04                           ` npostavs
  0 siblings, 1 reply; 40+ messages in thread
From: Charles A. Roelli @ 2017-07-04  6:59 UTC (permalink / raw)
  To: Alan Third; +Cc: Noam Postavsky, 25265

Up to you.  Thank you for pushing the fix.



> On 1 Jul 2017, at 14:04, Alan Third <alan@idiocy.org> wrote:
> 
>> On Mon, Jun 19, 2017 at 08:34:00PM +0200, Charles A. Roelli wrote:
>>> On 18/06/2017 16:01, Alan Third wrote:
>>> 
>>> I’ve attached a new patch that actually checks whether we’re running
>>> the NS GUI.
>> 
>> Thanks, the new patch fixes the issue.
> 
> I’ve pushed this patch up to master. Even if it’s still throwing up
> errors in the thread tests occasionally, it’s far better than just
> crashing.
> 
> Should we close this bug report, or would you rather leave it open
> until we can be sure it’s all working fine?
> -- 
> Alan Third






^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#25265: make-thread crashes in OS X 10.6
  2017-07-04  6:59                         ` Charles A. Roelli
@ 2017-07-04 12:04                           ` npostavs
       [not found]                             ` <20170705193642.GA18888@breton.holly.idiocy.org>
  0 siblings, 1 reply; 40+ messages in thread
From: npostavs @ 2017-07-04 12:04 UTC (permalink / raw)
  To: Charles A. Roelli; +Cc: Alan Third, 25265

>
>> On 1 Jul 2017, at 14:04, Alan Third <alan@idiocy.org> wrote:
>> 
>> Should we close this bug report, or would you rather leave it open
>> until we can be sure it’s all working fine?

I would suggest closing this bug and opening a new one for the test
failure, keeps things more readable and searchable.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#25265: make-thread crashes in OS X 10.6
       [not found]                             ` <20170705193642.GA18888@breton.holly.idiocy.org>
@ 2017-07-06  9:25                               ` Charles A. Roelli
  2017-07-06 17:10                               ` Charles A. Roelli
  1 sibling, 0 replies; 40+ messages in thread
From: Charles A. Roelli @ 2017-07-06  9:25 UTC (permalink / raw)
  To: Alan Third; +Cc: npostavs, 25265

Sure, will get to it later today.


> On 5 Jul 2017, at 21:36, Alan Third <alan@idiocy.org> wrote:
> 
>> On Tue, Jul 04, 2017 at 08:04:56AM -0400, npostavs@users.sourceforge.net wrote:
>>>> 
>>>> On 1 Jul 2017, at 14:04, Alan Third <alan@idiocy.org> wrote:
>>>> 
>>>> Should we close this bug report, or would you rather leave it open
>>>> until we can be sure it’s all working fine?
>> 
>> I would suggest closing this bug and opening a new one for the test
>> failure, keeps things more readable and searchable.
> 
> Sounds sensible.
> 
> Charles, I can’t reproduce the errors, so if you’re still getting them
> could you please raise a new bug report for them?
> 
> Thanks, all.
> -- 
> Alan Third






^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#25265: make-thread crashes in OS X 10.6
       [not found]                             ` <20170705193642.GA18888@breton.holly.idiocy.org>
  2017-07-06  9:25                               ` Charles A. Roelli
@ 2017-07-06 17:10                               ` Charles A. Roelli
  1 sibling, 0 replies; 40+ messages in thread
From: Charles A. Roelli @ 2017-07-06 17:10 UTC (permalink / raw)
  To: Alan Third, npostavs; +Cc: 25265

I've just made a new report (bug #27598) for thread-signal-early.


I wasn't able to reproduce the other issue this time around.


On 05/07/2017 21:36, Alan Third wrote:
> On Tue, Jul 04, 2017 at 08:04:56AM -0400, npostavs@users.sourceforge.net wrote:
>>>> On 1 Jul 2017, at 14:04, Alan Third <alan@idiocy.org> wrote:
>>>>
>>>> Should we close this bug report, or would you rather leave it open
>>>> until we can be sure it’s all working fine?
>> I would suggest closing this bug and opening a new one for the test
>> failure, keeps things more readable and searchable.
> Sounds sensible.
>
> Charles, I can’t reproduce the errors, so if you’re still getting them
> could you please raise a new bug report for them?
>
> Thanks, all.






^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2017-07-06 17:10 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-12-24 11:06 bug#25265: make-thread crashes in OS X 10.6 Charles A. Roelli
2016-12-24 17:51 ` Eli Zaretskii
2016-12-25 15:52   ` Eli Zaretskii
2016-12-26 13:09     ` Alan Third
2016-12-26 15:52       ` Eli Zaretskii
2016-12-26 20:56         ` Alan Third
2016-12-27  7:30           ` Eli Zaretskii
2016-12-27 10:44             ` Alan Third
2016-12-27 11:13               ` Eli Zaretskii
2016-12-28 19:36                 ` Alan Third
2016-12-29 17:12                   ` Eli Zaretskii
2016-12-30 18:45                     ` Alan Third
2016-12-30 21:08                       ` Eli Zaretskii
2016-12-30 22:05                         ` Alan Third
2016-12-31  9:20                           ` Eli Zaretskii
2016-12-31 16:09                             ` bug#25265: [PATCH] Rework NS event handling (bug#25265) Alan Third
2016-12-31 16:25                               ` Eli Zaretskii
2016-12-31 16:46                                 ` Alan Third
2017-01-01 15:03                               ` Alan Third
2017-01-01 15:42                                 ` Eli Zaretskii
2017-03-06 20:02 ` bug#25265: make-thread crashes in OS X 10.6 Alan Third
2017-03-08 20:17   ` Charles A. Roelli
2017-03-14 14:49     ` Alan Third
2017-05-02 20:49 ` Alan Third
2017-06-12 19:32   ` Charles A. Roelli
2017-06-13 20:46     ` Alan Third
2017-06-15 18:57       ` Charles A. Roelli
2017-06-15 19:04         ` Alan Third
2017-06-15 19:14           ` Noam Postavsky
2017-06-16 19:45           ` Alan Third
2017-06-16 20:05             ` Noam Postavsky
2017-06-16 20:51               ` Alan Third
2017-06-18 13:05                 ` Charles A. Roelli
2017-06-18 14:01                   ` Alan Third
2017-06-19 18:34                     ` Charles A. Roelli
2017-07-01 12:04                       ` Alan Third
2017-07-04  6:59                         ` Charles A. Roelli
2017-07-04 12:04                           ` npostavs
     [not found]                             ` <20170705193642.GA18888@breton.holly.idiocy.org>
2017-07-06  9:25                               ` Charles A. Roelli
2017-07-06 17:10                               ` Charles A. Roelli

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).