unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
@ 2012-12-05  7:22 Stephen Powell
  2012-12-05 16:42 ` Eli Zaretskii
                   ` (6 more replies)
  0 siblings, 7 replies; 36+ messages in thread
From: Stephen Powell @ 2012-12-05  7:22 UTC (permalink / raw)
  To: 13086

I'm using a checkout of the latest trunk.  While reading mail in gnus
emacs seems to hang using 100% cpu.  It is easy to reproduce.  I have a
live gdb session with the hang.

This is the gdb log:

(gdb) info threads

  Id   Target Id         Frame
* 8    Thread 7752.0x41c 0x776a000d in ntdll!LdrFindResource_U ()
   from C:\Windows\SysWOW64\ntdll.dll
  7    Thread 7752.0x1b24 0x776af8b1 in ntdll!RtlUpdateClonedSRWLock ()
   from C:\Windows\SysWOW64\ntdll.dll
  6    Thread 7752.0x18dc 0x776af8e5 in ntdll!RtlUpdateClonedSRWLock ()
   from C:\Windows\SysWOW64\ntdll.dll
  5    Thread 7752.0x1804 0x776af8b1 in ntdll!RtlUpdateClonedSRWLock ()
   from C:\Windows\SysWOW64\ntdll.dll
  4    Thread 7752.0x16cc 0x751078d7 in USER32!IsDialogMessage ()
   from C:\Windows\syswow64\user32.dll
  3    Thread 7752.0x11d0 0x776afd71 in ntdll!RtlFindSetBits ()
   from C:\Windows\SysWOW64\ntdll.dll
  2    Thread 7752.0x1784 0x776b013d in
ntdll!RtlEnableEarlyCriticalSectionEventCreation ()
   from C:\Windows\SysWOW64\ntdll.dll
  1    Thread 7752.0x1bb0 waitpid (pid=5528, status=0x0, options=1) at
w32proc.c:1126

(gdb) thread apply all bt full

Thread 8 (Thread 7752.0x41c):
#0  0x776a000d in ntdll!LdrFindResource_U () from
C:\Windows\SysWOW64\ntdll.dll
No symbol table info available.
#1  0x7772f896 in ntdll!RtlQueryTimeZoneInformation () from
C:\Windows\SysWOW64\ntdll.dll
No symbol table info available.
#2  0x00960fb2 in ?? ()
No symbol table info available.
#3  0x752033aa in KERNEL32!BaseCleanupAppcompatCacheSupport ()
   from C:\Windows\syswow64\kernel32.dll
No symbol table info available.
#4  0x6d4bffd4 in ?? ()
No symbol table info available.
#5  0x776c9ef2 in ntdll!RtlpNtSetValueKey () from
C:\Windows\SysWOW64\ntdll.dll
No symbol table info available.
#6  0x00000000 in ?? ()
No symbol table info available.

Thread 7 (Thread 7752.0x1b24):
#0  0x776af8b1 in ntdll!RtlUpdateClonedSRWLock () from
C:\Windows\SysWOW64\ntdll.dll
No symbol table info available.
#1  0x776af8b1 in ntdll!RtlUpdateClonedSRWLock () from
C:\Windows\SysWOW64\ntdll.dll
No symbol table info available.
#2  0x74686f0f in ?? () from C:\Windows\System32\mswsock.dll
No symbol table info available.
#3  0x746876a6 in ?? () from C:\Windows\System32\mswsock.dll
No symbol table info available.
#4  0x755e6b87 in WSAAddressToStringW () from C:\Windows\syswow64\ws2_32.dll
No symbol table info available.
#5  0x0105f78e in _sys_read_ahead (fd=5) at w32.c:6115
        nblock = 0
        cp = 0x144aaf0
        rc = 0
#6  0x01022520 in reader_thread (arg=0x144aaf0) at w32proc.c:952
        rc = 3
        cp = 0x144aaf0
#7  0x752033aa in KERNEL32!BaseCleanupAppcompatCacheSupport ()
   from C:\Windows\syswow64\kernel32.dll
No symbol table info available.
#8  0x00c9ffd4 in ?? ()
No symbol table info available.
#9  0x776c9ef2 in ntdll!RtlpNtSetValueKey () from
C:\Windows\SysWOW64\ntdll.dll
No symbol table info available.
#10 0x0144aaf0 in child_procs ()
No symbol table info available.
#11 0x6d140fee in ?? ()
No symbol table info available.
#12 0x00000000 in ?? ()
No symbol table info available.

Thread 6 (Thread 7752.0x18dc):
#0  0x776af8e5 in ntdll!RtlUpdateClonedSRWLock () from
C:\Windows\SysWOW64\ntdll.dll
No symbol table info available.
#1  0x776af8e5 in ntdll!RtlUpdateClonedSRWLock () from
C:\Windows\SysWOW64\ntdll.dll
No symbol table info available.
#2  0x7539dd54 in ReadFile () from C:\Windows\syswow64\KernelBase.dll
No symbol table info available.
#3  0x00000314 in ?? ()
No symbol table info available.
#4  0x00000000 in ?? ()
No symbol table info available.

Thread 5 (Thread 7752.0x1804):
#0  0x776af8b1 in ntdll!RtlUpdateClonedSRWLock () from
C:\Windows\SysWOW64\ntdll.dll
No symbol table info available.
#1  0x776af8b1 in ntdll!RtlUpdateClonedSRWLock () from
C:\Windows\SysWOW64\ntdll.dll
No symbol table info available.
#2  0x753a149d in WaitForSingleObjectEx () from
C:\Windows\syswow64\KernelBase.dll
No symbol table info available.
#3  0x00000304 in ?? ()
No symbol table info available.
#4  0x00000000 in ?? ()
No symbol table info available.

Thread 4 (Thread 7752.0x16cc):
#0  0x751078d7 in USER32!IsDialogMessage () from
C:\Windows\syswow64\user32.dll
No symbol table info available.
#1  0x751078d7 in USER32!IsDialogMessage () from
C:\Windows\syswow64\user32.dll
No symbol table info available.
#2  0x7510790d in USER32!GetCursorPos () from C:\Windows\syswow64\user32.dll
No symbol table info available.
#3  0x6bb2fec0 in ?? ()
No symbol table info available.
#4  0x010e4717 in w32_msg_pump (msg_buf=0x6bb2ff28) at w32fns.c:2394
        msg = {
          hwnd = 0x40916,
          message = 49478,
          wParam = 0,
          lParam = 0,
          time = 447382693,
          pt = {
            x = 322,
            y = 1112
          }
        }
        result = 0
        focus_window = 0x403
#5  0x010e4955 in w32_msg_worker@4 (arg=0x0) at w32fns.c:2620
        msg = {
          hwnd = 0x0,
          message = 0,
          wParam = 0,
          lParam = 0,
          time = 0,
          pt = {
            x = 0,
            y = 0
          }
        }
        dummy_buf = {
          next = 0x0,
          w32msg = {
            msg = {
              hwnd = 0x0,
              message = 0,
              wParam = 0,
              lParam = 0,
              time = 0,
              pt = {
                x = 0,
                y = 0
              }
            },
            dwModifiers = 0,
            rect = {
              left = 0,
              top = 0,
              right = 0,
              bottom = 0
            }
          },
          result = 0,
          completed = 0
        }
#6  0x752033aa in KERNEL32!BaseCleanupAppcompatCacheSupport ()
   from C:\Windows\syswow64\kernel32.dll
No symbol table info available.
#7  0x6bb2ffd4 in ?? ()
No symbol table info available.
#8  0x776c9ef2 in ntdll!RtlpNtSetValueKey () from
C:\Windows\SysWOW64\ntdll.dll
No symbol table info available.
#9  0x00000000 in ?? ()
No symbol table info available.

Thread 3 (Thread 7752.0x11d0):
#0  0x776afd71 in ntdll!RtlFindSetBits () from C:\Windows\SysWOW64\ntdll.dll
No symbol table info available.
#1  0x753a3bc8 in SleepEx () from C:\Windows\syswow64\KernelBase.dll
No symbol table info available.
#2  0x00000000 in ?? ()
No symbol table info available.

Thread 2 (Thread 7752.0x1784):
#0  0x776b013d in ntdll!RtlEnableEarlyCriticalSectionEventCreation ()
   from C:\Windows\SysWOW64\ntdll.dll
No symbol table info available.
#1  0x776b013d in ntdll!RtlEnableEarlyCriticalSectionEventCreation ()
   from C:\Windows\SysWOW64\ntdll.dll
No symbol table info available.
#2  0x776e2f51 in ntdll!RtlWeaklyEnumerateEntryHashTable () from
C:\Windows\SysWOW64\ntdll.dll
No symbol table info available.
#3  0x00000003 in ?? ()
No symbol table info available.
#4  0x00cfb2f8 in ?? ()
No symbol table info available.
#5  0x752033aa in KERNEL32!BaseCleanupAppcompatCacheSupport ()
   from C:\Windows\syswow64\kernel32.dll
No symbol table info available.
#6  0x6b17ffd4 in ?? ()
No symbol table info available.
#7  0x776c9ef2 in ntdll!RtlpNtSetValueKey () from
C:\Windows\SysWOW64\ntdll.dll
No symbol table info available.
#8  0x00cfb2c8 in ?? ()
No symbol table info available.
#9  0x776c9ec5 in ntdll!RtlpNtSetValueKey () from
C:\Windows\SysWOW64\ntdll.dll
No symbol table info available.
#10 0x776e2e25 in ntdll!RtlWeaklyEnumerateEntryHashTable () from
C:\Windows\SysWOW64\ntdll.dll
No symbol table info available.
#11 0x00cfb2c8 in ?? ()
No symbol table info available.
#12 0x00000000 in ?? ()
No symbol table info available.

Thread 1 (Thread 7752.0x1bb0):
#0  waitpid (pid=5528, status=0x0, options=1) at w32proc.c:1126
        active = 0
        retval = 1
        nh = 0
        cp = 0x144a9e8
        cps = {0x6d550152, 0x9, 0xcc01b4, 0x0, 0xcc0000, 0xd01cc0,
0x140027, 0x5a40,
          0x1a0026, 0x5a40, 0x33267da <__register_frame_info+53635034>,
          0x3315832 <__register_frame_info+53565490>, 0xe0028, 0xd01c18,
0x1, 0xd01cb8,
          0x88f13c, 0x776be023 <ntdll!LdrWx86FormatVirtualImage+679>,
0x9, 0x88f1a4,
          0xd01cc0, 0xd01cb8, 0x88f1d0, 0x753a161b
<WaitForMultipleObjectsEx+306>, 0xcc0000,
          0x0, 0x753a1654 <WaitForMultipleObjectsEx+363>, 0x279bba63,
0x9, 0xd01cb0,
          0xd01c90, 0x24}
        wait_hnd = {0x1, 0x88f0a8, 0x11d8ff1 <get_next_msg+550>,
0x144df8c, 0x140027, 0x5a40,
          0x33267d8 <__register_frame_info+53635032>,
          0x370522a <__register_frame_info+57692714>, 0xcc9438,
0x88f0a8, 0x24, 0x0, 0x0,
          0xce7128, 0xcc9128, 0xcc924c, 0xd01cc0, 0xcc9658, 0x11c37d6
<w32_read_socket+5951>,
          0x88f044, 0x0, 0x88f1c0, 0x777071d5
<ntdll!AlpcMaxAllowedMessageLength+36229>,
          0x1ab62262, 0xfffffffe, 0x776be38c
<ntdll!LdrWx86FormatVirtualImage+1552>,
          0x776be0f2 <ntdll!LdrWx86FormatVirtualImage+886>, 0x0, 0x9,
0xd01c90,
          0x33267d8 <__register_frame_info+53635032>,
          0x776be38c <ntdll!LdrWx86FormatVirtualImage+1552>}
        timeout_ms = 0
        dont_wait = 1
#1  0x010e04ab in get_child_status (child=5528, status=0x0, options=1,
interruptible=false)
    at sysdep.c:290
        pid = -1
#2  0x010e054b in child_status_changed (child=5528, status=0x0,
options=0) at sysdep.c:333
No locals.
#3  0x0101f1b9 in handle_child_signal (sig=18) at process.c:6236
        deleted_pid = 5528
        all_pids_are_fixnums = false
        xpid = 22112
        tail = 92692214
#4  0x010e10cf in deliver_process_signal (sig=18, handler=0x101f118
<handle_child_signal>)
    at sysdep.c:1579
        old_errno = 0
        on_main_thread = true
#5  0x0101f3a3 in deliver_child_signal (sig=18) at process.c:6277
No locals.
#6  0x01023acd in sys_select (nfds=7, rfds=0x88f674, wfds=0x0, efds=0x0,
timeout=0x88f664,
    ignored=0x0) at w32proc.c:2032
        orfds = {
          bits = {121, 0}
        }
        timeout_ms = 189
        start_time = 447287689
        i = 7
        nh = 6
        nc = 2
        nr = 1
        active = 6
        cp = 0x144ab48
        cps = {0x144ab48, 0x144aa98, 0x6d6, 0x15846c40, 0x0, 0xb43e940,
0x0,
          0x35e2750 <__register_frame_info+56502096>, 0x7, 0x1, 0x88f5b8,
          0x102f0ae <gobble_input+294>, 0x34c0ee0
<__register_frame_info+55316192>, 0x88f574,
          0x24, 0x31, 0x26165dc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x5277405,
          0x0, 0x0, 0x0, 0x0, 0x0}
        wait_hnd = {0x1e4, 0x1e0, 0x2ec, 0x340, 0x318, 0x124, 0x198,
0x30c, 0x7, 0x1,
          0x88f418, 0x11d8ff1 <get_next_msg+550>, 0x144df8c, 0x0, 0x10,
0x0, 0x5b06f40,
          0x88f40c, 0x88f418, 0x102f4c6 <unblock_input+22>, 0x0,
0x279dd5f7, 0x88f438, 0x0,
          0x708, 0x0, 0x88f558, 0x11c37d6 <w32_read_socket+5951>,
0x88f4b8, 0x0, 0xa5,
          0x1040001 <Finsert_file_contents+10749>, 0x18,
          0x331581a <__register_frame_info+53565466>, 0x88f468, 0x0,
0x0, 0x0, 0x708, 0x0,
          0x0, 0x88f4d0, 0x88f498, 0x102b3d1 <decode_timer+156>, 0x6d6,
0x15846c40,
          0xd9e9a240, 0x15846c40, 0x31, 0x6d6, 0x88f498, 0x10e0200
<EMACS_TIME_LT+38>, 0x0,
          0x279dd577, 0x88f4b8, 0x1058a20 <gettimeofday+17>, 0x708, 0x0,
0x88f4d8,
          0x10aced1 <decode_time_components+543>, 0x708, 0x0, 0xa5,
          0x1040001 <Finsert_file_contents+10749>, 0x18,
          0x331581a <__register_frame_info+53565466>, 0x88f4e8, 0x0,
0x0, 0x0, 0x708, 0x0,
          0x0, 0x88f550, 0x88f518, 0x102b3d1 <decode_timer+156>, 0x6d6,
0x15846c40,
          0xd9e9a240, 0x15846c40, 0x31, 0x6d6, 0x88f518, 0x10e0200
<EMACS_TIME_LT+38>, 0x0,
          0xb43e940, 0x6d6, 0x15846c40, 0x7, 0x1, 0x88f5a8, 0x102b714
<timer_check_2+828>,
          0x0, 0xb43e940, 0x6d6, 0x144df00}
        fdindex = {-1, 0, 3, 4, 5, 6, 0, 9, 53565466, 64, 8975112,
17756670, 0, 95461478,
          86471685, 53565490, 57693122, 0, 1000000, 95448640, 0, 0,
8975160, 1992716691,
          664654535, 0, 8975396, 7, 8975396, 839487998, 1992788202,
839487998, 3154135,
          10000000, 0, 7, 64, 1992788219, 216652739, 122163710,
30266038, 0, 95297760,
          95448896, 8975252, 2003599699, 8975256, 16854003, 95448832,
13641448, 8975248,
          2003558435, 1, 95448640, 95448672, 1992716691, 664654407, 0,
8975524, 7, 8975524,
          839487998, 1992788202, 839487998}
#7  0x0101cab9 in wait_reading_process_output (time_limit=0, nsecs=0,
read_kbd=-1,
    do_display=true, wait_for_cell=53565466, wait_proc=0x0,
just_wait_proc=0)
    at process.c:4544
        timeout_reduced_for_timers = 1
        channel = 7
        nfds = 1
        Available = {
          bits = {64, 0}
        }
        Writeok = {
          bits = {0, 0}
        }
        check_write = 0
        check_delay = 2
        no_avail = 0
        xerrno = 0
        proc = 93645125
        timeout = {
          tv_sec = 0,
          tv_nsec = 189000000
        }
        end_time = {
          tv_sec = 310000000,
          tv_nsec = 1800
        }
        wait_channel = -1
        got_some_input = 1
        count = 2
#8  0x0102aa7e in kbd_buffer_get_event (kbp=0x88f7bc,
used_mouse_menu=0x88fa53, end_time=0x0)
    at keyboard.c:3784
        obj = 16949402
#9  0x01028e14 in read_char (commandflag=1, nmaps=8, maps=0x88f960,
prev_event=53565466,
    used_mouse_menu=0x88fa53, end_time=0x0) at keyboard.c:2756
        kb = 0x1
        c = 53565466
        jmpcount = 2
        local_getcjmp = {8976632, 0, 0, 0, 8976268, 16942778, 8978372,
0, 8976676, 16800372,
          53670346, 53565466, 53593690, 53565442, 8976800, 53670466}
        save_jump = {0 <repeats 16 times>}
        tem = 18913693
        save = 84664620
        previous_echo_area_message = 53565466
        also_record = 53565466
        reread = false
        gcpro1 = {
          next = 0x88f800,
          var = 0x331c65a <__register_frame_info+53593690>,
          nvars = 8976376
        }
        gcpro2 = {
          next = 0x3335eea <__register_frame_info+53698282>,
          var = 0x331581a <__register_frame_info+53565466>,
          nvars = 53565466
        }
        polling_stopped_here = true
        orig_kboard = 0x3318680 <__register_frame_info+53577344>
#10 0x01032c44 in read_key_sequence (keybuf=0x88fbd0, bufsize=30,
prompt=53565466,
    dont_downcase_last=false, can_return_switch_frame=true,
fix_current_buffer=true)
    at keyboard.c:9201
        interrupted_kboard = 0x3318680 <__register_frame_info+53577344>
        interrupted_frame = 0x35e2750 <__register_frame_info+56502096>
        key = 53565466
        used_mouse_menu = false
        echo_local_start = 0
        last_real_key_start = 0
        keys_local_start = 0
        local_first_binding = 0
        from_string = 53565466
        count = 2
        t = 0
        echo_start = 0
        keys_start = 0
        nmaps = 8
        nmaps_allocated = 8
        defs = 0x88f920
        submaps = 0x88f960
        orig_local_map = 81398110
        orig_keymap = 53565466
        localized_local_map = 0
        first_binding = 0
        first_unbound = 31
        mock_input = 0
        fkey = {
          parent = 57361686,
          map = 57361686,
          start = 0,
          end = 0
        }
        keytran = {
          parent = 53554886,
          map = 53554886,
          start = 0,
          end = 0
        }
        indec = {
          parent = 57361678,
          map = 57361678,
          start = 0,
          end = 0
        }
        shift_translated = false
        delayed_switch_frame = 53565466
        original_uppercase = 53635034
        original_uppercase_position = -1
        dummyflag = false
        starting_buffer = 0x5a24600
        fake_prefixed_keys = 53565466
        gcpro1 = {
          next = 0x88faa8,
          var = 0x10056fe <set_internal+945>,
          nvars = 20976016
        }
#11 0x01026a27 in command_loop_1 () at keyboard.c:1448
        cmd = 81168962
        keybuf = {440, 456, 432, 20757259, 56502096, 53565466, 53565466,
18921393, 8977624,
          8977628, 8977448, 16933724, 90330742, 53565490, 8977487,
53565466, 53670346,
          53565466, 8977448, 56502096, 53577344, 2130567168, 8977544,
16933296, 90330742,
          8977487, 8977512, 16829167, 2, 55249086}
        i = 1
        prev_modiff = 99
        prev_buffer = 0x595a200
        already_adjusted = false
#12 0x0100b9ee in internal_condition_case (bfun=0x10266c4 <command_loop_1>,
    handlers=53615762, hfun=0x1026093 <cmd_error>) at eval.c:1192
        val = 55249086
        c = {
          tag = 53565466,
          val = 53565466,
          next = 0x88fd74,
          gcpro = 0x0,
          jmp = {8977720, 0, 0, 0, 8977548, 16824731, 8978372, 0, 2, 2,
1992732168, 8977624,
            8977704, 8977768, 1992740619, 8977704},
          backlist = 0x0,
          handlerlist = 0x0,
          lisp_eval_depth = 0,
          pdlcount = 2,
          poll_suppress_count = 0,
          interrupt_input_blocked = 0,
          byte_stack = 0x0
        }
        h = {
          handler = 53615762,
          var = 53565466,
          chosen_clause = 53565490,
          tag = 0x88fcc0,
          next = 0x0
        }
#13 0x0102643c in command_loop_2 (ignore=53565466) at keyboard.c:1163
        val = 0
#14 0x0100b565 in internal_catch (tag=53605618, func=0x1026418
<command_loop_2>, arg=53565466)
    at eval.c:963
        c = {
          tag = 53605618,
          val = 53565466,
          next = 0x0,
          gcpro = 0x0,
          jmp = {8977896, 2130567168, 0, 0, 8977756, 16823638, 8978372,
0, 53565466,
            53591552, 10623408, 10618832, 2130567168, 8977896, 16798209,
21265788},
          backlist = 0x0,
          handlerlist = 0x0,
          lisp_eval_depth = 0,
          pdlcount = 2,
          poll_suppress_count = 0,
          interrupt_input_blocked = 0,
          byte_stack = 0x0
        }
#15 0x010263f4 in command_loop () at keyboard.c:1142
No locals.
#16 0x01025cd8 in recursive_edit_1 () at keyboard.c:774
        count = 1
        val = 1992664130
#17 0x01025e29 in Frecursive_edit () at keyboard.c:838
        count = 0
        buffer = 53565466
#18 0x01002827 in main (argc=2, argv=0xa22ca0) at emacs.c:1560
        dummy = 8978152
        stack_bottom_variable = 0 '\000'
        do_initial_setlocale = true
        dumping = false
        skip_args = 0
        no_loadup = false
        junk = 0x0
        dname_arg = 0x0
        ch_to_dir = 0x0






^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
  2012-12-05  7:22 bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126 Stephen Powell
@ 2012-12-05 16:42 ` Eli Zaretskii
  2012-12-05 19:28 ` Stephen Powell
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 36+ messages in thread
From: Eli Zaretskii @ 2012-12-05 16:42 UTC (permalink / raw)
  To: Stephen Powell; +Cc: 13086

> Date: Wed, 05 Dec 2012 07:22:29 +0000
> From: Stephen Powell <stephen_powell@optusnet.com.au>
> 
> I'm using a checkout of the latest trunk.  While reading mail in gnus
> emacs seems to hang using 100% cpu.  It is easy to reproduce.  I have a
> live gdb session with the hang.

100% CPU usage is not a hang, it's probably an infloop.  Can you use
the 'finish' command, as described in etc/DEBUG, to find out where it
loops?

> Thread 1 (Thread 7752.0x1bb0):
> #0  waitpid (pid=5528, status=0x0, options=1) at w32proc.c:1126

Line 1126 of w32proc.c is the opening brace of the waitpid function.
I find it hard to believe it is looping there ;-)

So it's probably looping in the caller's loop, in get_child_status.
Can you see if that's the case, and if so, what value does waitpid
return to its caller?  Is that zero, by chance?

Also, is the process whose PID is 5528 still running?  If so, what
process is it?  Or is it a network connection?

Thanks.





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
  2012-12-05  7:22 bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126 Stephen Powell
  2012-12-05 16:42 ` Eli Zaretskii
@ 2012-12-05 19:28 ` Stephen Powell
  2012-12-05 21:25   ` Eli Zaretskii
  2012-12-06  5:07 ` Stephen Powell
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 36+ messages in thread
From: Stephen Powell @ 2012-12-05 19:28 UTC (permalink / raw)
  To: eliz; +Cc: 13086, stephen_powell

Eli Zaretskii <eliz@gnu.org> writes:

> Can you use the 'finish' command, as described in etc/DEBUG, to find
> out where it loops?

(gdb) thread 1
[Switching to thread 1 (Thread 7752.0x1bb0)]
#0  waitpid (pid=5528, status=0x0, options=1) at w32proc.c:1126
1126    {
(gdb) finish
Run till exit from #0  waitpid (pid=5528, status=0x0, options=1) at
w32proc.c:1126
0x010e04ab in get_child_status (child=5528, status=0x0, options=1,
interruptible=false)
    at sysdep.c:290
290       while ((pid = waitpid (child, status, options)) < 0)
Value returned is $1 = -1
(gdb) finish
Run till exit from #0  0x010e04ab in get_child_status (child=5528,
status=0x0, options=1,
    interruptible=false) at sysdep.c:290

Hung here.

> So it's probably looping in the caller's loop, in get_child_status.
> Can you see if that's the case, and if so, what value does waitpid
> return to its caller?  Is that zero, by chance?

-1

> Also, is the process whose PID is 5528 still running?

process 5528 was not found on 88RCB2S





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
  2012-12-05 19:28 ` Stephen Powell
@ 2012-12-05 21:25   ` Eli Zaretskii
  2012-12-06  2:04     ` Paul Eggert
  0 siblings, 1 reply; 36+ messages in thread
From: Eli Zaretskii @ 2012-12-05 21:25 UTC (permalink / raw)
  To: Stephen Powell, Paul Eggert; +Cc: 13086

> Date: Wed, 05 Dec 2012 19:28:20 +0000
> From: Stephen Powell <stephen_powell@optusnet.com.au>
> CC: stephen_powell@optusnet.com.au, 13086@debbugs.gnu.org
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Can you use the 'finish' command, as described in etc/DEBUG, to find
> > out where it loops?
> 
> (gdb) thread 1
> [Switching to thread 1 (Thread 7752.0x1bb0)]
> #0  waitpid (pid=5528, status=0x0, options=1) at w32proc.c:1126
> 1126    {
> (gdb) finish
> Run till exit from #0  waitpid (pid=5528, status=0x0, options=1) at
> w32proc.c:1126
> 0x010e04ab in get_child_status (child=5528, status=0x0, options=1,
> interruptible=false)
>     at sysdep.c:290
> 290       while ((pid = waitpid (child, status, options)) < 0)
> Value returned is $1 = -1
> (gdb) finish
> Run till exit from #0  0x010e04ab in get_child_status (child=5528,
> status=0x0, options=1,
>     interruptible=false) at sysdep.c:290
> 
> Hung here.

I see.  I wonder why it returned -1 the first time, when the process
was still running or just exited.  Can you try setting breakpoints in
a way that will be able to explain that?

I also don't understand the logic of the loop in get_child_status.
Paul, can you tell why it loops if waitpid is called with WNOHANG and
returns -1?  Won't that cause a busy-wait loop that pegs the CPU?





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
  2012-12-05 21:25   ` Eli Zaretskii
@ 2012-12-06  2:04     ` Paul Eggert
  2012-12-06  3:51       ` Eli Zaretskii
  0 siblings, 1 reply; 36+ messages in thread
From: Paul Eggert @ 2012-12-06  2:04 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 13086, Stephen Powell

On 12/05/2012 01:25 PM, Eli Zaretskii wrote:
> Paul, can you tell why it loops if waitpid is called with WNOHANG and
> returns -1?  Won't that cause a busy-wait loop that pegs the CPU?

If waitpid is called with WNOHANG and returns -1, then
errno must equal EINTR.  (There's an assertion for this.)
Since the wait was interrupted by a signal, it's supposed
to be retried.

Presumably Emacs is confused, and is waiting for a process
that it already waited for.  That's a bug and we should fix it.
I can't tell from the info so far whether the bug is in
the w32 implementation of waitpid, or in Emacs proper.

Was this emacs compiled with error-checking turned on
(./configure --enable-checking, or whatever equivalent is
use in the Microsoft Windows world)?
If so, I'm puzzled as to why the 'eassert (errno == EINTR)'
didn't fire.  If not, can you reproduce the bug with
error-checking enabled?





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
  2012-12-06  2:04     ` Paul Eggert
@ 2012-12-06  3:51       ` Eli Zaretskii
  2012-12-06  5:07         ` Paul Eggert
  2012-12-06  7:35         ` Paul Eggert
  0 siblings, 2 replies; 36+ messages in thread
From: Eli Zaretskii @ 2012-12-06  3:51 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 13086, stephen_powell

> Date: Wed, 05 Dec 2012 18:04:04 -0800
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: Stephen Powell <stephen_powell@optusnet.com.au>, 
>  13086@debbugs.gnu.org
> 
> On 12/05/2012 01:25 PM, Eli Zaretskii wrote:
> > Paul, can you tell why it loops if waitpid is called with WNOHANG and
> > returns -1?  Won't that cause a busy-wait loop that pegs the CPU?
> 
> If waitpid is called with WNOHANG and returns -1, then
> errno must equal EINTR.  (There's an assertion for this.)

Is that necessarily the case?  I don't see that in the Posix
specification, which says:

  RETURN VALUE

      If wait() or waitpid() returns because the status of a child
      process is available, these functions will return a value equal to
      the process ID of the child process for which status is
      reported. If wait() or waitpid() returns due to the delivery of a
      signal to the calling process, -1 will be returned and errno will
      be set to [EINTR]. If waitpid() was invoked with WNOHANG set in
      options, it has at least one child process specified by pid for
      which status is not available, and status is not available for any
      process specified by pid, 0 will be returned. Otherwise, (pid_t)-1
      will be returned, and errno will be set to indicate the error. 

   ERRORS

      The waitpid() function will fail if:

      [ECHILD]
	  The process or process group specified by pid does not exist
	  or is not a child of the calling process.
      [EINTR]
	  The function was interrupted by a signal. The value of the
	  location pointed to by stat_loc is undefined.
      [EINVAL]
	  The options argument is not valid. 

My reading of this, and specifically of the last sentence under
"RETURN VALUE", is that errno could also be ECHILD.  Am I missing
something?

> Presumably Emacs is confused, and is waiting for a process
> that it already waited for.  That's a bug and we should fix it.

Perhaps so, but inflooping in that case is hardly a Good Thing, is it?
And neither is aborting when asserts are enabled.  Perhaps signaling
an error would be better.

> I can't tell from the info so far whether the bug is in
> the w32 implementation of waitpid, or in Emacs proper.

I don't know yet either, but I thought I need to understand the logic
of that loop before I make up my mind, because the w32 waitpid
implementation should behave according to the expectations of its
callers.

> Was this emacs compiled with error-checking turned on
> (./configure --enable-checking, or whatever equivalent is
> use in the Microsoft Windows world)?
> If so, I'm puzzled as to why the 'eassert (errno == EINTR)'
> didn't fire.  If not, can you reproduce the bug with
> error-checking enabled?

Stephen, can you do that, please?





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
  2012-12-05  7:22 bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126 Stephen Powell
  2012-12-05 16:42 ` Eli Zaretskii
  2012-12-05 19:28 ` Stephen Powell
@ 2012-12-06  5:07 ` Stephen Powell
  2012-12-06 19:10 ` Stephen Powell
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 36+ messages in thread
From: Stephen Powell @ 2012-12-06  5:07 UTC (permalink / raw)
  To: eliz; +Cc: 13086, eggert, stephen_powell

Eli Zaretskii <eliz@gnu.org> writes:
>> Was this emacs compiled with error-checking turned on
>> (./configure --enable-checking, or whatever equivalent is
>> use in the Microsoft Windows world)?
>> If so, I'm puzzled as to why the 'eassert (errno =3D=3D EINTR)'
>> didn't fire.  If not, can you reproduce the bug with
>> error-checking enabled?
>
> Stephen, can you do that, please?

No it wasn't enabled, done now.  After reading mail in gnus for a few
dozen messages this is the new gdb log:

Thread 1 (Thread 7928.0x1010):
#0  terminate_due_to_signal (sig=3D22, backtrace_limit=3D2147483647) at ema=
cs.c:314
No locals.
#1  0x0102232f in die (msg=3D0x156df4c "assertion failed: errno =3D=3D EINT=
R", file=3D0x156df26 "sysdep.c", line=3D294) at alloc.c:6499
No locals.
#2  0x0114a3bb in get_child_status (child=3D6436, status=3D0x0, options=3D1=
, interruptible=3Dfalse) at sysdep.c:294
        pid =3D -1
#3  0x0114a4b3 in child_status_changed (child=3D6436, status=3D0x0, options=
=3D0) at sysdep.c:333
No locals.
#4  0x0102ffaa in handle_child_signal (sig=3D18) at process.c:6236
        deleted_pid =3D 6436
        all_pids_are_fixnums =3D false
        xpid =3D 25744
        tail =3D 95266870
#5  0x0114b4d7 in deliver_process_signal (sig=3D18, handler=3D0x102fe9b <ha=
ndle_child_signal>) at sysdep.c:1579
        old_errno =3D 0
        on_main_thread =3D true
#6  0x01030320 in deliver_child_signal (sig=3D18) at process.c:6277
No locals.
#7  0x01035340 in sys_select (nfds=3D7, rfds=3D0x88f6b4, wfds=3D0x0, efds=
=3D0x0, timeout=3D0x88f6a4, ignored=3D0x0) at w32proc.c:2032
        orfds =3D {
          bits =3D {121, 0}
        }
        timeout_ms =3D 78
        start_time =3D 526288248
        i =3D 7
        nh =3D 6
        nc =3D 2
        nr =3D 1
        active =3D 6
        cp =3D 0x1676b48
        cps =3D {0x1676b48, 0x1676a98, 0x0, 0x4a62f80, 0x5, 0x396636c0, 0x0=
, 0x3867750 <__register_frame_info+59144016>, 0x7, 0x1, 0x88f5f8, 0x10486dc=
 <gobble_input+415>, 0x3745ee0 <__register_frame_info+57958112>, 0x88f5b4, =
0x24, 0x0, 0x19273580, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x58637=
b5, 0x0, 0x0, 0x0, 0x0, 0x0}
        wait_hnd =3D {0x1e8, 0x1e4, 0x2e8, 0x304, 0x31c, 0x364, 0x380, 0x32=
0, 0x7, 0x1, 0x88f458, 0x12ecfac <get_next_msg+550>, 0x1679f8c, 0x0, 0x10, =
0x0, 0x88f418, 0x88f44c, 0x88f458, 0x1048af4 <unblock_input+22>, 0x0, 0xa24=
811c3, 0x88f478, 0x0, 0x0, 0x1dcd6500, 0x88f598, 0x12c61ad <w32_read_socket=
+7705>, 0x88f4f8, 0x0, 0x2d9, 0x1060001 <Finsert_file_contents+65>, 0x18, 0=
x359a81a <__register_frame_info+56207386>, 0x88f4a8, 0x0, 0x0, 0x7a120, 0x0=
, 0x0, 0x0, 0x88f510, 0x88f4d8, 0x1041005 <decode_timer+322>, 0x0, 0x4a62f8=
0, 0x4a62f80, 0x76c67193 <msvcrt!_itoa_s+1347>, 0xa2481153, 0x0, 0x88f5c4, =
0x7, 0x1, 0xa2481143, 0x88f4f8, 0x108a38f <gettimeofday+17>, 0x0, 0x1dcd650=
0, 0x88f518, 0x11097fb <decode_time_components+543>, 0x0, 0x1dcd6500, 0x2d9=
, 0x1060001 <Finsert_file_contents+65>, 0x18, 0x359a81a <__register_frame_i=
nfo+56207386>, 0x88f528, 0x0, 0x0, 0x7a120, 0x0, 0x0, 0x0, 0x88f590, 0x88f5=
58, 0x1041005 <decode_timer+322>, 0x0, 0x4a62f80, 0x4a62f80, 0x4a62f80, 0x0=
, 0x0, 0x88f558, 0x114a0f8 <EMACS_TIME_LT+38>, 0x5, 0x396636c0, 0x0, 0x4a62=
f80, 0x7, 0x1, 0x88f5e8, 0x1041502 <timer_check_2+1270>, 0x5, 0x396636c0, 0=
x0, 0x1679f00}
        fdindex =3D {-1, 0, 3, 4, 5, 6, 14089224, 6, 56207386, 64, 8975176,=
 18203154, 0, 525208227, -2, 2003792923, 4, 13369344, 1342179425, 97975520,=
 8975356, 2004228145, 8975224, 1992716691, -1572333837, 0, 8975460, 7, 8975=
460, 571139213, 1992788202, 571139213, 3154319, 10000000, 0, 7, 64, 1992788=
219, 216654055, -146185075, 30266221, 0, 96273952, 97976032, 8975368, 20038=
57877, 8975320, 16895143, 97976096, 2003580050, 14089216, 14089224, 1, 9797=
5520, 97975584, 1992716691, -1572333965, 0, 8975588, 7, 8975588, 571139213,=
 1992788202, 571139213}
#8  0x0102c033 in wait_reading_process_output (time_limit=3D28, nsecs=3D0, =
read_kbd=3D-1, do_display=3Dtrue, wait_for_cell=3D56207386, wait_proc=3D0x0=
, just_wait_proc=3D0) at process.c:4544
        timeout_reduced_for_timers =3D 1
        channel =3D 7
        nfds =3D 1
        Available =3D {
          bits =3D {64, 0}
        }
        Writeok =3D {
          bits =3D {0, 0}
        }
        check_write =3D 0
        check_delay =3D 2
        no_avail =3D 0
        xerrno =3D 0
        proc =3D 95467533
        timeout =3D {
          tv_sec =3D 0,=20
          tv_nsec =3D 78000000
        }
        end_time =3D {
          tv_sec =3D 1354769779,=20
          tv_nsec =3D 307000000
        }
        wait_channel =3D -1
        got_some_input =3D 1
        count =3D 2
#9  0x01100ac5 in sit_for (timeout=3D112, reading=3Dtrue, display_option=3D=
1) at dispnew.c:5907
        sec =3D 28
        nsec =3D 0
        do_display =3D true
#10 0x0103c823 in read_char (commandflag=3D1, nmaps=3D7, maps=3D0x88f960, p=
rev_event=3D56207386, used_mouse_menu=3D0x88fa53, end_time=3D0x0) at keyboa=
rd.c:2656
        tem0 =3D 56207386
        timeout =3D 28
        delay_level =3D 4
        buffer_size =3D 55
        c =3D 56207386
        jmpcount =3D 2
        local_getcjmp =3D {8976632, 0, 0, 0, 8976268, 17023232, 8978372, 0,=
 8976676, 16808689, 56312266, 56207386, 56235610, 56207362, 8976800, 563123=
86}
        save_jump =3D {0 <repeats 16 times>}
        tem =3D 20056477
        save =3D 87159004
        previous_echo_area_message =3D 56207386
        also_record =3D 56207386
        reread =3D false
        gcpro1 =3D {
          next =3D 0x88f800,=20
          var =3D 0x35a165a <__register_frame_info+56235610>,=20
          nvars =3D 8976376
        }
        gcpro2 =3D {
          next =3D 0x35baeea <__register_frame_info+56340202>,=20
          var =3D 0x359a81a <__register_frame_info+56207386>,=20
          nvars =3D 56207386
        }
        polling_stopped_here =3D false
        orig_kboard =3D 0x359d280 <__register_frame_info+56218240>
#11 0x010503ee in read_key_sequence (keybuf=3D0x88fbd0, bufsize=3D30, promp=
t=3D56207386, dont_downcase_last=3Dfalse, can_return_switch_frame=3Dtrue, f=
ix_current_buffer=3Dtrue) at keyboard.c:9201
        interrupted_kboard =3D 0x359d280 <__register_frame_info+56218240>
        interrupted_frame =3D 0x3867750 <__register_frame_info+59144016>
        key =3D 1256
        used_mouse_menu =3D false
        echo_local_start =3D 0
        last_real_key_start =3D 0
        keys_local_start =3D 0
        local_first_binding =3D 0
        from_string =3D 56207386
        count =3D 2
        t =3D 0
        echo_start =3D 0
        keys_start =3D 0
        nmaps =3D 7
        nmaps_allocated =3D 7
        defs =3D 0x88f920
        submaps =3D 0x88f960
        orig_local_map =3D 83881006
        orig_keymap =3D 56207386
        localized_local_map =3D 0
        first_binding =3D 0
        first_unbound =3D 31
        mock_input =3D 0
        fkey =3D {
          parent =3D 60004134,=20
          map =3D 60004134,=20
          start =3D 0,=20
          end =3D 0
        }
        keytran =3D {
          parent =3D 56196806,=20
          map =3D 56196806,=20
          start =3D 0,=20
          end =3D 0
        }
        indec =3D {
          parent =3D 60004126,=20
          map =3D 60004126,=20
          start =3D 0,=20
          end =3D 0
        }
        shift_translated =3D false
        delayed_switch_frame =3D 56207386
        original_uppercase =3D 56276954
        original_uppercase_position =3D -1
        dummyflag =3D false
        starting_buffer =3D 0x38db800 <__register_frame_info+59619328>
        fake_prefixed_keys =3D 56207386
        gcpro1 =3D {
          next =3D 0x3867750 <__register_frame_info+59144016>,=20
          var =3D 0x0,=20
          nvars =3D 56257586
        }
#12 0x010391e7 in command_loop_1 () at keyboard.c:1448
        cmd =3D 94126946
        keybuf =3D {128, 196, 84139489, 22412922, 8812368, 56207386, 562073=
86, 93540705, 8977624, 8977628, 8977448, 17008771, 96967310, 56207410, 8977=
487, 84271466, 56312266, 56207386, 8977448, 59144016, 56218240, 2130567168,=
 8977544, 17008128, 96967366, 8977487, 8977512, 16853932, 2, 57890958}
        i =3D 1
        prev_modiff =3D 7108
        prev_buffer =3D 0x38db800 <__register_frame_info+59619328>
        already_adjusted =3D false
#13 0x01011174 in internal_condition_case (bfun=3D0x1038cf7 <command_loop_1=
>, handlers=3D56257682, hfun=3D0x10384e3 <cmd_error>) at eval.c:1192
        val =3D 57890958
        c =3D {
          tag =3D 56207386,=20
          val =3D 56207386,=20
          next =3D 0x88fd74,=20
          gcpro =3D 0x0,=20
          jmp =3D {8977720, 0, 0, 0, 8977548, 16847137, 8978372, 0, 2, 2, 1=
992732168, 8977624, 8977704, 8977768, 1992740619, 8977704},=20
          backlist =3D 0x0,=20
          handlerlist =3D 0x0,=20
          lisp_eval_depth =3D 0,=20
          pdlcount =3D 2,=20
          poll_suppress_count =3D 0,=20
          interrupt_input_blocked =3D 0,=20
          byte_stack =3D 0x0
        }
        h =3D {
          handler =3D 56257682,=20
          var =3D 56207386,=20
          chosen_clause =3D 56207410,=20
          tag =3D 0x88fcc0,=20
          next =3D 0x0
        }
#14 0x01038963 in command_loop_2 (ignore=3D56207386) at keyboard.c:1163
        val =3D 0
#15 0x01010bc2 in internal_catch (tag=3D56247538, func=3D0x103893f <command=
_loop_2>, arg=3D56207386) at eval.c:963
        c =3D {
          tag =3D 56247538,=20
          val =3D 56207386,=20
          next =3D 0x0,=20
          gcpro =3D 0x0,=20
          jmp =3D {8977896, 2130567168, 0, 0, 8977756, 16845747, 8978372, 0=
, 56207386, 56233472, 10623464, 10618848, 2130567168, 8977896, 16803841, 23=
543164},=20
          backlist =3D 0x0,=20
          handlerlist =3D 0x0,=20
          lisp_eval_depth =3D 0,=20
          pdlcount =3D 2,=20
          poll_suppress_count =3D 0,=20
          interrupt_input_blocked =3D 0,=20
          byte_stack =3D 0x0
        }
#16 0x0103891b in command_loop () at keyboard.c:1142
No locals.
#17 0x01037e98 in recursive_edit_1 () at keyboard.c:774
        count =3D 1
        val =3D 1992664130
#18 0x010381d3 in Frecursive_edit () at keyboard.c:838
        count =3D 0
        buffer =3D 56207386
#19 0x01002a0a in main (argc=3D2, argv=3D0xa22f30) at emacs.c:1560
        dummy =3D 8978152
        stack_bottom_variable =3D 0 '\000'
        do_initial_setlocale =3D true
        dumping =3D false
        skip_args =3D 0
        no_loadup =3D false
        junk =3D 0x0
        dname_arg =3D 0x0
        ch_to_dir =3D 0x0








^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
  2012-12-06  3:51       ` Eli Zaretskii
@ 2012-12-06  5:07         ` Paul Eggert
  2012-12-06 18:28           ` Eli Zaretskii
  2012-12-06  7:35         ` Paul Eggert
  1 sibling, 1 reply; 36+ messages in thread
From: Paul Eggert @ 2012-12-06  5:07 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 13086, stephen_powell

On 12/05/2012 07:51 PM, Eli Zaretskii wrote:
> My reading of this, and specifically of the last sentence under
> "RETURN VALUE", is that errno could also be ECHILD.

Yes, waitpid can return -1 with errno == ECHILD, just as
it can return -1 with errno == EINVAL.  But if Emacs is
invoking waitpid that way, then Emacs has a bug.

Emacs cannot invoke waitpid on any argument such that waitpid
could return -1 with errno == ECHILD, because if it does so
it will allow race conditions such as the races that led to
Bug#8855.

> inflooping in that case is hardly a Good Thing, is it?
> And neither is aborting when asserts are enabled.  Perhaps signaling
> an error would be better.

If we can't fix the bug, perhaps signaling an error is the
best we can do, but I'd rather fix the bug.  Generally speaking,
if there's an internal programming error, Emacs aborts
rather than signaling an error.





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
  2012-12-06  3:51       ` Eli Zaretskii
  2012-12-06  5:07         ` Paul Eggert
@ 2012-12-06  7:35         ` Paul Eggert
  2012-12-06 18:41           ` Eli Zaretskii
  1 sibling, 1 reply; 36+ messages in thread
From: Paul Eggert @ 2012-12-06  7:35 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 13086, stephen_powell

I found a race condition in delete-process that
looks like it might be relevant, and installed
a patch as trunk bzr 111122.  Could you please
give that a try?  If it doesn't fix the problem,
it might help to see whether the problem is related
to delete-process.  E.g., you might plant a breakpoint
on Fdelete_process and see what sort of calls are
being made to it.  Another possibility is to put
a breakpoint on waitpid, to see what arguments are
being passed to it and what it returns -- maybe
you can insert some code to print out that info
to a file.





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
  2012-12-06  5:07         ` Paul Eggert
@ 2012-12-06 18:28           ` Eli Zaretskii
  2012-12-06 20:18             ` Paul Eggert
  0 siblings, 1 reply; 36+ messages in thread
From: Eli Zaretskii @ 2012-12-06 18:28 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 13086, stephen_powell

> Date: Wed, 05 Dec 2012 21:07:33 -0800
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: stephen_powell@optusnet.com.au, 13086@debbugs.gnu.org
> 
> > inflooping in that case is hardly a Good Thing, is it?
> > And neither is aborting when asserts are enabled.  Perhaps signaling
> > an error would be better.
> 
> If we can't fix the bug, perhaps signaling an error is the
> best we can do, but I'd rather fix the bug.  Generally speaking,
> if there's an internal programming error, Emacs aborts
> rather than signaling an error.

Yes, but we usually do that only if Emacs cannot possibly recover from
that internal error.  If Emacs _can_ continue, then we only abort via
eassert, so that a production version won't crash.  In this case, any
errno except EINTR can simply be ignored.  E.g., if waitpid was
somehow called to wait for a non-existing or wrong process, we don't
care about such a process anyway.

So how about the following change?

=== modified file 'src/sysdep.c'
--- src/sysdep.c	2012-12-03 21:42:12 +0000
+++ src/sysdep.c	2012-12-06 18:25:22 +0000
@@ -287,17 +287,20 @@ get_child_status (pid_t child, int *stat
      so that another thread running glib won't find them.  */
   eassert (0 < child);
 
-  while ((pid = waitpid (child, status, options)) < 0)
-    {
-      /* CHILD must be a child process that has not been reaped, and
-	 STATUS and OPTIONS must be valid.  */
-      eassert (errno == EINTR);
-
-      /* Note: the MS-Windows emulation of waitpid calls QUIT
-	 internally.  */
-      if (interruptible)
-	QUIT;
-    }
+  do {
+    pid = waitpid (child, status, options);
+    if (pid < 0)
+      {
+	/* CHILD must be a child process that has not been reaped, and
+	   STATUS and OPTIONS must be valid.  */
+	eassert (errno == EINTR);
+
+	/* Note: the MS-Windows emulation of waitpid calls QUIT
+	   internally.  */
+	if (errno == EINTR && interruptible)
+	  QUIT;
+      }
+  } while (pid < 0 && errno == EINTR);
 
   /* If successful and status is requested, tell wait_reading_process_output
      that it needs to wake up and look around.  */







^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
  2012-12-06  7:35         ` Paul Eggert
@ 2012-12-06 18:41           ` Eli Zaretskii
  0 siblings, 0 replies; 36+ messages in thread
From: Eli Zaretskii @ 2012-12-06 18:41 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 13086, stephen_powell

> Date: Wed, 05 Dec 2012 23:35:55 -0800
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: stephen_powell@optusnet.com.au, 13086@debbugs.gnu.org
> 
> I found a race condition in delete-process that
> looks like it might be relevant, and installed
> a patch as trunk bzr 111122.  Could you please
> give that a try?

I also found some potential problem in w32proc.c, and fixed that in
revision 111132 on the trunk.  Stephen, please try the latest code and
see if the problem persists.

> If it doesn't fix the problem,
> it might help to see whether the problem is related
> to delete-process.  E.g., you might plant a breakpoint
> on Fdelete_process and see what sort of calls are
> being made to it.  Another possibility is to put
> a breakpoint on waitpid, to see what arguments are
> being passed to it and what it returns -- maybe
> you can insert some code to print out that info
> to a file.

Yes, if Emacs still loops where it did in your original report, please
tell what does waitpid set errno to, and through which 'return' line
does it return to its caller.

I find DebPrint useful for this -- is fast and causes GDB to display a
string when you run Emacs under GDB.  You will see examples of its
usage in w32proc.c and elsewhere.

Thanks.





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
  2012-12-05  7:22 bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126 Stephen Powell
                   ` (2 preceding siblings ...)
  2012-12-06  5:07 ` Stephen Powell
@ 2012-12-06 19:10 ` Stephen Powell
  2012-12-06 19:36   ` Eli Zaretskii
  2012-12-06 20:17 ` Stephen Powell
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 36+ messages in thread
From: Stephen Powell @ 2012-12-06 19:10 UTC (permalink / raw)
  To: eliz; +Cc: 13086, eggert, stephen_powell

Eli Zaretskii <eliz@gnu.org> writes:

> I also found some potential problem in w32proc.c, and fixed that in
> revision 111132 on the trunk.  Stephen, please try the latest code and
> see if the problem persists.

OK, I tried that revision with the same problem.

I don't really know what I'm doing, but one odd thing I noticed while
playing around with Paul's suggestion to set a breakpoint on
delete_process:

1. Set a breakpoint on process.c:808. Print p.pid and do a xbacktrace.
2. Run gnus.
3. Gnus uses imap.el to get mail from my host.
4. Imap.el calls delete-process from imap-close

$1 = 1356
"delete-process" (0x88bf48)
"imap-close" (0x88c248)
"mail-source-fetch-imap" (0x88c584)

5. Imap.el calls delete-process to delete the same pid from
imap-sentinel

$2 = 1356
"delete-process" (0x88b928)
"imap-sentinel" (0x88bc24)
"delete-process" (0x88bf48)
"imap-close" (0x88c248)
"mail-source-fetch-imap" (0x88c584)

6. Let gnus sit for about a minute.  The error is signalled with the
same pid

Breakpoint 1, terminate_due_to_signal (sig=22,
backtrace_limit=2147483647) at emacs.c:314
314	  signal (sig, SIG_DFL);
(gdb) up 2
#2  0x0114a8db in get_child_status (child=1356, status=0x0, options=1,
interruptible=false) at sysdep.c:294
294	      eassert (errno == EINTR);






^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
  2012-12-06 19:10 ` Stephen Powell
@ 2012-12-06 19:36   ` Eli Zaretskii
  2012-12-06 19:48     ` Eli Zaretskii
  0 siblings, 1 reply; 36+ messages in thread
From: Eli Zaretskii @ 2012-12-06 19:36 UTC (permalink / raw)
  To: Stephen Powell; +Cc: 13086, eggert, stephen_powell

> Date: Thu, 06 Dec 2012 19:10:54 +0000
> From: Stephen Powell <stephen_powell@optusnet.com.au>
> CC: eggert@cs.ucla.edu, stephen_powell@optusnet.com.au, 13086@debbugs.gnu.org
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > I also found some potential problem in w32proc.c, and fixed that in
> > revision 111132 on the trunk.  Stephen, please try the latest code and
> > see if the problem persists.
> 
> OK, I tried that revision with the same problem.
> 
> I don't really know what I'm doing, but one odd thing I noticed while
> playing around with Paul's suggestion to set a breakpoint on
> delete_process:
> 
> 1. Set a breakpoint on process.c:808. Print p.pid and do a xbacktrace.
> 2. Run gnus.
> 3. Gnus uses imap.el to get mail from my host.
> 4. Imap.el calls delete-process from imap-close
> 
> $1 = 1356
> "delete-process" (0x88bf48)
> "imap-close" (0x88c248)
> "mail-source-fetch-imap" (0x88c584)
> 
> 5. Imap.el calls delete-process to delete the same pid from
> imap-sentinel
> 
> $2 = 1356
> "delete-process" (0x88b928)
> "imap-sentinel" (0x88bc24)
> "delete-process" (0x88bf48)
> "imap-close" (0x88c248)
> "mail-source-fetch-imap" (0x88c584)
> 
> 6. Let gnus sit for about a minute.  The error is signalled with the
> same pid
> 
> Breakpoint 1, terminate_due_to_signal (sig=22,
> backtrace_limit=2147483647) at emacs.c:314
> 314	  signal (sig, SIG_DFL);
> (gdb) up 2
> #2  0x0114a8db in get_child_status (child=1356, status=0x0, options=1,
> interruptible=false) at sysdep.c:294
> 294	      eassert (errno == EINTR);

What is the value of errno in frame #2?  Also, can you tell through
which line does waitpid exit in this case?





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
  2012-12-06 19:36   ` Eli Zaretskii
@ 2012-12-06 19:48     ` Eli Zaretskii
  0 siblings, 0 replies; 36+ messages in thread
From: Eli Zaretskii @ 2012-12-06 19:48 UTC (permalink / raw)
  To: stephen_powell; +Cc: 13086, eggert

> > I don't really know what I'm doing, but one odd thing I noticed while
> > playing around with Paul's suggestion to set a breakpoint on
> > delete_process:
> > 
> > 1. Set a breakpoint on process.c:808. Print p.pid and do a xbacktrace.
> > 2. Run gnus.
> > 3. Gnus uses imap.el to get mail from my host.
> > 4. Imap.el calls delete-process from imap-close
> > 
> > $1 = 1356
> > "delete-process" (0x88bf48)
> > "imap-close" (0x88c248)
> > "mail-source-fetch-imap" (0x88c584)
> > 
> > 5. Imap.el calls delete-process to delete the same pid from
> > imap-sentinel
> > 
> > $2 = 1356
> > "delete-process" (0x88b928)
> > "imap-sentinel" (0x88bc24)
> > "delete-process" (0x88bf48)
> > "imap-close" (0x88c248)
> > "mail-source-fetch-imap" (0x88c584)

Can you see if the first of these calls to delete-process succeeds to
reap the process?  IOW, is it the second call to delete-process that
causes the trouble, e.g., because it asks Emacs to reap a process that
was already reaped?

To see this, put a breakpoint inside get_child_status, or maybe inside
waitpid, and define commands that just display the child PID and
continue, like this:

  (gdb) break waitpid
  (gdb) commands
   > p pid
   > continue
   > end
  (gdb)

Then see how many times waitpid is invoked for the same PID, and
whether it causes assertion violation on the first or the second time
(if there is a second time).

Thanks.





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
  2012-12-05  7:22 bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126 Stephen Powell
                   ` (3 preceding siblings ...)
  2012-12-06 19:10 ` Stephen Powell
@ 2012-12-06 20:17 ` Stephen Powell
  2012-12-06 20:36   ` Eli Zaretskii
  2012-12-06 22:51 ` Stephen Powell
  2012-12-07  0:10 ` Stephen Powell
  6 siblings, 1 reply; 36+ messages in thread
From: Stephen Powell @ 2012-12-06 20:17 UTC (permalink / raw)
  To: eliz; +Cc: 13086, eggert, stephen_powell

Eli Zaretskii <eliz@gnu.org> writes:

Answering your first mail:

> What is the value of errno in frame #2?

(gdb) p errno
$25 = 10

> Also, can you tell through which line does waitpid exit in this case?

Breakpoint 5, waitpid (pid=4860, status=0x0, options=1) at w32proc.c:1190
1190		  return -1;
$24 = 4860


> Then see how many times waitpid is invoked for the same PID, and
> whether it causes assertion violation on the first or the second time
> (if there is a second time).

I started a new session for this question so the pids will be differant
from the answer to your first mail.

It only does a waitpid once for the offending pid:

Breakpoint 3, waitpid (pid=4880, status=0x0, options=1) at w32proc.c:1132

Breakpoint 1, terminate_due_to_signal (sig=22,
backtrace_limit=2147483647) at emacs.c:314
314	  signal (sig, SIG_DFL);
(gdb) up 2
#2  0x0114a8db in get_child_status (child=4880, status=0x0, options=1,
interruptible=false) at sysdep.c:294
294	      eassert (errno == EINTR);





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
  2012-12-06 18:28           ` Eli Zaretskii
@ 2012-12-06 20:18             ` Paul Eggert
  2012-12-06 20:25               ` Eli Zaretskii
  0 siblings, 1 reply; 36+ messages in thread
From: Paul Eggert @ 2012-12-06 20:18 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 13086, stephen_powell

On 12/06/12 10:28, Eli Zaretskii wrote:
> Yes, but we usually do that only if Emacs cannot possibly recover from
> that internal error.  If Emacs _can_ continue, then we only abort via
> eassert, so that a production version won't crash.

Well, it depends on what "recovery" means.  In this case, if
Emacs ignores the error and continues, it will have a confused
data structure that can cause it to kill unrelated innocent-victim
processes seemingly at random.  Such a bug would be rare and hard to
track down -- I would not be surprised if users have run across
this bug in the wild but have not known that Emacs was the culprit
and have assumed it was a Heisenbug in the innocent-victim program.

One possibility is for Emacs to fall back into recovery mode, such
as what it does now when it runs out of memory.  That is, Emacs
would tell you that it has had an internal error and that you should
save your work and exit as soon as you can.  This would lessen
the likelihood of the confused Emacs causing further damage.





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
  2012-12-06 20:18             ` Paul Eggert
@ 2012-12-06 20:25               ` Eli Zaretskii
  2012-12-06 20:36                 ` Paul Eggert
  0 siblings, 1 reply; 36+ messages in thread
From: Eli Zaretskii @ 2012-12-06 20:25 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 13086, stephen_powell

> Date: Thu, 06 Dec 2012 12:18:36 -0800
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: stephen_powell@optusnet.com.au, 13086@debbugs.gnu.org
> 
> On 12/06/12 10:28, Eli Zaretskii wrote:
> > Yes, but we usually do that only if Emacs cannot possibly recover from
> > that internal error.  If Emacs _can_ continue, then we only abort via
> > eassert, so that a production version won't crash.
> 
> Well, it depends on what "recovery" means.  In this case, if
> Emacs ignores the error and continues, it will have a confused
> data structure that can cause it to kill unrelated innocent-victim
> processes seemingly at random.

How can that happen, if PID is not our child process?

And even if it does happen, the hypothetical problem you envision
hardly justify losing an Emacs session.

> One possibility is for Emacs to fall back into recovery mode, such
> as what it does now when it runs out of memory.  That is, Emacs
> would tell you that it has had an internal error and that you should
> save your work and exit as soon as you can.  This would lessen
> the likelihood of the confused Emacs causing further damage.

It is much easier to remove the offending process object from the list
of those we expect to be dead.





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
  2012-12-06 20:25               ` Eli Zaretskii
@ 2012-12-06 20:36                 ` Paul Eggert
  2012-12-06 20:39                   ` Eli Zaretskii
  0 siblings, 1 reply; 36+ messages in thread
From: Paul Eggert @ 2012-12-06 20:36 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 13086, stephen_powell

On 12/06/12 12:25, Eli Zaretskii wrote:
 
> How can that happen, if PID is not our child process?

Because Emacs regularly kills processes that it thinks are its
children.  If I'm running several applications, one of them
can kill the other even if the applications are otherwise
unrelated -- that's standard Unix semantics.  So if
Emacs mistakenly thinks that it has a child with PID 1234
and does the equivalent of "kill -9 1234", which is something
that delete-process does, Emacs can kill an innocent and
unrelated victim.

> the hypothetical problem you envision
> hardly justify losing an Emacs session.

It depends on how important the other process is to the user.
Sometimes, the other process is more important than Emacs,
and the user would rather lose this Emacs session than lose
the other process.

>> One possibility is for Emacs to fall back into recovery mode
> 
> It is much easier to remove the offending process object from the list
> of those we expect to be dead.

Sure, but there's a good case for being cautious here, even if
it's a bit harder to be cautious.





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
  2012-12-06 20:17 ` Stephen Powell
@ 2012-12-06 20:36   ` Eli Zaretskii
  0 siblings, 0 replies; 36+ messages in thread
From: Eli Zaretskii @ 2012-12-06 20:36 UTC (permalink / raw)
  To: Stephen Powell; +Cc: 13086, eggert, stephen_powell

> Date: Thu, 06 Dec 2012 20:17:45 +0000
> From: Stephen Powell <stephen_powell@optusnet.com.au>
> CC: stephen_powell@optusnet.com.au, 13086@debbugs.gnu.org, eggert@cs.ucla.edu
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> Answering your first mail:
> 
> > What is the value of errno in frame #2?
> 
> (gdb) p errno
> $25 = 10

That's ECHILD.

> > Also, can you tell through which line does waitpid exit in this case?
> 
> Breakpoint 5, waitpid (pid=4860, status=0x0, options=1) at w32proc.c:1190
> 1190		  return -1;
> $24 = 4860

It seems we are looking for a process that is not on our list of
watched child processes.  How could that happen?

Can you look at the contents of the child_procs array, and see if the
process we are looking for is there?  Also, what is the value of
child_proc_count when waitpid is called?

If the above doesn't show the light, can you please compile w32proc.c
with -DFULL_DEBUG, and see what Emacs says about processes it launches
and reaps when you run Emacs under GDB?

Thanks.





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
  2012-12-06 20:36                 ` Paul Eggert
@ 2012-12-06 20:39                   ` Eli Zaretskii
  2012-12-06 20:48                     ` Eli Zaretskii
  2012-12-06 20:57                     ` Paul Eggert
  0 siblings, 2 replies; 36+ messages in thread
From: Eli Zaretskii @ 2012-12-06 20:39 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 13086, stephen_powell

> Date: Thu, 06 Dec 2012 12:36:05 -0800
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: stephen_powell@optusnet.com.au, 13086@debbugs.gnu.org
> 
> On 12/06/12 12:25, Eli Zaretskii wrote:
>  
> > How can that happen, if PID is not our child process?
> 
> Because Emacs regularly kills processes that it thinks are its
> children.  If I'm running several applications, one of them
> can kill the other even if the applications are otherwise
> unrelated -- that's standard Unix semantics.  So if
> Emacs mistakenly thinks that it has a child with PID 1234
> and does the equivalent of "kill -9 1234", which is something
> that delete-process does, Emacs can kill an innocent and
> unrelated victim.

But that already happened when we are calling waitpid.  It cannot be
helped at that point.

> >> One possibility is for Emacs to fall back into recovery mode
> > 
> > It is much easier to remove the offending process object from the list
> > of those we expect to be dead.
> 
> Sure, but there's a good case for being cautious here, even if
> it's a bit harder to be cautious.

It doesn't make sense to be that cautious in an Emacs compiled without
assertions.





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
  2012-12-06 20:39                   ` Eli Zaretskii
@ 2012-12-06 20:48                     ` Eli Zaretskii
  2012-12-06 21:36                       ` Paul Eggert
  2012-12-06 20:57                     ` Paul Eggert
  1 sibling, 1 reply; 36+ messages in thread
From: Eli Zaretskii @ 2012-12-06 20:48 UTC (permalink / raw)
  To: eggert; +Cc: 13086

> Date: Thu, 06 Dec 2012 22:39:33 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 13086@debbugs.gnu.org, stephen_powell@optusnet.com.au
> 
> > Date: Thu, 06 Dec 2012 12:36:05 -0800
> > From: Paul Eggert <eggert@cs.ucla.edu>
> > CC: stephen_powell@optusnet.com.au, 13086@debbugs.gnu.org
> > 
> > On 12/06/12 12:25, Eli Zaretskii wrote:
> >  
> > > How can that happen, if PID is not our child process?
> > 
> > Because Emacs regularly kills processes that it thinks are its
> > children.  If I'm running several applications, one of them
> > can kill the other even if the applications are otherwise
> > unrelated -- that's standard Unix semantics.  So if
> > Emacs mistakenly thinks that it has a child with PID 1234
> > and does the equivalent of "kill -9 1234", which is something
> > that delete-process does, Emacs can kill an innocent and
> > unrelated victim.
> 
> But that already happened when we are calling waitpid.  It cannot be
> helped at that point.
> 
> > >> One possibility is for Emacs to fall back into recovery mode
> > > 
> > > It is much easier to remove the offending process object from the list
> > > of those we expect to be dead.
> > 
> > Sure, but there's a good case for being cautious here, even if
> > it's a bit harder to be cautious.
> 
> It doesn't make sense to be that cautious in an Emacs compiled without
> assertions.

Just to make sure I'm clear: I'm saying that inflooping in these cases
will help nothing.  And that is exactly what we will do in an build
without assertions, if waitpid returns ECHILD.  That simply makes no
sense at all.





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
  2012-12-06 20:39                   ` Eli Zaretskii
  2012-12-06 20:48                     ` Eli Zaretskii
@ 2012-12-06 20:57                     ` Paul Eggert
  2012-12-07  6:46                       ` Eli Zaretskii
  1 sibling, 1 reply; 36+ messages in thread
From: Paul Eggert @ 2012-12-06 20:57 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 13086, stephen_powell

On 12/06/12 12:39, Eli Zaretskii wrote:

> But that already happened when we are calling waitpid.

Not necessarily.  Sometimes Emacs does a waitpid before
killing the child.  For example, the waitpid of
call_process_cleanup might precede the EMACS_KILLPG
of call_process_kill.

> It doesn't make sense to be that cautious in an Emacs compiled without
> assertions.

Well, as I said, I'd rather that the bug got fixed.
But if it's a common bug on Windows platforms, and we can't
easily fix it, perhaps we should add some #ifdef WINDOWSNT
code that ignores the problem along the lines of your suggestion.





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
  2012-12-06 20:48                     ` Eli Zaretskii
@ 2012-12-06 21:36                       ` Paul Eggert
  2012-12-07  6:22                         ` Eli Zaretskii
  0 siblings, 1 reply; 36+ messages in thread
From: Paul Eggert @ 2012-12-06 21:36 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 13086

On 12/06/12 12:48, Eli Zaretskii wrote:
> And that is exactly what we will do in an build
> without assertions, if waitpid returns ECHILD.

One possibility is to replace
"eassert (errno == EINTR);" with
"if (errno != EINTR) emacs_abort ();";
that would prevent the loop.

There are other possibilities,
for example, Emacs could report an error and
refuse to kill any processes thereafter;
this would prevent it from killing innocent-victim
processes.





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
  2012-12-05  7:22 bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126 Stephen Powell
                   ` (4 preceding siblings ...)
  2012-12-06 20:17 ` Stephen Powell
@ 2012-12-06 22:51 ` Stephen Powell
  2012-12-07  6:43   ` Eli Zaretskii
  2012-12-07  0:10 ` Stephen Powell
  6 siblings, 1 reply; 36+ messages in thread
From: Stephen Powell @ 2012-12-06 22:51 UTC (permalink / raw)
  To: eliz; +Cc: 13086, eggert, stephen_powell

Eli Zaretskii <eliz@gnu.org> writes:

> If the above doesn't show the light, can you please compile w32proc.c
> with -DFULL_DEBUG, and see what Emacs says about processes it launches
> and reaps when you run Emacs under GDB?


OK, its 9:30 my time and I got to do stuff, but here is what I've found
so for.

There are two hosts gnus connects to, Yahoo and HotMail.

1. Gnus uses the command

`c:\cygwin\bin\bash.exe -c "openssl s_client -quiet -ssl3 -connect
imap.mail.yahoo.com:993"'

which runs for a three or four seconds and exits with 0 as the return
code.  Emacs keeps track of this process.  Gnus seems to wait for this
to finish.

From the gdb log:

warning: register_child registered fd 6 with pid 6360
warning: sys_kill.TerminateProcess returned 5 for pid 6360


2. Gnus uses the command

`c:\cygwin\bin\bash.exe -c "gnutls-cli --insecure -p 995 pop3.live.com"'

which runs for fifty seconds and exits with 1 as the return code.  Emacs
loses track of this process.  Gnus does not seem to wait for this to
finish but carries on quickly.

From the gdb log:

warning: register_child registered fd 6 with pid 7020

No terminate process but the process has finished.


I'll try to get back to his latter today.





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
  2012-12-05  7:22 bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126 Stephen Powell
                   ` (5 preceding siblings ...)
  2012-12-06 22:51 ` Stephen Powell
@ 2012-12-07  0:10 ` Stephen Powell
  2012-12-07  6:44   ` Eli Zaretskii
                     ` (2 more replies)
  6 siblings, 3 replies; 36+ messages in thread
From: Stephen Powell @ 2012-12-07  0:10 UTC (permalink / raw)
  To: stephen_powell; +Cc: 13086, eggert

I've changed the mail-sources entry for hotmail to use the correct login
parameters so that now it actually works.  It connects quickly rather
than failing to connect and waiting for a timeout.

This has made the emacs abort go away but doesn't solve the problem of
emacs losing track of the child process.  I don't think I know enough to
be able to debug this problem.  Any suggestions about how to proceed are
welcome.





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
  2012-12-06 21:36                       ` Paul Eggert
@ 2012-12-07  6:22                         ` Eli Zaretskii
  2012-12-07 17:54                           ` Paul Eggert
  0 siblings, 1 reply; 36+ messages in thread
From: Eli Zaretskii @ 2012-12-07  6:22 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 13086

> Date: Thu, 06 Dec 2012 13:36:53 -0800
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: 13086@debbugs.gnu.org
> 
> On 12/06/12 12:48, Eli Zaretskii wrote:
> > And that is exactly what we will do in an build
> > without assertions, if waitpid returns ECHILD.
> 
> One possibility is to replace
> "eassert (errno == EINTR);" with
> "if (errno != EINTR) emacs_abort ();";
> that would prevent the loop.
> 
> There are other possibilities,
> for example, Emacs could report an error and
> refuse to kill any processes thereafter;
> this would prevent it from killing innocent-victim
> processes.

Any of these is better than iflooping.





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
  2012-12-06 22:51 ` Stephen Powell
@ 2012-12-07  6:43   ` Eli Zaretskii
  0 siblings, 0 replies; 36+ messages in thread
From: Eli Zaretskii @ 2012-12-07  6:43 UTC (permalink / raw)
  To: Stephen Powell; +Cc: 13086, eggert

> Date: Thu, 06 Dec 2012 22:51:16 +0000
> From: Stephen Powell <stephen_powell@optusnet.com.au>
> CC: stephen_powell@optusnet.com.au, 13086@debbugs.gnu.org, eggert@cs.ucla.edu
> 
> OK, its 9:30 my time and I got to do stuff, but here is what I've found
> so for.

Thanks, we are making good progress, I think.

> 1. Gnus uses the command
> 
> `c:\cygwin\bin\bash.exe -c "openssl s_client -quiet -ssl3 -connect
> imap.mail.yahoo.com:993"'
> 
> which runs for a three or four seconds and exits with 0 as the return
> code.  Emacs keeps track of this process.  Gnus seems to wait for this
> to finish.
> 
> >From the gdb log:
> 
> warning: register_child registered fd 6 with pid 6360
> warning: sys_kill.TerminateProcess returned 5 for pid 6360

I'd like to see a backtrace from where this message comes (in
sys_kill).

> 2. Gnus uses the command
> 
> `c:\cygwin\bin\bash.exe -c "gnutls-cli --insecure -p 995 pop3.live.com"'
> 
> which runs for fifty seconds and exits with 1 as the return code.  Emacs
> loses track of this process.  Gnus does not seem to wait for this to
> finish but carries on quickly.

When you say "keeps track" and "loses track", what specifically do you
mean?  What is different in what Emacs does or displays in these two
cases, evidence do you have that led you to these conclusions?

> >From the gdb log:
> 
> warning: register_child registered fd 6 with pid 7020
> 
> No terminate process but the process has finished.

It would be good to see the series of calls to waitpid in this case,
and also the messages from here:

  #ifdef FULL_DEBUG
		  DebPrint (("select waiting on child %d fd %d\n",
			     cp-child_procs, i));
  #endif

in sys_select, while this process is running and after it exits or
dies.

Thanks again for working on this.





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
  2012-12-07  0:10 ` Stephen Powell
@ 2012-12-07  6:44   ` Eli Zaretskii
  2012-12-21 11:56   ` Eli Zaretskii
  2013-01-22 13:38   ` Eli Zaretskii
  2 siblings, 0 replies; 36+ messages in thread
From: Eli Zaretskii @ 2012-12-07  6:44 UTC (permalink / raw)
  To: Stephen Powell; +Cc: 13086, eggert, stephen_powell

> Date: Fri, 07 Dec 2012 00:10:01 +0000
> From: Stephen Powell <stephen_powell@optusnet.com.au>
> CC: eliz@gnu.org, 13086@debbugs.gnu.org, eggert@cs.ucla.edu
> 
> I've changed the mail-sources entry for hotmail to use the correct login
> parameters so that now it actually works.  It connects quickly rather
> than failing to connect and waiting for a timeout.
> 
> This has made the emacs abort go away but doesn't solve the problem of
> emacs losing track of the child process.  I don't think I know enough to
> be able to debug this problem.  Any suggestions about how to proceed are
> welcome.

I made some suggestions in another mail.  It could be that it will be
easier to debug this if you go back to the configuration where the
connection was timing out, because this will give you (and GDB) more
time to gather evidence.

Thanks.





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
  2012-12-06 20:57                     ` Paul Eggert
@ 2012-12-07  6:46                       ` Eli Zaretskii
  2012-12-07 16:48                         ` Ken Brown
  0 siblings, 1 reply; 36+ messages in thread
From: Eli Zaretskii @ 2012-12-07  6:46 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 13086, stephen_powell

> Date: Thu, 06 Dec 2012 12:57:54 -0800
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: stephen_powell@optusnet.com.au, 13086@debbugs.gnu.org
> 
> Well, as I said, I'd rather that the bug got fixed.
> But if it's a common bug on Windows platforms, and we can't
> easily fix it, perhaps we should add some #ifdef WINDOWSNT
> code that ignores the problem along the lines of your suggestion.

I wasn't thinking about Windows at all, as a matter of fact.  The
current bug on Windows will soon be solved, one way or another.  I was
thinking about Posix platforms inflooping when waitpid returns ECHILD,
when the build is without assertions (which is the default).





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
  2012-12-07  6:46                       ` Eli Zaretskii
@ 2012-12-07 16:48                         ` Ken Brown
  2012-12-07 17:43                           ` Paul Eggert
  0 siblings, 1 reply; 36+ messages in thread
From: Ken Brown @ 2012-12-07 16:48 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 13086, Paul Eggert, stephen_powell

On 12/7/2012 1:46 AM, Eli Zaretskii wrote:
>> Date: Thu, 06 Dec 2012 12:57:54 -0800
>> From: Paul Eggert <eggert@cs.ucla.edu>
>> CC: stephen_powell@optusnet.com.au, 13086@debbugs.gnu.org
>>
>> Well, as I said, I'd rather that the bug got fixed.
>> But if it's a common bug on Windows platforms, and we can't
>> easily fix it, perhaps we should add some #ifdef WINDOWSNT
>> code that ignores the problem along the lines of your suggestion.
>
> I wasn't thinking about Windows at all, as a matter of fact.  The
> current bug on Windows will soon be solved, one way or another.  I was
> thinking about Posix platforms inflooping when waitpid returns ECHILD,
> when the build is without assertions (which is the default).

I haven't been following this thread closely, but could any of this 
discussion be relevant to the emacs-24 branch?

I'm asking because two Cygwin users have reported that Emacs 24.2 
sometimes "hangs", and that there is at least one zombie process when 
this happens; see

   http://cygwin.com/ml/cygwin/2012-11/msg00101.html
   http://cygwin.com/ml/cygwin/2012-12/msg00114.html
   http://cygwin.com/ml/cygwin/2012-12/msg00078.html

I've been trying to get more information from these users.  If you think 
this could be related to the current bug (which seems to be about the 
trunk in spite of the subject line), I'll send the information when I 
get it.  Otherwise I'll file a new bug report.

Ken





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
  2012-12-07 16:48                         ` Ken Brown
@ 2012-12-07 17:43                           ` Paul Eggert
  0 siblings, 0 replies; 36+ messages in thread
From: Paul Eggert @ 2012-12-07 17:43 UTC (permalink / raw)
  To: Ken Brown; +Cc: 13086, stephen_powell

On 12/07/2012 08:48 AM, Ken Brown wrote:
> could any of this discussion be relevant to the emacs-24 branch?

I suspect the emacs-24 Cygwin bug is not due to the particular code
we're talking about, which was briefly in the emacs-24 branch but
was reverted at the first sign of trouble.  So I suggest filing a
new bug report.

It could be that this discussion is relevant, as this area
is clearly buggy in the emacs-24 branch, and the bugs could cause
symptoms such as you describe.  So I suggest mentioning Bug#13086
in any new bug report for emacs-24.





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
  2012-12-07  6:22                         ` Eli Zaretskii
@ 2012-12-07 17:54                           ` Paul Eggert
  2012-12-07 18:57                             ` Eli Zaretskii
  0 siblings, 1 reply; 36+ messages in thread
From: Paul Eggert @ 2012-12-07 17:54 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 13086

On 12/06/2012 10:22 PM, Eli Zaretskii wrote:
> Any of these is better than iflooping.

OK, I changed it to use emacs_abort for now (trunk bzr 111150),
as that's simplest.  We can improve this later as needed.





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
  2012-12-07 17:54                           ` Paul Eggert
@ 2012-12-07 18:57                             ` Eli Zaretskii
  0 siblings, 0 replies; 36+ messages in thread
From: Eli Zaretskii @ 2012-12-07 18:57 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 13086

> Date: Fri, 07 Dec 2012 09:54:28 -0800
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: 13086@debbugs.gnu.org
> 
> On 12/06/2012 10:22 PM, Eli Zaretskii wrote:
> > Any of these is better than iflooping.
> 
> OK, I changed it to use emacs_abort for now (trunk bzr 111150),
> as that's simplest.

Thanks.





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
  2012-12-07  0:10 ` Stephen Powell
  2012-12-07  6:44   ` Eli Zaretskii
@ 2012-12-21 11:56   ` Eli Zaretskii
  2012-12-23 17:12     ` Eli Zaretskii
  2013-01-22 13:38   ` Eli Zaretskii
  2 siblings, 1 reply; 36+ messages in thread
From: Eli Zaretskii @ 2012-12-21 11:56 UTC (permalink / raw)
  To: Stephen Powell; +Cc: 13086, Fabrice Popineau

Fabrice Popineau reported a similar problem on his machine, and
debugged it to find the source.  Based on his findings, I committed to
the emacs-24 branch (revision 111050) changes that should avoid losing
track of child processes, and hopefully (after they are merged to the
trunk) also resolve the problem reported here.

Thanks.





^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
  2012-12-21 11:56   ` Eli Zaretskii
@ 2012-12-23 17:12     ` Eli Zaretskii
  0 siblings, 0 replies; 36+ messages in thread
From: Eli Zaretskii @ 2012-12-23 17:12 UTC (permalink / raw)
  To: stephen_powell; +Cc: 13086, fabrice.popineau

> Date: Fri, 21 Dec 2012 13:56:38 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> CC: 13086@debbugs.gnu.org, Fabrice Popineau <fabrice.popineau@gmail.com>
> 
> Fabrice Popineau reported a similar problem on his machine, and
> debugged it to find the source.  Based on his findings, I committed to
> the emacs-24 branch (revision 111050) changes that should avoid losing
> track of child processes, and hopefully (after they are merged to the
> trunk) also resolve the problem reported here.

Another commit (trunk revision 111311) should make handling of
subprocess shutdown on MS-Windows even more solid.  As a nice side
effect, the annoying "reader_thread.SetEvent failed with 6 for fd 3"
message no longer appears.






^ permalink raw reply	[flat|nested] 36+ messages in thread

* bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
  2012-12-07  0:10 ` Stephen Powell
  2012-12-07  6:44   ` Eli Zaretskii
  2012-12-21 11:56   ` Eli Zaretskii
@ 2013-01-22 13:38   ` Eli Zaretskii
  2 siblings, 0 replies; 36+ messages in thread
From: Eli Zaretskii @ 2013-01-22 13:38 UTC (permalink / raw)
  To: Stephen Powell; +Cc: 13086-done, eggert

> Date: Fri, 07 Dec 2012 00:10:01 +0000
> From: Stephen Powell <stephen_powell@optusnet.com.au>
> CC: eliz@gnu.org, 13086@debbugs.gnu.org, eggert@cs.ucla.edu
> 
> I've changed the mail-sources entry for hotmail to use the correct login
> parameters so that now it actually works.  It connects quickly rather
> than failing to connect and waiting for a timeout.
> 
> This has made the emacs abort go away but doesn't solve the problem of
> emacs losing track of the child process.  I don't think I know enough to
> be able to debug this problem.  Any suggestions about how to proceed are
> welcome.

No further reports about this for a long time, so I presume the
problem was solved by the related changes on the trunk.

I'm therefore closing this bug.  Feel free to reopen if something
similar surfaces.

Thanks.





^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2013-01-22 13:38 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-05  7:22 bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126 Stephen Powell
2012-12-05 16:42 ` Eli Zaretskii
2012-12-05 19:28 ` Stephen Powell
2012-12-05 21:25   ` Eli Zaretskii
2012-12-06  2:04     ` Paul Eggert
2012-12-06  3:51       ` Eli Zaretskii
2012-12-06  5:07         ` Paul Eggert
2012-12-06 18:28           ` Eli Zaretskii
2012-12-06 20:18             ` Paul Eggert
2012-12-06 20:25               ` Eli Zaretskii
2012-12-06 20:36                 ` Paul Eggert
2012-12-06 20:39                   ` Eli Zaretskii
2012-12-06 20:48                     ` Eli Zaretskii
2012-12-06 21:36                       ` Paul Eggert
2012-12-07  6:22                         ` Eli Zaretskii
2012-12-07 17:54                           ` Paul Eggert
2012-12-07 18:57                             ` Eli Zaretskii
2012-12-06 20:57                     ` Paul Eggert
2012-12-07  6:46                       ` Eli Zaretskii
2012-12-07 16:48                         ` Ken Brown
2012-12-07 17:43                           ` Paul Eggert
2012-12-06  7:35         ` Paul Eggert
2012-12-06 18:41           ` Eli Zaretskii
2012-12-06  5:07 ` Stephen Powell
2012-12-06 19:10 ` Stephen Powell
2012-12-06 19:36   ` Eli Zaretskii
2012-12-06 19:48     ` Eli Zaretskii
2012-12-06 20:17 ` Stephen Powell
2012-12-06 20:36   ` Eli Zaretskii
2012-12-06 22:51 ` Stephen Powell
2012-12-07  6:43   ` Eli Zaretskii
2012-12-07  0:10 ` Stephen Powell
2012-12-07  6:44   ` Eli Zaretskii
2012-12-21 11:56   ` Eli Zaretskii
2012-12-23 17:12     ` Eli Zaretskii
2013-01-22 13:38   ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).