unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#19868: 25.0.50; Compilation eats buffers
@ 2015-02-14 19:30 Richard Copley
  2015-02-15 17:53 ` Eli Zaretskii
  2016-08-12 20:47 ` bug#19868: #19868 " Noam Postavsky
  0 siblings, 2 replies; 10+ messages in thread
From: Richard Copley @ 2015-02-14 19:30 UTC (permalink / raw)
  To: 19868

On Windows, with MinGW gcc.exe installed and on the path, save a file
"c:\temp\bug.c" containing these two lines:

#include <windows.h>
int main () { Sleep (5000); }

Compile with "M-x compile RET", supplying this compile-command:
gcc -mwindows -o bug.exe bug.c && bug.exe

Within 5 seconds, execute "M-x compile" again and answer "yes" to kill
the existing process. The process doesn't respond to the signal,
and Emacs hangs inside the call to `delete-process' in
`compilation-start'.

When the process does eventually die and the `delete-process' call
returns, the current buffer has changed from *compilation* to the buffer
from which the compilation was launched (which will often be a source
code buffer).

`compilation-start' then proceeds to erase the buffer and discard its
undo history. This is potentially very bad news for the user's source
code.

I'm not sure where the buffer gets changed (presumably in a sentinel,
but `compilation-sentinel' looks OK to me). Wrapping the
`delete-process' call inside a `save-excursion' fixes (or hides?) the
problem.


In GNU Emacs 25.0.50.1 (x86_64-w64-mingw32)
 of 2015-02-09 on MACHINE
Repository revision: 21d1f8b85eec8fc1f87bb30398e449f6b20b6ecc
Windowing system distributor `Microsoft Corp.', version 6.3.9600
Configured using:
 `configure --prefix /c/emacs/emacs-20150209-192633
 --disable-dependency-tracking
 --enable-locallisppath=%emacs_dir%/../site-lisp --with-wide-int
 --build=x86_64-w64-mingw32 'CPPFLAGS=-I G:/usr/include -I
 C:/GnuWin32/include' 'LDFLAGS=-L G:/usr/lib -L C:/GnuWin32/lib''

Configured features:
XPM JPEG TIFF GIF PNG SOUND NOTIFY ACL GNUTLS LIBXML2 ZLIB





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#19868: 25.0.50; Compilation eats buffers
  2015-02-14 19:30 bug#19868: 25.0.50; Compilation eats buffers Richard Copley
@ 2015-02-15 17:53 ` Eli Zaretskii
  2015-02-17  0:25   ` Richard Copley
  2016-08-12 20:47 ` bug#19868: #19868 " Noam Postavsky
  1 sibling, 1 reply; 10+ messages in thread
From: Eli Zaretskii @ 2015-02-15 17:53 UTC (permalink / raw)
  To: Richard Copley; +Cc: 19868

> Date: Sat, 14 Feb 2015 19:30:45 +0000
> From: Richard Copley <rcopley@gmail.com>
> 
> On Windows, with MinGW gcc.exe installed and on the path, save a file
> "c:\temp\bug.c" containing these two lines:
> 
> #include <windows.h>
> int main () { Sleep (5000); }
> 
> Compile with "M-x compile RET", supplying this compile-command:
> gcc -mwindows -o bug.exe bug.c && bug.exe
> 
> Within 5 seconds, execute "M-x compile" again and answer "yes" to kill
> the existing process. The process doesn't respond to the signal,

There are no signals on Windows.  Emacs simulates SIGINT and SIGKILL
by other means, see sys_kill.

> and Emacs hangs inside the call to `delete-process' in
> `compilation-start'.
> 
> When the process does eventually die and the `delete-process' call
> returns, the current buffer has changed from *compilation* to the buffer
> from which the compilation was launched (which will often be a source
> code buffer).
> 
> `compilation-start' then proceeds to erase the buffer and discard its
> undo history. This is potentially very bad news for the user's source
> code.

I cannot reproduce this: for me, Emacs doesn't hang at all.  As soon
as I answer YES to the kill process question, I see in Process
Explorer that cmdproxy, cmd.exe, and the program that sleeps are all
terminated, and the new compilation begins.  Like I'd expect.

If I instrument the sys_kill function, I see that we first send a
simulated Ctrl-C keystroke to the process, and a second afterwards
terminate it forcefully, which is consistent with the calls to
interrupt-process and delete-process in compilation-start.

I tried this on Windows 7 and XP, and both show the same correct
behavior.

It could be that what you see is specific to Windows 8, or to 64-bit
programs, or to how MinGW64 sets up the process in its startup code (I
used MinGW32).

You say above that Emacs hangs inside the delete-process call -- can
you show a backtrace in that state, preferably from an unoptimized
build?  I'd like to see where exactly it hangs.

Also, is the -mwindows compiler switch a factor here, i.e. does the
problem happen with a console application that sleeps?  (I'm not sure
it should matter, because the process that we are killing is cmdproxy,
not the program you compiled.)

In addition, can you look at the relevant processes in Process
Explorer and seed if any of them are killed when you answer YES?

> I'm not sure where the buffer gets changed (presumably in a sentinel,
> but `compilation-sentinel' looks OK to me).

Run all this under GDB, put a breakpoint on a low-level function that
switches buffers (e.g., in set_buffer_internal), and you will see in
the backtrace which Lisp function triggers that.  It is advisable to
manually load compile.el in advance, so that xbacktrace shows more
details.

> In GNU Emacs 25.0.50.1 (x86_64-w64-mingw32)
>  of 2015-02-09 on MACHINE
> Repository revision: 21d1f8b85eec8fc1f87bb30398e449f6b20b6ecc
> Windowing system distributor `Microsoft Corp.', version 6.3.9600
> Configured using:
>  `configure --prefix /c/emacs/emacs-20150209-192633
>  --disable-dependency-tracking
>  --enable-locallisppath=%emacs_dir%/../site-lisp --with-wide-int
>  --build=x86_64-w64-mingw32 'CPPFLAGS=-I G:/usr/include -I
>  C:/GnuWin32/include' 'LDFLAGS=-L G:/usr/lib -L C:/GnuWin32/lib''

Any idea why you are building --with-wide-int?  It's supposed to be a
no-op in a 64-bit build.  (This is not related to the bug.)





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#19868: 25.0.50; Compilation eats buffers
  2015-02-15 17:53 ` Eli Zaretskii
@ 2015-02-17  0:25   ` Richard Copley
  0 siblings, 0 replies; 10+ messages in thread
From: Richard Copley @ 2015-02-17  0:25 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 19868

[-- Attachment #1: Type: text/plain, Size: 5005 bytes --]

On 15 February 2015 at 17:53, Eli Zaretskii <eliz@gnu.org> wrote:
>>
>> On Windows, with MinGW gcc.exe installed and on the path, save a file
>> "c:\temp\bug.c" containing these two lines:
>>
>> #include <windows.h>
>> int main () { Sleep (5000); }
>>
>> Compile with "M-x compile RET", supplying this compile-command:
>> gcc -mwindows -o bug.exe bug.c && bug.exe
>>
>> Within 5 seconds, execute "M-x compile" again and answer "yes" to kill
>> the existing process. The process doesn't respond to the signal,
>
> There are no signals on Windows.  Emacs simulates SIGINT and SIGKILL
> by other means, see sys_kill.
>
>> and Emacs hangs inside the call to `delete-process' in
>> `compilation-start'.
>>
>> When the process does eventually die and the `delete-process' call
>> returns, the current buffer has changed from *compilation* to the buffer
>> from which the compilation was launched (which will often be a source
>> code buffer).
>>
>> `compilation-start' then proceeds to erase the buffer and discard its
>> undo history. This is potentially very bad news for the user's source
>> code.
>
> I cannot reproduce this: for me, Emacs doesn't hang at all.  As soon
> as I answer YES to the kill process question, I see in Process
> Explorer that cmdproxy, cmd.exe, and the program that sleeps are all
> terminated, and the new compilation begins.  Like I'd expect.
>
> If I instrument the sys_kill function, I see that we first send a
> simulated Ctrl-C keystroke to the process, and a second afterwards
> terminate it forcefully, which is consistent with the calls to
> interrupt-process and delete-process in compilation-start.
>
> I tried this on Windows 7 and XP, and both show the same correct
> behavior.
>
> It could be that what you see is specific to Windows 8, or to 64-bit
> programs, or to how MinGW64 sets up the process in its startup code (I
> used MinGW32).

I see my problem no matter what compiler I use to build "bug.exe"
(old-fashioned MinGW32, and both the 32- and 64-bit MinGW-W64
GCC 4.9.2 toolchains). I'll try on Windows 7, and if I get time,
with 32-bit Emacs.

when building "bug.exe" with good old MinGW and with
both the 32- and 64-bit toolchains from MinGW-W64. I haven't tried it
with a 32-bit Emacs. I will try that, and on Windows 7, when I have time.

> You say above that Emacs hangs inside the delete-process call -- can
> you show a backtrace in that state, preferably from an unoptimized
> build?  I'd like to see where exactly it hangs.

I tried to work out how to control the optimization level when building
Emacs but I'm stumped. How do you do that? (If there are configure
flags, can they be mentioned in "configure --help"?)

FWIW, attached is the result of "thread apply all bt full" after typing
Ctrl-C in GDB while debugging an optimized Emacs that was hanging.
Looks like I'm doing something horribly wrong. Sorry about that.

> Also, is the -mwindows compiler switch a factor here, i.e. does the
> problem happen with a console application that sleeps?

Yes, -mwindows is needed. Console applications die as expected.

> (I'm not sure it should matter, because the process that we are
> killing is cmdproxy, not the program you compiled.)

Then I don't understand why a GUI program would ever die in response
to that. (Would runemacs.exe?) Really I didn't expect it to; that's
not the bug I was reporting (though I'm happy to help fix it if it is
a bug).

> In addition, can you look at the relevant processes in Process
> Explorer and seed if any of them are killed when you answer YES?

"cmdproxy.exe" and its descendants "cmd.exe" and "conhost.exe"
are killed, leaving just the orphaned "bug.exe".

>> I'm not sure where the buffer gets changed (presumably in a sentinel,
>> but `compilation-sentinel' looks OK to me).
>
> Run all this under GDB, put a breakpoint on a low-level function that
> switches buffers (e.g., in set_buffer_internal), and you will see in
> the backtrace which Lisp function triggers that.  It is advisable to
> manually load compile.el in advance, so that xbacktrace shows more
> details.

I'm sorry to say that, mysteriously, I can no longer reproduce the
effect where the current buffer changes during the `delete-process'
call and causes work to be lost. I can't see what I'm doing differently.
I might have to get back to you another time.

>> In GNU Emacs 25.0.50.1 (x86_64-w64-mingw32)
>>  of 2015-02-09 on MACHINE
>> Repository revision: 21d1f8b85eec8fc1f87bb30398e449f6b20b6ecc
>> Windowing system distributor `Microsoft Corp.', version 6.3.9600
>> Configured using:
>>  `configure --prefix /c/emacs/emacs-20150209-192633
>>  --disable-dependency-tracking
>>  --enable-locallisppath=%emacs_dir%/../site-lisp --with-wide-int
>>  --build=x86_64-w64-mingw32 'CPPFLAGS=-I G:/usr/include -I
>>  C:/GnuWin32/include' 'LDFLAGS=-L G:/usr/lib -L C:/GnuWin32/lib''
>
> Any idea why you are building --with-wide-int?  It's supposed to be a
> no-op in a 64-bit build.  (This is not related to the bug.)

I'll remove it, thanks.

[-- Attachment #2: bt.txt --]
[-- Type: text/plain, Size: 6183 bytes --]

Program received signal SIGINT, Interrupt.
[Switching to Thread 8568.0x1518]
0x00007ff9c5cc3233 in RegLoadMUIStringA ()
   from C:\WINDOWS\system32\KernelBase.dll
(gdb) thread apply all bt full

Thread 7 (Thread 8568.0x1518):
#0  0x00007ff9c5cc3233 in RegLoadMUIStringA ()
   from C:\WINDOWS\system32\KernelBase.dll
No symbol table info available.
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 6 (Thread 8568.0x450):
#0  0x00007ff9c8aa28ca in ntdll!ZwWaitForWorkViaWorkerFactory ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
No symbol table info available.
#1  0x00007ff9c8a44d26 in ntdll!RtlFreeUnicodeString ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
No symbol table info available.
#2  0x00007ff9c88213d2 in KERNEL32!BaseThreadInitThunk ()
   from C:\WINDOWS\system32\kernel32.dll
No symbol table info available.
#3  0x00007ff9c8a7eb64 in ntdll!RtlUserThreadStart ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
No symbol table info available.
#4  0x0000000000000000 in ?? ()
No symbol table info available.
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Thread 5 (Thread 8568.0x1814):
#0  0x00007ff9c8aa28ca in ntdll!ZwWaitForWorkViaWorkerFactory ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
No symbol table info available.
#1  0x00007ff9c8a44d26 in ntdll!RtlFreeUnicodeString ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
No symbol table info available.
#2  0x00007ff9c88213d2 in KERNEL32!BaseThreadInitThunk ()
   from C:\WINDOWS\system32\kernel32.dll
No symbol table info available.
#3  0x00007ff9c8a7eb64 in ntdll!RtlUserThreadStart ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
No symbol table info available.
#4  0x0000000000000000 in ?? ()
No symbol table info available.
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Thread 4 (Thread 8568.0x1930):
#0  0x00007ff9c8aa0e3a in ntdll!ZwReadFile ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
No symbol table info available.
#1  0x00007ff9c5c183a8 in ReadFile () from C:\WINDOWS\system32\KernelBase.dll
No symbol table info available.
#2  0x00007ff9c8981b59 in msvcrt!__crtGetStringTypeW ()
   from C:\WINDOWS\system32\msvcrt.dll
No symbol table info available.
#3  0x00007ff9c8981c79 in msvcrt!_read () from C:\WINDOWS\system32\msvcrt.dll
No symbol table info available.
#4  0x0000000400196135 in _sys_read_ahead (fd=<optimized out>)
    at g:/emacs/repo/emacs/src/w32.c:7990
        cp = 0x100000000
        rc = 0
#5  0x000000040019b815 in reader_thread (arg=0x4017a5940 <child_procs>)
    at g:/emacs/repo/emacs/src/w32proc.c:1017
        rc = <optimized out>
        cp = 0x4017a5940 <child_procs>
#6  0x00007ff9c88213d2 in KERNEL32!BaseThreadInitThunk ()
   from C:\WINDOWS\system32\kernel32.dll
No symbol table info available.
#7  0x00007ff9c8a7eb64 in ntdll!RtlUserThreadStart ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
No symbol table info available.
#8  0x0000000000000000 in ?? ()
No symbol table info available.
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Thread 3 (Thread 8568.0x2258):
#0  0x00007ff9c60726ca in USER32!GetMessageW ()
   from C:\WINDOWS\system32\user32.dll
No symbol table info available.
#1  0x00007ff9c6072695 in USER32!GetMessageW ()
   from C:\WINDOWS\system32\user32.dll
No symbol table info available.
#2  0x0000000400170068 in w32_msg_pump (msg_buf=0x2ecfef0)
    at g:/emacs/repo/emacs/src/w32fns.c:2526
        msg = {hwnd = 0x1906d0, message = 275, wParam = 1, lParam = 0,
          time = 452290093, pt = {x = 273, y = 1075}}
        focus_window = <optimized out>
#3  0x00000004001705c0 in w32_msg_worker (arg=<optimized out>)
---Type <return> to continue, or q <return> to quit---
    at g:/emacs/repo/emacs/src/w32fns.c:2747
        msg = {hwnd = 0x0, message = 0, wParam = 0, lParam = 0, time = 0,
          pt = {x = 0, y = 0}}
        dummy_buf = {next = 0x0, w32msg = {msg = {hwnd = 0x0, message = 0,
              wParam = 0, lParam = 0, time = 0, pt = {x = 0, y = 0}},
            dwModifiers = 0, rect = {left = 0, top = 0, right = 0,
              bottom = 0}}, result = 0, completed = 0}
#4  0x00007ff9c88213d2 in KERNEL32!BaseThreadInitThunk ()
   from C:\WINDOWS\system32\kernel32.dll
No symbol table info available.
#5  0x00007ff9c8a7eb64 in ntdll!RtlUserThreadStart ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
No symbol table info available.
#6  0x0000000000000000 in ?? ()
No symbol table info available.
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Thread 2 (Thread 8568.0x15a4):
#0  0x00007ff9c8aa111a in ntdll!ZwDelayExecution ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
No symbol table info available.
#1  0x00007ff9c5c1121a in SleepEx () from C:\WINDOWS\system32\KernelBase.dll
No symbol table info available.
#2  0x000000040019bdab in timer_loop (arg=0x0)
    at g:/emacs/repo/emacs/src/w32proc.c:381
        sleep_time = <optimized out>
        handler = <optimized out>
        now = <optimized out>
        expire = <optimized out>
        reload = <optimized out>
        itimer = 0x0
        which = <optimized out>
        sig = <optimized out>
        crit = <optimized out>
        max_sleep = <optimized out>
        hth = 0x0
#3  0x00007ff9c88213d2 in KERNEL32!BaseThreadInitThunk ()
   from C:\WINDOWS\system32\kernel32.dll
No symbol table info available.
#4  0x00007ff9c8a7eb64 in ntdll!RtlUserThreadStart ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
No symbol table info available.
#5  0x0000000000000000 in ?? ()
No symbol table info available.
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Thread 1 (Thread 8568.0x10c8):
#0  0x00007ff9c8aa0e1a in ntdll!ZwWaitForSingleObject ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
No symbol table info available.
#1  0x00007ff9c8a49a85 in ntdll!RtlImageNtHeaderEx ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
No symbol table info available.
#2  0x00007ff9c8a47f44 in ntdll!RtlEnterCriticalSection ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
No symbol table info available.
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#19868: #19868 25.0.50; Compilation eats buffers
  2015-02-14 19:30 bug#19868: 25.0.50; Compilation eats buffers Richard Copley
  2015-02-15 17:53 ` Eli Zaretskii
@ 2016-08-12 20:47 ` Noam Postavsky
  2016-08-13  6:44   ` Eli Zaretskii
  1 sibling, 1 reply; 10+ messages in thread
From: Noam Postavsky @ 2016-08-12 20:47 UTC (permalink / raw)
  To: 19868; +Cc: Richard Copley

retitle 19868 [w32] restarting compilation hangs trying to kill process
found 19868 25.1
quit

> I tried this on Windows 7 and XP, and both show the same correct
> behavior.
>
> It could be that what you see is specific to Windows 8, or to 64-bit
> programs, or to how MinGW64 sets up the process in its startup code (I
> used MinGW32).
>
> You say above that Emacs hangs inside the delete-process call -- can
> you show a backtrace in that state, preferably from an unoptimized
> build?  I'd like to see where exactly it hangs.

I reproduced this (the hanging, not the buffer eating) on Windows 10,
Emacs 25.1, MinGW64.  Stepping with gdb I found the the hang occurs in
sys_close where it calls _close (fd).  This is being called from
deactivate_process:

  for (i = 0; i < PROCESS_OPEN_FDS; i++)
    close_process_fd (&p->open_fd[i]); // <-- when i == 2





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#19868: #19868 25.0.50; Compilation eats buffers
  2016-08-12 20:47 ` bug#19868: #19868 " Noam Postavsky
@ 2016-08-13  6:44   ` Eli Zaretskii
  2016-08-15 22:19     ` Noam Postavsky
  0 siblings, 1 reply; 10+ messages in thread
From: Eli Zaretskii @ 2016-08-13  6:44 UTC (permalink / raw)
  To: Noam Postavsky; +Cc: rcopley, 19868

> From: Noam Postavsky <npostavs@users.sourceforge.net>
> Date: Fri, 12 Aug 2016 16:47:07 -0400
> Cc: Richard Copley <rcopley@gmail.com>
> 
> I reproduced this (the hanging, not the buffer eating) on Windows 10,
> Emacs 25.1, MinGW64.  Stepping with gdb I found the the hang occurs in
> sys_close where it calls _close (fd).  This is being called from
> deactivate_process:
> 
>   for (i = 0; i < PROCESS_OPEN_FDS; i++)
>     close_process_fd (&p->open_fd[i]); // <-- when i == 2

Does it hang in the _close call itself, or somewhere else?

And what is the value of fd?

Can you instrument the relevant code with printf's and see this
happening without stepping through the code with GDB?  Doing the
latter might change the timing of the calls, so we might be trying to
use file descriptors when the process (cmdproxy) is already dead, and
so the other end of the pipe no longer exists.

In any case, this is a tricky situation, because we kill the shell,
not the program it runs.  When the program run from the shell was
built with -mwindows, it is detached from the shell, and the various
Emacs facilities that try to kill subprocesses are likely to fail in
exciting ways.

IOW, running -mwindows programs from the likes of "M-x compile" is not
really supported on MS-Windows, I think.  Of course, if we can figure
out how to avoid the hang in this case, we should.





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#19868: #19868 25.0.50; Compilation eats buffers
  2016-08-13  6:44   ` Eli Zaretskii
@ 2016-08-15 22:19     ` Noam Postavsky
  2016-08-16 14:18       ` Eli Zaretskii
  0 siblings, 1 reply; 10+ messages in thread
From: Noam Postavsky @ 2016-08-15 22:19 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Richard Copley, 19868

On Sat, Aug 13, 2016 at 2:44 AM, Eli Zaretskii <eliz@gnu.org> wrote:
>> From: Noam Postavsky <npostavs@users.sourceforge.net>
>> Date: Fri, 12 Aug 2016 16:47:07 -0400
>> Cc: Richard Copley <rcopley@gmail.com>
>>
>> I reproduced this (the hanging, not the buffer eating) on Windows 10,
>> Emacs 25.1, MinGW64.  Stepping with gdb I found the the hang occurs in
>> sys_close where it calls _close (fd).  This is being called from
>> deactivate_process:
>>
>>   for (i = 0; i < PROCESS_OPEN_FDS; i++)
>>     close_process_fd (&p->open_fd[i]); // <-- when i == 2
>
> Does it hang in the _close call itself, or somewhere else?

It's in the _close call itself.

>
> And what is the value of fd?
>
> Can you instrument the relevant code with printf's and see this
> happening without stepping through the code with GDB?  Doing the
> latter might change the timing of the calls, so we might be trying to
> use file descriptors when the process (cmdproxy) is already dead, and
> so the other end of the pipe no longer exists.

I put fprintf+fflush before close_process_fd and around _close:

close_process_fd(-1[i = 0])
close_process_fd(4[i = 1])
going to _close(4)...done _close(4)
close_process_fd(5[i = 2])
going to _close(5)... // here Emacs hangs until I kill bug.exe





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#19868: #19868 25.0.50; Compilation eats buffers
  2016-08-15 22:19     ` Noam Postavsky
@ 2016-08-16 14:18       ` Eli Zaretskii
  2016-08-16 21:17         ` Noam Postavsky
  0 siblings, 1 reply; 10+ messages in thread
From: Eli Zaretskii @ 2016-08-16 14:18 UTC (permalink / raw)
  To: Noam Postavsky; +Cc: rcopley, 19868

> From: Noam Postavsky <npostavs@users.sourceforge.net>
> Date: Mon, 15 Aug 2016 18:19:05 -0400
> Cc: 19868@debbugs.gnu.org, Richard Copley <rcopley@gmail.com>
> 
> On Sat, Aug 13, 2016 at 2:44 AM, Eli Zaretskii <eliz@gnu.org> wrote:
> >> From: Noam Postavsky <npostavs@users.sourceforge.net>
> >> Date: Fri, 12 Aug 2016 16:47:07 -0400
> >> Cc: Richard Copley <rcopley@gmail.com>
> >>
> >> I reproduced this (the hanging, not the buffer eating) on Windows 10,
> >> Emacs 25.1, MinGW64.  Stepping with gdb I found the the hang occurs in
> >> sys_close where it calls _close (fd).  This is being called from
> >> deactivate_process:
> >>
> >>   for (i = 0; i < PROCESS_OPEN_FDS; i++)
> >>     close_process_fd (&p->open_fd[i]); // <-- when i == 2
> >
> > Does it hang in the _close call itself, or somewhere else?
> 
> It's in the _close call itself.

Hm... not so good.

> > And what is the value of fd?
> >
> > Can you instrument the relevant code with printf's and see this
> > happening without stepping through the code with GDB?  Doing the
> > latter might change the timing of the calls, so we might be trying to
> > use file descriptors when the process (cmdproxy) is already dead, and
> > so the other end of the pipe no longer exists.
> 
> I put fprintf+fflush before close_process_fd and around _close:
> 
> close_process_fd(-1[i = 0])
> close_process_fd(4[i = 1])
> going to _close(4)...done _close(4)
> close_process_fd(5[i = 2])
> going to _close(5)... // here Emacs hangs until I kill bug.exe

Can you tell what descriptor 5 is open on?  Is it for input, for
output, for something else?

Also, is "until I kill bug.exe" accurate?  That program just waits for
5 seconds, so after that it should exit by itself.  Are you saying it
doesn't unless killed by external means?

Thanks.





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#19868: #19868 25.0.50; Compilation eats buffers
  2016-08-16 14:18       ` Eli Zaretskii
@ 2016-08-16 21:17         ` Noam Postavsky
  2016-08-17 15:15           ` Eli Zaretskii
  0 siblings, 1 reply; 10+ messages in thread
From: Noam Postavsky @ 2016-08-16 21:17 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Richard Copley, 19868

On Tue, Aug 16, 2016 at 10:18 AM, Eli Zaretskii <eliz@gnu.org> wrote:
>>
>> I put fprintf+fflush before close_process_fd and around _close:
>>
>> close_process_fd(-1[i = 0])
>> close_process_fd(4[i = 1])
>> going to _close(4)...done _close(4)
>> close_process_fd(5[i = 2])
>> going to _close(5)... // here Emacs hangs until I kill bug.exe
>
> Can you tell what descriptor 5 is open on?  Is it for input, for
> output, for something else?

I found this enum which indicates that i=2 would be READ_FROM_SUBPROCESS.
/* Indexes of file descriptors in open_fds.  */
enum
  {
    /* The pipe from Emacs to its subprocess.  */
    SUBPROCESS_STDIN,
    WRITE_TO_SUBPROCESS,

    /* The main pipe from the subprocess to Emacs.  */
    READ_FROM_SUBPROCESS,
    SUBPROCESS_STDOUT,

I confirmed with printfs that open_fd[2] is set to 5 by the
emacs_pipe() calls in create_process (I also double checked with gdb
that nobody else sets it in between).

I printed all open_fd values from deactivate_process, just before the
closing loop, I got

deactivate_process()open_fd[0] = -1, open_fd[1] = 4, open_fd[2] = 5,
open_fd[3] = -1, open_fd[4] = -1, open_fd[5] = -1,

So, only WRITE_TO_SUBPROCESS and READ_FROM_SUBPROCESS are open. When
compiling bug.c without -mwindows, all open_fd values are -1 at that
spot.

>
> Also, is "until I kill bug.exe" accurate?  That program just waits for
> 5 seconds, so after that it should exit by itself.  Are you saying it
> doesn't unless killed by external means?

Ah, sorry, I upped the waiting time to 5 minutes, because 5 seconds
seemed a bit short for debugging. So I should have said "until bug.exe
terminates" (either by itself, or because I told it to).

Another observation: if I close Emacs while it's running bug.exe,
Emacs closes successfully, but leaves bug.exe running (even though I
answer yes at the prompt to kill it).





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#19868: #19868 25.0.50; Compilation eats buffers
  2016-08-16 21:17         ` Noam Postavsky
@ 2016-08-17 15:15           ` Eli Zaretskii
  2016-08-29 21:48             ` Noam Postavsky
  0 siblings, 1 reply; 10+ messages in thread
From: Eli Zaretskii @ 2016-08-17 15:15 UTC (permalink / raw)
  To: Noam Postavsky; +Cc: rcopley, 19868

> From: Noam Postavsky <npostavs@users.sourceforge.net>
> Date: Tue, 16 Aug 2016 17:17:20 -0400
> Cc: 19868@debbugs.gnu.org, Richard Copley <rcopley@gmail.com>
> 
> >> going to _close(5)... // here Emacs hangs until I kill bug.exe
> >
> > Can you tell what descriptor 5 is open on?  Is it for input, for
> > output, for something else?
> 
> I found this enum which indicates that i=2 would be READ_FROM_SUBPROCESS.

OK.

> I printed all open_fd values from deactivate_process, just before the
> closing loop, I got
> 
> deactivate_process()open_fd[0] = -1, open_fd[1] = 4, open_fd[2] = 5,
> open_fd[3] = -1, open_fd[4] = -1, open_fd[5] = -1,
> 
> So, only WRITE_TO_SUBPROCESS and READ_FROM_SUBPROCESS are open. When
> compiling bug.c without -mwindows, all open_fd values are -1 at that
> spot.

The last sentence shows an important difference between the two
cases.  Can you spot the code which makes the 2 handles -1 in the case
of a console (not -mwindows) application?  That might give us a clue
about the reason for Emacs hanging in _close.

One possibility is that the reader thread is trying to read from the
descriptor which we are trying to close.  Maybe that prevents _close
from completing its job.  Then the question is how does this succeed
in the case of a console application?

compilation-start sends SIGINT to the subprocess, then waits for 1
sec, then calls delete-process.  On Windows, interrupting a process is
implemented in w32proc.c:sys_kill.  My guess is that with a console
application, sending the simulated Ctrl-C to cmdproxy kills both
cmdproxy and the application, while in the -mwindows case only
cmdproxy (and perhaps its child cmd.exe) is killed.  But the details
still evade me: how would the above explain the fact that the
descriptors are already -1 when deactivate_process is called, who
closes them, and by what trigger?

Another thing to try is to set w32-start-process-share-console to a
non-nil value.  I don't know if it will help or make things worse, I
think this option was never seriously used, and I don't know what GUI
applications do on Windows when their control handler function is
called.  MSDN says in
https://msdn.microsoft.com/en-us/library/windows/desktop/ms683155(v=vs.85).aspx:

  All console processes have a default handler function that calls the
  ExitProcess function.

which says nothing about non-console applications.  And I think
non-console applications are not attached to any console anyway, so
this option will probably do nothing useful.  Still, it could give us
some clues about what's going on.

Some more interesting (though very vague) documentation is here:

 https://msdn.microsoft.com/en-us/library/windows/desktop/ms686016(v=vs.85).aspx

> Another observation: if I close Emacs while it's running bug.exe,
> Emacs closes successfully, but leaves bug.exe running (even though I
> answer yes at the prompt to kill it).

That's just another manifestation of the fact that we cannot reliably
kill grandchildren processes on MS-Windows, especially when they are
not console applications.  We can only kill the immediate child
process, in this case cmdproxy (and probably its child cmd.exe as
well).

Thanks.

P.S. Don't hesitate to ask questions about how this stuff works, if
something is unclear.  There's a large comment around line 390 in
w32proc.c which provides an overview, so if you didn't already read
it, it could help.





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#19868: #19868 25.0.50; Compilation eats buffers
  2016-08-17 15:15           ` Eli Zaretskii
@ 2016-08-29 21:48             ` Noam Postavsky
  0 siblings, 0 replies; 10+ messages in thread
From: Noam Postavsky @ 2016-08-29 21:48 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Richard Copley, 19868

On Wed, Aug 17, 2016 at 11:15 AM, Eli Zaretskii <eliz@gnu.org> wrote:
>> I printed all open_fd values from deactivate_process, just before the
>> closing loop, I got
>>
>> deactivate_process()open_fd[0] = -1, open_fd[1] = 4, open_fd[2] = 5,
>> open_fd[3] = -1, open_fd[4] = -1, open_fd[5] = -1,
>>
>> So, only WRITE_TO_SUBPROCESS and READ_FROM_SUBPROCESS are open. When
>> compiling bug.c without -mwindows, all open_fd values are -1 at that
>> spot.
>
> The last sentence shows an important difference between the two
> cases.  Can you spot the code which makes the 2 handles -1 in the case
> of a console (not -mwindows) application?  That might give us a clue
> about the reason for Emacs hanging in _close.

Sorry, turns out I lied about this. In the non-mwindows case
deactivate_process() gets called 3 times in total, the latter 2 times
have all the handles closed and set to -1 (and I mistakenly looked at
the values only from the last call). The first time is the same as the
mwindows case (except for the hanging, of course).

> Another thing to try is to set w32-start-process-share-console to a
> non-nil value.

Seems to make no difference.

>> Another observation: if I close Emacs while it's running bug.exe,
>> Emacs closes successfully, but leaves bug.exe running (even though I
>> answer yes at the prompt to kill it).
>
> That's just another manifestation of the fact that we cannot reliably
> kill grandchildren processes on MS-Windows, especially when they are
> not console applications.  We can only kill the immediate child
> process, in this case cmdproxy (and probably its child cmd.exe as
> well).

Right, I recall seeing in #15983 a suggestion to crawl the process
tree in order to be able to do this. Another possibility I found while
searching the web is to use Job Objects for this
(https://msdn.microsoft.com/en-us/library/ms684161(VS.85).aspx).
Though it has a limitation:

Windows 7, Windows Server 2008 R2, Windows XP with SP3, Windows Server
2008, Windows Vista and Windows Server 2003:  A process can be
associated with only one job. Jobs cannot be nested. The ability to
nest jobs was added in Windows 8 and Windows Server 2012.

So Emacs using Job Objects would prevent the process it calls from
using them (on older Windows OSes).

>
> P.S. Don't hesitate to ask questions about how this stuff works, if
> something is unclear.  There's a large comment around line 390 in
> w32proc.c which provides an overview, so if you didn't already read
> it, it could help.

For the record, the comment seems to be closer to line 790 (after the first ^L).





^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-08-29 21:48 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-14 19:30 bug#19868: 25.0.50; Compilation eats buffers Richard Copley
2015-02-15 17:53 ` Eli Zaretskii
2015-02-17  0:25   ` Richard Copley
2016-08-12 20:47 ` bug#19868: #19868 " Noam Postavsky
2016-08-13  6:44   ` Eli Zaretskii
2016-08-15 22:19     ` Noam Postavsky
2016-08-16 14:18       ` Eli Zaretskii
2016-08-16 21:17         ` Noam Postavsky
2016-08-17 15:15           ` Eli Zaretskii
2016-08-29 21:48             ` Noam Postavsky

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).