From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Unexec dumping results in "Segmentation fault" on Windows Msys2 Date: Thu, 15 Apr 2021 09:49:38 +0300 Message-ID: <835z0oyrct.fsf@gnu.org> References: <83im52ed8b.fsf@gnu.org> <989be2e0-a090-309b-58cb-8064c6bd5aee@gmail.com> <83y2dycmgr.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="28351"; mail-complaints-to="usenet@ciao.gmane.io" Cc: emacs-devel@gnu.org To: Nikolay Kudryavtsev Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Thu Apr 15 08:50:20 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lWvpU-0007Go-GF for ged-emacs-devel@m.gmane-mx.org; Thu, 15 Apr 2021 08:50:20 +0200 Original-Received: from localhost ([::1]:60114 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lWvpT-0003yX-Fh for ged-emacs-devel@m.gmane-mx.org; Thu, 15 Apr 2021 02:50:19 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:52240) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lWvoz-0003Zp-C2 for emacs-devel@gnu.org; Thu, 15 Apr 2021 02:49:49 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:58905) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lWvoz-0004mS-4c; Thu, 15 Apr 2021 02:49:49 -0400 Original-Received: from 84.94.185.95.cable.012.net.il ([84.94.185.95]:1959 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1lWvoy-0000sn-84; Thu, 15 Apr 2021 02:49:48 -0400 In-Reply-To: (message from Nikolay Kudryavtsev on Thu, 15 Apr 2021 01:11:53 +0300) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:268076 Archived-At: > From: Nikolay Kudryavtsev > Cc: emacs-devel@gnu.org > Date: Thu, 15 Apr 2021 01:11:53 +0300 > > Segfaults are triggered by msys2 binutils version 2.36. If we try to > debug the segfault with GDB we get this: > > Thread 1 received signal SIGSEGV, Segmentation fault. > 0x00007ff7c7514862 in main (argc=9, argv=0x1f815ad1860) >     at D:/Emacs/source/repo/src/emacs.c:960 > 960       stack_bottom = (char *) &stack_bottom_variable; This is very strange. Is this in temacs.exe or in the dumped emacs.exe? What happens when you start temacs from the directory where it was built, like this: temacs -Q Does it crash then? What does this show in GDB at the point of the crash? (gdb) p &stack_bottom_variable > It seems like the initial bootstrap-emacs.exe is so broken that it fails > at even the simplest things. The question I keep asking myself is, if we > assume that it's our build environment that has some problem, why is > unexec the only place that is harmed by it? Because unexec produces a new binary executable from the memory of a running Emacs process, and evidently that executable is broken in fundamental ways, for some reason. The mere fact that using an older version of Binutils produces different results points to some change in the assembler/linker area. Did MSYS2 folks change anything in how MinGW64 Binutils are configured? e.g., what about LTO usage? > #10 0x00007ff911ef0a9e in ntdll!KiUserExceptionDispatcher () >    from /c/WINDOWS/SYSTEM32/ntdll.dll > #11 0x00007ff9103c43d7 in msvcrt!memmove () from > /c/WINDOWS/System32/msvcrt.dll > #12 0x0000000400191e25 in insert_1_both ( >     string=0x4506840 "(fn FILENAME)\377\377\377", nchars=13, nbytes=13, >     inherit=false, prepare=true, before_markers=false) >     at D:/Emacs/source/repo/src/insdel.c:915 You don't show enough data to come up with ideas. All I can say is that insert_1_both tried to access memory in some invalid way. The source line is this: memcpy (GPT_ADDR, string, nbytes); which is intended to insert text of the STRING argument (13 bytes of it) into the gap. Why this segfaults I have no idea. You didn't event show the entire C backtrace, so I don't know if this is the original crash or a secondary one, which happened when processing the original exception. I also don't understand why we see msvcrt!memmove in the backtrace: Emacs calls memcpy, not memmove, and on my system if I put a breakpoint at that source line and step into the call, I find myself in msvcrt!memcpy, as expected. Maybe it's something that MinGW64 runtime or your version of GCC do differently, I don't know. > #13 0x000000040024f689 in Fprin1_to_string (object=..., noescape=...) >     at D:/Emacs/source/repo/src/print.c:685 > #14 0x000000040020be9a in styled_format (nargs=2, args=0xbf0720, > message=false) >     at D:/Emacs/source/repo/src/editfns.c:3322 > #15 0x000000040020b69f in Fformat (nargs=2, args=0xbf0720) >     at D:/Emacs/source/repo/src/editfns.c:3059 > #16 0x000000040021b946 in eval_sub (form=...) >     at D:/Emacs/source/repo/src/eval.c:2363 > #17 0x000000040021dbd0 in apply_lambda (fun=..., args=..., count=228) >     at D:/Emacs/source/repo/src/eval.c:3056 > > Again, memory related. Since we know that unexec works in emacs26 and > emacs27 branches, I went for another row of bisecting and traced the > offending commit to cddf85d256. Now this is interesting in that if > unexec triggers a crash here, is there a way to get this to crash Emacs > during the normal usage? Also, I have an old msys2 backup from 2017 that > I've used for testing and I'm getting the same kind of exceptions with > it, so I don't think we can write this master branch issue off on the > build environment. This is not an efficient method of investigating the problem, IME. Bisecting is not going to help you unless you find a change that is simple and localized enough to give the "eureka!" moment, and the one you found isn't. The way to debug this is to use the debugger and try to understand what exactly causes the crash. For example, in the above case, what's wrong with the memcpy call? is GPT_ADDR invalid, per chance? More generally, what did Emacs try to do when it crashed? The full backtrace would help us understand that; it could be that the real problem is elsewhere, way up the callstack, and this is just the fallout. IOW, you need to actively debug the problem where it happens and find the root cause of the crashes. Only then we can start thinking about which change broke it and how to repair it. You could also keep teasing me until I find the time to debug this myself, but I don't promise this will happen any time soon, given what I have on my plate. And even if I find the time, there's no guarantee I will see the problem: I use a different version of GCC (9.2.0) a different runtime and headers (mingw.org's MinGW, not MinGW64), and I build my own Binutils from sources, configuring them as I see fit, which is different from what the MSYS2 folks do. Sorry I couldn't be of more help.