From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x Date: Thu, 15 Sep 2022 10:10:59 +0300 Message-ID: <83wna5yuws.fsf@gnu.org> References: <87h71aix5r.fsf@trouble.defaultvalue.org> <83tu5a3cdw.fsf@gnu.org> <87pmfxhfoz.fsf@trouble.defaultvalue.org> Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="37228"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 57789@debbugs.gnu.org To: Rob Browning , Andrea Corallo , Paul Eggert Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Thu Sep 15 09:41:09 2022 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1oYjUj-0009N9-1y for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 15 Sep 2022 09:41:09 +0200 Original-Received: from localhost ([::1]:56688 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oYjUi-0002mM-1L for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 15 Sep 2022 03:41:08 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:47766) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oYj2Z-0003of-IW for bug-gnu-emacs@gnu.org; Thu, 15 Sep 2022 03:12:07 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:39926) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1oYj2Y-0001zr-D5 for bug-gnu-emacs@gnu.org; Thu, 15 Sep 2022 03:12:03 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1oYj2Y-0001T7-4j for bug-gnu-emacs@gnu.org; Thu, 15 Sep 2022 03:12:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 15 Sep 2022 07:12:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 57789 X-GNU-PR-Package: emacs Original-Received: via spool by 57789-submit@debbugs.gnu.org id=B57789.16632258805590 (code B ref 57789); Thu, 15 Sep 2022 07:12:02 +0000 Original-Received: (at 57789) by debbugs.gnu.org; 15 Sep 2022 07:11:20 +0000 Original-Received: from localhost ([127.0.0.1]:56857 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oYj1r-0001S6-ET for submit@debbugs.gnu.org; Thu, 15 Sep 2022 03:11:19 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:39304) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oYj1o-0001Rp-Kb for 57789@debbugs.gnu.org; Thu, 15 Sep 2022 03:11:18 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:37310) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oYj1i-0001sG-Bx; Thu, 15 Sep 2022 03:11:10 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=et/s5qPu1g+PcpPDTSg+zOkk5vJVkQkUT/i5mTSr5FU=; b=SW5L6nzEkRSc UUrgY0wpTGDSPZWcJcHjQZMsWVSdSejdUyORC50se1+HBaQo4egf28sNbGVxIH/vnTJjsdiCIGU5O q10NVkWxP58Ub9TO0AxkhFU25OrRkDimEtPmDOHO9xaknlpO6QPHK1QRLBa7xrcCegfot7I4u0VYs F1ukXrIPekIsuVf4exEWE7+KxdKx/9VKh4Logg+1T9j7l7rSItWat3dqGMT/iOL+bm+sAKWhtsVgU j7hYO9BX2SxbcDPJdcz5lBw7mfOIxKtYOFGC9LomOBMHvZBvbVfOvpKlmP+46t3KUnoc3pfh24qC5 A+JFKluf4Ab+1R7KWk8M+A==; Original-Received: from [87.69.77.57] (port=2798 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oYj1h-0000M8-2B; Thu, 15 Sep 2022 03:11:09 -0400 In-Reply-To: <87pmfxhfoz.fsf@trouble.defaultvalue.org> (message from Rob Browning on Wed, 14 Sep 2022 15:19:24 -0500) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:242563 Archived-At: > From: Rob Browning > Cc: 57789@debbugs.gnu.org > Date: Wed, 14 Sep 2022 15:19:24 -0500 > > Eli Zaretskii writes: > > > Please run the crashing command under GDB, and when it segfaults, > > produce the C-level and Lisp-level backtrace, and post them here. > > Starting from scratch with the emacs-28.1 commit I can reproduce the > failure when building via > > ./configure --prefix=/home/rlb/opt/emacs-tmp --with-native-compilation > > It crashes with the same segfault repeatably, i.e. if you run make > again, it crashes again on the previously mentioned "... -l comp -f > batch-byte+native-compile international/titdic-cnv.el" invocation. That > crash output is attached below. > > After adjusting the Makefile.in invocation so I could run it with gdb in > exactly the same environment once it's failing on that command, I > captured the backtrace and included it below. Thanks. The backtrace indicates that the crash is in GC. This probably means we have some fundamental problem on that architecture. Andrea, any advice for how to investigate? Does the build of the same code with the same options sans "--with-native-compilation" succeed, or does it also crash with similar symptoms? If the build without native-compilation succeeds, my first question would be how mature and stable is libgccjit on that platform? Perhaps take this up with the GCC's libgccjit developers. > With respect to the Lisp-level backtrace, I imagined you probably meant > an xbacktrace? If so (and assuming I'm guessing right about how I > should do that), I haven't figured out how to arrange sourcing the > src/.gdbinit from the src/Makefile.in command. You can source it manually from the GDB prompt, when the segfault happens, and then invoke xbacktrace manually, can't you? > It looked like it might be because there were no debug symbols, so I > tried adding a CFLAGS=-g3 to the end of the ./configure, but that caused > the crash to disappear entirely. Too bad, it means we have a heisenbug on our hands, which will make it even harder to debug (as if debugging crashes in GC were not hard enough already). What happens if you modify this variable: (defcustom native-comp-debug (if (eq 'windows-nt system-type) 1 0) to have the value 1 or even zero, and then rebuild from scratch? does the build succeed then? > Finally (and this was just a random guess based on previous experiences, > particularly with programs like guile that play (normal, traditional) > tricks with pointers/coercions/etc.) I noticed that emacs doesn't > specify -fno-strict-aliasing, and unless all the C code has been written > with that in mind, I assume that might open a window allowing the > optimizer to introduce undesirable changes. So I added a > CFLAGS=-fno-strict-aliasing to the end of the ./configure command, and > then the build and tests worked fine (twice in a row): > > ./configure --prefix=/home/rlb/opt/emacs-tmp --with-native-compilation \ > CFLAGS=-fno-strict-aliasing > > Of course that's not remotely conclusive, but if all of the C code > wasn't written with strict-aliasing in mind, then I wondered if it might > make sense to consider adding -fno-strict-aliasing as a default option. I don't know enough about this. Perhaps Andrea or Paul could comment. > Also, even if that ends up being desirable, I'm not sure it'll be > sufficient. That is, I suspect I might want to run the full build/check > with -fno-strict-aliasing in a loop for a bit to make sure the clean > build/check is reliable, since I think I may have seen some test crashes > (not the build crash) on one earlier run with that option, but I'm not > sure that was a clean attempt. Yes, running the full test suite would be the logical next step. > Program received signal SIGSEGV, Segmentation fault. > mark_object (arg=) at alloc.c:6809 > 6809 if (symbol_marked_p (ptr)) > (gdb) backtrace > #0 mark_object (arg=) at alloc.c:6809 Any idea what cause SIGSEGV here? Was 'ptr' an invalid pointer for some reason, and if so, what exactly makes it invalid? Thanks.