* bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x @ 2022-09-14 1:04 Rob Browning 2022-09-14 2:42 ` Eli Zaretskii 0 siblings, 1 reply; 21+ messages in thread From: Rob Browning @ 2022-09-14 1:04 UTC (permalink / raw) To: 57789 On zelenka.debian.org https://db.debian.org/machines.cgi?host=zelenka the build crashes with a segfault with current Debian sid (unstable). I can produce the crash like this: git clone --single-branch --branch emacs-28 .../emacs.git cd emacs ./autogen.sh ./configure --prefix=/home/rlb/opt/emacs-tmp --with-native-compilation make check The debian package produced a similar failure earlier: https://buildd.debian.org/status/fetch.php?pkg=emacs&arch=s390x&ver=1%3A28.1%2B1-3&stamp=1662863442&raw=0 Here's the final bit of the clone build's log, and I'm happy to help test on the machine if that'd be useful: Loading /home/rlb/emacs/lisp/electric.el (source)... Loading /home/rlb/emacs/lisp/paren.el (source)... Loading /home/rlb/emacs/lisp/emacs-lisp/shorthands.el (source)... Loading /home/rlb/emacs/lisp/emacs-lisp/eldoc.el (source)... Loading /home/rlb/emacs/lisp/cus-start.el (source)... Loading /home/rlb/emacs/lisp/tooltip.el (source)... Loading /home/rlb/emacs/lisp/international/iso-transl.el (source)... Finding pointers to doc strings... Finding pointers to doc strings...done Dumping under the name bootstrap-emacs.pdmp Dumping fingerprint: b4b1b9ac4d82ce4537c0e1eb6527b2b7f5831cb6de31c7f9b2fd2a1a0c4531c4 Dump complete Byte counts: header=100 hot=14915588 discardable=175392 cold=10410424 Reloc counts: hot=1048047 discardable=5080 make -C ../lisp compile-first EMACS="../src/bootstrap-emacs" make[2]: Entering directory '/home/rlb/emacs/lisp' ELC+ELN emacs-lisp/macroexp.elc ELC+ELN emacs-lisp/cconv.elc ELC+ELN emacs-lisp/byte-opt.elc ELC+ELN emacs-lisp/bytecomp.elc ELC+ELN emacs-lisp/comp.elc ELC+ELN emacs-lisp/comp-cstr.elc ELC+ELN emacs-lisp/cl-macs.elc ELC+ELN emacs-lisp/rx.elc ELC+ELN emacs-lisp/cl-seq.elc Fatal error 11: Segmentation fault Backtrace: ../src/bootstrap-emacs(+0x15deb6)[0x2aa0a7ddeb6] ../src/bootstrap-emacs(+0x4efc4)[0x2aa0a6cefc4] ../src/bootstrap-emacs(+0x4f1fe)[0x2aa0a6cf1fe] ../src/bootstrap-emacs(+0x15c240)[0x2aa0a7dc240] ../src/bootstrap-emacs(+0x15c2d2)[0x2aa0a7dc2d2] ../src/bootstrap-emacs(+0x6a47d8)[0x2aa0ad247d8] ../src/bootstrap-emacs(+0x1a7de0)[0x2aa0a827de0] ../src/bootstrap-emacs(+0x1a8ee6)[0x2aa0a828ee6] ../src/bootstrap-emacs(+0x1a8ee6)[0x2aa0a828ee6] ../src/bootstrap-emacs(+0x1a8ee6)[0x2aa0a828ee6] ../src/bootstrap-emacs(+0x1a8ee6)[0x2aa0a828ee6] ../src/bootstrap-emacs(+0x1a7c3e)[0x2aa0a827c3e] ../src/bootstrap-emacs(+0x1a9094)[0x2aa0a829094] ../src/bootstrap-emacs(eval_sub+0x410)[0x2aa0a84cc28] ../src/bootstrap-emacs(+0x1cdc2e)[0x2aa0a84dc2e] ../src/bootstrap-emacs(+0x1cdf10)[0x2aa0a84df10] ../src/bootstrap-emacs(eval_sub+0x1b8)[0x2aa0a84c9d0] ../src/bootstrap-emacs(+0x1cdeb8)[0x2aa0a84deb8] ../src/bootstrap-emacs(eval_sub+0x1b8)[0x2aa0a84c9d0] ../src/bootstrap-emacs(eval_sub+0x2c4)[0x2aa0a84cadc] ../src/bootstrap-emacs(+0x1cd26a)[0x2aa0a84d26a] ../src/bootstrap-emacs(eval_sub+0x532)[0x2aa0a84cd4a] ../src/bootstrap-emacs(eval_sub+0x4ba)[0x2aa0a84ccd2] ../src/bootstrap-emacs(eval_sub+0x532)[0x2aa0a84cd4a] ../src/bootstrap-emacs(+0x1ce488)[0x2aa0a84e488] ../src/bootstrap-emacs(eval_sub+0x532)[0x2aa0a84cd4a] ../src/bootstrap-emacs(+0x1cd8cc)[0x2aa0a84d8cc] ../src/bootstrap-emacs(eval_sub+0x532)[0x2aa0a84cd4a] ../src/bootstrap-emacs(+0x1cdc2e)[0x2aa0a84dc2e] ../src/bootstrap-emacs(+0x1cdf10)[0x2aa0a84df10] ../src/bootstrap-emacs(eval_sub+0x1b8)[0x2aa0a84c9d0] ../src/bootstrap-emacs(eval_sub+0x2c4)[0x2aa0a84cadc] ../src/bootstrap-emacs(+0x1cdc2e)[0x2aa0a84dc2e] ../src/bootstrap-emacs(Ffuncall+0x1f2)[0x2aa0a84a202] ../src/bootstrap-emacs(+0x1cc6a4)[0x2aa0a84c6a4] ../src/bootstrap-emacs(+0x1ce26c)[0x2aa0a84e26c] ../src/bootstrap-emacs(eval_sub+0x638)[0x2aa0a84ce50] ../src/bootstrap-emacs(+0x1ce7ec)[0x2aa0a84e7ec] ../src/bootstrap-emacs(eval_sub+0x532)[0x2aa0a84cd4a] ../src/bootstrap-emacs(+0x1cdc2e)[0x2aa0a84dc2e] ../src/bootstrap-emacs(+0x1cdf10)[0x2aa0a84df10] ... make[2]: *** [Makefile:316: emacs-lisp/cl-seq.elc] Segmentation fault make[2]: Leaving directory '/home/rlb/emacs/lisp' make[1]: *** [Makefile:870: bootstrap-emacs.pdmp] Error 2 make[1]: Leaving directory '/home/rlb/emacs/src' make: *** [Makefile:449: src] Error 2 Thanks -- Rob Browning rlb @defaultvalue.org and @debian.org GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x 2022-09-14 1:04 bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x Rob Browning @ 2022-09-14 2:42 ` Eli Zaretskii 2022-09-14 3:06 ` Rob Browning 2022-09-14 20:19 ` Rob Browning 0 siblings, 2 replies; 21+ messages in thread From: Eli Zaretskii @ 2022-09-14 2:42 UTC (permalink / raw) To: Rob Browning; +Cc: 57789 > From: Rob Browning <rlb@defaultvalue.org> > Date: Tue, 13 Sep 2022 20:04:32 -0500 > > On zelenka.debian.org https://db.debian.org/machines.cgi?host=zelenka > the build crashes with a segfault with current Debian sid (unstable). I > can produce the crash like this: > > git clone --single-branch --branch emacs-28 .../emacs.git If you build the current emacs-28 branch, then it isn't Emacs 28.1, it's Emacs 28.2.50, right? > cd emacs > ./autogen.sh > ./configure --prefix=/home/rlb/opt/emacs-tmp --with-native-compilation > make check > > The debian package produced a similar failure earlier: > > https://buildd.debian.org/status/fetch.php?pkg=emacs&arch=s390x&ver=1%3A28.1%2B1-3&stamp=1662863442&raw=0 > > Here's the final bit of the clone build's log, and I'm happy to help > test on the machine if that'd be useful: Please run the crashing command under GDB, and when it segfaults, produce the C-level and Lisp-level backtrace, and post them here. Thanks. ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x 2022-09-14 2:42 ` Eli Zaretskii @ 2022-09-14 3:06 ` Rob Browning 2022-09-14 3:20 ` Rob Browning 2022-09-14 20:19 ` Rob Browning 1 sibling, 1 reply; 21+ messages in thread From: Rob Browning @ 2022-09-14 3:06 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 57789 Eli Zaretskii <eliz@gnu.org> writes: > If you build the current emacs-28 branch, then it isn't Emacs 28.1, > it's Emacs 28.2.50, right? Right, sorry, the clone test was the current branch tip, and the buildd log was for (Debian's partially altered) tree, derived from the emacs-28.1 tag. I can easily re-test the 28.1 tag if we like. > Please run the crashing command under GDB, and when it segfaults, > produce the C-level and Lisp-level backtrace, and post them here. Will attempt. Thanks -- Rob Browning rlb @defaultvalue.org and @debian.org GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x 2022-09-14 3:06 ` Rob Browning @ 2022-09-14 3:20 ` Rob Browning 0 siblings, 0 replies; 21+ messages in thread From: Rob Browning @ 2022-09-14 3:20 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 57789 Rob Browning <rlb@defaultvalue.org> writes: > Will attempt. Hmm, so I ran "make V=1" from the same tree and saw thw command that repeatably crashed, which was: EMACSLOADPATH= '../src/bootstrap-emacs' -batch --no-site-file --no-site-lisp --eval '(setq load-prefer-newer t)' \ -l comp -f batch-byte+native-compile international/titdic-cnv.el I then ran that manually via (cd lisp && EMACSLOADPATH= '../src/bootstrap-emacs' -batch --no-site-file --no-site-lisp --eval '(setq load-prefer-newer t)' \ -l comp -f batch-byte+native-compile international/titdic-cnv.el) which ran for a bit and succeeded. After that a make worked fine until bindings.el where it crashed again, this time with an "Aborted", and running it manually didn't help. In any case, I'm going to start over and try to get the backtraces for the titdic-cnv.el failure. -- Rob Browning rlb @defaultvalue.org and @debian.org GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x 2022-09-14 2:42 ` Eli Zaretskii 2022-09-14 3:06 ` Rob Browning @ 2022-09-14 20:19 ` Rob Browning 2022-09-14 20:21 ` Rob Browning 2022-09-15 7:10 ` Eli Zaretskii 1 sibling, 2 replies; 21+ messages in thread From: Rob Browning @ 2022-09-14 20:19 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 57789 [-- Attachment #1: Type: text/plain, Size: 2538 bytes --] Eli Zaretskii <eliz@gnu.org> writes: > Please run the crashing command under GDB, and when it segfaults, > produce the C-level and Lisp-level backtrace, and post them here. Starting from scratch with the emacs-28.1 commit I can reproduce the failure when building via ./configure --prefix=/home/rlb/opt/emacs-tmp --with-native-compilation It crashes with the same segfault repeatably, i.e. if you run make again, it crashes again on the previously mentioned "... -l comp -f batch-byte+native-compile international/titdic-cnv.el" invocation. That crash output is attached below. After adjusting the Makefile.in invocation so I could run it with gdb in exactly the same environment once it's failing on that command, I captured the backtrace and included it below. With respect to the Lisp-level backtrace, I imagined you probably meant an xbacktrace? If so (and assuming I'm guessing right about how I should do that), I haven't figured out how to arrange sourcing the src/.gdbinit from the src/Makefile.in command. I'm likely doing something wrong, but it doesn't seem to want to load the file. It looked like it might be because there were no debug symbols, so I tried adding a CFLAGS=-g3 to the end of the ./configure, but that caused the crash to disappear entirely. Finally (and this was just a random guess based on previous experiences, particularly with programs like guile that play (normal, traditional) tricks with pointers/coercions/etc.) I noticed that emacs doesn't specify -fno-strict-aliasing, and unless all the C code has been written with that in mind, I assume that might open a window allowing the optimizer to introduce undesirable changes. So I added a CFLAGS=-fno-strict-aliasing to the end of the ./configure command, and then the build and tests worked fine (twice in a row): ./configure --prefix=/home/rlb/opt/emacs-tmp --with-native-compilation \ CFLAGS=-fno-strict-aliasing Of course that's not remotely conclusive, but if all of the C code wasn't written with strict-aliasing in mind, then I wondered if it might make sense to consider adding -fno-strict-aliasing as a default option. Also, even if that ends up being desirable, I'm not sure it'll be sufficient. That is, I suspect I might want to run the full build/check with -fno-strict-aliasing in a loop for a bit to make sure the clean build/check is reliable, since I think I may have seen some test crashes (not the build crash) on one earlier run with that option, but I'm not sure that was a clean attempt. The make crash: [-- Attachment #2: emacs-s390x-crash --] [-- Type: text/plain, Size: 2955 bytes --] make[2]: Entering directory '/home/rlb/emacs/lisp' EMACSLOADPATH= '../src/bootstrap-emacs' -batch --no-site-file --no-site-lisp --eval '(setq load-prefer-newer t)' \ -l comp -f batch-byte+native-compile international/titdic-cnv.el Fatal error 11: Segmentation fault Backtrace: ../src/bootstrap-emacs(+0x15deb6)[0x2aa293ddeb6] ../src/bootstrap-emacs(+0x4efc4)[0x2aa292cefc4] ../src/bootstrap-emacs(+0x4f1fe)[0x2aa292cf1fe] ../src/bootstrap-emacs(+0x15c240)[0x2aa293dc240] ../src/bootstrap-emacs(+0x15c2d2)[0x2aa293dc2d2] ../src/bootstrap-emacs(+0x6a47d8)[0x2aa299247d8] ../src/bootstrap-emacs(+0x1a7fa8)[0x2aa29427fa8] ../src/bootstrap-emacs(+0x1a8ee6)[0x2aa29428ee6] ../src/bootstrap-emacs(+0x1a8ee6)[0x2aa29428ee6] ../src/bootstrap-emacs(+0x1a8ee6)[0x2aa29428ee6] ../src/bootstrap-emacs(+0x1a7c3e)[0x2aa29427c3e] ../src/bootstrap-emacs(+0x1a9094)[0x2aa29429094] ../src/bootstrap-emacs(Ffuncall+0x2de)[0x2aa2944a2ee] ../src/bootstrap-emacs(+0x1ca42c)[0x2aa2944a42c] ../src/bootstrap-emacs(+0x1f0c72)[0x2aa29470c72] ../src/bootstrap-emacs(+0x1f7fb0)[0x2aa29477fb0] ../src/bootstrap-emacs(+0x1f8474)[0x2aa29478474] ../src/bootstrap-emacs(eval_sub+0x5e4)[0x2aa2944cdfc] ../src/bootstrap-emacs(+0x1ce488)[0x2aa2944e488] ../src/bootstrap-emacs(eval_sub+0x532)[0x2aa2944cd4a] ../src/bootstrap-emacs(+0x1ce488)[0x2aa2944e488] ../src/bootstrap-emacs(eval_sub+0x532)[0x2aa2944cd4a] ../src/bootstrap-emacs(+0x1ce8cc)[0x2aa2944e8cc] ../src/bootstrap-emacs(eval_sub+0x532)[0x2aa2944cd4a] ../src/bootstrap-emacs(+0x1ce488)[0x2aa2944e488] ../src/bootstrap-emacs(eval_sub+0x532)[0x2aa2944cd4a] ../src/bootstrap-emacs(+0x1cd824)[0x2aa2944d824] ../src/bootstrap-emacs(eval_sub+0x532)[0x2aa2944cd4a] ../src/bootstrap-emacs(+0x1cdc2e)[0x2aa2944dc2e] ../src/bootstrap-emacs(Ffuncall+0x1f2)[0x2aa2944a202] ../src/bootstrap-emacs(+0x1ca4b0)[0x2aa2944a4b0] ../src/bootstrap-emacs(+0x1f90e4)[0x2aa294790e4] ../src/bootstrap-emacs(+0x1f9462)[0x2aa29479462] ../src/bootstrap-emacs(+0x1c9ef0)[0x2aa29449ef0] ../src/bootstrap-emacs(Ffuncall+0x182)[0x2aa2944a192] /home/rlb/emacs/native-lisp/28.2-87d45215/comp-7672a6ed-ac6bcf4e.eln(F636f6d702d2d6e61746976652d636f6d70696c65_comp__native_compile_0+0x804)[0x3ff91d6b0d4] ../src/bootstrap-emacs(Ffuncall+0x23e)[0x2aa2944a24e] /home/rlb/emacs/native-lisp/28.2-87d45215/comp-7672a6ed-ac6bcf4e.eln(F62617463682d6e61746976652d636f6d70696c65_batch_native_compile_0+0x1d2)[0x3ff91d6c592] ../src/bootstrap-emacs(Ffuncall+0x23e)[0x2aa2944a24e] /home/rlb/emacs/native-lisp/28.2-87d45215/comp-7672a6ed-ac6bcf4e.eln(F62617463682d627974652b6e61746976652d636f6d70696c65_batch_bytenative_compile_0+0x108)[0x3ff91d6c728] ../src/bootstrap-emacs(Ffuncall+0x23e)[0x2aa2944a24e] ... make[2]: *** [Makefile:321: international/titdic-cnv.elc] Segmentation fault make[2]: Leaving directory '/home/rlb/emacs/lisp' make[1]: *** [Makefile:845: ../lisp/loaddefs.el] Error 2 make[1]: Leaving directory '/home/rlb/emacs/src' make: *** [Makefile:449: src] Error 2 [-- Attachment #3: Type: text/plain, Size: 21 bytes --] The gdb backtrace: [-- Attachment #4: emacs-s390x-backtrace --] [-- Type: text/plain, Size: 9023 bytes --] Program received signal SIGSEGV, Segmentation fault. mark_object (arg=<optimized out>) at alloc.c:6809 6809 if (symbol_marked_p (ptr)) (gdb) backtrace #0 mark_object (arg=<optimized out>) at alloc.c:6809 #1 0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa00ac54a8) at alloc.c:6607 #2 mark_vectorlike (header=0x2aa00ac54a0) at alloc.c:6382 #3 0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa007f4ca8) at alloc.c:6607 #4 mark_vectorlike (header=0x2aa007f4ca0) at alloc.c:6382 #5 0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa007c3b10) at alloc.c:6607 #6 mark_vectorlike (header=0x2aa007c3b08) at alloc.c:6382 #7 0x000002aa001a7c3e in visit_static_gc_roots (visitor=...) at alloc.c:5926 #8 0x000002aa001a9094 in garbage_collect () at alloc.c:6132 #9 0x000002aa001a9d0c in maybe_garbage_collect () at alloc.c:6045 #10 0x000002aa001ca2ee in maybe_gc () at lisp.h:5142 #11 Ffuncall (nargs=nargs@entry=3, args=args@entry=0x3ffffffa6a0) at eval.c:3007 #12 0x000002aa001ca42c in call2 (fn=fn@entry=0x155f3675830, arg1=arg1@entry=0x2aa00a75e43, arg2=arg2@entry=0x0) at eval.c:2890 #13 0x000002aa001f0c72 in readevalloop_eager_expand_eval (val=val@entry=0x2aa00a75e43, macroexpand=macroexpand@entry=0x155f3675830) at lread.c:2133 #14 0x000002aa001f7fb0 in readevalloop (readcharfun=readcharfun@entry=0x2aa00aa27b5, infile0=<optimized out>, infile0@entry=0x0, sourcename=sourcename@entry=0x2aa00a7fff4, printflag=printflag@entry=false, unibyte=unibyte@entry=0x0, readfun=0x0, start=0x0, end=<optimized out>) at lread.c:2324 #15 0x000002aa001f8474 in Feval_buffer (buffer=<optimized out>, printflag=0x0, filename=0x2aa00a7fff4, unibyte=0x0, do_allow_print=<optimized out>) at lread.c:2397 #16 0x000002aa001ccdfc in eval_sub (form=<optimized out>) at eval.c:2512 #17 0x000002aa001ce488 in Fprogn (body=0x0) at eval.c:465 #18 Flet (args=0x3b) at eval.c:1051 #19 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451 #20 0x000002aa001ce488 in Fprogn (body=0x0) at eval.c:465 #21 Flet (args=0x36) at eval.c:1051 #22 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451 #23 0x000002aa001ce8cc in Funwind_protect (args=0x3fff3cf7f0b) at lisp.h:1420 #24 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451 #25 0x000002aa001ce488 in Fprogn (body=0x3fff3cf7d6b) at eval.c:465 #26 Flet (args=0x2d) at eval.c:1051 #27 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451 #28 0x000002aa001cd824 in Fprogn (body=0x0) at eval.c:465 #29 Fif (args=<optimized out>) at eval.c:421 #30 Fif (args=<optimized out>) at eval.c:407 #31 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451 #32 0x000002aa001cdc2e in Fprogn (body=0x0) at eval.c:465 #33 funcall_lambda (fun=0x3fff3cf7c9b, nargs=nargs@entry=4, arg_vector=arg_vector@entry=0x3ffffffb650) at eval.c:3305 #34 0x000002aa001ca202 in Ffuncall (nargs=nargs@entry=5, args=args@entry=0x3ffffffb648) at eval.c:3039 #35 0x000002aa001ca4b0 in call4 (fn=<optimized out>, arg1=arg1@entry=0x2aa00a7fff4, arg2=arg2@entry=0x2aa00a7fff4, arg3=arg3@entry=0x0, arg4=arg4@entry=0x30) at eval.c:2905 #36 0x000002aa001f90e4 in Fload (file=file@entry=0x3fff362bcbc, noerror=noerror@entry=0x0, nomessage=nomessage@entry=0x30, nosuffix=nosuffix@entry=0x0, must_suffix=<optimized out>, must_suffix@entry=0x30) at lread.c:1473 #37 0x000002aa001f9462 in save_match_data_load (file=0x3fff362bcbc, noerror=noerror@entry=0x0, nomessage=nomessage@entry=0x30, nosuffix=nosuffix@entry=0x0, must_suffix=must_suffix@entry=0x30) at lread.c:1629 #38 0x000002aa001c9ef0 in Fautoload_do_load (fundef=0x3fff362bc4b, funname=funname@entry=0x155f2f7a340, macro_only=macro_only@entry=0x0) at eval.c:2295 #39 0x000002aa001ca192 in Ffuncall (nargs=2, args=0x3ffffffbba0) at eval.c:3042 #40 0x000003fff306b0d4 in F636f6d702d2d6e61746976652d636f6d70696c65_comp__native_compile_0 () at /home/rlb/emacs/native-lisp/28.2-87d45215/comp-7672a6ed-ac6bcf4e.eln #41 0x000002aa001ca24e in Ffuncall (nargs=<optimized out>, args=<optimized out>) at lisp.h:2110 #42 0x000003fff306c592 in F62617463682d6e61746976652d636f6d70696c65_batch_native_compile_0 () at /home/rlb/emacs/native-lisp/28.2-87d45215/comp-7672a6ed-ac6bcf4e.eln #43 0x000002aa001ca24e in Ffuncall (nargs=<optimized out>, args=<optimized out>) at lisp.h:2110 #44 0x000003fff306c728 in F62617463682d627974652b6e61746976652d636f6d70696c65_batch_bytenative_compile_0 () at /home/rlb/emacs/native-lisp/28.2-87d45215/comp-7672a6ed-ac6bcf4e.eln #45 0x000002aa001ca24e in Ffuncall (nargs=<optimized out>, args=<optimized out>) at lisp.h:2110 #46 0x000002aa001ccfc4 in eval_sub (form=<optimized out>) at eval.c:2470 #47 0x000002aa001cd824 in Fprogn (body=0x0) at eval.c:465 #48 Fif (args=<optimized out>) at eval.c:421 #49 Fif (args=<optimized out>) at eval.c:407 #50 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451 #51 0x000002aa001cd8cc in Fprogn (body=0x0) at eval.c:465 #52 Fcond (args=<optimized out>) at eval.c:445 #53 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451 #54 0x000002aa001ce732 in Fprogn (body=0x3fff36e1b43) at eval.c:465 #55 FletX (args=0x3fff36e1b03) at eval.c:983 #56 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451 #57 0x000002aa001cd6ae in Fprogn (body=0x0) at eval.c:465 #58 prog_ignore (body=<optimized out>) at eval.c:476 #59 Fwhile (args=<optimized out>) at eval.c:1072 #60 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451 #61 0x000002aa001ce732 in Fprogn (body=0x0) at eval.c:465 #62 FletX (args=0x3fff36e1a83) at eval.c:983 #63 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451 #64 0x000002aa001cd1d6 in Fprogn (body=0x0) at eval.c:465 #65 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451 #66 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451 #67 0x000002aa001ce488 in Fprogn (body=0x0) at eval.c:465 #68 Flet (args=0x12) at eval.c:1051 #69 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451 #70 0x000002aa001ce488 in Fprogn (body=0x3fff35d3a73) at eval.c:465 #71 Flet (args=0xe) at eval.c:1051 #72 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451 #73 0x000002aa001cdc2e in Fprogn (body=0x0) at eval.c:465 #74 funcall_lambda (fun=0x3fff35d39e3, fun@entry=0x3fff35d39d3, nargs=nargs@entry=1, arg_vector=arg_vector@entry=0x3ffffffd280) at eval.c:3305 #75 0x000002aa001cdf10 in apply_lambda (fun=fun@entry=0x3fff35d39d3, args=<optimized out>, count=2929176661299, count@entry=15) at eval.c:3172 #76 0x000002aa001cc9d0 in eval_sub (form=<optimized out>) at eval.c:2575 #77 0x000002aa001ce488 in Fprogn (body=0x3fff37a209b) at eval.c:465 #78 Flet (args=0x8) at eval.c:1051 #79 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451 #80 0x000002aa001cdc2e in Fprogn (body=0x0) at eval.c:465 #81 funcall_lambda (fun=0x3fff37a1e7b, fun@entry=0x3fff37a1e6b, nargs=nargs@entry=0, arg_vector=arg_vector@entry=0x3ffffffd740) at eval.c:3305 #82 0x000002aa001cdf10 in apply_lambda (fun=fun@entry=0x3fff37a1e6b, args=<optimized out>, count=2929176221524, count@entry=11) at eval.c:3172 #83 0x000002aa001cc9d0 in eval_sub (form=<optimized out>) at eval.c:2575 #84 0x000002aa001ce8cc in Funwind_protect (args=0x3fff380e7a3) at lisp.h:1420 #85 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451 #86 0x000002aa001ce488 in Fprogn (body=0x0) at eval.c:465 #87 Flet (args=0x3ffffffe658) at eval.c:1051 #88 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451 #89 0x000002aa001cd824 in Fprogn (body=0x3fff380e233) at eval.c:465 #90 Fif (args=<optimized out>) at eval.c:421 #91 Fif (args=<optimized out>) at eval.c:407 #92 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451 #93 0x000002aa001cdc2e in Fprogn (body=0x0) at eval.c:465 #94 funcall_lambda (fun=0x3fff380e0e3, fun@entry=0x3fff380e0d3, nargs=nargs@entry=0, arg_vector=arg_vector@entry=0x3ffffffdf88) at eval.c:3305 #95 0x000002aa001cdf10 in apply_lambda (fun=fun@entry=0x3fff380e0d3, args=<optimized out>, count=4398046502696, count@entry=4) at eval.c:3172 #96 0x000002aa001cc9d0 in eval_sub (form=form@entry=0x3fff3f3ef1b) at eval.c:2575 #97 0x000002aa001cee52 in Feval (form=0x3fff3f3ef1b, lexical=<optimized out>) at eval.c:2327 #98 0x000002aa001c8fb6 in internal_condition_case (bfun=bfun@entry=0x2aa00142860 <top_level_2>, handlers=handlers@entry=0x90, hfun=hfun@entry=0x2aa00148ca8 <cmd_error>) at eval.c:1450 #99 0x000002aa001435d2 in top_level_1 (ignore=ignore@entry=0x0) at keyboard.c:1150 #100 0x000002aa001c8ed4 in internal_catch (tag=tag@entry=0xe850, func=func@entry=0x2aa001435a0 <top_level_1>, arg=arg@entry=0x0) at eval.c:1181 #101 0x000002aa001427e0 in command_loop () at keyboard.c:1110 #102 0x000002aa001487bc in recursive_edit_1 () at keyboard.c:720 #103 0x000002aa00148bcc in Frecursive_edit () at keyboard.c:803 #104 0x000002aa00051d7a in main (argc=<optimized out>, argv=0x3ffffffea28) at emacs.c:2358 [-- Attachment #5: Type: text/plain, Size: 205 bytes --] Thanks -- Rob Browning rlb @defaultvalue.org and @debian.org GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x 2022-09-14 20:19 ` Rob Browning @ 2022-09-14 20:21 ` Rob Browning 2022-09-16 6:04 ` Gerd Möllmann 2022-09-15 7:10 ` Eli Zaretskii 1 sibling, 1 reply; 21+ messages in thread From: Rob Browning @ 2022-09-14 20:21 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 57789 Rob Browning <rlb@defaultvalue.org> writes: > Starting from scratch with the emacs-28.1 commit I can reproduce the > failure when building via Oops, meant the emacs-28.2 commit for all of that testing. -- Rob Browning rlb @defaultvalue.org and @debian.org GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x 2022-09-14 20:21 ` Rob Browning @ 2022-09-16 6:04 ` Gerd Möllmann 2022-09-17 21:04 ` Rob Browning 0 siblings, 1 reply; 21+ messages in thread From: Gerd Möllmann @ 2022-09-16 6:04 UTC (permalink / raw) To: Rob Browning; +Cc: 57789, Eli Zaretskii Rob Browning <rlb@defaultvalue.org> writes: > Rob Browning <rlb@defaultvalue.org> writes: > >> Starting from scratch with the emacs-28.1 commit I can reproduce the >> failure when building via > > Oops, meant the emacs-28.2 commit for all of that testing. Looking at Rob's backtrace, #0 mark_object (arg=<optimized out>) at alloc.c:6809 #1 0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa00ac54a8) at alloc.c:6607 #2 mark_vectorlike (header=0x2aa00ac54a0) at alloc.c:6382 #3 0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa007f4ca8) at alloc.c:6607 #4 mark_vectorlike (header=0x2aa007f4ca0) at alloc.c:6382 #5 0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa007c3b10) at alloc.c:6607 #6 mark_vectorlike (header=0x2aa007c3b08) at alloc.c:6382 #7 0x000002aa001a7c3e in visit_static_gc_roots (visitor=...) at alloc.c:5926 and seeing frame#7, would it be a way forward to determine which staticpro (I assume it is a staticpro) that is? Maybe that can give a clue which one can then use together with a bisect, perhaps? WDYT? ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x 2022-09-16 6:04 ` Gerd Möllmann @ 2022-09-17 21:04 ` Rob Browning 2022-09-18 5:22 ` Gerd Möllmann 2022-09-18 5:33 ` Eli Zaretskii 0 siblings, 2 replies; 21+ messages in thread From: Rob Browning @ 2022-09-17 21:04 UTC (permalink / raw) To: Gerd Möllmann; +Cc: 57789, Eli Zaretskii Gerd Möllmann <gerd.moellmann@gmail.com> writes: > Looking at Rob's backtrace, > > #0 mark_object (arg=<optimized out>) at alloc.c:6809 > #1 0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa00ac54a8) at alloc.c:6607 > #2 mark_vectorlike (header=0x2aa00ac54a0) at alloc.c:6382 > #3 0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa007f4ca8) at alloc.c:6607 > #4 mark_vectorlike (header=0x2aa007f4ca0) at alloc.c:6382 > #5 0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa007c3b10) at alloc.c:6607 > #6 mark_vectorlike (header=0x2aa007c3b08) at alloc.c:6382 > #7 0x000002aa001a7c3e in visit_static_gc_roots (visitor=...) at alloc.c:5926 > > and seeing frame#7, would it be a way forward to determine which > staticpro (I assume it is a staticpro) that is? Maybe that can give a > clue which one can then use together with a bisect, perhaps? Not completely sure I followed, but moving up to that frame and printing visitor didn't work: "optimized out". -- Rob Browning rlb @defaultvalue.org and @debian.org GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x 2022-09-17 21:04 ` Rob Browning @ 2022-09-18 5:22 ` Gerd Möllmann 2022-09-18 5:49 ` Eli Zaretskii 2022-09-18 5:33 ` Eli Zaretskii 1 sibling, 1 reply; 21+ messages in thread From: Gerd Möllmann @ 2022-09-18 5:22 UTC (permalink / raw) To: Rob Browning; +Cc: 57789, Eli Zaretskii Rob Browning <rlb@defaultvalue.org> writes: > Gerd Möllmann <gerd.moellmann@gmail.com> writes: > >> Looking at Rob's backtrace, >> >> #0 mark_object (arg=<optimized out>) at alloc.c:6809 >> #1 0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa00ac54a8) at alloc.c:6607 >> #2 mark_vectorlike (header=0x2aa00ac54a0) at alloc.c:6382 >> #3 0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa007f4ca8) at alloc.c:6607 >> #4 mark_vectorlike (header=0x2aa007f4ca0) at alloc.c:6382 >> #5 0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa007c3b10) at alloc.c:6607 >> #6 mark_vectorlike (header=0x2aa007c3b08) at alloc.c:6382 >> #7 0x000002aa001a7c3e in visit_static_gc_roots (visitor=...) at alloc.c:5926 >> >> and seeing frame#7, would it be a way forward to determine which >> staticpro (I assume it is a staticpro) that is? Maybe that can give a >> clue which one can then use together with a bisect, perhaps? > > Not completely sure I followed, but moving up to that frame and printing > visitor didn't work: "optimized out". Sorry, I thought another Emacs developer would chime in, when I wrote that. Let me try to explain what I'm after. Frame#7, the call to visit_static_gc_roots shows that we are at the very beginning of a GC, recursively marking everything that we know must survice the GC. void visit_static_gc_roots (struct gc_root_visitor visitor) { visit_buffer_root (visitor, &buffer_defaults, GC_ROOT_BUFFER_LOCAL_DEFAULT); visit_buffer_root (visitor, &buffer_local_symbols, GC_ROOT_BUFFER_LOCAL_NAME); for (int i = 0; i < ARRAYELTS (lispsym); i++) { Lisp_Object sptr = builtin_lisp_symbol (i); visitor.visit (&sptr, GC_ROOT_C_SYMBOL, visitor.data); } for (int i = 0; i < staticidx; i++) visitor.visit (staticvec[i], GC_ROOT_STATICPRO, visitor.data); } First interesting thing would be where in this function we are when the crash happens. I was assuming it is somewhere in the last for-loop, for reasons, but that doesn't have to be the case. If I'm right, we are currently in the process of marking Lisp objects referenced from C variables that are known to contains Lisp objects. Such variables are added to staticvec with a call to staticpro. That's what the staticpro in my last mail menat. But let's first see where in visit_... we are. ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x 2022-09-18 5:22 ` Gerd Möllmann @ 2022-09-18 5:49 ` Eli Zaretskii 2022-09-18 5:55 ` Gerd Möllmann 0 siblings, 1 reply; 21+ messages in thread From: Eli Zaretskii @ 2022-09-18 5:49 UTC (permalink / raw) To: Gerd Möllmann; +Cc: 57789, rlb > From: Gerd Möllmann <gerd.moellmann@gmail.com> > Cc: Eli Zaretskii <eliz@gnu.org>, 57789@debbugs.gnu.org > Date: Sun, 18 Sep 2022 07:22:45 +0200 > > But let's first see where in visit_... we are. I think the backtrace tells that, if you look at the sources from the emacs-28 branch. See my other message. ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x 2022-09-18 5:49 ` Eli Zaretskii @ 2022-09-18 5:55 ` Gerd Möllmann 0 siblings, 0 replies; 21+ messages in thread From: Gerd Möllmann @ 2022-09-18 5:55 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 57789, rlb On 22-09-18 7:49 , Eli Zaretskii wrote: >> From: Gerd Möllmann <gerd.moellmann@gmail.com> >> Cc: Eli Zaretskii <eliz@gnu.org>, 57789@debbugs.gnu.org >> Date: Sun, 18 Sep 2022 07:22:45 +0200 >> >> But let's first see where in visit_... we are. > > I think the backtrace tells that, if you look at the sources from the > emacs-28 branch. See my other message. Ah, right, visit_buffer_root. EINSUFFICIENTCOFFEE. ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x 2022-09-17 21:04 ` Rob Browning 2022-09-18 5:22 ` Gerd Möllmann @ 2022-09-18 5:33 ` Eli Zaretskii 2022-09-24 21:06 ` Rob Browning 1 sibling, 1 reply; 21+ messages in thread From: Eli Zaretskii @ 2022-09-18 5:33 UTC (permalink / raw) To: Rob Browning; +Cc: gerd.moellmann, 57789 > From: Rob Browning <rlb@defaultvalue.org> > Cc: Eli Zaretskii <eliz@gnu.org>, 57789@debbugs.gnu.org > Date: Sat, 17 Sep 2022 16:04:31 -0500 > > Gerd Möllmann <gerd.moellmann@gmail.com> writes: > > > Looking at Rob's backtrace, > > > > #0 mark_object (arg=<optimized out>) at alloc.c:6809 > > #1 0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa00ac54a8) at alloc.c:6607 > > #2 mark_vectorlike (header=0x2aa00ac54a0) at alloc.c:6382 > > #3 0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa007f4ca8) at alloc.c:6607 > > #4 mark_vectorlike (header=0x2aa007f4ca0) at alloc.c:6382 > > #5 0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa007c3b10) at alloc.c:6607 > > #6 mark_vectorlike (header=0x2aa007c3b08) at alloc.c:6382 > > #7 0x000002aa001a7c3e in visit_static_gc_roots (visitor=...) at alloc.c:5926 > > > > and seeing frame#7, would it be a way forward to determine which > > staticpro (I assume it is a staticpro) that is? Maybe that can give a > > clue which one can then use together with a bisect, perhaps? > > Not completely sure I followed, but moving up to that frame and printing > visitor didn't work: "optimized out". The code where this happens is this: for (int i = 0; i < staticidx; i++) visitor.visit (staticvec[i], GC_ROOT_STATICPRO, visitor.data); So one way of knowing which staticpro is being handled here is to see what is the value of 'i' and look at staticvec[i]. I'm guessing that 'i' is also "optimized out", though, so 2 possible ways forward: . disassemble visit_static_gc_roots, find in which register or where on the stack or in memory is 'i; or staticvec[i] stored, and go from there; or . add a printf to the above loop to show the value of 'i', and re-run the build, fingers crossed, hoping that the additional printf won't make the crash go away. Once you know which staticpro is being processed here, we'd need to examine its contents and try to figure out which parts cause the crash in GC. Thanks. ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x 2022-09-18 5:33 ` Eli Zaretskii @ 2022-09-24 21:06 ` Rob Browning 2023-06-07 21:15 ` Andrea Corallo 0 siblings, 1 reply; 21+ messages in thread From: Rob Browning @ 2022-09-24 21:06 UTC (permalink / raw) To: Eli Zaretskii; +Cc: gerd.moellmann, 57789 Eli Zaretskii <eliz@gnu.org> writes: > Once you know which staticpro is being processed here, we'd need to > examine its contents and try to figure out which parts cause the crash > in GC. Thanks, and I'll try to look in to this further when I have time. For now I'm changing the debian packages to avoid native compilation on some architectures (currently mips64el[1] and s390x). [1] There ./configure fails at the moment with "Error: -march=mips1 is not compatible with the selected ABI" when testing libgccjit. That's on eller.debian.org (mipsel host in a mips64el schroot). -- Rob Browning rlb @defaultvalue.org and @debian.org GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x 2022-09-24 21:06 ` Rob Browning @ 2023-06-07 21:15 ` Andrea Corallo 2023-09-11 18:08 ` Stefan Kangas 0 siblings, 1 reply; 21+ messages in thread From: Andrea Corallo @ 2023-06-07 21:15 UTC (permalink / raw) To: Rob Browning; +Cc: gerd.moellmann, 57789, Eli Zaretskii Rob Browning <rlb@defaultvalue.org> writes: > Eli Zaretskii <eliz@gnu.org> writes: > >> Once you know which staticpro is being processed here, we'd need to >> examine its contents and try to figure out which parts cause the crash >> in GC. > > Thanks, and I'll try to look in to this further when I have time. For > now I'm changing the debian packages to avoid native compilation on some > architectures (currently mips64el[1] and s390x). > > [1] There ./configure fails at the moment with "Error: -march=mips1 is > not compatible with the selected ABI" when testing libgccjit. > That's on eller.debian.org (mipsel host in a mips64el schroot). Hi Rob, any progress with this investigation? Is the bug still reproducible with a recent codebase? Thanks Andrea ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x 2023-06-07 21:15 ` Andrea Corallo @ 2023-09-11 18:08 ` Stefan Kangas 0 siblings, 0 replies; 21+ messages in thread From: Stefan Kangas @ 2023-09-11 18:08 UTC (permalink / raw) To: Andrea Corallo, Rob Browning; +Cc: gerd.moellmann, 57789, Eli Zaretskii tags 57789 + moreinfo thanks Andrea Corallo <acorallo@gnu.org> writes: > Rob Browning <rlb@defaultvalue.org> writes: > >> Eli Zaretskii <eliz@gnu.org> writes: >> >>> Once you know which staticpro is being processed here, we'd need to >>> examine its contents and try to figure out which parts cause the crash >>> in GC. >> >> Thanks, and I'll try to look in to this further when I have time. For >> now I'm changing the debian packages to avoid native compilation on some >> architectures (currently mips64el[1] and s390x). >> >> [1] There ./configure fails at the moment with "Error: -march=mips1 is >> not compatible with the selected ABI" when testing libgccjit. >> That's on eller.debian.org (mipsel host in a mips64el schroot). > > Hi Rob, > > any progress with this investigation? Is the bug still reproducible > with a recent codebase? Ping. Rob, any updates here? ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x 2022-09-14 20:19 ` Rob Browning 2022-09-14 20:21 ` Rob Browning @ 2022-09-15 7:10 ` Eli Zaretskii 2022-09-15 14:51 ` Paul Eggert via Bug reports for GNU Emacs, the Swiss army knife of text editors ` (2 more replies) 1 sibling, 3 replies; 21+ messages in thread From: Eli Zaretskii @ 2022-09-15 7:10 UTC (permalink / raw) To: Rob Browning, Andrea Corallo, Paul Eggert; +Cc: 57789 > From: Rob Browning <rlb@defaultvalue.org> > Cc: 57789@debbugs.gnu.org > Date: Wed, 14 Sep 2022 15:19:24 -0500 > > Eli Zaretskii <eliz@gnu.org> writes: > > > Please run the crashing command under GDB, and when it segfaults, > > produce the C-level and Lisp-level backtrace, and post them here. > > Starting from scratch with the emacs-28.1 commit I can reproduce the > failure when building via > > ./configure --prefix=/home/rlb/opt/emacs-tmp --with-native-compilation > > It crashes with the same segfault repeatably, i.e. if you run make > again, it crashes again on the previously mentioned "... -l comp -f > batch-byte+native-compile international/titdic-cnv.el" invocation. That > crash output is attached below. > > After adjusting the Makefile.in invocation so I could run it with gdb in > exactly the same environment once it's failing on that command, I > captured the backtrace and included it below. Thanks. The backtrace indicates that the crash is in GC. This probably means we have some fundamental problem on that architecture. Andrea, any advice for how to investigate? Does the build of the same code with the same options sans "--with-native-compilation" succeed, or does it also crash with similar symptoms? If the build without native-compilation succeeds, my first question would be how mature and stable is libgccjit on that platform? Perhaps take this up with the GCC's libgccjit developers. > With respect to the Lisp-level backtrace, I imagined you probably meant > an xbacktrace? If so (and assuming I'm guessing right about how I > should do that), I haven't figured out how to arrange sourcing the > src/.gdbinit from the src/Makefile.in command. You can source it manually from the GDB prompt, when the segfault happens, and then invoke xbacktrace manually, can't you? > It looked like it might be because there were no debug symbols, so I > tried adding a CFLAGS=-g3 to the end of the ./configure, but that caused > the crash to disappear entirely. Too bad, it means we have a heisenbug on our hands, which will make it even harder to debug (as if debugging crashes in GC were not hard enough already). What happens if you modify this variable: (defcustom native-comp-debug (if (eq 'windows-nt system-type) 1 0) to have the value 1 or even zero, and then rebuild from scratch? does the build succeed then? > Finally (and this was just a random guess based on previous experiences, > particularly with programs like guile that play (normal, traditional) > tricks with pointers/coercions/etc.) I noticed that emacs doesn't > specify -fno-strict-aliasing, and unless all the C code has been written > with that in mind, I assume that might open a window allowing the > optimizer to introduce undesirable changes. So I added a > CFLAGS=-fno-strict-aliasing to the end of the ./configure command, and > then the build and tests worked fine (twice in a row): > > ./configure --prefix=/home/rlb/opt/emacs-tmp --with-native-compilation \ > CFLAGS=-fno-strict-aliasing > > Of course that's not remotely conclusive, but if all of the C code > wasn't written with strict-aliasing in mind, then I wondered if it might > make sense to consider adding -fno-strict-aliasing as a default option. I don't know enough about this. Perhaps Andrea or Paul could comment. > Also, even if that ends up being desirable, I'm not sure it'll be > sufficient. That is, I suspect I might want to run the full build/check > with -fno-strict-aliasing in a loop for a bit to make sure the clean > build/check is reliable, since I think I may have seen some test crashes > (not the build crash) on one earlier run with that option, but I'm not > sure that was a clean attempt. Yes, running the full test suite would be the logical next step. > Program received signal SIGSEGV, Segmentation fault. > mark_object (arg=<optimized out>) at alloc.c:6809 > 6809 if (symbol_marked_p (ptr)) > (gdb) backtrace > #0 mark_object (arg=<optimized out>) at alloc.c:6809 Any idea what cause SIGSEGV here? Was 'ptr' an invalid pointer for some reason, and if so, what exactly makes it invalid? Thanks. ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x 2022-09-15 7:10 ` Eli Zaretskii @ 2022-09-15 14:51 ` Paul Eggert via Bug reports for GNU Emacs, the Swiss army knife of text editors 2022-09-15 16:26 ` Rob Browning 2022-09-16 8:43 ` Andrea Corallo 2022-09-16 8:39 ` Andrea Corallo 2022-09-17 21:00 ` Rob Browning 2 siblings, 2 replies; 21+ messages in thread From: Paul Eggert via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-09-15 14:51 UTC (permalink / raw) To: Eli Zaretskii, Rob Browning, Andrea Corallo; +Cc: 57789 On 9/15/22 02:10, Eli Zaretskii wrote: >> Of course that's not remotely conclusive, but if all of the C code >> wasn't written with strict-aliasing in mind, then I wondered if it might >> make sense to consider adding -fno-strict-aliasing as a default option. > I don't know enough about this. Perhaps Andrea or Paul could comment. > Throwing -fno-strict-aliasing in the mix is a bit like throwing -O1 into the mix. I'm not surprised it would cause a Heisenbug to vanish; it doesn't mean strict aliasing is the problem. Emacs should work with strict aliasing. At least, that's true in the default build. I suppose it could be possible there's a strict aliasing bug in the native compiler - I'm not that familiar with that code. ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x 2022-09-15 14:51 ` Paul Eggert via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-09-15 16:26 ` Rob Browning 2022-09-16 8:43 ` Andrea Corallo 1 sibling, 0 replies; 21+ messages in thread From: Rob Browning @ 2022-09-15 16:26 UTC (permalink / raw) To: Paul Eggert, Eli Zaretskii, Andrea Corallo; +Cc: 57789 Paul Eggert <eggert@cs.ucla.edu> writes: > Throwing -fno-strict-aliasing in the mix is a bit like throwing -O1 into > the mix. I'm not surprised it would cause a Heisenbug to vanish; it > doesn't mean strict aliasing is the problem. Agreed. Thanks -- Rob Browning rlb @defaultvalue.org and @debian.org GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x 2022-09-15 14:51 ` Paul Eggert via Bug reports for GNU Emacs, the Swiss army knife of text editors 2022-09-15 16:26 ` Rob Browning @ 2022-09-16 8:43 ` Andrea Corallo 1 sibling, 0 replies; 21+ messages in thread From: Andrea Corallo @ 2022-09-16 8:43 UTC (permalink / raw) To: Paul Eggert; +Cc: 57789, Eli Zaretskii, Rob Browning Paul Eggert <eggert@cs.ucla.edu> writes: > On 9/15/22 02:10, Eli Zaretskii wrote: >>> Of course that's not remotely conclusive, but if all of the C code >>> wasn't written with strict-aliasing in mind, then I wondered if it might >>> make sense to consider adding -fno-strict-aliasing as a default option. >> I don't know enough about this. Perhaps Andrea or Paul could comment. >> > Throwing -fno-strict-aliasing in the mix is a bit like throwing -O1 > into the mix. I'm not surprised it would cause a Heisenbug to vanish; > it doesn't mean strict aliasing is the problem. Hi Paul, totally agree with you. The fact that even -g has an impact here clearly shows that initial conditions are not necessary directly connected with the final symptom we observe. Best Regards Andrea ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x 2022-09-15 7:10 ` Eli Zaretskii 2022-09-15 14:51 ` Paul Eggert via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-09-16 8:39 ` Andrea Corallo 2022-09-17 21:00 ` Rob Browning 2 siblings, 0 replies; 21+ messages in thread From: Andrea Corallo @ 2022-09-16 8:39 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 57789, Paul Eggert, Rob Browning Eli Zaretskii <eliz@gnu.org> writes: >> From: Rob Browning <rlb@defaultvalue.org> >> Cc: 57789@debbugs.gnu.org >> Date: Wed, 14 Sep 2022 15:19:24 -0500 >> >> Eli Zaretskii <eliz@gnu.org> writes: >> >> > Please run the crashing command under GDB, and when it segfaults, >> > produce the C-level and Lisp-level backtrace, and post them here. >> >> Starting from scratch with the emacs-28.1 commit I can reproduce the >> failure when building via >> >> ./configure --prefix=/home/rlb/opt/emacs-tmp --with-native-compilation >> >> It crashes with the same segfault repeatably, i.e. if you run make >> again, it crashes again on the previously mentioned "... -l comp -f >> batch-byte+native-compile international/titdic-cnv.el" invocation. That >> crash output is attached below. >> >> After adjusting the Makefile.in invocation so I could run it with gdb in >> exactly the same environment once it's failing on that command, I >> captured the backtrace and included it below. > > Thanks. The backtrace indicates that the crash is in GC. This > probably means we have some fundamental problem on that architecture. > Andrea, any advice for how to investigate? Mmmh one cheap way to maybe gather more info is to have a run under valgrind. Other than that I typically start debugging with GDB and possibly rr. Like what is (or was) the object the GC is crashing on? Why? What's the last piece of code that touched it? Why? IIUC here we have no debug symbols so this makes it very difficult. BTW the fact that -g has an impact on the crash is very odd Andrea ^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x 2022-09-15 7:10 ` Eli Zaretskii 2022-09-15 14:51 ` Paul Eggert via Bug reports for GNU Emacs, the Swiss army knife of text editors 2022-09-16 8:39 ` Andrea Corallo @ 2022-09-17 21:00 ` Rob Browning 2 siblings, 0 replies; 21+ messages in thread From: Rob Browning @ 2022-09-17 21:00 UTC (permalink / raw) To: Eli Zaretskii, Andrea Corallo, Paul Eggert; +Cc: 57789 Eli Zaretskii <eliz@gnu.org> writes: > Rob Browning <rlb@defaultvalue.org> writes: > Does the build of the same code with the same options sans > "--with-native-compilation" succeed, or does it also crash with > similar symptoms? Works fine. > You can source it manually from the GDB prompt, when the segfault > happens, and then invoke xbacktrace manually, can't you? Yep. Breakpoint 1 at 0x2aa0004ef30: file emacs.c, line 400. Breakpoint 2 at 0x2aa0010f168: file xterm.c, line 10291. (gdb) xbacktrace "Automatic GC" (0x0) "internal-macroexpand-for-load" (0xffffa6a8) "eval-buffer" (0xffffaa28) "let" (0xffffac10) "let" (0xffffae28) "unwind-protect" (0xffffaff0) "let" (0xffffb1f8) "if" (0xffffb3c8) "load-with-code-conversion" (0xffffb650) "time-since" (0xffffbba8) "comp--native-compile" (0xffffbd38) "batch-native-compile" (0xffffbef0) "batch-byte+native-compile" (0xffffc080) "funcall" (0xffffc078) "if" (0xffffc268) "cond" (0xffffc438) "let*" (0xffffc618) "while" (0xffffc7e8) "let*" (0xffffc9c8) "progn" (0xffffcb98) "if" (0xffffccc0) "let" (0xffffceb8) "let" (0xffffd0b0) "command-line-1" (0xffffd280) "let" (0xffffd570) "command-line" (0xffffd740) "unwind-protect" (0xffffd9f0) "let" (0xffffdbe8) "if" (0xffffddb8) "normal-top-level" (0xffffdf88) > Too bad, it means we have a heisenbug on our hands, which will make it > even harder to debug (as if debugging crashes in GC were not hard > enough already). > > What happens if you modify this variable: > > (defcustom native-comp-debug (if (eq 'windows-nt system-type) 1 0) > > to have the value 1 or even zero, and then rebuild from scratch? does > the build succeed then? No, appears to crash in the same way. > Yes, running the full test suite would be the logical next step. Oh, I had run it, I just meant that I'd likely want to double-check via testing in a loop to try to see if it might be an intermittent failure. Thanks -- Rob Browning rlb @defaultvalue.org and @debian.org GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2023-09-11 18:08 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-09-14 1:04 bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x Rob Browning 2022-09-14 2:42 ` Eli Zaretskii 2022-09-14 3:06 ` Rob Browning 2022-09-14 3:20 ` Rob Browning 2022-09-14 20:19 ` Rob Browning 2022-09-14 20:21 ` Rob Browning 2022-09-16 6:04 ` Gerd Möllmann 2022-09-17 21:04 ` Rob Browning 2022-09-18 5:22 ` Gerd Möllmann 2022-09-18 5:49 ` Eli Zaretskii 2022-09-18 5:55 ` Gerd Möllmann 2022-09-18 5:33 ` Eli Zaretskii 2022-09-24 21:06 ` Rob Browning 2023-06-07 21:15 ` Andrea Corallo 2023-09-11 18:08 ` Stefan Kangas 2022-09-15 7:10 ` Eli Zaretskii 2022-09-15 14:51 ` Paul Eggert via Bug reports for GNU Emacs, the Swiss army knife of text editors 2022-09-15 16:26 ` Rob Browning 2022-09-16 8:43 ` Andrea Corallo 2022-09-16 8:39 ` Andrea Corallo 2022-09-17 21:00 ` Rob Browning
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.