unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x
@ 2022-09-14  1:04 Rob Browning
  2022-09-14  2:42 ` Eli Zaretskii
  0 siblings, 1 reply; 21+ messages in thread
From: Rob Browning @ 2022-09-14  1:04 UTC (permalink / raw)
  To: 57789


On zelenka.debian.org https://db.debian.org/machines.cgi?host=zelenka
the build crashes with a segfault with current Debian sid (unstable).  I
can produce the crash like this:

  git clone --single-branch --branch emacs-28 .../emacs.git
  cd emacs
  ./autogen.sh
  ./configure --prefix=/home/rlb/opt/emacs-tmp --with-native-compilation
  make check

The debian package produced a similar failure earlier:

  https://buildd.debian.org/status/fetch.php?pkg=emacs&arch=s390x&ver=1%3A28.1%2B1-3&stamp=1662863442&raw=0

Here's the final bit of the clone build's log, and I'm happy to help
test on the machine if that'd be useful:

  Loading /home/rlb/emacs/lisp/electric.el (source)...
  Loading /home/rlb/emacs/lisp/paren.el (source)...
  Loading /home/rlb/emacs/lisp/emacs-lisp/shorthands.el (source)...
  Loading /home/rlb/emacs/lisp/emacs-lisp/eldoc.el (source)...
  Loading /home/rlb/emacs/lisp/cus-start.el (source)...
  Loading /home/rlb/emacs/lisp/tooltip.el (source)...
  Loading /home/rlb/emacs/lisp/international/iso-transl.el (source)...
  Finding pointers to doc strings...
  Finding pointers to doc strings...done
  Dumping under the name bootstrap-emacs.pdmp
  Dumping fingerprint: b4b1b9ac4d82ce4537c0e1eb6527b2b7f5831cb6de31c7f9b2fd2a1a0c4531c4
  Dump complete
  Byte counts: header=100 hot=14915588 discardable=175392 cold=10410424
  Reloc counts: hot=1048047 discardable=5080
  make -C ../lisp compile-first EMACS="../src/bootstrap-emacs"
  make[2]: Entering directory '/home/rlb/emacs/lisp'
    ELC+ELN  emacs-lisp/macroexp.elc
    ELC+ELN  emacs-lisp/cconv.elc
    ELC+ELN  emacs-lisp/byte-opt.elc
    ELC+ELN  emacs-lisp/bytecomp.elc
    ELC+ELN  emacs-lisp/comp.elc
    ELC+ELN  emacs-lisp/comp-cstr.elc
    ELC+ELN  emacs-lisp/cl-macs.elc
    ELC+ELN  emacs-lisp/rx.elc
    ELC+ELN  emacs-lisp/cl-seq.elc
  Fatal error 11: Segmentation fault
  Backtrace:
  ../src/bootstrap-emacs(+0x15deb6)[0x2aa0a7ddeb6]
  ../src/bootstrap-emacs(+0x4efc4)[0x2aa0a6cefc4]
  ../src/bootstrap-emacs(+0x4f1fe)[0x2aa0a6cf1fe]
  ../src/bootstrap-emacs(+0x15c240)[0x2aa0a7dc240]
  ../src/bootstrap-emacs(+0x15c2d2)[0x2aa0a7dc2d2]
  ../src/bootstrap-emacs(+0x6a47d8)[0x2aa0ad247d8]
  ../src/bootstrap-emacs(+0x1a7de0)[0x2aa0a827de0]
  ../src/bootstrap-emacs(+0x1a8ee6)[0x2aa0a828ee6]
  ../src/bootstrap-emacs(+0x1a8ee6)[0x2aa0a828ee6]
  ../src/bootstrap-emacs(+0x1a8ee6)[0x2aa0a828ee6]
  ../src/bootstrap-emacs(+0x1a8ee6)[0x2aa0a828ee6]
  ../src/bootstrap-emacs(+0x1a7c3e)[0x2aa0a827c3e]
  ../src/bootstrap-emacs(+0x1a9094)[0x2aa0a829094]
  ../src/bootstrap-emacs(eval_sub+0x410)[0x2aa0a84cc28]
  ../src/bootstrap-emacs(+0x1cdc2e)[0x2aa0a84dc2e]
  ../src/bootstrap-emacs(+0x1cdf10)[0x2aa0a84df10]
  ../src/bootstrap-emacs(eval_sub+0x1b8)[0x2aa0a84c9d0]
  ../src/bootstrap-emacs(+0x1cdeb8)[0x2aa0a84deb8]
  ../src/bootstrap-emacs(eval_sub+0x1b8)[0x2aa0a84c9d0]
  ../src/bootstrap-emacs(eval_sub+0x2c4)[0x2aa0a84cadc]
  ../src/bootstrap-emacs(+0x1cd26a)[0x2aa0a84d26a]
  ../src/bootstrap-emacs(eval_sub+0x532)[0x2aa0a84cd4a]
  ../src/bootstrap-emacs(eval_sub+0x4ba)[0x2aa0a84ccd2]
  ../src/bootstrap-emacs(eval_sub+0x532)[0x2aa0a84cd4a]
  ../src/bootstrap-emacs(+0x1ce488)[0x2aa0a84e488]
  ../src/bootstrap-emacs(eval_sub+0x532)[0x2aa0a84cd4a]
  ../src/bootstrap-emacs(+0x1cd8cc)[0x2aa0a84d8cc]
  ../src/bootstrap-emacs(eval_sub+0x532)[0x2aa0a84cd4a]
  ../src/bootstrap-emacs(+0x1cdc2e)[0x2aa0a84dc2e]
  ../src/bootstrap-emacs(+0x1cdf10)[0x2aa0a84df10]
  ../src/bootstrap-emacs(eval_sub+0x1b8)[0x2aa0a84c9d0]
  ../src/bootstrap-emacs(eval_sub+0x2c4)[0x2aa0a84cadc]
  ../src/bootstrap-emacs(+0x1cdc2e)[0x2aa0a84dc2e]
  ../src/bootstrap-emacs(Ffuncall+0x1f2)[0x2aa0a84a202]
  ../src/bootstrap-emacs(+0x1cc6a4)[0x2aa0a84c6a4]
  ../src/bootstrap-emacs(+0x1ce26c)[0x2aa0a84e26c]
  ../src/bootstrap-emacs(eval_sub+0x638)[0x2aa0a84ce50]
  ../src/bootstrap-emacs(+0x1ce7ec)[0x2aa0a84e7ec]
  ../src/bootstrap-emacs(eval_sub+0x532)[0x2aa0a84cd4a]
  ../src/bootstrap-emacs(+0x1cdc2e)[0x2aa0a84dc2e]
  ../src/bootstrap-emacs(+0x1cdf10)[0x2aa0a84df10]
  ...
  make[2]: *** [Makefile:316: emacs-lisp/cl-seq.elc] Segmentation fault
  make[2]: Leaving directory '/home/rlb/emacs/lisp'
  make[1]: *** [Makefile:870: bootstrap-emacs.pdmp] Error 2
  make[1]: Leaving directory '/home/rlb/emacs/src'
  make: *** [Makefile:449: src] Error 2

Thanks
-- 
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4





^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x
  2022-09-14  1:04 bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x Rob Browning
@ 2022-09-14  2:42 ` Eli Zaretskii
  2022-09-14  3:06   ` Rob Browning
  2022-09-14 20:19   ` Rob Browning
  0 siblings, 2 replies; 21+ messages in thread
From: Eli Zaretskii @ 2022-09-14  2:42 UTC (permalink / raw)
  To: Rob Browning; +Cc: 57789

> From: Rob Browning <rlb@defaultvalue.org>
> Date: Tue, 13 Sep 2022 20:04:32 -0500
> 
> On zelenka.debian.org https://db.debian.org/machines.cgi?host=zelenka
> the build crashes with a segfault with current Debian sid (unstable).  I
> can produce the crash like this:
> 
>   git clone --single-branch --branch emacs-28 .../emacs.git

If you build the current emacs-28 branch, then it isn't Emacs 28.1,
it's Emacs 28.2.50, right?

>   cd emacs
>   ./autogen.sh
>   ./configure --prefix=/home/rlb/opt/emacs-tmp --with-native-compilation
>   make check
> 
> The debian package produced a similar failure earlier:
> 
>   https://buildd.debian.org/status/fetch.php?pkg=emacs&arch=s390x&ver=1%3A28.1%2B1-3&stamp=1662863442&raw=0
> 
> Here's the final bit of the clone build's log, and I'm happy to help
> test on the machine if that'd be useful:

Please run the crashing command under GDB, and when it segfaults,
produce the C-level and Lisp-level backtrace, and post them here.

Thanks.





^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x
  2022-09-14  2:42 ` Eli Zaretskii
@ 2022-09-14  3:06   ` Rob Browning
  2022-09-14  3:20     ` Rob Browning
  2022-09-14 20:19   ` Rob Browning
  1 sibling, 1 reply; 21+ messages in thread
From: Rob Browning @ 2022-09-14  3:06 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 57789

Eli Zaretskii <eliz@gnu.org> writes:

> If you build the current emacs-28 branch, then it isn't Emacs 28.1,
> it's Emacs 28.2.50, right?

Right, sorry, the clone test was the current branch tip, and the buildd
log was for (Debian's partially altered) tree, derived from the
emacs-28.1 tag.  I can easily re-test the 28.1 tag if we like.

> Please run the crashing command under GDB, and when it segfaults,
> produce the C-level and Lisp-level backtrace, and post them here.

Will attempt.

Thanks
-- 
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4





^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x
  2022-09-14  3:06   ` Rob Browning
@ 2022-09-14  3:20     ` Rob Browning
  0 siblings, 0 replies; 21+ messages in thread
From: Rob Browning @ 2022-09-14  3:20 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 57789

Rob Browning <rlb@defaultvalue.org> writes:

> Will attempt.

Hmm, so I ran "make V=1" from the same tree and saw thw command that
repeatably crashed, which was:

  EMACSLOADPATH= '../src/bootstrap-emacs' -batch --no-site-file --no-site-lisp --eval '(setq load-prefer-newer t)'  \
    -l comp -f batch-byte+native-compile international/titdic-cnv.el

I then ran that manually via 

  (cd lisp
   && EMACSLOADPATH= '../src/bootstrap-emacs' -batch --no-site-file --no-site-lisp --eval '(setq load-prefer-newer t)'  \
      -l comp -f batch-byte+native-compile international/titdic-cnv.el)

which ran for a bit and succeeded.  After that a make worked fine until
bindings.el where it crashed again, this time with an "Aborted", and
running it manually didn't help.

In any case, I'm going to start over and try to get the backtraces for
the titdic-cnv.el failure.

-- 
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4





^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x
  2022-09-14  2:42 ` Eli Zaretskii
  2022-09-14  3:06   ` Rob Browning
@ 2022-09-14 20:19   ` Rob Browning
  2022-09-14 20:21     ` Rob Browning
  2022-09-15  7:10     ` Eli Zaretskii
  1 sibling, 2 replies; 21+ messages in thread
From: Rob Browning @ 2022-09-14 20:19 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 57789

[-- Attachment #1: Type: text/plain, Size: 2538 bytes --]

Eli Zaretskii <eliz@gnu.org> writes:

> Please run the crashing command under GDB, and when it segfaults,
> produce the C-level and Lisp-level backtrace, and post them here.

Starting from scratch with the emacs-28.1 commit I can reproduce the
failure when building via

  ./configure --prefix=/home/rlb/opt/emacs-tmp --with-native-compilation

It crashes with the same segfault repeatably, i.e. if you run make
again, it crashes again on the previously mentioned "... -l comp -f
batch-byte+native-compile international/titdic-cnv.el" invocation.  That
crash output is attached below.

After adjusting the Makefile.in invocation so I could run it with gdb in
exactly the same environment once it's failing on that command, I
captured the backtrace and included it below.

With respect to the Lisp-level backtrace, I imagined you probably meant
an xbacktrace?  If so (and assuming I'm guessing right about how I
should do that), I haven't figured out how to arrange sourcing the
src/.gdbinit from the src/Makefile.in command.  I'm likely doing
something wrong, but it doesn't seem to want to load the file.

It looked like it might be because there were no debug symbols, so I
tried adding a CFLAGS=-g3 to the end of the ./configure, but that caused
the crash to disappear entirely.

Finally (and this was just a random guess based on previous experiences,
particularly with programs like guile that play (normal, traditional)
tricks with pointers/coercions/etc.) I noticed that emacs doesn't
specify -fno-strict-aliasing, and unless all the C code has been written
with that in mind, I assume that might open a window allowing the
optimizer to introduce undesirable changes.  So I added a
CFLAGS=-fno-strict-aliasing to the end of the ./configure command, and
then the build and tests worked fine (twice in a row):

  ./configure --prefix=/home/rlb/opt/emacs-tmp --with-native-compilation \
    CFLAGS=-fno-strict-aliasing

Of course that's not remotely conclusive, but if all of the C code
wasn't written with strict-aliasing in mind, then I wondered if it might
make sense to consider adding -fno-strict-aliasing as a default option.

Also, even if that ends up being desirable, I'm not sure it'll be
sufficient.  That is, I suspect I might want to run the full build/check
with -fno-strict-aliasing in a loop for a bit to make sure the clean
build/check is reliable, since I think I may have seen some test crashes
(not the build crash) on one earlier run with that option, but I'm not
sure that was a clean attempt.

The make crash:


[-- Attachment #2: emacs-s390x-crash --]
[-- Type: text/plain, Size: 2955 bytes --]

make[2]: Entering directory '/home/rlb/emacs/lisp'
EMACSLOADPATH= '../src/bootstrap-emacs' -batch --no-site-file --no-site-lisp --eval '(setq load-prefer-newer t)'  \
-l comp -f batch-byte+native-compile international/titdic-cnv.el
Fatal error 11: Segmentation fault
Backtrace:
../src/bootstrap-emacs(+0x15deb6)[0x2aa293ddeb6]
../src/bootstrap-emacs(+0x4efc4)[0x2aa292cefc4]
../src/bootstrap-emacs(+0x4f1fe)[0x2aa292cf1fe]
../src/bootstrap-emacs(+0x15c240)[0x2aa293dc240]
../src/bootstrap-emacs(+0x15c2d2)[0x2aa293dc2d2]
../src/bootstrap-emacs(+0x6a47d8)[0x2aa299247d8]
../src/bootstrap-emacs(+0x1a7fa8)[0x2aa29427fa8]
../src/bootstrap-emacs(+0x1a8ee6)[0x2aa29428ee6]
../src/bootstrap-emacs(+0x1a8ee6)[0x2aa29428ee6]
../src/bootstrap-emacs(+0x1a8ee6)[0x2aa29428ee6]
../src/bootstrap-emacs(+0x1a7c3e)[0x2aa29427c3e]
../src/bootstrap-emacs(+0x1a9094)[0x2aa29429094]
../src/bootstrap-emacs(Ffuncall+0x2de)[0x2aa2944a2ee]
../src/bootstrap-emacs(+0x1ca42c)[0x2aa2944a42c]
../src/bootstrap-emacs(+0x1f0c72)[0x2aa29470c72]
../src/bootstrap-emacs(+0x1f7fb0)[0x2aa29477fb0]
../src/bootstrap-emacs(+0x1f8474)[0x2aa29478474]
../src/bootstrap-emacs(eval_sub+0x5e4)[0x2aa2944cdfc]
../src/bootstrap-emacs(+0x1ce488)[0x2aa2944e488]
../src/bootstrap-emacs(eval_sub+0x532)[0x2aa2944cd4a]
../src/bootstrap-emacs(+0x1ce488)[0x2aa2944e488]
../src/bootstrap-emacs(eval_sub+0x532)[0x2aa2944cd4a]
../src/bootstrap-emacs(+0x1ce8cc)[0x2aa2944e8cc]
../src/bootstrap-emacs(eval_sub+0x532)[0x2aa2944cd4a]
../src/bootstrap-emacs(+0x1ce488)[0x2aa2944e488]
../src/bootstrap-emacs(eval_sub+0x532)[0x2aa2944cd4a]
../src/bootstrap-emacs(+0x1cd824)[0x2aa2944d824]
../src/bootstrap-emacs(eval_sub+0x532)[0x2aa2944cd4a]
../src/bootstrap-emacs(+0x1cdc2e)[0x2aa2944dc2e]
../src/bootstrap-emacs(Ffuncall+0x1f2)[0x2aa2944a202]
../src/bootstrap-emacs(+0x1ca4b0)[0x2aa2944a4b0]
../src/bootstrap-emacs(+0x1f90e4)[0x2aa294790e4]
../src/bootstrap-emacs(+0x1f9462)[0x2aa29479462]
../src/bootstrap-emacs(+0x1c9ef0)[0x2aa29449ef0]
../src/bootstrap-emacs(Ffuncall+0x182)[0x2aa2944a192]
/home/rlb/emacs/native-lisp/28.2-87d45215/comp-7672a6ed-ac6bcf4e.eln(F636f6d702d2d6e61746976652d636f6d70696c65_comp__native_compile_0+0x804)[0x3ff91d6b0d4]
../src/bootstrap-emacs(Ffuncall+0x23e)[0x2aa2944a24e]
/home/rlb/emacs/native-lisp/28.2-87d45215/comp-7672a6ed-ac6bcf4e.eln(F62617463682d6e61746976652d636f6d70696c65_batch_native_compile_0+0x1d2)[0x3ff91d6c592]
../src/bootstrap-emacs(Ffuncall+0x23e)[0x2aa2944a24e]
/home/rlb/emacs/native-lisp/28.2-87d45215/comp-7672a6ed-ac6bcf4e.eln(F62617463682d627974652b6e61746976652d636f6d70696c65_batch_bytenative_compile_0+0x108)[0x3ff91d6c728]
../src/bootstrap-emacs(Ffuncall+0x23e)[0x2aa2944a24e]
...
make[2]: *** [Makefile:321: international/titdic-cnv.elc] Segmentation fault
make[2]: Leaving directory '/home/rlb/emacs/lisp'
make[1]: *** [Makefile:845: ../lisp/loaddefs.el] Error 2
make[1]: Leaving directory '/home/rlb/emacs/src'
make: *** [Makefile:449: src] Error 2

[-- Attachment #3: Type: text/plain, Size: 21 bytes --]


The gdb backtrace:


[-- Attachment #4: emacs-s390x-backtrace --]
[-- Type: text/plain, Size: 9023 bytes --]

Program received signal SIGSEGV, Segmentation fault.
mark_object (arg=<optimized out>) at alloc.c:6809
6809            if (symbol_marked_p (ptr))
(gdb) backtrace
#0  mark_object (arg=<optimized out>) at alloc.c:6809
#1  0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa00ac54a8) at alloc.c:6607
#2  mark_vectorlike (header=0x2aa00ac54a0) at alloc.c:6382
#3  0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa007f4ca8) at alloc.c:6607
#4  mark_vectorlike (header=0x2aa007f4ca0) at alloc.c:6382
#5  0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa007c3b10) at alloc.c:6607
#6  mark_vectorlike (header=0x2aa007c3b08) at alloc.c:6382
#7  0x000002aa001a7c3e in visit_static_gc_roots (visitor=...) at alloc.c:5926
#8  0x000002aa001a9094 in garbage_collect () at alloc.c:6132
#9  0x000002aa001a9d0c in maybe_garbage_collect () at alloc.c:6045
#10 0x000002aa001ca2ee in maybe_gc () at lisp.h:5142
#11 Ffuncall (nargs=nargs@entry=3, args=args@entry=0x3ffffffa6a0) at eval.c:3007
#12 0x000002aa001ca42c in call2 (fn=fn@entry=0x155f3675830, arg1=arg1@entry=0x2aa00a75e43, arg2=arg2@entry=0x0) at eval.c:2890
#13 0x000002aa001f0c72 in readevalloop_eager_expand_eval (val=val@entry=0x2aa00a75e43, macroexpand=macroexpand@entry=0x155f3675830) at lread.c:2133
#14 0x000002aa001f7fb0 in readevalloop (readcharfun=readcharfun@entry=0x2aa00aa27b5, infile0=<optimized out>, 
    infile0@entry=0x0, sourcename=sourcename@entry=0x2aa00a7fff4, printflag=printflag@entry=false, unibyte=unibyte@entry=0x0, readfun=0x0, start=0x0, end=<optimized out>) at lread.c:2324
#15 0x000002aa001f8474 in Feval_buffer (buffer=<optimized out>, printflag=0x0, filename=0x2aa00a7fff4, unibyte=0x0, do_allow_print=<optimized out>) at lread.c:2397
#16 0x000002aa001ccdfc in eval_sub (form=<optimized out>) at eval.c:2512
#17 0x000002aa001ce488 in Fprogn (body=0x0) at eval.c:465
#18 Flet (args=0x3b) at eval.c:1051
#19 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451
#20 0x000002aa001ce488 in Fprogn (body=0x0) at eval.c:465
#21 Flet (args=0x36) at eval.c:1051
#22 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451
#23 0x000002aa001ce8cc in Funwind_protect (args=0x3fff3cf7f0b) at lisp.h:1420
#24 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451
#25 0x000002aa001ce488 in Fprogn (body=0x3fff3cf7d6b) at eval.c:465
#26 Flet (args=0x2d) at eval.c:1051
#27 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451
#28 0x000002aa001cd824 in Fprogn (body=0x0) at eval.c:465
#29 Fif (args=<optimized out>) at eval.c:421
#30 Fif (args=<optimized out>) at eval.c:407
#31 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451
#32 0x000002aa001cdc2e in Fprogn (body=0x0) at eval.c:465
#33 funcall_lambda (fun=0x3fff3cf7c9b, nargs=nargs@entry=4, arg_vector=arg_vector@entry=0x3ffffffb650) at eval.c:3305
#34 0x000002aa001ca202 in Ffuncall (nargs=nargs@entry=5, args=args@entry=0x3ffffffb648) at eval.c:3039
#35 0x000002aa001ca4b0 in call4 (fn=<optimized out>, arg1=arg1@entry=0x2aa00a7fff4, arg2=arg2@entry=0x2aa00a7fff4, arg3=arg3@entry=0x0, arg4=arg4@entry=0x30) at eval.c:2905
#36 0x000002aa001f90e4 in Fload (file=file@entry=0x3fff362bcbc, noerror=noerror@entry=0x0, nomessage=nomessage@entry=0x30, nosuffix=nosuffix@entry=0x0, must_suffix=<optimized out>, 
    must_suffix@entry=0x30) at lread.c:1473
#37 0x000002aa001f9462 in save_match_data_load (file=0x3fff362bcbc, noerror=noerror@entry=0x0, nomessage=nomessage@entry=0x30, nosuffix=nosuffix@entry=0x0, must_suffix=must_suffix@entry=0x30)
    at lread.c:1629
#38 0x000002aa001c9ef0 in Fautoload_do_load (fundef=0x3fff362bc4b, funname=funname@entry=0x155f2f7a340, macro_only=macro_only@entry=0x0) at eval.c:2295
#39 0x000002aa001ca192 in Ffuncall (nargs=2, args=0x3ffffffbba0) at eval.c:3042
#40 0x000003fff306b0d4 in F636f6d702d2d6e61746976652d636f6d70696c65_comp__native_compile_0 () at /home/rlb/emacs/native-lisp/28.2-87d45215/comp-7672a6ed-ac6bcf4e.eln
#41 0x000002aa001ca24e in Ffuncall (nargs=<optimized out>, args=<optimized out>) at lisp.h:2110
#42 0x000003fff306c592 in F62617463682d6e61746976652d636f6d70696c65_batch_native_compile_0 () at /home/rlb/emacs/native-lisp/28.2-87d45215/comp-7672a6ed-ac6bcf4e.eln
#43 0x000002aa001ca24e in Ffuncall (nargs=<optimized out>, args=<optimized out>) at lisp.h:2110
#44 0x000003fff306c728 in F62617463682d627974652b6e61746976652d636f6d70696c65_batch_bytenative_compile_0 () at /home/rlb/emacs/native-lisp/28.2-87d45215/comp-7672a6ed-ac6bcf4e.eln
#45 0x000002aa001ca24e in Ffuncall (nargs=<optimized out>, args=<optimized out>) at lisp.h:2110
#46 0x000002aa001ccfc4 in eval_sub (form=<optimized out>) at eval.c:2470
#47 0x000002aa001cd824 in Fprogn (body=0x0) at eval.c:465
#48 Fif (args=<optimized out>) at eval.c:421
#49 Fif (args=<optimized out>) at eval.c:407
#50 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451
#51 0x000002aa001cd8cc in Fprogn (body=0x0) at eval.c:465
#52 Fcond (args=<optimized out>) at eval.c:445
#53 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451
#54 0x000002aa001ce732 in Fprogn (body=0x3fff36e1b43) at eval.c:465
#55 FletX (args=0x3fff36e1b03) at eval.c:983
#56 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451
#57 0x000002aa001cd6ae in Fprogn (body=0x0) at eval.c:465
#58 prog_ignore (body=<optimized out>) at eval.c:476
#59 Fwhile (args=<optimized out>) at eval.c:1072
#60 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451
#61 0x000002aa001ce732 in Fprogn (body=0x0) at eval.c:465
#62 FletX (args=0x3fff36e1a83) at eval.c:983
#63 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451
#64 0x000002aa001cd1d6 in Fprogn (body=0x0) at eval.c:465
#65 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451
#66 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451
#67 0x000002aa001ce488 in Fprogn (body=0x0) at eval.c:465
#68 Flet (args=0x12) at eval.c:1051
#69 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451
#70 0x000002aa001ce488 in Fprogn (body=0x3fff35d3a73) at eval.c:465
#71 Flet (args=0xe) at eval.c:1051
#72 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451
#73 0x000002aa001cdc2e in Fprogn (body=0x0) at eval.c:465
#74 funcall_lambda (fun=0x3fff35d39e3, fun@entry=0x3fff35d39d3, nargs=nargs@entry=1, arg_vector=arg_vector@entry=0x3ffffffd280) at eval.c:3305
#75 0x000002aa001cdf10 in apply_lambda (fun=fun@entry=0x3fff35d39d3, args=<optimized out>, count=2929176661299, count@entry=15) at eval.c:3172
#76 0x000002aa001cc9d0 in eval_sub (form=<optimized out>) at eval.c:2575
#77 0x000002aa001ce488 in Fprogn (body=0x3fff37a209b) at eval.c:465
#78 Flet (args=0x8) at eval.c:1051
#79 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451
#80 0x000002aa001cdc2e in Fprogn (body=0x0) at eval.c:465
#81 funcall_lambda (fun=0x3fff37a1e7b, fun@entry=0x3fff37a1e6b, nargs=nargs@entry=0, arg_vector=arg_vector@entry=0x3ffffffd740) at eval.c:3305
#82 0x000002aa001cdf10 in apply_lambda (fun=fun@entry=0x3fff37a1e6b, args=<optimized out>, count=2929176221524, count@entry=11) at eval.c:3172
#83 0x000002aa001cc9d0 in eval_sub (form=<optimized out>) at eval.c:2575
#84 0x000002aa001ce8cc in Funwind_protect (args=0x3fff380e7a3) at lisp.h:1420
#85 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451
#86 0x000002aa001ce488 in Fprogn (body=0x0) at eval.c:465
#87 Flet (args=0x3ffffffe658) at eval.c:1051
#88 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451
#89 0x000002aa001cd824 in Fprogn (body=0x3fff380e233) at eval.c:465
#90 Fif (args=<optimized out>) at eval.c:421
#91 Fif (args=<optimized out>) at eval.c:407
#92 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451
#93 0x000002aa001cdc2e in Fprogn (body=0x0) at eval.c:465
#94 funcall_lambda (fun=0x3fff380e0e3, fun@entry=0x3fff380e0d3, nargs=nargs@entry=0, arg_vector=arg_vector@entry=0x3ffffffdf88) at eval.c:3305
#95 0x000002aa001cdf10 in apply_lambda (fun=fun@entry=0x3fff380e0d3, args=<optimized out>, count=4398046502696, count@entry=4) at eval.c:3172
#96 0x000002aa001cc9d0 in eval_sub (form=form@entry=0x3fff3f3ef1b) at eval.c:2575
#97 0x000002aa001cee52 in Feval (form=0x3fff3f3ef1b, lexical=<optimized out>) at eval.c:2327
#98 0x000002aa001c8fb6 in internal_condition_case (bfun=bfun@entry=0x2aa00142860 <top_level_2>, handlers=handlers@entry=0x90, hfun=hfun@entry=0x2aa00148ca8 <cmd_error>) at eval.c:1450
#99 0x000002aa001435d2 in top_level_1 (ignore=ignore@entry=0x0) at keyboard.c:1150
#100 0x000002aa001c8ed4 in internal_catch (tag=tag@entry=0xe850, func=func@entry=0x2aa001435a0 <top_level_1>, arg=arg@entry=0x0) at eval.c:1181
#101 0x000002aa001427e0 in command_loop () at keyboard.c:1110
#102 0x000002aa001487bc in recursive_edit_1 () at keyboard.c:720
#103 0x000002aa00148bcc in Frecursive_edit () at keyboard.c:803
#104 0x000002aa00051d7a in main (argc=<optimized out>, argv=0x3ffffffea28) at emacs.c:2358

[-- Attachment #5: Type: text/plain, Size: 205 bytes --]


Thanks
-- 
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4

^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x
  2022-09-14 20:19   ` Rob Browning
@ 2022-09-14 20:21     ` Rob Browning
  2022-09-16  6:04       ` Gerd Möllmann
  2022-09-15  7:10     ` Eli Zaretskii
  1 sibling, 1 reply; 21+ messages in thread
From: Rob Browning @ 2022-09-14 20:21 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 57789

Rob Browning <rlb@defaultvalue.org> writes:

> Starting from scratch with the emacs-28.1 commit I can reproduce the
> failure when building via

Oops, meant the emacs-28.2 commit for all of that testing.

-- 
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4





^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x
  2022-09-14 20:19   ` Rob Browning
  2022-09-14 20:21     ` Rob Browning
@ 2022-09-15  7:10     ` Eli Zaretskii
  2022-09-15 14:51       ` Paul Eggert via Bug reports for GNU Emacs, the Swiss army knife of text editors
                         ` (2 more replies)
  1 sibling, 3 replies; 21+ messages in thread
From: Eli Zaretskii @ 2022-09-15  7:10 UTC (permalink / raw)
  To: Rob Browning, Andrea Corallo, Paul Eggert; +Cc: 57789

> From: Rob Browning <rlb@defaultvalue.org>
> Cc: 57789@debbugs.gnu.org
> Date: Wed, 14 Sep 2022 15:19:24 -0500
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Please run the crashing command under GDB, and when it segfaults,
> > produce the C-level and Lisp-level backtrace, and post them here.
> 
> Starting from scratch with the emacs-28.1 commit I can reproduce the
> failure when building via
> 
>   ./configure --prefix=/home/rlb/opt/emacs-tmp --with-native-compilation
> 
> It crashes with the same segfault repeatably, i.e. if you run make
> again, it crashes again on the previously mentioned "... -l comp -f
> batch-byte+native-compile international/titdic-cnv.el" invocation.  That
> crash output is attached below.
> 
> After adjusting the Makefile.in invocation so I could run it with gdb in
> exactly the same environment once it's failing on that command, I
> captured the backtrace and included it below.

Thanks.  The backtrace indicates that the crash is in GC.  This
probably means we have some fundamental problem on that architecture.
Andrea, any advice for how to investigate?

Does the build of the same code with the same options sans
"--with-native-compilation" succeed, or does it also crash with
similar symptoms?  If the build without native-compilation succeeds,
my first question would be how mature and stable is libgccjit on that
platform?  Perhaps take this up with the GCC's libgccjit developers.

> With respect to the Lisp-level backtrace, I imagined you probably meant
> an xbacktrace?  If so (and assuming I'm guessing right about how I
> should do that), I haven't figured out how to arrange sourcing the
> src/.gdbinit from the src/Makefile.in command.

You can source it manually from the GDB prompt, when the segfault
happens, and then invoke xbacktrace manually, can't you?

> It looked like it might be because there were no debug symbols, so I
> tried adding a CFLAGS=-g3 to the end of the ./configure, but that caused
> the crash to disappear entirely.

Too bad, it means we have a heisenbug on our hands, which will make it
even harder to debug (as if debugging crashes in GC were not hard
enough already).

What happens if you modify this variable:

  (defcustom native-comp-debug (if (eq 'windows-nt system-type) 1 0)

to have the value 1 or even zero, and then rebuild from scratch? does
the build succeed then?

> Finally (and this was just a random guess based on previous experiences,
> particularly with programs like guile that play (normal, traditional)
> tricks with pointers/coercions/etc.) I noticed that emacs doesn't
> specify -fno-strict-aliasing, and unless all the C code has been written
> with that in mind, I assume that might open a window allowing the
> optimizer to introduce undesirable changes.  So I added a
> CFLAGS=-fno-strict-aliasing to the end of the ./configure command, and
> then the build and tests worked fine (twice in a row):
> 
>   ./configure --prefix=/home/rlb/opt/emacs-tmp --with-native-compilation \
>     CFLAGS=-fno-strict-aliasing
> 
> Of course that's not remotely conclusive, but if all of the C code
> wasn't written with strict-aliasing in mind, then I wondered if it might
> make sense to consider adding -fno-strict-aliasing as a default option.

I don't know enough about this.  Perhaps Andrea or Paul could comment.

> Also, even if that ends up being desirable, I'm not sure it'll be
> sufficient.  That is, I suspect I might want to run the full build/check
> with -fno-strict-aliasing in a loop for a bit to make sure the clean
> build/check is reliable, since I think I may have seen some test crashes
> (not the build crash) on one earlier run with that option, but I'm not
> sure that was a clean attempt.

Yes, running the full test suite would be the logical next step.

> Program received signal SIGSEGV, Segmentation fault.
> mark_object (arg=<optimized out>) at alloc.c:6809
> 6809            if (symbol_marked_p (ptr))
> (gdb) backtrace
> #0  mark_object (arg=<optimized out>) at alloc.c:6809

Any idea what cause SIGSEGV here?  Was 'ptr' an invalid pointer for
some reason, and if so, what exactly makes it invalid?

Thanks.





^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x
  2022-09-15  7:10     ` Eli Zaretskii
@ 2022-09-15 14:51       ` Paul Eggert via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-09-15 16:26         ` Rob Browning
  2022-09-16  8:43         ` Andrea Corallo
  2022-09-16  8:39       ` Andrea Corallo
  2022-09-17 21:00       ` Rob Browning
  2 siblings, 2 replies; 21+ messages in thread
From: Paul Eggert via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-09-15 14:51 UTC (permalink / raw)
  To: Eli Zaretskii, Rob Browning, Andrea Corallo; +Cc: 57789

On 9/15/22 02:10, Eli Zaretskii wrote:
>> Of course that's not remotely conclusive, but if all of the C code
>> wasn't written with strict-aliasing in mind, then I wondered if it might
>> make sense to consider adding -fno-strict-aliasing as a default option.
> I don't know enough about this.  Perhaps Andrea or Paul could comment.
>
Throwing -fno-strict-aliasing in the mix is a bit like throwing -O1 into 
the mix. I'm not surprised it would cause a Heisenbug to vanish; it 
doesn't mean strict aliasing is the problem.

Emacs should work with strict aliasing. At least, that's true in the 
default build. I suppose it could be possible there's a strict aliasing 
bug in the native compiler - I'm not that familiar with that code.






^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x
  2022-09-15 14:51       ` Paul Eggert via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-09-15 16:26         ` Rob Browning
  2022-09-16  8:43         ` Andrea Corallo
  1 sibling, 0 replies; 21+ messages in thread
From: Rob Browning @ 2022-09-15 16:26 UTC (permalink / raw)
  To: Paul Eggert, Eli Zaretskii, Andrea Corallo; +Cc: 57789

Paul Eggert <eggert@cs.ucla.edu> writes:

> Throwing -fno-strict-aliasing in the mix is a bit like throwing -O1 into 
> the mix. I'm not surprised it would cause a Heisenbug to vanish; it 
> doesn't mean strict aliasing is the problem.

Agreed.

Thanks
-- 
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4





^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x
  2022-09-14 20:21     ` Rob Browning
@ 2022-09-16  6:04       ` Gerd Möllmann
  2022-09-17 21:04         ` Rob Browning
  0 siblings, 1 reply; 21+ messages in thread
From: Gerd Möllmann @ 2022-09-16  6:04 UTC (permalink / raw)
  To: Rob Browning; +Cc: 57789, Eli Zaretskii

Rob Browning <rlb@defaultvalue.org> writes:

> Rob Browning <rlb@defaultvalue.org> writes:
>
>> Starting from scratch with the emacs-28.1 commit I can reproduce the
>> failure when building via
>
> Oops, meant the emacs-28.2 commit for all of that testing.

Looking at Rob's backtrace, 

#0  mark_object (arg=<optimized out>) at alloc.c:6809
#1  0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa00ac54a8) at alloc.c:6607
#2  mark_vectorlike (header=0x2aa00ac54a0) at alloc.c:6382
#3  0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa007f4ca8) at alloc.c:6607
#4  mark_vectorlike (header=0x2aa007f4ca0) at alloc.c:6382
#5  0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa007c3b10) at alloc.c:6607
#6  mark_vectorlike (header=0x2aa007c3b08) at alloc.c:6382
#7  0x000002aa001a7c3e in visit_static_gc_roots (visitor=...) at alloc.c:5926

and seeing frame#7, would it be a way forward to determine which
staticpro (I assume it is a staticpro) that is?  Maybe that can give a
clue which one can then use together with a bisect, perhaps?

WDYT?





^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x
  2022-09-15  7:10     ` Eli Zaretskii
  2022-09-15 14:51       ` Paul Eggert via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-09-16  8:39       ` Andrea Corallo
  2022-09-17 21:00       ` Rob Browning
  2 siblings, 0 replies; 21+ messages in thread
From: Andrea Corallo @ 2022-09-16  8:39 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 57789, Paul Eggert, Rob Browning

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Rob Browning <rlb@defaultvalue.org>
>> Cc: 57789@debbugs.gnu.org
>> Date: Wed, 14 Sep 2022 15:19:24 -0500
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> > Please run the crashing command under GDB, and when it segfaults,
>> > produce the C-level and Lisp-level backtrace, and post them here.
>> 
>> Starting from scratch with the emacs-28.1 commit I can reproduce the
>> failure when building via
>> 
>>   ./configure --prefix=/home/rlb/opt/emacs-tmp --with-native-compilation
>> 
>> It crashes with the same segfault repeatably, i.e. if you run make
>> again, it crashes again on the previously mentioned "... -l comp -f
>> batch-byte+native-compile international/titdic-cnv.el" invocation.  That
>> crash output is attached below.
>> 
>> After adjusting the Makefile.in invocation so I could run it with gdb in
>> exactly the same environment once it's failing on that command, I
>> captured the backtrace and included it below.
>
> Thanks.  The backtrace indicates that the crash is in GC.  This
> probably means we have some fundamental problem on that architecture.
> Andrea, any advice for how to investigate?

Mmmh one cheap way to maybe gather more info is to have a run under
valgrind.

Other than that I typically start debugging with GDB and possibly
rr. Like what is (or was) the object the GC is crashing on?  Why?
What's the last piece of code that touched it? Why?  IIUC here we have
no debug symbols so this makes it very difficult.

BTW the fact that -g has an impact on the crash is very odd

  Andrea






^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x
  2022-09-15 14:51       ` Paul Eggert via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-09-15 16:26         ` Rob Browning
@ 2022-09-16  8:43         ` Andrea Corallo
  1 sibling, 0 replies; 21+ messages in thread
From: Andrea Corallo @ 2022-09-16  8:43 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 57789, Eli Zaretskii, Rob Browning

Paul Eggert <eggert@cs.ucla.edu> writes:

> On 9/15/22 02:10, Eli Zaretskii wrote:
>>> Of course that's not remotely conclusive, but if all of the C code
>>> wasn't written with strict-aliasing in mind, then I wondered if it might
>>> make sense to consider adding -fno-strict-aliasing as a default option.
>> I don't know enough about this.  Perhaps Andrea or Paul could comment.
>>
> Throwing -fno-strict-aliasing in the mix is a bit like throwing -O1
> into the mix. I'm not surprised it would cause a Heisenbug to vanish;
> it doesn't mean strict aliasing is the problem.

Hi Paul,

totally agree with you.  The fact that even -g has an impact here
clearly shows that initial conditions are not necessary directly
connected with the final symptom we observe.

Best Regards

  Andrea





^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x
  2022-09-15  7:10     ` Eli Zaretskii
  2022-09-15 14:51       ` Paul Eggert via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-09-16  8:39       ` Andrea Corallo
@ 2022-09-17 21:00       ` Rob Browning
  2 siblings, 0 replies; 21+ messages in thread
From: Rob Browning @ 2022-09-17 21:00 UTC (permalink / raw)
  To: Eli Zaretskii, Andrea Corallo, Paul Eggert; +Cc: 57789


Eli Zaretskii <eliz@gnu.org> writes:

> Rob Browning <rlb@defaultvalue.org> writes:

> Does the build of the same code with the same options sans
> "--with-native-compilation" succeed, or does it also crash with
> similar symptoms?

Works fine.

> You can source it manually from the GDB prompt, when the segfault
> happens, and then invoke xbacktrace manually, can't you?

Yep.

  Breakpoint 1 at 0x2aa0004ef30: file emacs.c, line 400.
  Breakpoint 2 at 0x2aa0010f168: file xterm.c, line 10291.
  (gdb) xbacktrace
  "Automatic GC" (0x0)
  "internal-macroexpand-for-load" (0xffffa6a8)
  "eval-buffer" (0xffffaa28)
  "let" (0xffffac10)
  "let" (0xffffae28)
  "unwind-protect" (0xffffaff0)
  "let" (0xffffb1f8)
  "if" (0xffffb3c8)
  "load-with-code-conversion" (0xffffb650)
  "time-since" (0xffffbba8)
  "comp--native-compile" (0xffffbd38)
  "batch-native-compile" (0xffffbef0)
  "batch-byte+native-compile" (0xffffc080)
  "funcall" (0xffffc078)
  "if" (0xffffc268)
  "cond" (0xffffc438)
  "let*" (0xffffc618)
  "while" (0xffffc7e8)
  "let*" (0xffffc9c8)
  "progn" (0xffffcb98)
  "if" (0xffffccc0)
  "let" (0xffffceb8)
  "let" (0xffffd0b0)
  "command-line-1" (0xffffd280)
  "let" (0xffffd570)
  "command-line" (0xffffd740)
  "unwind-protect" (0xffffd9f0)
  "let" (0xffffdbe8)
  "if" (0xffffddb8)
  "normal-top-level" (0xffffdf88)

> Too bad, it means we have a heisenbug on our hands, which will make it
> even harder to debug (as if debugging crashes in GC were not hard
> enough already).
>
> What happens if you modify this variable:
>
>   (defcustom native-comp-debug (if (eq 'windows-nt system-type) 1 0)
>
> to have the value 1 or even zero, and then rebuild from scratch? does
> the build succeed then?

No, appears to crash in the same way.

> Yes, running the full test suite would be the logical next step.

Oh, I had run it, I just meant that I'd likely want to double-check via
testing in a loop to try to see if it might be an intermittent failure.

Thanks
-- 
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4





^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x
  2022-09-16  6:04       ` Gerd Möllmann
@ 2022-09-17 21:04         ` Rob Browning
  2022-09-18  5:22           ` Gerd Möllmann
  2022-09-18  5:33           ` Eli Zaretskii
  0 siblings, 2 replies; 21+ messages in thread
From: Rob Browning @ 2022-09-17 21:04 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: 57789, Eli Zaretskii

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Looking at Rob's backtrace, 
>
> #0  mark_object (arg=<optimized out>) at alloc.c:6809
> #1  0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa00ac54a8) at alloc.c:6607
> #2  mark_vectorlike (header=0x2aa00ac54a0) at alloc.c:6382
> #3  0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa007f4ca8) at alloc.c:6607
> #4  mark_vectorlike (header=0x2aa007f4ca0) at alloc.c:6382
> #5  0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa007c3b10) at alloc.c:6607
> #6  mark_vectorlike (header=0x2aa007c3b08) at alloc.c:6382
> #7  0x000002aa001a7c3e in visit_static_gc_roots (visitor=...) at alloc.c:5926
>
> and seeing frame#7, would it be a way forward to determine which
> staticpro (I assume it is a staticpro) that is?  Maybe that can give a
> clue which one can then use together with a bisect, perhaps?

Not completely sure I followed, but moving up to that frame and printing
visitor didn't work: "optimized out".

-- 
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4





^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x
  2022-09-17 21:04         ` Rob Browning
@ 2022-09-18  5:22           ` Gerd Möllmann
  2022-09-18  5:49             ` Eli Zaretskii
  2022-09-18  5:33           ` Eli Zaretskii
  1 sibling, 1 reply; 21+ messages in thread
From: Gerd Möllmann @ 2022-09-18  5:22 UTC (permalink / raw)
  To: Rob Browning; +Cc: 57789, Eli Zaretskii

Rob Browning <rlb@defaultvalue.org> writes:

> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>
>> Looking at Rob's backtrace, 
>>
>> #0  mark_object (arg=<optimized out>) at alloc.c:6809
>> #1  0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa00ac54a8) at alloc.c:6607
>> #2  mark_vectorlike (header=0x2aa00ac54a0) at alloc.c:6382
>> #3  0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa007f4ca8) at alloc.c:6607
>> #4  mark_vectorlike (header=0x2aa007f4ca0) at alloc.c:6382
>> #5  0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa007c3b10) at alloc.c:6607
>> #6  mark_vectorlike (header=0x2aa007c3b08) at alloc.c:6382
>> #7  0x000002aa001a7c3e in visit_static_gc_roots (visitor=...) at alloc.c:5926
>>
>> and seeing frame#7, would it be a way forward to determine which
>> staticpro (I assume it is a staticpro) that is?  Maybe that can give a
>> clue which one can then use together with a bisect, perhaps?
>
> Not completely sure I followed, but moving up to that frame and printing
> visitor didn't work: "optimized out".

Sorry, I thought another Emacs developer would chime in, when I wrote
that.

Let me try to explain what I'm after.  Frame#7, the call to
visit_static_gc_roots shows that we are at the very beginning of a GC,
recursively marking everything that we know must survice the GC.

void
visit_static_gc_roots (struct gc_root_visitor visitor)
{
  visit_buffer_root (visitor,
                     &buffer_defaults,
                     GC_ROOT_BUFFER_LOCAL_DEFAULT);
  visit_buffer_root (visitor,
                     &buffer_local_symbols,
                     GC_ROOT_BUFFER_LOCAL_NAME);

  for (int i = 0; i < ARRAYELTS (lispsym); i++)
    {
      Lisp_Object sptr = builtin_lisp_symbol (i);
      visitor.visit (&sptr, GC_ROOT_C_SYMBOL, visitor.data);
    }

  for (int i = 0; i < staticidx; i++)
    visitor.visit (staticvec[i], GC_ROOT_STATICPRO, visitor.data);
}

First interesting thing would be where in this function we are when the
crash happens.  I was assuming it is somewhere in the last for-loop, for
reasons, but that doesn't have to be the case.

If I'm right, we are currently in the process of marking Lisp objects
referenced from C variables that are known to contains Lisp objects.
Such variables are added to staticvec with a call to staticpro.  That's
what the staticpro in my last mail menat.

But let's first see where in visit_... we are.







^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x
  2022-09-17 21:04         ` Rob Browning
  2022-09-18  5:22           ` Gerd Möllmann
@ 2022-09-18  5:33           ` Eli Zaretskii
  2022-09-24 21:06             ` Rob Browning
  1 sibling, 1 reply; 21+ messages in thread
From: Eli Zaretskii @ 2022-09-18  5:33 UTC (permalink / raw)
  To: Rob Browning; +Cc: gerd.moellmann, 57789

> From: Rob Browning <rlb@defaultvalue.org>
> Cc: Eli Zaretskii <eliz@gnu.org>, 57789@debbugs.gnu.org
> Date: Sat, 17 Sep 2022 16:04:31 -0500
> 
> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
> 
> > Looking at Rob's backtrace, 
> >
> > #0  mark_object (arg=<optimized out>) at alloc.c:6809
> > #1  0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa00ac54a8) at alloc.c:6607
> > #2  mark_vectorlike (header=0x2aa00ac54a0) at alloc.c:6382
> > #3  0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa007f4ca8) at alloc.c:6607
> > #4  mark_vectorlike (header=0x2aa007f4ca0) at alloc.c:6382
> > #5  0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa007c3b10) at alloc.c:6607
> > #6  mark_vectorlike (header=0x2aa007c3b08) at alloc.c:6382
> > #7  0x000002aa001a7c3e in visit_static_gc_roots (visitor=...) at alloc.c:5926
> >
> > and seeing frame#7, would it be a way forward to determine which
> > staticpro (I assume it is a staticpro) that is?  Maybe that can give a
> > clue which one can then use together with a bisect, perhaps?
> 
> Not completely sure I followed, but moving up to that frame and printing
> visitor didn't work: "optimized out".

The code where this happens is this:

  for (int i = 0; i < staticidx; i++)
    visitor.visit (staticvec[i], GC_ROOT_STATICPRO, visitor.data);

So one way of knowing which staticpro is being handled here is to see
what is the value of 'i' and look at staticvec[i].  I'm guessing that
'i' is also "optimized out", though, so 2 possible ways forward:

  . disassemble visit_static_gc_roots, find in which register or where
    on the stack or in memory is 'i; or staticvec[i] stored, and go
    from there; or
  . add a printf to the above loop to show the value of 'i', and
    re-run the build, fingers crossed, hoping that the additional
    printf won't make the crash go away.

Once you know which staticpro is being processed here, we'd need to
examine its contents and try to figure out which parts cause the crash
in GC.

Thanks.





^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x
  2022-09-18  5:22           ` Gerd Möllmann
@ 2022-09-18  5:49             ` Eli Zaretskii
  2022-09-18  5:55               ` Gerd Möllmann
  0 siblings, 1 reply; 21+ messages in thread
From: Eli Zaretskii @ 2022-09-18  5:49 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: 57789, rlb

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: Eli Zaretskii <eliz@gnu.org>,  57789@debbugs.gnu.org
> Date: Sun, 18 Sep 2022 07:22:45 +0200
> 
> But let's first see where in visit_... we are.

I think the backtrace tells that, if you look at the sources from the
emacs-28 branch.  See my other message.





^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x
  2022-09-18  5:49             ` Eli Zaretskii
@ 2022-09-18  5:55               ` Gerd Möllmann
  0 siblings, 0 replies; 21+ messages in thread
From: Gerd Möllmann @ 2022-09-18  5:55 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 57789, rlb

On 22-09-18 7:49 , Eli Zaretskii wrote:
>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: Eli Zaretskii <eliz@gnu.org>,  57789@debbugs.gnu.org
>> Date: Sun, 18 Sep 2022 07:22:45 +0200
>>
>> But let's first see where in visit_... we are.
> 
> I think the backtrace tells that, if you look at the sources from the
> emacs-28 branch.  See my other message.

Ah, right, visit_buffer_root.  EINSUFFICIENTCOFFEE.





^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x
  2022-09-18  5:33           ` Eli Zaretskii
@ 2022-09-24 21:06             ` Rob Browning
  2023-06-07 21:15               ` Andrea Corallo
  0 siblings, 1 reply; 21+ messages in thread
From: Rob Browning @ 2022-09-24 21:06 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gerd.moellmann, 57789

Eli Zaretskii <eliz@gnu.org> writes:

> Once you know which staticpro is being processed here, we'd need to
> examine its contents and try to figure out which parts cause the crash
> in GC.

Thanks, and I'll try to look in to this further when I have time.  For
now I'm changing the debian packages to avoid native compilation on some
architectures (currently mips64el[1] and s390x).

[1] There ./configure fails at the moment with "Error: -march=mips1 is
    not compatible with the selected ABI" when testing libgccjit.
    That's on eller.debian.org (mipsel host in a mips64el schroot).

-- 
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4





^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x
  2022-09-24 21:06             ` Rob Browning
@ 2023-06-07 21:15               ` Andrea Corallo
  2023-09-11 18:08                 ` Stefan Kangas
  0 siblings, 1 reply; 21+ messages in thread
From: Andrea Corallo @ 2023-06-07 21:15 UTC (permalink / raw)
  To: Rob Browning; +Cc: gerd.moellmann, 57789, Eli Zaretskii

Rob Browning <rlb@defaultvalue.org> writes:

> Eli Zaretskii <eliz@gnu.org> writes:
>
>> Once you know which staticpro is being processed here, we'd need to
>> examine its contents and try to figure out which parts cause the crash
>> in GC.
>
> Thanks, and I'll try to look in to this further when I have time.  For
> now I'm changing the debian packages to avoid native compilation on some
> architectures (currently mips64el[1] and s390x).
>
> [1] There ./configure fails at the moment with "Error: -march=mips1 is
>     not compatible with the selected ABI" when testing libgccjit.
>     That's on eller.debian.org (mipsel host in a mips64el schroot).

Hi Rob,

any progress with this investigation?  Is the bug still reproducible
with a recent codebase?

Thanks

  Andrea





^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x
  2023-06-07 21:15               ` Andrea Corallo
@ 2023-09-11 18:08                 ` Stefan Kangas
  0 siblings, 0 replies; 21+ messages in thread
From: Stefan Kangas @ 2023-09-11 18:08 UTC (permalink / raw)
  To: Andrea Corallo, Rob Browning; +Cc: gerd.moellmann, 57789, Eli Zaretskii

tags 57789 + moreinfo
thanks

Andrea Corallo <acorallo@gnu.org> writes:

> Rob Browning <rlb@defaultvalue.org> writes:
>
>> Eli Zaretskii <eliz@gnu.org> writes:
>>
>>> Once you know which staticpro is being processed here, we'd need to
>>> examine its contents and try to figure out which parts cause the crash
>>> in GC.
>>
>> Thanks, and I'll try to look in to this further when I have time.  For
>> now I'm changing the debian packages to avoid native compilation on some
>> architectures (currently mips64el[1] and s390x).
>>
>> [1] There ./configure fails at the moment with "Error: -march=mips1 is
>>     not compatible with the selected ABI" when testing libgccjit.
>>     That's on eller.debian.org (mipsel host in a mips64el schroot).
>
> Hi Rob,
>
> any progress with this investigation?  Is the bug still reproducible
> with a recent codebase?

Ping.  Rob, any updates here?





^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2023-09-11 18:08 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-14  1:04 bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x Rob Browning
2022-09-14  2:42 ` Eli Zaretskii
2022-09-14  3:06   ` Rob Browning
2022-09-14  3:20     ` Rob Browning
2022-09-14 20:19   ` Rob Browning
2022-09-14 20:21     ` Rob Browning
2022-09-16  6:04       ` Gerd Möllmann
2022-09-17 21:04         ` Rob Browning
2022-09-18  5:22           ` Gerd Möllmann
2022-09-18  5:49             ` Eli Zaretskii
2022-09-18  5:55               ` Gerd Möllmann
2022-09-18  5:33           ` Eli Zaretskii
2022-09-24 21:06             ` Rob Browning
2023-06-07 21:15               ` Andrea Corallo
2023-09-11 18:08                 ` Stefan Kangas
2022-09-15  7:10     ` Eli Zaretskii
2022-09-15 14:51       ` Paul Eggert via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-09-15 16:26         ` Rob Browning
2022-09-16  8:43         ` Andrea Corallo
2022-09-16  8:39       ` Andrea Corallo
2022-09-17 21:00       ` Rob Browning

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).