unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#42832: 28.0.50; "Bus error" when compiling Emacs now on Debian bullseye
@ 2020-08-12 17:12 Lars Ingebrigtsen
  2020-08-12 18:22 ` Lars Ingebrigtsen
  2020-08-12 19:26 ` Andreas Schwab
  0 siblings, 2 replies; 31+ messages in thread
From: Lars Ingebrigtsen @ 2020-08-12 17:12 UTC (permalink / raw)
  To: 42832


I'm getting this on one of my machines:

/bin/bash: line 1: 2759815 Bus error               EMACSLOADPATH= '../src/emacs' -batch --no-site-file --no-site-lisp --eval '(setq load-prefer-newer t)' -f batch-byte-compile cedet/semantic/bovine/c-by.el
make[3]: *** [Makefile:295: cedet/semantic/bovine/c-by.elc] Error 135
make[3]: *** Waiting for unfinished jobs....
make[2]: *** [Makefile:318: compile-main] Error 2
make[1]: *** [Makefile:411: lisp] Error 2
make: *** [Makefile:1126: bootstrap] Error 2

It's reproducible in that I always get this when I say "make", but if I
instead say

./src/emacs -batch --no-site-file --no-site-lisp --eval '(setq load-prefer-newer t)' -f batch-byte-compile cedet/semantic/bovine/c-by.el

everything works as it should, and it makes the .elc file.

So I'm not sure how to debug this...

On my laptop (which is also Debian bullseye), I'm not seeing any problems.


In GNU Emacs 28.0.50 (build 51, x86_64-pc-linux-gnu, GTK+ Version 3.24.20, cairo version 1.16.0)
 of 2020-08-09 built on xo
Repository revision: 1a845a672dc73c8e98e6cb9bb734616e168e60ba
Repository branch: master
Windowing system distributor 'The X.Org Foundation', version 11.0.12008000
System Description: Debian GNU/Linux bullseye/sid


-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no






^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#42832: 28.0.50; "Bus error" when compiling Emacs now on Debian bullseye
  2020-08-12 17:12 bug#42832: 28.0.50; "Bus error" when compiling Emacs now on Debian bullseye Lars Ingebrigtsen
@ 2020-08-12 18:22 ` Lars Ingebrigtsen
  2020-08-12 18:30   ` Lars Ingebrigtsen
  2020-08-12 18:34   ` Eli Zaretskii
  2020-08-12 19:26 ` Andreas Schwab
  1 sibling, 2 replies; 31+ messages in thread
From: Lars Ingebrigtsen @ 2020-08-12 18:22 UTC (permalink / raw)
  To: 42832

Additional data point:

It's totally repeatable with "make -j2" and up, but with single-threaded
compilation, everything works fine.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no






^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#42832: 28.0.50; "Bus error" when compiling Emacs now on Debian bullseye
  2020-08-12 18:22 ` Lars Ingebrigtsen
@ 2020-08-12 18:30   ` Lars Ingebrigtsen
  2020-08-12 18:50     ` Eli Zaretskii
  2020-08-12 18:34   ` Eli Zaretskii
  1 sibling, 1 reply; 31+ messages in thread
From: Lars Ingebrigtsen @ 2020-08-12 18:30 UTC (permalink / raw)
  To: 42832

I got a core dump, and gdb says it starts with:

#0  raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x0000564c625c0ad5 in terminate_due_to_signal
    (sig=sig@entry=7, backtrace_limit=backtrace_limit@entry=40) at emacs.c:408
#2  0x0000564c625c0f6b in handle_fatal_signal (sig=sig@entry=7)
    at sysdep.c:1782
#3  0x0000564c626bbd9d in deliver_thread_signal
    (sig=7, handler=0x564c625c0f60 <handle_fatal_signal>) at sysdep.c:1756
#4  0x0000564c626bbe89 in deliver_fatal_thread_signal (sig=<optimized out>)
    at sysdep.c:1794
#5  0x00007f13103bc140 in <signal handler called> ()
    at /lib/x86_64-linux-gnu/libpthread.so.0
#6  vector_marked_p (v=0xc000000018000000) at alloc.c:3859
#7  mark_object (arg=<optimized out>) at alloc.c:6607
#8  0x0000564c626ffd7e in mark_vectorlike (header=0x564c63246f10)
    at alloc.c:6280
#9  0x0000564c626ffd7e in mark_vectorlike (header=header@entry=0x7f130c63a1a8)
    at alloc.c:6280
#10 0x0000564c626ff68c in mark_hash_table (ptr=0x7f130c63a1a8) at alloc.c:6651
#11 mark_object (arg=<optimized out>) at alloc.c:6651
#12 0x0000564c626ffce7 in mark_memory (end=<optimized out>, 
    end@entry=0x7ffc912d7c20, start=<optimized out>) at alloc.c:4842
#13 mark_stack (bottom=<optimized out>, end=end@entry=0x7ffc912b1840 "")
    at alloc.c:5039
#14 0x0000564c62782e61 in mark_one_thread (thread=0x564c62b50460 <main_thread>)
    at thread.c:630
#15 mark_threads_callback (ignore=<optimized out>) at thread.c:661
#16 0x0000564c627006b7 in garbage_collect () at alloc.c:6068
#17 0x0000564c62700f91 in maybe_garbage_collect () at alloc.c:5975
#18 0x0000564c6271d1d5 in maybe_gc () at lisp.h:5053
#19 Ffuncall (nargs=3, args=args@entry=0x7ffc912b1980) at eval.c:2779
#20 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#21 0x0000564c6271d157 in Ffuncall (nargs=2, args=args@entry=0x7ffc912b23c0)
    at eval.c:2809
#22 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#23 0x0000564c6271d157 in Ffuncall (nargs=3, args=args@entry=0x7ffc912b26d0)
    at eval.c:2809
#24 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#25 0x0000564c6271d157 in Ffuncall (nargs=3, args=args@entry=0x7ffc912b2a38)
    at eval.c:2809
#26 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#27 0x0000564c6271d157 in Ffuncall (nargs=2, args=args@entry=0x7ffc912b3450)
    at eval.c:2809
#28 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#29 0x0000564c6271d157 in Ffuncall (nargs=3, args=args@entry=0x7ffc912b3760)
    at eval.c:2809
#30 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#31 0x0000564c6271d157 in Ffuncall (nargs=3, args=args@entry=0x7ffc912b3ac8)
    at eval.c:2809
#32 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#33 0x0000564c6271d157 in Ffuncall (nargs=2, args=args@entry=0x7ffc912b44e0)
    at eval.c:2809
#34 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#35 0x0000564c6271d157 in Ffuncall (nargs=3, args=args@entry=0x7ffc912b47f0)
    at eval.c:2809
#36 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#37 0x0000564c6271d157 in Ffuncall (nargs=3, args=args@entry=0x7ffc912b4b58)
    at eval.c:2809
#38 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#39 0x0000564c6271d157 in Ffuncall (nargs=2, args=args@entry=0x7ffc912b5570)
    at eval.c:2809
#40 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, #maxdepth=<optimized out>,
args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#41 0x0000564c6271d157 in Ffuncall (nargs=3, args=args@entry=0x7ffc912b5880)
    at eval.c:2809
#42 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#43 0x0000564c6271d157 in Ffuncall (nargs=3, args=args@entry=0x7ffc912b5be8)
    at eval.c:2809
#44 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#45 0x0000564c6271d157 in Ffuncall (nargs=2, args=args@entry=0x7ffc912b6600)
    at eval.c:2809
#46 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#47 0x0000564c6271d157 in Ffuncall (nargs=3, args=args@entry=0x7ffc912b6910)
    at eval.c:2809
#48 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#49 0x0000564c6271d157 in Ffuncall (nargs=3, args=args@entry=0x7ffc912b6c78)
    at eval.c:2809
#50 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#51 0x0000564c6271d157 in Ffuncall (nargs=2, args=args@entry=0x7ffc912b7690)
    at eval.c:2809
#52 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#53 0x0000564c6271d157 in Ffuncall (nargs=3, args=args@entry=0x7ffc912b79a0)
    at eval.c:2809
#54 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#55 0x0000564c6271d157 in Ffuncall (nargs=3, args=args@entry=0x7ffc912b7d08)
    at eval.c:2809
#56 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#57 0x0000564c6271d157 in Ffuncall (nargs=2, args=args@entry=0x7ffc912b8720)
    at eval.c:2809
#58 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#59 0x0000564c6271d157 in Ffuncall (nargs=3, args=args@entry=0x7ffc912b8a30)
    at eval.c:2809
#60 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#61 0x0000564c6271d157 in Ffuncall (nargs=3, args=args@entry=0x7ffc912b8d98)
    at eval.c:2809
#62 0x0000564c62757768 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>)
    at bytecode.c:632
#63 0x0000564c6271d157 in Ffuncall (nargs=2, args=args@entry=0x7ffc912b97b0)
...

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no






^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#42832: 28.0.50; "Bus error" when compiling Emacs now on Debian bullseye
  2020-08-12 18:22 ` Lars Ingebrigtsen
  2020-08-12 18:30   ` Lars Ingebrigtsen
@ 2020-08-12 18:34   ` Eli Zaretskii
  2020-08-12 18:41     ` Lars Ingebrigtsen
  1 sibling, 1 reply; 31+ messages in thread
From: Eli Zaretskii @ 2020-08-12 18:34 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 42832

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Date: Wed, 12 Aug 2020 20:22:13 +0200
> 
> It's totally repeatable with "make -j2" and up, but with single-threaded
> compilation, everything works fine.

Are you sure it isn't a hardware problem on that machine?





^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#42832: 28.0.50; "Bus error" when compiling Emacs now on Debian bullseye
  2020-08-12 18:34   ` Eli Zaretskii
@ 2020-08-12 18:41     ` Lars Ingebrigtsen
  0 siblings, 0 replies; 31+ messages in thread
From: Lars Ingebrigtsen @ 2020-08-12 18:41 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 42832

Eli Zaretskii <eliz@gnu.org> writes:

>> It's totally repeatable with "make -j2" and up, but with single-threaded
>> compilation, everything works fine.
>
> Are you sure it isn't a hardware problem on that machine?

Nope, it could well be.  But there's been no other problems on the
machine, and the problem is so repeatable...

I'll try rebooting it, though, and see whether that has any effect.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#42832: 28.0.50; "Bus error" when compiling Emacs now on Debian bullseye
  2020-08-12 18:30   ` Lars Ingebrigtsen
@ 2020-08-12 18:50     ` Eli Zaretskii
  2020-08-12 18:58       ` Lars Ingebrigtsen
  0 siblings, 1 reply; 31+ messages in thread
From: Eli Zaretskii @ 2020-08-12 18:50 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 42832

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Date: Wed, 12 Aug 2020 20:30:48 +0200
> 
> I got a core dump, and gdb says it starts with:
> 
> #0  raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:50
> #1  0x0000564c625c0ad5 in terminate_due_to_signal
>     (sig=sig@entry=7, backtrace_limit=backtrace_limit@entry=40) at emacs.c:408
> #2  0x0000564c625c0f6b in handle_fatal_signal (sig=sig@entry=7)
>     at sysdep.c:1782
> #3  0x0000564c626bbd9d in deliver_thread_signal
>     (sig=7, handler=0x564c625c0f60 <handle_fatal_signal>) at sysdep.c:1756
> #4  0x0000564c626bbe89 in deliver_fatal_thread_signal (sig=<optimized out>)
>     at sysdep.c:1794
> #5  0x00007f13103bc140 in <signal handler called> ()
>     at /lib/x86_64-linux-gnu/libpthread.so.0
> #6  vector_marked_p (v=0xc000000018000000) at alloc.c:3859
                       ^^^^^^^^^^^^^^^^^^^^
If you repeat this, do you get the same value in 'v' as above
(assuming it always crashes with the same backtrace)?





^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#42832: 28.0.50; "Bus error" when compiling Emacs now on Debian bullseye
  2020-08-12 18:50     ` Eli Zaretskii
@ 2020-08-12 18:58       ` Lars Ingebrigtsen
  2020-08-12 19:28         ` Lars Ingebrigtsen
  0 siblings, 1 reply; 31+ messages in thread
From: Lars Ingebrigtsen @ 2020-08-12 18:58 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 42832

Eli Zaretskii <eliz@gnu.org> writes:

>> I got a core dump, and gdb says it starts with:
>> 
>> #0  raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:50
>> #1  0x0000564c625c0ad5 in terminate_due_to_signal
>>     (sig=sig@entry=7, backtrace_limit=backtrace_limit@entry=40) at emacs.c:408
>> #2  0x0000564c625c0f6b in handle_fatal_signal (sig=sig@entry=7)
>>     at sysdep.c:1782
>> #3  0x0000564c626bbd9d in deliver_thread_signal
>>     (sig=7, handler=0x564c625c0f60 <handle_fatal_signal>) at sysdep.c:1756
>> #4  0x0000564c626bbe89 in deliver_fatal_thread_signal (sig=<optimized out>)
>>     at sysdep.c:1794
>> #5  0x00007f13103bc140 in <signal handler called> ()
>>     at /lib/x86_64-linux-gnu/libpthread.so.0
>> #6  vector_marked_p (v=0xc000000018000000) at alloc.c:3859
>                        ^^^^^^^^^^^^^^^^^^^^
> If you repeat this, do you get the same value in 'v' as above
> (assuming it always crashes with the same backtrace)?

Yes, I get the same backtrace every time:

#0  raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x0000559941742ad5 in terminate_due_to_signal
    (sig=sig@entry=7, backtrace_limit=backtrace_limit@entry=40) at emacs.c:408
#2  0x0000559941742f6b in handle_fatal_signal (sig=sig@entry=7)
    at sysdep.c:1782
#3  0x000055994183dd9d in deliver_thread_signal
    (sig=7, handler=0x559941742f60 <handle_fatal_signal>) at sysdep.c:1756
#4  0x000055994183de89 in deliver_fatal_thread_signal (sig=<optimized out>)
    at sysdep.c:1794
#5  0x00007f7880ea8140 in <signal handler called> ()
    at /lib/x86_64-linux-gnu/libpthread.so.0
#6  vector_marked_p (v=0xc000000018000000) at alloc.c:3859
#7  mark_object (arg=<optimized out>) at alloc.c:6607
#8  0x0000559941881d7e in mark_vectorlike (header=0x559942facf10)
    at alloc.c:6280
#9  0x0000559941881d7e in mark_vectorlike (header=header@entry=0x7f787d1261a8)
    at alloc.c:6280

I've also now rebooted, and I'm still getting the same "bus error".

I've tried building Emacs with -O0, and I don't get the error then.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#42832: 28.0.50; "Bus error" when compiling Emacs now on Debian bullseye
  2020-08-12 17:12 bug#42832: 28.0.50; "Bus error" when compiling Emacs now on Debian bullseye Lars Ingebrigtsen
  2020-08-12 18:22 ` Lars Ingebrigtsen
@ 2020-08-12 19:26 ` Andreas Schwab
  1 sibling, 0 replies; 31+ messages in thread
From: Andreas Schwab @ 2020-08-12 19:26 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 42832

On Aug 12 2020, Lars Ingebrigtsen wrote:

> I'm getting this on one of my machines:
>
> /bin/bash: line 1: 2759815 Bus error               EMACSLOADPATH= '../src/emacs' -batch --no-site-file --no-site-lisp --eval '(setq load-prefer-newer t)' -f batch-byte-compile cedet/semantic/bovine/c-by.el
> make[3]: *** [Makefile:295: cedet/semantic/bovine/c-by.elc] Error 135
> make[3]: *** Waiting for unfinished jobs....
> make[2]: *** [Makefile:318: compile-main] Error 2
> make[1]: *** [Makefile:411: lisp] Error 2
> make: *** [Makefile:1126: bootstrap] Error 2
>
> It's reproducible in that I always get this when I say "make", but if I
> instead say

A bus error usually means an mmaped file got truncated so that the
mapping now extends beyond the end of the file.  Emacs uses mmap to map
the pdmp file.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."





^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#42832: 28.0.50; "Bus error" when compiling Emacs now on Debian bullseye
  2020-08-12 18:58       ` Lars Ingebrigtsen
@ 2020-08-12 19:28         ` Lars Ingebrigtsen
  2020-08-12 19:33           ` Lars Ingebrigtsen
  0 siblings, 1 reply; 31+ messages in thread
From: Lars Ingebrigtsen @ 2020-08-12 19:28 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 42832, Paul Eggert

I did some bisecting here, and if I 

git checkout 0d0aad213f941efc0fa0ec032e37dc9c2b08c9fb

(i.e., go back to the version of Emacs just before the recent pdumper
hash table stuff), then Emacs builds without this bus error.

So I've added Paul to the Cc's.  :-)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no






^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#42832: 28.0.50; "Bus error" when compiling Emacs now on Debian bullseye
  2020-08-12 19:28         ` Lars Ingebrigtsen
@ 2020-08-12 19:33           ` Lars Ingebrigtsen
  2020-08-12 20:40             ` Paul Eggert
  0 siblings, 1 reply; 31+ messages in thread
From: Lars Ingebrigtsen @ 2020-08-12 19:33 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 42832, Paul Eggert

Lars Ingebrigtsen <larsi@gnus.org> writes:

> I did some bisecting here, and if I 
>
> git checkout 0d0aad213f941efc0fa0ec032e37dc9c2b08c9fb
>
> (i.e., go back to the version of Emacs just before the recent pdumper
> hash table stuff), then Emacs builds without this bus error.

Yup, that checkout works, but 16a16645f524c62f7906036b0e383e4247b58de7
has the bus error.

Which is:

commit 16a16645f524c62f7906036b0e383e4247b58de7
Author:     Pip Cet <pipcet@gmail.com>
AuthorDate: Tue Aug 11 02:16:53 2020 -0700
Commit:     Paul Eggert <eggert@cs.ucla.edu>
CommitDate: Tue Aug 11 02:27:43 2020 -0700

    Rehash hash tables eagerly after loading a dump
    
-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#42832: 28.0.50; "Bus error" when compiling Emacs now on Debian bullseye
  2020-08-12 19:33           ` Lars Ingebrigtsen
@ 2020-08-12 20:40             ` Paul Eggert
  2020-08-12 20:47               ` Lars Ingebrigtsen
  0 siblings, 1 reply; 31+ messages in thread
From: Paul Eggert @ 2020-08-12 20:40 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 42832

[-- Attachment #1: Type: text/plain, Size: 651 bytes --]

On 8/12/20 12:33 PM, Lars Ingebrigtsen wrote:

> Yup, that checkout works, but 16a16645f524c62f7906036b0e383e4247b58de7
> has the bus error.
> 
> commit 16a16645f524c62f7906036b0e383e4247b58de7
> Author:     Pip Cet <pipcet@gmail.com>
> AuthorDate: Tue Aug 11 02:16:53 2020 -0700
> Commit:     Paul Eggert <eggert@cs.ucla.edu>
> CommitDate: Tue Aug 11 02:27:43 2020 -0700
> 
>      Rehash hash tables eagerly after loading a dump
>      
> 

A quick workaround might be to revert that particular commit; could you try the 
attached patch? It passes "make check" for me.

Obviously it'd be better to have a real fix. I've asked Pip Cet to take a look.

[-- Attachment #2: 0001-Revert-2020-08-11T09-16-53Z-pipcet-gmail.com.patch --]
[-- Type: text/x-patch, Size: 18662 bytes --]

From 75e346be0208ec0211f6ae65b90016d4a34e854e Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Wed, 12 Aug 2020 13:35:03 -0700
Subject: [PATCH] Revert 2020-08-11T09:16:53Z!pipcet@gmail.com.

---
 src/bytecode.c  |   1 +
 src/composite.c |   1 +
 src/emacs.c     |   1 -
 src/fns.c       |  61 ++++++++++----
 src/lisp.h      |  21 ++++-
 src/minibuf.c   |   3 +
 src/pdumper.c   | 207 ++++++++++++++++++++++++++----------------------
 src/pdumper.h   |   1 -
 8 files changed, 183 insertions(+), 113 deletions(-)

diff --git a/src/bytecode.c b/src/bytecode.c
index 1c3b6eac0d..1913a4812a 100644
--- a/src/bytecode.c
+++ b/src/bytecode.c
@@ -1401,6 +1401,7 @@ #define DEFINE(name, value) LABEL (name) ,
             Lisp_Object v1 = POP;
             ptrdiff_t i;
             struct Lisp_Hash_Table *h = XHASH_TABLE (jmp_table);
+            hash_rehash_if_needed (h);
 
             /* h->count is a faster approximation for HASH_TABLE_SIZE (h)
                here. */
diff --git a/src/composite.c b/src/composite.c
index ec2b8328f7..f96f0b7772 100644
--- a/src/composite.c
+++ b/src/composite.c
@@ -652,6 +652,7 @@ gstring_lookup_cache (Lisp_Object header)
 composition_gstring_put_cache (Lisp_Object gstring, ptrdiff_t len)
 {
   struct Lisp_Hash_Table *h = XHASH_TABLE (gstring_hash_table);
+  hash_rehash_if_needed (h);
   Lisp_Object header = LGSTRING_HEADER (gstring);
   Lisp_Object hash = h->test.hashfn (header, h);
   if (len < 0)
diff --git a/src/emacs.c b/src/emacs.c
index cb04de4aab..e51a14f656 100644
--- a/src/emacs.c
+++ b/src/emacs.c
@@ -1536,7 +1536,6 @@ main (int argc, char **argv)
   if (!initialized)
     {
       init_alloc_once ();
-      init_pdumper_once ();
       init_obarray_once ();
       init_eval_once ();
       init_charset_once ();
diff --git a/src/fns.c b/src/fns.c
index 9199178212..fa7d5fa3dc 100644
--- a/src/fns.c
+++ b/src/fns.c
@@ -4252,27 +4252,46 @@ maybe_resize_hash_table (struct Lisp_Hash_Table *h)
    the "pdump", because the objects' addresses may have changed, thus
    affecting their hashes.  */
 void
-hash_table_rehash (Lisp_Object hash)
+hash_table_rehash (struct Lisp_Hash_Table *h)
 {
-  struct Lisp_Hash_Table *h = XHASH_TABLE (hash);
-  ptrdiff_t i, count = h->count;
+  ptrdiff_t size = HASH_TABLE_SIZE (h);
+
+  /* These structures may have been purecopied and shared
+     (bug#36447).  */
+  Lisp_Object hash = make_nil_vector (size);
+  h->next = Fcopy_sequence (h->next);
+  h->index = Fcopy_sequence (h->index);
 
   /* Recompute the actual hash codes for each entry in the table.
      Order is still invalid.  */
-  for (i = 0; i < count; i++)
+  for (ptrdiff_t i = 0; i < size; ++i)
     {
       Lisp_Object key = HASH_KEY (h, i);
-      Lisp_Object hash_code = h->test.hashfn (key, h);
-      ptrdiff_t start_of_bucket = XUFIXNUM (hash_code) % ASIZE (h->index);
-      set_hash_hash_slot (h, i, hash_code);
-      set_hash_next_slot (h, i, HASH_INDEX (h, start_of_bucket));
-      set_hash_index_slot (h, start_of_bucket, i);
-      eassert (HASH_NEXT (h, i) != i); /* Stop loops.  */
+      if (!EQ (key, Qunbound))
+        ASET (hash, i, h->test.hashfn (key, h));
     }
 
-  ptrdiff_t size = ASIZE (h->next);
-  for (; i + 1 < size; i++)
-    set_hash_next_slot (h, i, i + 1);
+  /* Reset the index so that any slot we don't fill below is marked
+     invalid.  */
+  Ffillarray (h->index, make_fixnum (-1));
+
+  /* Rebuild the collision chains.  */
+  for (ptrdiff_t i = 0; i < size; ++i)
+    if (!NILP (AREF (hash, i)))
+      {
+        EMACS_UINT hash_code = XUFIXNUM (AREF (hash, i));
+        ptrdiff_t start_of_bucket = hash_code % ASIZE (h->index);
+        set_hash_next_slot (h, i, HASH_INDEX (h, start_of_bucket));
+        set_hash_index_slot (h, start_of_bucket, i);
+        eassert (HASH_NEXT (h, i) != i); /* Stop loops.  */
+      }
+
+  /* Finally, mark the hash table as having a valid hash order.
+     Do this last so that if we're interrupted, we retry on next
+     access. */
+  eassert (hash_rehash_needed_p (h));
+  h->hash = hash;
+  eassert (!hash_rehash_needed_p (h));
 }
 
 /* Lookup KEY in hash table H.  If HASH is non-null, return in *HASH
@@ -4284,6 +4303,8 @@ hash_lookup (struct Lisp_Hash_Table *h, Lisp_Object key, Lisp_Object *hash)
 {
   ptrdiff_t start_of_bucket, i;
 
+  hash_rehash_if_needed (h);
+
   Lisp_Object hash_code = h->test.hashfn (key, h);
   if (hash)
     *hash = hash_code;
@@ -4318,6 +4339,8 @@ hash_put (struct Lisp_Hash_Table *h, Lisp_Object key, Lisp_Object value,
 {
   ptrdiff_t start_of_bucket, i;
 
+  hash_rehash_if_needed (h);
+
   /* Increment count after resizing because resizing may fail.  */
   maybe_resize_hash_table (h);
   h->count++;
@@ -4350,6 +4373,8 @@ hash_remove_from_table (struct Lisp_Hash_Table *h, Lisp_Object key)
   ptrdiff_t start_of_bucket = XUFIXNUM (hash_code) % ASIZE (h->index);
   ptrdiff_t prev = -1;
 
+  hash_rehash_if_needed (h);
+
   for (ptrdiff_t i = HASH_INDEX (h, start_of_bucket);
        0 <= i;
        i = HASH_NEXT (h, i))
@@ -4390,7 +4415,8 @@ hash_clear (struct Lisp_Hash_Table *h)
   if (h->count > 0)
     {
       ptrdiff_t size = HASH_TABLE_SIZE (h);
-      memclear (xvector_contents (h->hash), size * word_size);
+      if (!hash_rehash_needed_p (h))
+	memclear (xvector_contents (h->hash), size * word_size);
       for (ptrdiff_t i = 0; i < size; i++)
 	{
 	  set_hash_next_slot (h, i, i < size - 1 ? i + 1 : -1);
@@ -4426,7 +4452,9 @@ sweep_weak_table (struct Lisp_Hash_Table *h, bool remove_entries_p)
   for (ptrdiff_t bucket = 0; bucket < n; ++bucket)
     {
       /* Follow collision chain, removing entries that don't survive
-         this garbage collection.  */
+         this garbage collection.  It's okay if hash_rehash_needed_p
+         (h) is true, since we're operating entirely on the cached
+         hash values. */
       ptrdiff_t prev = -1;
       ptrdiff_t next;
       for (ptrdiff_t i = HASH_INDEX (h, bucket); 0 <= i; i = next)
@@ -4471,7 +4499,7 @@ sweep_weak_table (struct Lisp_Hash_Table *h, bool remove_entries_p)
                     set_hash_hash_slot (h, i, Qnil);
 
                   eassert (h->count != 0);
-                  h->count--;
+                  h->count += h->count > 0 ? -1 : 1;
                 }
 	      else
 		{
@@ -4895,6 +4923,7 @@ DEFUN ("hash-table-count", Fhash_table_count, Shash_table_count, 1, 1, 0,
   (Lisp_Object table)
 {
   struct Lisp_Hash_Table *h = check_hash_table (table);
+  eassert (h->count >= 0);
   return make_fixnum (h->count);
 }
 
diff --git a/src/lisp.h b/src/lisp.h
index 2962babb4f..c58cd0fa6b 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -2275,7 +2275,11 @@ #define DEFSYM(sym, name) /* empty */
 
 struct Lisp_Hash_Table
 {
-  /* Change pdumper.c if you change the fields here.  */
+  /* Change pdumper.c if you change the fields here.
+
+     IMPORTANT!!!!!!!
+
+     Call hash_rehash_if_needed() before accessing.  */
 
   /* This is for Lisp; the hash table code does not refer to it.  */
   union vectorlike_header header;
@@ -2394,7 +2398,20 @@ HASH_TABLE_SIZE (const struct Lisp_Hash_Table *h)
   return size;
 }
 
-void hash_table_rehash (Lisp_Object);
+void hash_table_rehash (struct Lisp_Hash_Table *h);
+
+INLINE bool
+hash_rehash_needed_p (const struct Lisp_Hash_Table *h)
+{
+  return NILP (h->hash);
+}
+
+INLINE void
+hash_rehash_if_needed (struct Lisp_Hash_Table *h)
+{
+  if (hash_rehash_needed_p (h))
+    hash_table_rehash (h);
+}
 
 /* Default size for hash tables if not specified.  */
 
diff --git a/src/minibuf.c b/src/minibuf.c
index cb302c5a60..9d870ce364 100644
--- a/src/minibuf.c
+++ b/src/minibuf.c
@@ -1212,6 +1212,9 @@ DEFUN ("try-completion", Ftry_completion, Stry_completion, 2, 3, 0,
       bucket = AREF (collection, idx);
     }
 
+  if (HASH_TABLE_P (collection))
+    hash_rehash_if_needed (XHASH_TABLE (collection));
+
   while (1)
     {
       /* Get the next element of the alist, obarray, or hash-table.  */
diff --git a/src/pdumper.c b/src/pdumper.c
index bc41afc7c5..85c8b19949 100644
--- a/src/pdumper.c
+++ b/src/pdumper.c
@@ -95,6 +95,17 @@ #define VM_MS_WINDOWS 2
 # define VM_SUPPORTED 0
 #endif
 
+/* PDUMPER_CHECK_REHASHING being true causes the portable dumper to
+   check, for each hash table it dumps, that the hash table means the
+   same thing after rehashing.  */
+#ifndef PDUMPER_CHECK_REHASHING
+# if ENABLE_CHECKING
+#  define PDUMPER_CHECK_REHASHING 1
+# else
+#  define PDUMPER_CHECK_REHASHING 0
+# endif
+#endif
+
 /* Require an architecture in which pointers, ptrdiff_t and intptr_t
    are the same size and have the same layout, and where bytes have
    eight bits --- that is, a general-purpose computer made after 1990.
@@ -381,9 +392,6 @@ dump_fingerprint (char const *label,
      The start of the cold region is always aligned on a page
      boundary.  */
   dump_off cold_start;
-
-  /* Offset of a vector of the dumped hash tables.  */
-  dump_off hash_list;
 };
 
 /* Double-ended singly linked list.  */
@@ -541,9 +549,6 @@ dump_fingerprint (char const *label,
      heap objects.  */
   Lisp_Object bignum_data;
 
-  /* List of hash tables that have been dumped.  */
-  Lisp_Object hash_tables;
-
   dump_off number_hot_relocations;
   dump_off number_discardable_relocations;
 };
@@ -2591,65 +2596,79 @@ dump_vectorlike_generic (struct dump_context *ctx,
   return offset;
 }
 
-/* Return a vector of KEY, VALUE pairs in the given hash table H.  The
-   first H->count pairs are valid, and the rest are unbound.  */
-static Lisp_Object
-hash_table_contents (struct Lisp_Hash_Table *h)
+/* Determine whether the hash table's hash order is stable
+   across dump and load.  If it is, we don't have to trigger
+   a rehash on access.  */
+static bool
+dump_hash_table_stable_p (const struct Lisp_Hash_Table *hash)
 {
-  if (h->test.hashfn == hashfn_user_defined)
+  if (hash->test.hashfn == hashfn_user_defined)
     error ("cannot dump hash tables with user-defined tests");  /* Bug#36769 */
-
-  ptrdiff_t size = HASH_TABLE_SIZE (h);
-  Lisp_Object key_and_value = make_uninit_vector (2 * size);
-  ptrdiff_t n = 0;
-
-  /* Make sure key_and_value ends up in the same order; charset.c
-     relies on it by expecting hash table indices to stay constant
-     across the dump.  */
-  for (ptrdiff_t i = 0; i < size; i++)
-    if (!NILP (HASH_HASH (h, i)))
-      {
-	ASET (key_and_value, n++, HASH_KEY (h, i));
-	ASET (key_and_value, n++, HASH_VALUE (h, i));
-      }
-
-  while (n < 2 * size)
+  bool is_eql = hash->test.hashfn == hashfn_eql;
+  bool is_equal = hash->test.hashfn == hashfn_equal;
+  ptrdiff_t size = HASH_TABLE_SIZE (hash);
+  for (ptrdiff_t i = 0; i < size; ++i)
     {
-      ASET (key_and_value, n++, Qunbound);
-      ASET (key_and_value, n++, Qnil);
+      Lisp_Object key =  HASH_KEY (hash, i);
+      if (!EQ (key, Qunbound))
+        {
+	  bool key_stable = (dump_builtin_symbol_p (key)
+			     || FIXNUMP (key)
+			     || (is_equal
+			         && (STRINGP (key) || BOOL_VECTOR_P (key)))
+			     || ((is_equal || is_eql)
+			         && (FLOATP (key) || BIGNUMP (key))));
+          if (!key_stable)
+            return false;
+        }
     }
 
-  return key_and_value;
-}
-
-static dump_off
-dump_hash_table_list (struct dump_context *ctx)
-{
-  if (!NILP (ctx->hash_tables))
-    return dump_object (ctx, CALLN (Fapply, Qvector, ctx->hash_tables));
-  else
-    return 0;
+  return true;
 }
 
-static void
-hash_table_freeze (struct Lisp_Hash_Table *h)
+/* Return a list of (KEY . VALUE) pairs in the given hash table.  The
+   first H->count pairs are valid, and the rest are unbound.  */
+static Lisp_Object
+hash_table_contents (Lisp_Object table)
 {
-  ptrdiff_t npairs = ASIZE (h->key_and_value) / 2;
-  h->key_and_value = hash_table_contents (h);
-  h->next = h->hash = make_fixnum (npairs);
-  h->index = make_fixnum (ASIZE (h->index));
-  h->next_free = (npairs == h->count ? -1 : h->count);
+  Lisp_Object contents = Qnil;
+  struct Lisp_Hash_Table *h = XHASH_TABLE (table);
+  for (ptrdiff_t i = 0; i < HASH_TABLE_SIZE (h); ++i)
+    {
+      Lisp_Object key = HASH_KEY (h, i);
+      if (!EQ (key, Qunbound))
+        dump_push (&contents, Fcons (key, HASH_VALUE (h, i)));
+    }
+  return Fnreverse (contents);
 }
 
+/* Copy the given hash table, rehash it, and make sure that we can
+   look up all the values in the original.  */
 static void
-hash_table_thaw (Lisp_Object hash)
-{
-  struct Lisp_Hash_Table *h = XHASH_TABLE (hash);
-  h->hash = make_nil_vector (XFIXNUM (h->hash));
-  h->next = Fmake_vector (h->next, make_fixnum (-1));
-  h->index = Fmake_vector (h->index, make_fixnum (-1));
+check_hash_table_rehash (Lisp_Object table_orig)
+{
+  ptrdiff_t count = XHASH_TABLE (table_orig)->count;
+  hash_rehash_if_needed (XHASH_TABLE (table_orig));
+  Lisp_Object table_rehashed = Fcopy_hash_table (table_orig);
+  eassert (!hash_rehash_needed_p (XHASH_TABLE (table_rehashed)));
+  XHASH_TABLE (table_rehashed)->hash = Qnil;
+  eassert (count == 0 || hash_rehash_needed_p (XHASH_TABLE (table_rehashed)));
+  hash_rehash_if_needed (XHASH_TABLE (table_rehashed));
+  eassert (!hash_rehash_needed_p (XHASH_TABLE (table_rehashed)));
+  Lisp_Object expected_contents = hash_table_contents (table_orig);
+  while (!NILP (expected_contents))
+    {
+      Lisp_Object key_value_pair = dump_pop (&expected_contents);
+      Lisp_Object key = XCAR (key_value_pair);
+      Lisp_Object expected_value = XCDR (key_value_pair);
+      Lisp_Object arbitrary = Qdump_emacs_portable__sort_predicate_copied;
+      Lisp_Object found_value = Fgethash (key, table_rehashed, arbitrary);
+      eassert (EQ (expected_value, found_value));
+      Fremhash (key, table_rehashed);
+    }
 
-  hash_table_rehash (hash);
+  eassert (EQ (Fhash_table_count (table_rehashed),
+               make_fixnum (0)));
 }
 
 static dump_off
@@ -2661,11 +2680,51 @@ dump_hash_table (struct dump_context *ctx,
 # error "Lisp_Hash_Table changed. See CHECK_STRUCTS comment in config.h."
 #endif
   const struct Lisp_Hash_Table *hash_in = XHASH_TABLE (object);
+  bool is_stable = dump_hash_table_stable_p (hash_in);
+  /* If the hash table is likely to be modified in memory (either
+     because we need to rehash, and thus toggle hash->count, or
+     because we need to assemble a list of weak tables) punt the hash
+     table to the end of the dump, where we can lump all such hash
+     tables together.  */
+  if (!(is_stable || !NILP (hash_in->weak))
+      && ctx->flags.defer_hash_tables)
+    {
+      if (offset != DUMP_OBJECT_ON_HASH_TABLE_QUEUE)
+        {
+	  eassert (offset == DUMP_OBJECT_ON_NORMAL_QUEUE
+		   || offset == DUMP_OBJECT_NOT_SEEN);
+          /* We still want to dump the actual keys and values now.  */
+          dump_enqueue_object (ctx, hash_in->key_and_value, WEIGHT_NONE);
+          /* We'll get to the rest later.  */
+          offset = DUMP_OBJECT_ON_HASH_TABLE_QUEUE;
+          dump_remember_object (ctx, object, offset);
+          dump_push (&ctx->deferred_hash_tables, object);
+        }
+      return offset;
+    }
+
+  if (PDUMPER_CHECK_REHASHING)
+    check_hash_table_rehash (make_lisp_ptr ((void *) hash_in, Lisp_Vectorlike));
+
   struct Lisp_Hash_Table hash_munged = *hash_in;
   struct Lisp_Hash_Table *hash = &hash_munged;
 
-  hash_table_freeze (hash);
-  dump_push (&ctx->hash_tables, object);
+  /* Remember to rehash this hash table on first access.  After a
+     dump reload, the hash table values will have changed, so we'll
+     need to rebuild the index.
+
+     TODO: for EQ and EQL hash tables, it should be possible to rehash
+     here using the preferred load address of the dump, eliminating
+     the need to rehash-on-access if we can load the dump where we
+     want.  */
+  if (hash->count > 0 && !is_stable)
+    /* Hash codes will have to be recomputed anyway, so let's not dump them.
+       Also set `hash` to nil for hash_rehash_needed_p.
+       We could also refrain from dumping the `next' and `index' vectors,
+       except that `next' is currently used for HASH_TABLE_SIZE and
+       we'd have to rebuild the next_free list as well as adjust
+       sweep_weak_hash_table for the case where there's no `index'.  */
+    hash->hash = Qnil;
 
   START_DUMP_PVEC (ctx, &hash->header, struct Lisp_Hash_Table, out);
   dump_pseudovector_lisp_fields (ctx, &out->header, &hash->header);
@@ -4061,19 +4120,6 @@ DEFUN ("dump-emacs-portable",
 	 || !NILP (ctx->deferred_hash_tables)
 	 || !NILP (ctx->deferred_symbols));
 
-  ctx->header.hash_list = ctx->offset;
-  dump_hash_table_list (ctx);
-
-  do
-    {
-      dump_drain_deferred_hash_tables (ctx);
-      dump_drain_deferred_symbols (ctx);
-      dump_drain_normal_queue (ctx);
-    }
-  while (!dump_queue_empty_p (&ctx->dump_queue)
-	 || !NILP (ctx->deferred_hash_tables)
-	 || !NILP (ctx->deferred_symbols));
-
   dump_sort_copied_objects (ctx);
 
   /* While we copy built-in symbols into the Emacs image, these
@@ -5227,9 +5273,6 @@ dump_do_all_emacs_relocations (const struct dump_header *const header,
    NUMBER_DUMP_SECTIONS,
   };
 
-/* Pointer to a stack variable to avoid having to staticpro it.  */
-static Lisp_Object *pdumper_hashes = &zero_vector;
-
 /* Load a dump from DUMP_FILENAME.  Return an error code.
 
    N.B. We run very early in initialization, so we can't use lisp,
@@ -5376,15 +5419,6 @@ pdumper_load (const char *dump_filename)
   for (int i = 0; i < ARRAYELTS (sections); ++i)
     dump_mmap_reset (&sections[i]);
 
-  Lisp_Object hashes = zero_vector;
-  if (header->hash_list)
-    {
-      struct Lisp_Vector *hash_tables =
-	(struct Lisp_Vector *) (dump_base + header->hash_list);
-      hashes = make_lisp_ptr (hash_tables, Lisp_Vectorlike);
-    }
-
-  pdumper_hashes = &hashes;
   /* Run the functions Emacs registered for doing post-dump-load
      initialization.  */
   for (int i = 0; i < nr_dump_hooks; ++i)
@@ -5455,19 +5489,6 @@ DEFUN ("pdumper-stats", Fpdumper_stats, Spdumper_stats, 0, 0, 0,
 #endif /* HAVE_PDUMPER */
 
 \f
-static void
-thaw_hash_tables (void)
-{
-  Lisp_Object hash_tables = *pdumper_hashes;
-  for (ptrdiff_t i = 0; i < ASIZE (hash_tables); i++)
-    hash_table_thaw (AREF (hash_tables, i));
-}
-
-void
-init_pdumper_once (void)
-{
-  pdumper_do_now_and_after_load (thaw_hash_tables);
-}
 
 void
 syms_of_pdumper (void)
diff --git a/src/pdumper.h b/src/pdumper.h
index c793fb4058..6a99b511f2 100644
--- a/src/pdumper.h
+++ b/src/pdumper.h
@@ -256,7 +256,6 @@ pdumper_clear_marks (void)
    file was loaded.  */
 extern void pdumper_record_wd (const char *);
 
-void init_pdumper_once (void);
 void syms_of_pdumper (void);
 
 INLINE_HEADER_END
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* bug#42832: 28.0.50; "Bus error" when compiling Emacs now on Debian bullseye
  2020-08-12 20:40             ` Paul Eggert
@ 2020-08-12 20:47               ` Lars Ingebrigtsen
  2020-08-12 21:42                 ` Pip Cet
  0 siblings, 1 reply; 31+ messages in thread
From: Lars Ingebrigtsen @ 2020-08-12 20:47 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 42832

Paul Eggert <eggert@cs.ucla.edu> writes:

> A quick workaround might be to revert that particular commit; could
> you try the attached patch? It passes "make check" for me.

Yup; with that patch applied, the bus error goes away.

But it's an odd problem -- I've tried building on three machines now,
and it only fails on one.  The machine it fails on and one it works on
are both Debian bullseye, both with the same compiler version, etc.  And
on the one machine it does fail on, it only fails when saying "make -j2"
or higher.

So for all I know, there is some kind of very strange hardware error on
that machine...  although that's looking kinda unlikely now.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#42832: 28.0.50; "Bus error" when compiling Emacs now on Debian bullseye
  2020-08-12 20:47               ` Lars Ingebrigtsen
@ 2020-08-12 21:42                 ` Pip Cet
  2020-08-12 21:54                   ` Lars Ingebrigtsen
  2020-08-12 22:00                   ` Lars Ingebrigtsen
  0 siblings, 2 replies; 31+ messages in thread
From: Pip Cet @ 2020-08-12 21:42 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 42832, Paul Eggert

On Wed, Aug 12, 2020 at 8:48 PM Lars Ingebrigtsen <larsi@gnus.org> wrote:
> Paul Eggert <eggert@cs.ucla.edu> writes:
>
> > A quick workaround might be to revert that particular commit; could
> > you try the attached patch? It passes "make check" for me.
>
> Yup; with that patch applied, the bus error goes away.

That is strange.

> But it's an odd problem -- I've tried building on three machines now,
> and it only fails on one.  The machine it fails on and one it works on
> are both Debian bullseye, both with the same compiler version, etc.

Same sysctl settings, too? In particular, address randomization
appears to be enabled, does this also happen if you disable it (echo 0
| sudo tee /proc/sys/kernel/randomize_va_space) ?

> And
> on the one machine it does fail on, it only fails when saying "make -j2"
> or higher.
>
> So for all I know, there is some kind of very strange hardware error on
> that machine...  although that's looking kinda unlikely now.

I'm thinking it might "simply" be a very timing-sensitive issue, which
would exonerate me :-)





^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#42832: 28.0.50; "Bus error" when compiling Emacs now on Debian bullseye
  2020-08-12 21:42                 ` Pip Cet
@ 2020-08-12 21:54                   ` Lars Ingebrigtsen
  2020-08-13 10:05                     ` Pip Cet
  2020-08-12 22:00                   ` Lars Ingebrigtsen
  1 sibling, 1 reply; 31+ messages in thread
From: Lars Ingebrigtsen @ 2020-08-12 21:54 UTC (permalink / raw)
  To: Pip Cet; +Cc: 42832, Paul Eggert

Pip Cet <pipcet@gmail.com> writes:

> Same sysctl settings, too? In particular, address randomization
> appears to be enabled, does this also happen if you disable it (echo 0
> | sudo tee /proc/sys/kernel/randomize_va_space) ?

Tried that now and rebuilt -- bus error in the same place, I think:

#0  raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x0000555555597ad5 in terminate_due_to_signal
    (sig=sig@entry=7, backtrace_limit=backtrace_limit@entry=40) at emacs.c:408
#2  0x0000555555597f6b in handle_fatal_signal (sig=sig@entry=7)
    at sysdep.c:1782
#3  0x0000555555692d9d in deliver_thread_signal
    (sig=7, handler=0x555555597f60 <handle_fatal_signal>) at sysdep.c:1756
#4  0x0000555555692e89 in deliver_fatal_thread_signal (sig=<optimized out>)
    at sysdep.c:1794
#5  0x00007ffff5726140 in <signal handler called> ()
    at /lib/x86_64-linux-gnu/libpthread.so.0
#6  vector_marked_p (v=0xc000000018000000) at alloc.c:3859
#7  mark_object (arg=<optimized out>) at alloc.c:6607
#8  0x00005555556d6d7e in mark_vectorlike (header=0x555555c34f10)
    at alloc.c:6280
#9  0x00005555556d6d7e in mark_vectorlike (header=header@entry=0x7ffff19a41a8)
    at alloc.c:6280
#10 0x00005555556d668c in mark_hash_table (ptr=0x7ffff19a41a8) at alloc.c:6651
#11 mark_object (arg=<optimized out>) at alloc.c:6651
#12 0x00005555556d6ce7 in mark_memory (end=<optimized out>, 
    end@entry=0x7ffffffee950, start=<optimized out>) at alloc.c:4842


-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#42832: 28.0.50; "Bus error" when compiling Emacs now on Debian bullseye
  2020-08-12 21:42                 ` Pip Cet
  2020-08-12 21:54                   ` Lars Ingebrigtsen
@ 2020-08-12 22:00                   ` Lars Ingebrigtsen
  1 sibling, 0 replies; 31+ messages in thread
From: Lars Ingebrigtsen @ 2020-08-12 22:00 UTC (permalink / raw)
  To: Pip Cet; +Cc: 42832, Paul Eggert

Pip Cet <pipcet@gmail.com> writes:

>> So for all I know, there is some kind of very strange hardware error on
>> that machine...  although that's looking kinda unlikely now.
>
> I'm thinking it might "simply" be a very timing-sensitive issue, which
> would exonerate me :-)

:-)

I think we should just leave it as is for now, and see whether anybody
else sees this problem, too (and perhaps we'll find some commonalities
between the setups).  I'll just switch my test builds to a different
machine for now.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#42832: 28.0.50; "Bus error" when compiling Emacs now on Debian bullseye
  2020-08-12 21:54                   ` Lars Ingebrigtsen
@ 2020-08-13 10:05                     ` Pip Cet
  2020-08-13 10:12                       ` Lars Ingebrigtsen
  0 siblings, 1 reply; 31+ messages in thread
From: Pip Cet @ 2020-08-13 10:05 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 42832, Paul Eggert

On Wed, Aug 12, 2020 at 9:54 PM Lars Ingebrigtsen <larsi@gnus.org> wrote:
> Pip Cet <pipcet@gmail.com> writes:
>
> > Same sysctl settings, too? In particular, address randomization
> > appears to be enabled, does this also happen if you disable it (echo 0
> > | sudo tee /proc/sys/kernel/randomize_va_space) ?
>
> Tried that now and rebuilt -- bus error in the same place, I think:
>
> #0  raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:50
> #1  0x0000555555597ad5 in terminate_due_to_signal
>     (sig=sig@entry=7, backtrace_limit=backtrace_limit@entry=40) at emacs.c:408
> #2  0x0000555555597f6b in handle_fatal_signal (sig=sig@entry=7)
>     at sysdep.c:1782
> #3  0x0000555555692d9d in deliver_thread_signal
>     (sig=7, handler=0x555555597f60 <handle_fatal_signal>) at sysdep.c:1756
> #4  0x0000555555692e89 in deliver_fatal_thread_signal (sig=<optimized out>)
>     at sysdep.c:1794
> #5  0x00007ffff5726140 in <signal handler called> ()
>     at /lib/x86_64-linux-gnu/libpthread.so.0
> #6  vector_marked_p (v=0xc000000018000000) at alloc.c:3859
> #7  mark_object (arg=<optimized out>) at alloc.c:6607
> #8  0x00005555556d6d7e in mark_vectorlike (header=0x555555c34f10)
>     at alloc.c:6280

I'm trying to reproduce your build environment vaguely, and while the
addresses don't match up perfectly that does indeed appear to be an
eagerly-rehashed hash table's ->hash vector.

This is a shot in the dark, but in my case, the table containing that
address is Vdbus_registered_objects_table. Can you check whether
that's true in your case, too? Something like "p
globals.f_Vdbus_registered_objects_table" in gdb (using the core dump
should be fine) should either produce 0x555555c34f15 or something
else. If it is dbus, can you try compiling without it?





^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#42832: 28.0.50; "Bus error" when compiling Emacs now on Debian bullseye
  2020-08-13 10:05                     ` Pip Cet
@ 2020-08-13 10:12                       ` Lars Ingebrigtsen
  2020-08-13 10:15                         ` Lars Ingebrigtsen
  0 siblings, 1 reply; 31+ messages in thread
From: Lars Ingebrigtsen @ 2020-08-13 10:12 UTC (permalink / raw)
  To: Pip Cet; +Cc: 42832, Paul Eggert

Pip Cet <pipcet@gmail.com> writes:

> This is a shot in the dark, but in my case, the table containing that
> address is Vdbus_registered_objects_table. Can you check whether
> that's true in your case, too? Something like "p
> globals.f_Vdbus_registered_objects_table" in gdb (using the core dump
> should be fine) should either produce 0x555555c34f15 or something
> else. If it is dbus, can you try compiling without it?

Let's see...

(gdb) p globals.f_Vdbus_registered_objects_table
$1 = (Lisp_Object) 0x7ffff19e6665

I'll try compiling without dbus and see what happens.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#42832: 28.0.50; "Bus error" when compiling Emacs now on Debian bullseye
  2020-08-13 10:12                       ` Lars Ingebrigtsen
@ 2020-08-13 10:15                         ` Lars Ingebrigtsen
  2020-08-13 14:08                           ` Pip Cet
  0 siblings, 1 reply; 31+ messages in thread
From: Lars Ingebrigtsen @ 2020-08-13 10:15 UTC (permalink / raw)
  To: Pip Cet; +Cc: 42832, Paul Eggert

Lars Ingebrigtsen <larsi@gnus.org> writes:

> I'll try compiling without dbus and see what happens.

With ./configure --without-dbus, "make bootstrap" doesn't error out for
me.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#42832: 28.0.50; "Bus error" when compiling Emacs now on Debian bullseye
  2020-08-13 10:15                         ` Lars Ingebrigtsen
@ 2020-08-13 14:08                           ` Pip Cet
  2020-08-14 11:48                             ` Lars Ingebrigtsen
  2020-08-14 14:24                             ` Pip Cet
  0 siblings, 2 replies; 31+ messages in thread
From: Pip Cet @ 2020-08-13 14:08 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 42832, Paul Eggert

On Thu, Aug 13, 2020 at 10:15 AM Lars Ingebrigtsen <larsi@gnus.org> wrote:
> Lars Ingebrigtsen <larsi@gnus.org> writes:
> > I'll try compiling without dbus and see what happens.
>
> With ./configure --without-dbus, "make bootstrap" doesn't error out for
> me.

So even though the hash table wasn't the dbus hash table, omitting the
dbus code somehow avoids the problem? Odd.

All that sounds to me like we ought to dig down into the core file and
figure out what happened, since the issue is likely to remain present
otherwise and it seems somewhat difficult to track down and reproduce.

The other odd thing is that 0xc000000018000000. That looks like a
GC-marked pseudovector header, but I've checked and can't find
anything that would generate PVEC_COMPILEDs of length 0, which would
be a severe bug.

Can you find out which hash table lives at 0x7ffff19a41a8? I'd suggest
something like "find &globals,&globals+1,0x7ffff19a41ad" to get the
offset in globals, if it is a global variable, then looking it up with
"ptype/o globals".

(If you don't have the time, I'd be happy to look at the core file
myself, if we can arrange that).





^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#42832: 28.0.50; "Bus error" when compiling Emacs now on Debian bullseye
  2020-08-13 14:08                           ` Pip Cet
@ 2020-08-14 11:48                             ` Lars Ingebrigtsen
  2020-08-14 12:05                               ` Pip Cet
  2020-08-14 14:24                             ` Pip Cet
  1 sibling, 1 reply; 31+ messages in thread
From: Lars Ingebrigtsen @ 2020-08-14 11:48 UTC (permalink / raw)
  To: Pip Cet; +Cc: 42832, Paul Eggert

Pip Cet <pipcet@gmail.com> writes:

> Can you find out which hash table lives at 0x7ffff19a41a8? I'd suggest
> something like "find &globals,&globals+1,0x7ffff19a41ad" to get the
> offset in globals, if it is a global variable, then looking it up with
> "ptype/o globals".

That's the value from mark_vectorlike?  It's moved a bit:

#9  0x00005555556d6d7e in mark_vectorlike (header=header@entry=0x7ffff19a4190)
    at alloc.c:6280

But it says:

(gdb) find &globals,&globals+1,0x7ffff19a4190
Pattern not found.

> (If you don't have the time, I'd be happy to look at the core file
> myself, if we can arrange that).

The machine is unfortunately deep inside my private network, so there's
no easy way to allow ssh to it...

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#42832: 28.0.50; "Bus error" when compiling Emacs now on Debian bullseye
  2020-08-14 11:48                             ` Lars Ingebrigtsen
@ 2020-08-14 12:05                               ` Pip Cet
  2020-08-14 12:34                                 ` Lars Ingebrigtsen
  0 siblings, 1 reply; 31+ messages in thread
From: Pip Cet @ 2020-08-14 12:05 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 42832, Paul Eggert

On Fri, Aug 14, 2020 at 11:49 AM Lars Ingebrigtsen <larsi@gnus.org> wrote:
> Pip Cet <pipcet@gmail.com> writes:
> > Can you find out which hash table lives at 0x7ffff19a41a8? I'd suggest
> > something like "find &globals,&globals+1,0x7ffff19a41ad" to get the
> > offset in globals, if it is a global variable, then looking it up with
> > "ptype/o globals".
>
> That's the value from mark_vectorlike?  It's moved a bit:

That's strange, but possible if non-reproducible things happen on the dbus...

> #9  0x00005555556d6d7e in mark_vectorlike (header=header@entry=0x7ffff19a4190)
>     at alloc.c:6280
>
> But it says:
>
> (gdb) find &globals,&globals+1,0x7ffff19a4190
> Pattern not found.

It would probably be 0x7ffff19a4195 that we'd be looking for, stored
as a tagged pointer, but it's possible it's not a global variable at
all, of course.

> > (If you don't have the time, I'd be happy to look at the core file
> > myself, if we can arrange that).
>
> The machine is unfortunately deep inside my private network, so there's
> no easy way to allow ssh to it...

If you do have a machine that could serve files, it'd be the core file
and the corresponding emacs executable that would be most interesting.
I expect the core file to be rather large, though.





^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#42832: 28.0.50; "Bus error" when compiling Emacs now on Debian bullseye
  2020-08-14 12:05                               ` Pip Cet
@ 2020-08-14 12:34                                 ` Lars Ingebrigtsen
  0 siblings, 0 replies; 31+ messages in thread
From: Lars Ingebrigtsen @ 2020-08-14 12:34 UTC (permalink / raw)
  To: Pip Cet; +Cc: 42832, Paul Eggert

Pip Cet <pipcet@gmail.com> writes:

> If you do have a machine that could serve files, it'd be the core file
> and the corresponding emacs executable that would be most interesting.
> I expect the core file to be rather large, though.

Sure, I put them at:

https://quimby.gnus.org/s/emacs
https://quimby.gnus.org/s/core

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#42832: 28.0.50; "Bus error" when compiling Emacs now on Debian bullseye
  2020-08-13 14:08                           ` Pip Cet
  2020-08-14 11:48                             ` Lars Ingebrigtsen
@ 2020-08-14 14:24                             ` Pip Cet
  2020-08-14 15:01                               ` Pip Cet
  1 sibling, 1 reply; 31+ messages in thread
From: Pip Cet @ 2020-08-14 14:24 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 42832, Paul Eggert

On Thu, Aug 13, 2020 at 2:08 PM Pip Cet <pipcet@gmail.com> wrote:
> All that sounds to me like we ought to dig down into the core file and
> figure out what happened, since the issue is likely to remain present
> otherwise and it seems somewhat difficult to track down and reproduce.

I have a theory, and it sounds like a somewhat silly bug.

- there's a hash table h in the dumper image
- h->hash points to dynamically allocated storage (as it always does
after my patch)
- the last reference to the hash table dies
- garbage_collect is called and collects h->hash
- h->hash's storage is reallocated for a different vector with a
different start position
- a word (re)appears on the stack which looks like it's a pointer to h
(it isn't, actually)
- garbage_collect is called and calls mark_maybe_pointer(h)
- h is recognized as a pdumper object
- h->hash is marked
- we're now marking a word in the middle of the new vector that
occupies the space that h->hash used to occupy
- in our case, this word is 0xc000000018000005, which is interpreted
as a tagged pointer, dereferencing of which leads to SIGBUS

Is there something which I'm missing which would prevent this scenario?

If no, any ideas on how to fix it? The obvious fix would be to always
mark all pdumped objects, but that has a performance cost. Less
obvious would be clearing the memory in the pdumper image that belongs
to an object that's being "freed", or keeping track of which pdumper
objects are still valid after GC...





^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#42832: 28.0.50; "Bus error" when compiling Emacs now on Debian bullseye
  2020-08-14 14:24                             ` Pip Cet
@ 2020-08-14 15:01                               ` Pip Cet
  2020-08-14 15:37                                 ` Lars Ingebrigtsen
  0 siblings, 1 reply; 31+ messages in thread
From: Pip Cet @ 2020-08-14 15:01 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 42832, Paul Eggert

[-- Attachment #1: Type: text/plain, Size: 546 bytes --]

On Fri, Aug 14, 2020 at 2:24 PM Pip Cet <pipcet@gmail.com> wrote:
> If no, any ideas on how to fix it? The obvious fix would be to always
> mark all pdumped objects, but that has a performance cost. Less
> obvious would be clearing the memory in the pdumper image that belongs
> to an object that's being "freed", or keeping track of which pdumper
> objects are still valid after GC...

I've gone with the last idea. This patch should fix things, though
given how difficult the bug is to trigger reliably it might also
merely appear to fix it...

[-- Attachment #2: 0001-Try-to-avoid-marking-zombie-pdumper-objects.patch --]
[-- Type: text/x-patch, Size: 2813 bytes --]

From e5e7445625c4727067b6e056bf736a2c8db37602 Mon Sep 17 00:00:00 2001
From: Pip Cet <pipcet@gmail.com>
Date: Fri, 14 Aug 2020 14:56:19 +0000
Subject: [PATCH] Try to avoid marking zombie pdumper objects.

---
 src/pdumper.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/src/pdumper.c b/src/pdumper.c
index bc41afc7c5..812eb18de9 100644
--- a/src/pdumper.c
+++ b/src/pdumper.c
@@ -4871,6 +4871,8 @@ dump_bitset_clear (struct dump_bitset *bitset)
   struct dump_header header;
   /* Mark bits for objects in the dump; used during GC.  */
   struct dump_bitset mark_bits;
+  /* Mark bits for objects in the dump; used during GC.  */
+  struct dump_bitset last_mark_bits;
   /* Time taken to load the dump.  */
   double load_time;
   /* Dump file name.  */
@@ -4995,6 +4997,9 @@ pdumper_find_object_type_impl (const void *obj)
     return PDUMPER_NO_OBJECT;
   const struct dump_reloc *reloc =
     dump_find_relocation (&dump_private.header.object_starts, offset);
+  ptrdiff_t bitno = offset / DUMP_ALIGNMENT;
+  if (!dump_bitset_bit_set_p (&dump_private.last_mark_bits, bitno))
+    return PDUMPER_NO_OBJECT;
   return (reloc != NULL && dump_reloc_get_offset (*reloc) == offset)
     ? reloc->type
     : PDUMPER_NO_OBJECT;
@@ -5021,12 +5026,16 @@ pdumper_set_marked_impl (const void *obj)
   eassert (offset < dump_private.header.cold_start);
   eassert (offset < dump_private.header.discardable_start);
   ptrdiff_t bitno = offset / DUMP_ALIGNMENT;
+  eassert (dump_bitset_bit_set_p (&dump_private.last_mark_bits, bitno));
   dump_bitset_set_bit (&dump_private.mark_bits, bitno);
 }
 
 void
 pdumper_clear_marks_impl (void)
 {
+  memcpy (dump_private.last_mark_bits.bits, dump_private.mark_bits.bits,
+	  sizeof (dump_private.last_mark_bits.bits[0])
+	  * dump_private.mark_bits.number_words);
   dump_bitset_clear (&dump_private.mark_bits);
 }
 
@@ -5244,6 +5253,7 @@ pdumper_load (const char *dump_filename)
   dump_off adj_discardable_start;
 
   struct dump_bitset mark_bits;
+  struct dump_bitset last_mark_bits;
   size_t mark_bits_needed;
 
   struct dump_header header_buf = { 0 };
@@ -5360,12 +5370,18 @@ pdumper_load (const char *dump_filename)
   if (!dump_bitset_init (&mark_bits, mark_bits_needed))
     goto out;
 
+  if (!dump_bitset_init (&last_mark_bits, mark_bits_needed))
+    goto out;
+
+  memset (last_mark_bits.bits, 0xff, sizeof (last_mark_bits.bits[0] * last_mark_bits.number_words));
+
   /* Point of no return.  */
   err = PDUMPER_LOAD_SUCCESS;
   dump_base = (uintptr_t) sections[DS_HOT].mapping;
   gflags.dumped_with_pdumper_ = true;
   dump_private.header = *header;
   dump_private.mark_bits = mark_bits;
+  dump_private.last_mark_bits = last_mark_bits;
   dump_public.start = dump_base;
   dump_public.end = dump_public.start + dump_size;
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* bug#42832: 28.0.50; "Bus error" when compiling Emacs now on Debian bullseye
  2020-08-14 15:01                               ` Pip Cet
@ 2020-08-14 15:37                                 ` Lars Ingebrigtsen
  2020-08-14 19:08                                   ` Pip Cet
  0 siblings, 1 reply; 31+ messages in thread
From: Lars Ingebrigtsen @ 2020-08-14 15:37 UTC (permalink / raw)
  To: Pip Cet; +Cc: 42832, Paul Eggert

Pip Cet <pipcet@gmail.com> writes:

> I've gone with the last idea. This patch should fix things,

It does!  With that patch, I'm not able to reproduce the bug.

> though given how difficult the bug is to trigger reliably it might
> also merely appear to fix it...

:-/

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#42832: 28.0.50; "Bus error" when compiling Emacs now on Debian bullseye
  2020-08-14 15:37                                 ` Lars Ingebrigtsen
@ 2020-08-14 19:08                                   ` Pip Cet
  2020-08-14 19:35                                     ` Lars Ingebrigtsen
                                                       ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Pip Cet @ 2020-08-14 19:08 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 42832, Paul Eggert

[-- Attachment #1: Type: text/plain, Size: 520 bytes --]

On Fri, Aug 14, 2020 at 3:37 PM Lars Ingebrigtsen <larsi@gnus.org> wrote:
> Pip Cet <pipcet@gmail.com> writes:
> > I've gone with the last idea. This patch should fix things,
> It does!  With that patch, I'm not able to reproduce the bug.

Oops. There was a bug in the patch which would have resulted in assert
failures had I tested it with assertions. Fixed version attached.

The crash was looking at a hash table created, most likely, by
cl--generic-get-dispatcher; so that makes my theory sound more
plausible, too.

[-- Attachment #2: 0001-Try-to-avoid-marking-zombie-pdumper-objects.patch --]
[-- Type: text/x-patch, Size: 2870 bytes --]

From e9b53aa1c6cdfb8f31bb9de76ae1aa20659752f1 Mon Sep 17 00:00:00 2001
From: Pip Cet <pipcet@gmail.com>
Date: Fri, 14 Aug 2020 14:56:19 +0000
Subject: [PATCH] Try to avoid marking zombie pdumper objects.

---
 src/pdumper.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/src/pdumper.c b/src/pdumper.c
index bc41afc7c5..c05b88ea6d 100644
--- a/src/pdumper.c
+++ b/src/pdumper.c
@@ -4871,6 +4871,8 @@ dump_bitset_clear (struct dump_bitset *bitset)
   struct dump_header header;
   /* Mark bits for objects in the dump; used during GC.  */
   struct dump_bitset mark_bits;
+  /* Mark bits for objects in the dump; used during GC.  */
+  struct dump_bitset last_mark_bits;
   /* Time taken to load the dump.  */
   double load_time;
   /* Dump file name.  */
@@ -4995,6 +4997,10 @@ pdumper_find_object_type_impl (const void *obj)
     return PDUMPER_NO_OBJECT;
   const struct dump_reloc *reloc =
     dump_find_relocation (&dump_private.header.object_starts, offset);
+  ptrdiff_t bitno = offset / DUMP_ALIGNMENT;
+  if (offset < dump_private.header.cold_start
+      && !dump_bitset_bit_set_p (&dump_private.last_mark_bits, bitno))
+    return PDUMPER_NO_OBJECT;
   return (reloc != NULL && dump_reloc_get_offset (*reloc) == offset)
     ? reloc->type
     : PDUMPER_NO_OBJECT;
@@ -5021,12 +5027,16 @@ pdumper_set_marked_impl (const void *obj)
   eassert (offset < dump_private.header.cold_start);
   eassert (offset < dump_private.header.discardable_start);
   ptrdiff_t bitno = offset / DUMP_ALIGNMENT;
+  eassert (dump_bitset_bit_set_p (&dump_private.last_mark_bits, bitno));
   dump_bitset_set_bit (&dump_private.mark_bits, bitno);
 }
 
 void
 pdumper_clear_marks_impl (void)
 {
+  memcpy (dump_private.last_mark_bits.bits, dump_private.mark_bits.bits,
+	  sizeof (dump_private.last_mark_bits.bits[0])
+	  * dump_private.mark_bits.number_words);
   dump_bitset_clear (&dump_private.mark_bits);
 }
 
@@ -5244,6 +5254,7 @@ pdumper_load (const char *dump_filename)
   dump_off adj_discardable_start;
 
   struct dump_bitset mark_bits;
+  struct dump_bitset last_mark_bits;
   size_t mark_bits_needed;
 
   struct dump_header header_buf = { 0 };
@@ -5360,12 +5371,19 @@ pdumper_load (const char *dump_filename)
   if (!dump_bitset_init (&mark_bits, mark_bits_needed))
     goto out;
 
+  if (!dump_bitset_init (&last_mark_bits, mark_bits_needed))
+    goto out;
+
+  memset (last_mark_bits.bits, 0xff, sizeof (last_mark_bits.bits[0])
+	  * last_mark_bits.number_words);
+
   /* Point of no return.  */
   err = PDUMPER_LOAD_SUCCESS;
   dump_base = (uintptr_t) sections[DS_HOT].mapping;
   gflags.dumped_with_pdumper_ = true;
   dump_private.header = *header;
   dump_private.mark_bits = mark_bits;
+  dump_private.last_mark_bits = last_mark_bits;
   dump_public.start = dump_base;
   dump_public.end = dump_public.start + dump_size;
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* bug#42832: 28.0.50; "Bus error" when compiling Emacs now on Debian bullseye
  2020-08-14 19:08                                   ` Pip Cet
@ 2020-08-14 19:35                                     ` Lars Ingebrigtsen
  2020-08-14 21:10                                     ` Lars Ingebrigtsen
  2020-08-14 21:48                                     ` Paul Eggert
  2 siblings, 0 replies; 31+ messages in thread
From: Lars Ingebrigtsen @ 2020-08-14 19:35 UTC (permalink / raw)
  To: Pip Cet; +Cc: 42832, Paul Eggert

Pip Cet <pipcet@gmail.com> writes:

> Oops. There was a bug in the patch which would have resulted in assert
> failures had I tested it with assertions. Fixed version attached.

I can confirm that this version of the patch also makes the bus error go
away.

But, curiously enough, doing a "git pull" also makes the bus error go
away.  Rewinding git back 24 hours brought the bus error back again, and
I applied the patch to that version, and that made the bus error go
away.

So something has been committed over the last 24 hours resulting in me
no longer being able to reproduce the bug.  

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#42832: 28.0.50; "Bus error" when compiling Emacs now on Debian bullseye
  2020-08-14 19:08                                   ` Pip Cet
  2020-08-14 19:35                                     ` Lars Ingebrigtsen
@ 2020-08-14 21:10                                     ` Lars Ingebrigtsen
  2020-08-14 21:48                                     ` Paul Eggert
  2 siblings, 0 replies; 31+ messages in thread
From: Lars Ingebrigtsen @ 2020-08-14 21:10 UTC (permalink / raw)
  To: Pip Cet; +Cc: 42832, Paul Eggert

I got a segfault again!

This time on FreeBSD (I'm working on setting up some VMs to do some
testing on non-Debian systems), and in a different place:

  ELC      foldout.elc
  ELC      follow.elc
gmake[3]: *** [Makefile:295: find-cmd.elc] Segmentation fault (core dumped)
gmake[3]: *** Waiting for unfinished jobs....
gmake[3]: Leaving directory '/usr/home/larsi/src/emacs/trunk/lisp'
gmake[2]: *** [Makefile:318: compile-main] Error 2
gmake[2]: Leaving directory '/usr/home/larsi/src/emacs/trunk/lisp'
gmake[1]: *** [Makefile:411: lisp] Error 2
gmake[1]: Leaving directory '/usr/home/larsi/src/emacs/trunk'
gmake: *** [GNUmakefile:93: default] Error 2

Backtrace:

#0  0x00000008023ec1ba in thr_kill () at /lib/libc.so.7
#1  0x00000008023ea5e4 in raise () at /lib/libc.so.7
#2  0x000000000041f5a4 in terminate_due_to_signal
    (sig=sig@entry=11, backtrace_limit=backtrace_limit@entry=40) at emacs.c:408
#3  0x000000000041fa0c in handle_fatal_signal (sig=sig@entry=11)
    at sysdep.c:1782
#4  0x00000000005170c0 in deliver_thread_signal
    (sig=sig@entry=11, handler=0x41fa01 <handle_fatal_signal>) at sysdep.c:1756
#5  0x0000000000517126 in deliver_fatal_thread_signal (sig=11) at sysdep.c:1879
#6  handle_sigsegv (sig=11, siginfo=<optimized out>, arg=<optimized out>)
    at sysdep.c:1879
#7  0x00000008014f03ce in  () at /lib/libthr.so.3
#8  0x00000008014ef98f in  () at /lib/libthr.so.3
#9  0x00007ffffffff193 in <signal handler called> ()
#10 0x0000000000558ca2 in cons_marked_p (c=0x50) at pdumper.h:148
#11 mark_object (arg=<optimized out>) at alloc.c:6733
#12 0x000000000055969e in mark_vectorlike (header=0x8043fb620) at alloc.c:6280
#13 0x000000000055969e in mark_vectorlike (header=header@entry=0x803d112d8)
    at alloc.c:6280
#14 0x0000000000558fe8 in mark_hash_table (ptr=0x803d112d8) at alloc.c:6651
#15 mark_object (arg=<optimized out>) at alloc.c:6651
#16 0x000000000055960f in mark_memory (end=<optimized out>, 
    end@entry=0x7ffffffeea60, start=<optimized out>) at alloc.c:4842

And, again, your zombie patch makes the bug disappear.

So I guess it's time to commit it?  :-)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no






^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#42832: 28.0.50; "Bus error" when compiling Emacs now on Debian bullseye
  2020-08-14 19:08                                   ` Pip Cet
  2020-08-14 19:35                                     ` Lars Ingebrigtsen
  2020-08-14 21:10                                     ` Lars Ingebrigtsen
@ 2020-08-14 21:48                                     ` Paul Eggert
  2020-08-14 22:25                                       ` Pip Cet
  2 siblings, 1 reply; 31+ messages in thread
From: Paul Eggert @ 2020-08-14 21:48 UTC (permalink / raw)
  To: Pip Cet, Lars Ingebrigtsen; +Cc: 42832

[-- Attachment #1: Type: text/plain, Size: 359 bytes --]

Thanks for working on this and writing a fix for this obscure bug.

A couple of minor thoughts on the patch.

We can avoid copying bitmaps by swapping pointers to them.

There are now two opportunities for calloc to fail, and we leak memory if the 
second one fails. A simple fix is to call calloc once and split the result in half.

Proposed patch attached.

[-- Attachment #2: 0001-Fix-bus-error-on-Debian-bullseye.patch --]
[-- Type: text/x-patch, Size: 4802 bytes --]

From d8d976e1f5c0bed8539419b5263ae4a361501247 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Fri, 14 Aug 2020 14:33:21 -0700
Subject: [PATCH] Fix bus error on Debian bullseye
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Problem reported by Lars Mage Ingebrigtsen, and problem diagnosis
and most of this patch by Pip Cet (Bug#42832).
* src/pdumper.c (dump_bitsets_init): Rename from dump_bitset_init.
All callers changed.  Initialize two bitsets with a single malloc
call.
(struct pdumper_loaded_dump_private): New member last_mark_bits.
(pdumper_find_object_type_impl): Return PDUMPER_NO_OBJECT if
the last_mark_bits’ bit is set.
(pdumper_set_marked_impl): Assert that the last_mark_bits’
bit is set.
(pdumper_clear_marks_impl): Save mark_bits into
last_mark_bits before clearing mark_bits.
Co-authored-by: Pip Cet <pipcet@gmail.com>
---
 src/pdumper.c | 32 +++++++++++++++++++++++---------
 1 file changed, 23 insertions(+), 9 deletions(-)

diff --git a/src/pdumper.c b/src/pdumper.c
index bc41afc7c5..2d1b19283c 100644
--- a/src/pdumper.c
+++ b/src/pdumper.c
@@ -4802,14 +4802,19 @@ dump_mmap_contiguous (struct dump_memory_map *maps, int nr_maps)
 };
 
 static bool
-dump_bitset_init (struct dump_bitset *bitset, size_t number_bits)
+dump_bitsets_init (struct dump_bitset bitset[2], size_t number_bits)
 {
-  int xword_size = sizeof (bitset->bits[0]);
+  int xword_size = sizeof (bitset[0].bits[0]);
   int bits_per_word = xword_size * CHAR_BIT;
   ptrdiff_t words_needed = divide_round_up (number_bits, bits_per_word);
-  bitset->number_words = words_needed;
-  bitset->bits = calloc (words_needed, xword_size);
-  return bitset->bits != NULL;
+  dump_bitset_word *bits = calloc (words_needed, 2 * xword_size);
+  if (!bits)
+    return false;
+  bitset[0].bits = bits;
+  bitset[0].number_words = bitset[1].number_words = words_needed;
+  bitset[1].bits = memset (bits + words_needed, UCHAR_MAX,
+			   words_needed * xword_size);
+  return true;
 }
 
 static dump_bitset_word *
@@ -4870,7 +4875,7 @@ dump_bitset_clear (struct dump_bitset *bitset)
   /* Copy of the header we read from the dump.  */
   struct dump_header header;
   /* Mark bits for objects in the dump; used during GC.  */
-  struct dump_bitset mark_bits;
+  struct dump_bitset mark_bits, last_mark_bits;
   /* Time taken to load the dump.  */
   double load_time;
   /* Dump file name.  */
@@ -4993,6 +4998,10 @@ pdumper_find_object_type_impl (const void *obj)
   dump_off offset = ptrdiff_t_to_dump_off ((uintptr_t) obj - dump_public.start);
   if (offset % DUMP_ALIGNMENT != 0)
     return PDUMPER_NO_OBJECT;
+  ptrdiff_t bitno = offset / DUMP_ALIGNMENT;
+  if (offset < dump_private.header.cold_start
+      && !dump_bitset_bit_set_p (&dump_private.last_mark_bits, bitno))
+    return PDUMPER_NO_OBJECT;
   const struct dump_reloc *reloc =
     dump_find_relocation (&dump_private.header.object_starts, offset);
   return (reloc != NULL && dump_reloc_get_offset (*reloc) == offset)
@@ -5021,12 +5030,16 @@ pdumper_set_marked_impl (const void *obj)
   eassert (offset < dump_private.header.cold_start);
   eassert (offset < dump_private.header.discardable_start);
   ptrdiff_t bitno = offset / DUMP_ALIGNMENT;
+  eassert (dump_bitset_bit_set_p (&dump_private.last_mark_bits, bitno));
   dump_bitset_set_bit (&dump_private.mark_bits, bitno);
 }
 
 void
 pdumper_clear_marks_impl (void)
 {
+  dump_bitset_word *swap = dump_private.last_mark_bits.bits;
+  dump_private.last_mark_bits.bits = dump_private.mark_bits.bits;
+  dump_private.mark_bits.bits = swap;
   dump_bitset_clear (&dump_private.mark_bits);
 }
 
@@ -5243,7 +5256,7 @@ pdumper_load (const char *dump_filename)
   int dump_page_size;
   dump_off adj_discardable_start;
 
-  struct dump_bitset mark_bits;
+  struct dump_bitset mark_bits[2];
   size_t mark_bits_needed;
 
   struct dump_header header_buf = { 0 };
@@ -5357,7 +5370,7 @@ pdumper_load (const char *dump_filename)
   err = PDUMPER_LOAD_ERROR;
   mark_bits_needed =
     divide_round_up (header->discardable_start, DUMP_ALIGNMENT);
-  if (!dump_bitset_init (&mark_bits, mark_bits_needed))
+  if (!dump_bitsets_init (mark_bits, mark_bits_needed))
     goto out;
 
   /* Point of no return.  */
@@ -5365,7 +5378,8 @@ pdumper_load (const char *dump_filename)
   dump_base = (uintptr_t) sections[DS_HOT].mapping;
   gflags.dumped_with_pdumper_ = true;
   dump_private.header = *header;
-  dump_private.mark_bits = mark_bits;
+  dump_private.mark_bits = mark_bits[0];
+  dump_private.last_mark_bits = mark_bits[1];
   dump_public.start = dump_base;
   dump_public.end = dump_public.start + dump_size;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* bug#42832: 28.0.50; "Bus error" when compiling Emacs now on Debian bullseye
  2020-08-14 21:48                                     ` Paul Eggert
@ 2020-08-14 22:25                                       ` Pip Cet
  2020-08-14 22:52                                         ` Paul Eggert
  0 siblings, 1 reply; 31+ messages in thread
From: Pip Cet @ 2020-08-14 22:25 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Lars Ingebrigtsen, 42832

On Fri, Aug 14, 2020 at 9:48 PM Paul Eggert <eggert@cs.ucla.edu> wrote:
> Thanks for working on this and writing a fix for this obscure bug.

Thank you for having a look at it.

> Proposed patch attached.

LGTM, with one very minor nit:

(pdumper_find_object_type_impl): Return PDUMPER_NO_OBJECT if
the last_mark_bits’ bit is set.

It's actually if it's unset that we return PDUMPER_NO_OBJECT.





^ permalink raw reply	[flat|nested] 31+ messages in thread

* bug#42832: 28.0.50; "Bus error" when compiling Emacs now on Debian bullseye
  2020-08-14 22:25                                       ` Pip Cet
@ 2020-08-14 22:52                                         ` Paul Eggert
  0 siblings, 0 replies; 31+ messages in thread
From: Paul Eggert @ 2020-08-14 22:52 UTC (permalink / raw)
  To: Pip Cet; +Cc: Lars Ingebrigtsen, 42832-done

On 8/14/20 3:25 PM, Pip Cet wrote:
> It's actually if it's unset that we return PDUMPER_NO_OBJECT.

I fixed that, installed, and am marking the bug as done. Thanks very much.





^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2020-08-14 22:52 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-12 17:12 bug#42832: 28.0.50; "Bus error" when compiling Emacs now on Debian bullseye Lars Ingebrigtsen
2020-08-12 18:22 ` Lars Ingebrigtsen
2020-08-12 18:30   ` Lars Ingebrigtsen
2020-08-12 18:50     ` Eli Zaretskii
2020-08-12 18:58       ` Lars Ingebrigtsen
2020-08-12 19:28         ` Lars Ingebrigtsen
2020-08-12 19:33           ` Lars Ingebrigtsen
2020-08-12 20:40             ` Paul Eggert
2020-08-12 20:47               ` Lars Ingebrigtsen
2020-08-12 21:42                 ` Pip Cet
2020-08-12 21:54                   ` Lars Ingebrigtsen
2020-08-13 10:05                     ` Pip Cet
2020-08-13 10:12                       ` Lars Ingebrigtsen
2020-08-13 10:15                         ` Lars Ingebrigtsen
2020-08-13 14:08                           ` Pip Cet
2020-08-14 11:48                             ` Lars Ingebrigtsen
2020-08-14 12:05                               ` Pip Cet
2020-08-14 12:34                                 ` Lars Ingebrigtsen
2020-08-14 14:24                             ` Pip Cet
2020-08-14 15:01                               ` Pip Cet
2020-08-14 15:37                                 ` Lars Ingebrigtsen
2020-08-14 19:08                                   ` Pip Cet
2020-08-14 19:35                                     ` Lars Ingebrigtsen
2020-08-14 21:10                                     ` Lars Ingebrigtsen
2020-08-14 21:48                                     ` Paul Eggert
2020-08-14 22:25                                       ` Pip Cet
2020-08-14 22:52                                         ` Paul Eggert
2020-08-12 22:00                   ` Lars Ingebrigtsen
2020-08-12 18:34   ` Eli Zaretskii
2020-08-12 18:41     ` Lars Ingebrigtsen
2020-08-12 19:26 ` Andreas Schwab

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).