unofficial mirror of bug-guile@gnu.org 
 help / color / mirror / Atom feed
* bug#39266: Finalization thread hits wrong-type-arg on weak vector (AArch64)
@ 2020-01-24 15:14 Ludovic Courtès
  2020-02-19 13:50 ` Ludovic Courtès
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Ludovic Courtès @ 2020-01-24 15:14 UTC (permalink / raw)
  To: bug-Guile

Hello!

While building the “guix-system.drv” derivation on AArch64, I got this
crash (not fully deterministic but quite frequent).  Here the
finalization thread gets a wrong-type-arg in ‘scm_i_weak_car’ (i.e.,
accessing a one-element weak vector):

--8<---------------cut here---------------start------------->8---
$ ( export out=$PWD/build; unset GUILE_LOAD_PATH; unset GUILE_LOAD_COMPILED_PATH; gdb --args "/gnu/store/p8in2npgl5yhliy25ikz7shjbq0gii95-guile-next-3.0.0/bin/guile" "--no-auto-compile" "-L" "/gnu/store/3qg8l6kr4wa9sbgwy00z1mb3p88xf455-module-import" "-C" "/gnu/store/h9qcvg71bmx735fsndagll9y7s72k9n9-module-import-compiled" guix-system-builder )
[…]
loading 'gnu/services/cups.scm'...
Backtrace:
[Switching to Thread 0xffffbebec1d0 (LWP 22464)]

Thread 2 "guile" hit Breakpoint 3, scm_display_backtrace_with_highlights (
    stack=stack@entry="#<struct stack>" = {...}, port=port@entry=#<port #<port-type file 4c3b40> 510040>, 
    first=first@entry=#f, depth=depth@entry=#f, highlights=highlights@entry=()) at backtrace.c:269
269     {
(gdb) bt
#0  scm_display_backtrace_with_highlights (stack=stack@entry="#<struct stack>" = {...}, 
    port=port@entry=#<port #<port-type file 4c3b40> 510040>, first=first@entry=#f, depth=depth@entry=#f, 
    highlights=highlights@entry=()) at backtrace.c:269
#1  0x0000ffffbf5ef8c4 in print_exception_and_backtrace (
    args=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70cef60, tag=wrong-type-arg, 
    port=#<port #<port-type file 4c3b40> 510040>) at continuations.c:409
#2  pre_unwind_handler (error_port=0x510040, tag=wrong-type-arg, 
    args=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70cef60) at continuations.c:453
#3  0x0000ffffbf672588 in catch_pre_unwind_handler (data=0xffffbebeb850, 
    exn=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70ced40) at throw.c:135
#4  0x0000ffffbf67bdf8 in vm_regular_engine (thread=0x475b40) at vm-engine.c:972
#5  0x0000ffffbf67d10c in scm_call_n (proc=proc@entry=#<unmatched-tag 10045>, argv=<optimized out>, nargs=5)
    at vm.c:1589
#6  0x0000ffffbf5f3c10 in scm_apply_0 (proc=#<unmatched-tag 10045>, args=()) at eval.c:603
#7  0x0000ffffbf5f4654 in scm_apply_1 (proc=<optimized out>, arg1=arg1@entry=wrong-type-arg, 
    args=args@entry=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70caf80) at eval.c:609
#8  0x0000ffffbf6729e0 in scm_throw (key=key@entry=wrong-type-arg, 
    args=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70caf80) at throw.c:262
#9  0x0000ffffbf672b44 in scm_ithrow (key=key@entry=wrong-type-arg, args=<optimized out>, no_return=no_return@entry=1)
    at throw.c:457
#10 0x0000ffffbf5f1dec in scm_error_scm (key=key@entry=wrong-type-arg, subr=subr@entry="weak-vector-ref", 
    message=<optimized out>, 
    args=args@entry=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70cafd0, 
    data=data@entry=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70cafc0) at error.c:90
#11 0x0000ffffbf5f1ea0 in scm_error (key=key@entry=wrong-type-arg, 
    subr=subr@entry=0xffffbf6a52d8 <s_scm_weak_vector_ref> "weak-vector-ref", 
    message=message@entry=0xffffbf696e98 "Wrong type argument in position ~A (expecting ~A): ~S", 
    args=args@entry=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70cafd0, 
    rest=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70cafc0) at error.c:62
#12 0x0000ffffbf5f22cc in scm_wrong_type_arg_msg (
    subr=subr@entry=0xffffbf6a52d8 <s_scm_weak_vector_ref> "weak-vector-ref", pos=pos@entry=1, 
    bad_value=<error reading variable: ERROR: Cannot access memory at address 0x0>0x30ff880, 
    szMessage=szMessage@entry=0xffffbf6a5300 "weak vector") at error.c:282
#13 0x0000ffffbf680050 in scm_c_weak_vector_ref (wv=<optimized out>, k=k@entry=0) at weak-vector.c:193
#14 0x0000ffffbf67eff4 in scm_i_weak_car (
    pair=<error reading variable: ERROR: Cannot access memory at address 0x0>0x30f8830) at weak-list.h:39
#15 scm_i_visit_weak_list (list_loc=0xffffbf6c81b0 <all_weak_tables>, visit=<optimized out>) at weak-list.h:49
#16 vacuum_all_weak_tables () at weak-table.c:494
#17 0x0000ffffbf5fda44 in async_gc_finalizer (ptr=0x494ec0, data=0x0) at finalizers.c:316
#18 0x0000ffffbf549f74 in GC_invoke_finalizers ()
   from /gnu/store/wsqzmim7m23gskpibrpqzx4djadhjz8y-libgc-7.6.12/lib/libgc.so.1
#19 0x0000ffffbf5fdf64 in scm_run_finalizers () at finalizers.c:398
#20 0x0000ffffbf5fdff4 in finalization_thread_proc (unused=<optimized out>) at finalizers.c:233
#21 0x0000ffffbf5ef6e0 in c_body (d=0xffffbebeb918) at continuations.c:430
#22 0x0000ffffbf67bdf8 in vm_regular_engine (thread=0x475b40) at vm-engine.c:972
#23 0x0000ffffbf67d10c in scm_call_n (proc=#<unmatched-tag 10045>, argv=argv@entry=0xffffbebeb660, 
    nargs=nargs@entry=2) at vm.c:1589
#24 0x0000ffffbf5f3930 in scm_call_2 (proc=<optimized out>, arg1=<optimized out>, arg2=<optimized out>) at eval.c:503
#25 0x0000ffffbf5f4f38 in scm_c_with_exception_handler (type=type@entry=#t, handler=0xffffbebeb670, 
    handler@entry=0xffffbf6724b0 <catch_post_unwind_handler>, handler_data=0x510040, 
    handler_data@entry=0xffffbebeb850, thunk=0x0, thunk@entry=0xffffbf6725f8 <catch_body>, 
    thunk_data=0x1dce42683dff4d67, thunk_data@entry=0xffffbebeb850) at exceptions.c:170
#26 0x0000ffffbf672850 in scm_c_catch (tag=tag@entry=#t, body=body@entry=0xffffbf5ef6c8 <c_body>, 
    body_data=body_data@entry=0xffffbebeb918, handler=handler@entry=0xffffbf5ef970 <c_handler>, 
    handler_data=handler_data@entry=0xffffbebeb918, 
    pre_unwind_handler=pre_unwind_handler@entry=0xffffbf5ef7b8 <pre_unwind_handler>, 
    pre_unwind_handler_data=pre_unwind_handler_data@entry=0x510040) at throw.c:168
#27 0x0000ffffbf5efbf4 in scm_i_with_continuation_barrier (body=body@entry=0xffffbf5ef6c8 <c_body>, 
    body_data=body_data@entry=0xffffbebeb918, handler=handler@entry=0xffffbf5ef970 <c_handler>, 
    handler_data=handler_data@entry=0xffffbebeb918, 
    pre_unwind_handler=pre_unwind_handler@entry=0xffffbf5ef7b8 <pre_unwind_handler>, pre_unwind_handler_data=0x510040)
    at continuations.c:368
#28 0x0000ffffbf5efca0 in scm_c_with_continuation_barrier (func=<optimized out>, data=<optimized out>)
    at continuations.c:464
#29 0x0000ffffbf671148 in with_guile (base=0xffffbebeb988, data=0xffffbebeb9a8) at threads.c:645
#30 0x0000ffffbf551618 in GC_call_with_stack_base ()
   from /gnu/store/wsqzmim7m23gskpibrpqzx4djadhjz8y-libgc-7.6.12/lib/libgc.so.1
#31 0x0000ffffbf6714a8 in scm_i_with_guile (dynamic_state=<optimized out>, data=<optimized out>, func=<optimized out>)
    at threads.c:688
#32 scm_with_guile (func=<optimized out>, data=<optimized out>) at threads.c:694
#33 0x0000ffffbf50e7f4 in start_thread ()
   from /gnu/store/nr1aw4i32h7rmxwmq7d2da0mwcwg551j-glibc-2.29/lib/libpthread.so.0
#34 0x0000ffffbf136edc in thread_start () from /gnu/store/nr1aw4i32h7rmxwmq7d2da0mwcwg551j-glibc-2.29/lib/libc.so.6
(gdb) info threads
  Id   Target Id                                 Frame 
  1    Thread 0xffffbf6f4010 (LWP 22463) "guile" resize_table (table=table@entry=0x4a2cb0) at weak-table.c:272
* 2    Thread 0xffffbebec1d0 (LWP 22464) "guile" scm_display_backtrace_with_highlights (
    stack=stack@entry="#<struct stack>" = {...}, port=port@entry=#<port #<port-type file 4c3b40> 510040>, 
    first=first@entry=#f, depth=depth@entry=#f, highlights=highlights@entry=()) at backtrace.c:269
--8<---------------cut here---------------end--------------->8---

The problem appears to be that the type tag of the weak-vector got
zeroed:

--8<---------------cut here---------------start------------->8---
(gdb) frame 15
#15 scm_i_visit_weak_list (list_loc=0xffffbf6c81b0 <all_weak_tables>, visit=<optimized out>) at weak-list.h:49
49            SCM car = scm_i_weak_car (in);
(gdb) p in
$48 = <error reading variable: ERROR: Cannot access memory at address 0x0>(SCM) <error reading variable: ERROR: Cannot access memory at address 0x0>0x30f8830
(gdb) p *(void**)in
$49 = (void *) 0x30ff880
(gdb) p ((void**)$49)[0]@2
$50 = {0x0, 0x30f8840}
(gdb) p (void**)in
$51 = (void **) 0x30f8830
--8<---------------cut here---------------end--------------->8---

There’s normally no disappearing link registered on the first element of
the weak vector (type tag + length) so I don’t know how this can happen.
Here’s the other thread (with surprisingly broken stack frames):

--8<---------------cut here---------------start------------->8---
(gdb) thread 1
[Switching to thread 1 (Thread 0xffffbf6f4010 (LWP 22463))]
#0  resize_table (table=table@entry=0x4a2cb0) at weak-table.c:272
272               scm_t_weak_entry *next = entry->next;
(gdb) bt
#0  resize_table (table=table@entry=0x4a2cb0) at weak-table.c:272
#1  0x0000ffffbf67ef00 in vacuum_weak_table (table=0x4a2cb0) at weak-table.c:318
#2  0x0000ffffbf67f374 in scm_c_weak_table_ref (table=<optimized out>, raw_hash=2016212919028049524, 
    pred=0xffffbf67eb10 <assq_predicate>, closure=0x72b6a80, dflt=()) at weak-table.c:533
#3  0x0000ffffbf653824 in scm_source_properties (obj=<optimized out>) at srcprop.c:195
#4  0x0000ffffbeebcf14 in ?? ()
#5  0x0000ffffbf03c130 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
--8<---------------cut here---------------end--------------->8---

Thoughts?

This code is the same as in 2.2.

Ludo’.





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#39266: Finalization thread hits wrong-type-arg on weak vector (AArch64)
  2020-01-24 15:14 bug#39266: Finalization thread hits wrong-type-arg on weak vector (AArch64) Ludovic Courtès
@ 2020-02-19 13:50 ` Ludovic Courtès
  2020-02-19 14:19   ` Brian Woodcox
  2020-02-29 15:09 ` shtwzrd via Bug reports for GUILE, GNU's Ubiquitous Extension Language
  2020-03-09 14:38 ` Ludovic Courtès
  2 siblings, 1 reply; 7+ messages in thread
From: Ludovic Courtès @ 2020-02-19 13:50 UTC (permalink / raw)
  To: 39266

Ludovic Courtès <ludo@gnu.org> skribis:

> While building the “guix-system.drv” derivation on AArch64, I got this
> crash (not fully deterministic but quite frequent).  Here the
> finalization thread gets a wrong-type-arg in ‘scm_i_weak_car’ (i.e.,
> accessing a one-element weak vector):
>
> $ ( export out=$PWD/build; unset GUILE_LOAD_PATH; unset GUILE_LOAD_COMPILED_PATH; gdb --args "/gnu/store/p8in2npgl5yhliy25ikz7shjbq0gii95-guile-next-3.0.0/bin/guile" "--no-auto-compile" "-L" "/gnu/store/3qg8l6kr4wa9sbgwy00z1mb3p88xf455-module-import" "-C" "/gnu/store/h9qcvg71bmx735fsndagll9y7s72k9n9-module-import-compiled" guix-system-builder )
> […]
> loading 'gnu/services/cups.scm'...
> Backtrace:

Apparently this bug does not occur with v3.0.0-23-g7dc90a17e¹.  It may
be that 00fbdfa7345765168e14438eed0b0b8c64c27ab9 reduces GC pressure,
which as a side effect makes the problem vanish.

It’s not satisfactory, but as a stop-gap measure, we could release 3.0.1
like this, which could make Guile 3 usable for Guix on AArch64.

Thoughts?

Ludo’.

¹ Specifically, I tested by (1) building a tarball with “make dist”, (2)
  running “guix build guile-next
  --with-source=guile-next=the-tarball.tar.gz”, and (3) running that
  Guile in the code above.  For some reason, Guile 3.0.0 built “by hand”
  would not reproduce the original bug, which is why I built it through
  Guix.





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#39266: Finalization thread hits wrong-type-arg on weak vector (AArch64)
  2020-02-19 13:50 ` Ludovic Courtès
@ 2020-02-19 14:19   ` Brian Woodcox
  0 siblings, 0 replies; 7+ messages in thread
From: Brian Woodcox @ 2020-02-19 14:19 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 39266

Well, I would be happy, because right now I can’t do guix pull.

I tried multiple timed on aarch64 with no success.

Brian.


> On Feb 19, 2020, at 6:50 AM, Ludovic Courtès <ludo@gnu.org> wrote:
> 
> Ludovic Courtès <ludo@gnu.org> skribis:
> 
>> While building the “guix-system.drv” derivation on AArch64, I got this
>> crash (not fully deterministic but quite frequent).  Here the
>> finalization thread gets a wrong-type-arg in ‘scm_i_weak_car’ (i.e.,
>> accessing a one-element weak vector):
>> 
>> $ ( export out=$PWD/build; unset GUILE_LOAD_PATH; unset GUILE_LOAD_COMPILED_PATH; gdb --args "/gnu/store/p8in2npgl5yhliy25ikz7shjbq0gii95-guile-next-3.0.0/bin/guile" "--no-auto-compile" "-L" "/gnu/store/3qg8l6kr4wa9sbgwy00z1mb3p88xf455-module-import" "-C" "/gnu/store/h9qcvg71bmx735fsndagll9y7s72k9n9-module-import-compiled" guix-system-builder )
>> […]
>> loading 'gnu/services/cups.scm'...
>> Backtrace:
> 
> Apparently this bug does not occur with v3.0.0-23-g7dc90a17e¹.  It may
> be that 00fbdfa7345765168e14438eed0b0b8c64c27ab9 reduces GC pressure,
> which as a side effect makes the problem vanish.
> 
> It’s not satisfactory, but as a stop-gap measure, we could release 3.0.1
> like this, which could make Guile 3 usable for Guix on AArch64.
> 
> Thoughts?
> 
> Ludo’.
> 
> ¹ Specifically, I tested by (1) building a tarball with “make dist”, (2)
>  running “guix build guile-next
>  --with-source=guile-next=the-tarball.tar.gz”, and (3) running that
>  Guile in the code above.  For some reason, Guile 3.0.0 built “by hand”
>  would not reproduce the original bug, which is why I built it through
>  Guix.
> 
> 
> 





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#39266: Finalization thread hits wrong-type-arg on weak vector (AArch64)
  2020-01-24 15:14 bug#39266: Finalization thread hits wrong-type-arg on weak vector (AArch64) Ludovic Courtès
  2020-02-19 13:50 ` Ludovic Courtès
@ 2020-02-29 15:09 ` shtwzrd via Bug reports for GUILE, GNU's Ubiquitous Extension Language
  2020-03-09 14:38 ` Ludovic Courtès
  2 siblings, 0 replies; 7+ messages in thread
From: shtwzrd via Bug reports for GUILE, GNU's Ubiquitous Extension Language @ 2020-02-29 15:09 UTC (permalink / raw)
  To: 39266@debbugs.gnu.org, Ludovic Courtès

[-- Attachment #1: Type: text/plain, Size: 474 bytes --]

Seconding what Brian said, I would also be happy with this solution.

Right now, guix users on aarch64 simply aren't using guile 3 because they can't pull, so this one bug probably acts as a blocker to encountering other potential bugs.

So even though it's not satisfactory, getting aarch64 working again could result in more and varied bug reports for guile on that platform, so that there may eventually be enough information to find the actual root cause of the problem.

[-- Attachment #2: Type: text/html, Size: 545 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#39266: Finalization thread hits wrong-type-arg on weak vector (AArch64)
  2020-01-24 15:14 bug#39266: Finalization thread hits wrong-type-arg on weak vector (AArch64) Ludovic Courtès
  2020-02-19 13:50 ` Ludovic Courtès
  2020-02-29 15:09 ` shtwzrd via Bug reports for GUILE, GNU's Ubiquitous Extension Language
@ 2020-03-09 14:38 ` Ludovic Courtès
  2020-03-09 22:19   ` Pierre Langlois
  2 siblings, 1 reply; 7+ messages in thread
From: Ludovic Courtès @ 2020-03-09 14:38 UTC (permalink / raw)
  To: 39266

Ludovic Courtès <ludo@gnu.org> skribis:

> While building the “guix-system.drv” derivation on AArch64, I got this
> crash (not fully deterministic but quite frequent).  Here the
> finalization thread gets a wrong-type-arg in ‘scm_i_weak_car’ (i.e.,
> accessing a one-element weak vector):

With 3.0.1, I can reproduce the bug on x86_64.  With rr (thanks, Andy!),
I found this (starting from the point where the type cell of the weak
vector is zeroed, and reverse-continuing until its gets its original
value of 0x10f):

--8<---------------cut here---------------start------------->8---
(rr) frame 40
#40 0x00007ffff7f2e66d in scm_i_weak_car (pair=0x7fffe15af690) at ../libguile/pairs.h:190
190	  return SCM_CAR (x);
(rr) down
#39 0x00007ffff7f2f576 in scm_c_weak_vector_ref (wv=<optimized out>, k=k@entry=0) at weak-vector.c:193
193	  SCM_VALIDATE_WEAK_VECTOR (1, wv);
(rr) 
#38 0x00007ffff7ea7ba0 in scm_wrong_type_arg_msg (
    subr=subr@entry=0x7ffff7f56f00 <s_scm_weak_vector_ref> "weak-vector-ref", pos=pos@entry=1, 
    bad_value=0x7fffec472b90, szMessage=szMessage@entry=0x7ffff7f56e80 "weak vector") at error.c:282
282	      scm_error (scm_arg_type_key,
(rr) p *((void**)0x7fffec472b90)
$1 = (void *) 0x0
(rr) watch *((void**)0x7fffec472b90)
Hardware watchpoint 1: *((void**)0x7fffec472b90)
(rr) reverse-cont
Continuing.

Thread 1 received signal SIGCONT, Continued.
[Switching to Thread 27074.27074]
__lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:101
101	../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: Dosiero aŭ dosierujo ne ekzistas.
(rr) 
Continuing.

Thread 1 hit Hardware watchpoint 1: *((void**)0x7fffec472b90)

Old value = (void *) 0x0
New value = (void *) 0x10f
__memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:259
259	../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Dosiero aŭ dosierujo ne ekzistas.
(rr) bt
#0  __memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:259
#1  0x00007ffff7f1d499 in set_vtable_access_fields (vtable=vtable@entry=0x7fffeb48ee80) at struct.c:143
#2  0x00007ffff7f1dd8d in scm_i_struct_inherit_vtable_magic (vtable=vtable@entry=0x7ffff4e32fa0, 
    obj=obj@entry=0x7fffeb48ee80) at struct.c:215
#3  0x00007ffff7f1dfea in scm_c_make_structv (vtable=0x7ffff4e32fa0, n_tail=<optimized out>, n_init=8, 
    init=0x7fffffff50d0) at struct.c:364
#4  0x00007ffff7f1e0b9 in scm_make_struct_no_tail (vtable=0x7ffff4e32fa0, init=0x304) at struct.c:491
--8<---------------cut here---------------end--------------->8---

Bingo!  There’s a mismatch in struct.c:

--8<---------------cut here---------------start------------->8---
  bitmask_size = (nfields + 31U) / 32U;
  unboxed_fields = scm_gc_malloc_pointerless (bitmask_size, "unboxed fields");
  memset (unboxed_fields, 0, bitmask_size * sizeof(*unboxed_fields));
--8<---------------cut here---------------end--------------->8---

Pushed a fix as 7c17655cd3d859bf0c5a86d9782a7788205fc05a.

Thanks, rr!  You made my day!  :-)

Now testing Guix builds on x86_64, i686, ARMv7, and AArch64 to see if
that addresses seemingly related issues.

Ludo’.





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#39266: Finalization thread hits wrong-type-arg on weak vector (AArch64)
  2020-03-09 14:38 ` Ludovic Courtès
@ 2020-03-09 22:19   ` Pierre Langlois
  2020-03-10 17:25     ` Ludovic Courtès
  0 siblings, 1 reply; 7+ messages in thread
From: Pierre Langlois @ 2020-03-09 22:19 UTC (permalink / raw)
  To: 39266

Hi Ludo,

Ludovic Courtès writes:

> Ludovic Courtès <ludo@gnu.org> skribis:
>
>> While building the “guix-system.drv” derivation on AArch64, I got this
>> crash (not fully deterministic but quite frequent).  Here the
>> finalization thread gets a wrong-type-arg in ‘scm_i_weak_car’ (i.e.,
>> accessing a one-element weak vector):
>
> With 3.0.1, I can reproduce the bug on x86_64.  With rr (thanks, Andy!),
> I found this (starting from the point where the type cell of the weak
> vector is zeroed, and reverse-continuing until its gets its original
> value of 0x10f):
>
> --8<---------------cut here---------------start------------->8---
> (rr) frame 40
> #40 0x00007ffff7f2e66d in scm_i_weak_car (pair=0x7fffe15af690) at ../libguile/pairs.h:190
> 190	  return SCM_CAR (x);
> (rr) down
> #39 0x00007ffff7f2f576 in scm_c_weak_vector_ref (wv=<optimized out>, k=k@entry=0) at weak-vector.c:193
> 193	  SCM_VALIDATE_WEAK_VECTOR (1, wv);
> (rr) 
> #38 0x00007ffff7ea7ba0 in scm_wrong_type_arg_msg (
>     subr=subr@entry=0x7ffff7f56f00 <s_scm_weak_vector_ref> "weak-vector-ref", pos=pos@entry=1, 
>     bad_value=0x7fffec472b90, szMessage=szMessage@entry=0x7ffff7f56e80 "weak vector") at error.c:282
> 282	      scm_error (scm_arg_type_key,
> (rr) p *((void**)0x7fffec472b90)
> $1 = (void *) 0x0
> (rr) watch *((void**)0x7fffec472b90)
> Hardware watchpoint 1: *((void**)0x7fffec472b90)
> (rr) reverse-cont
> Continuing.
>
> Thread 1 received signal SIGCONT, Continued.
> [Switching to Thread 27074.27074]
> __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:101
> 101	../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: Dosiero aŭ dosierujo ne ekzistas.
> (rr) 
> Continuing.
>
> Thread 1 hit Hardware watchpoint 1: *((void**)0x7fffec472b90)
>
> Old value = (void *) 0x0
> New value = (void *) 0x10f
> __memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:259
> 259	../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Dosiero aŭ dosierujo ne ekzistas.
> (rr) bt
> #0  __memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:259
> #1  0x00007ffff7f1d499 in set_vtable_access_fields (vtable=vtable@entry=0x7fffeb48ee80) at struct.c:143
> #2  0x00007ffff7f1dd8d in scm_i_struct_inherit_vtable_magic (vtable=vtable@entry=0x7ffff4e32fa0, 
>     obj=obj@entry=0x7fffeb48ee80) at struct.c:215
> #3  0x00007ffff7f1dfea in scm_c_make_structv (vtable=0x7ffff4e32fa0, n_tail=<optimized out>, n_init=8, 
>     init=0x7fffffff50d0) at struct.c:364
> #4  0x00007ffff7f1e0b9 in scm_make_struct_no_tail (vtable=0x7ffff4e32fa0, init=0x304) at struct.c:491
> --8<---------------cut here---------------end--------------->8---
>
> Bingo!  There’s a mismatch in struct.c:
>
> --8<---------------cut here---------------start------------->8---
>   bitmask_size = (nfields + 31U) / 32U;
>   unboxed_fields = scm_gc_malloc_pointerless (bitmask_size, "unboxed fields");
>   memset (unboxed_fields, 0, bitmask_size * sizeof(*unboxed_fields));
> --8<---------------cut here---------------end--------------->8---

Oh wow, scary! That was some nice debugging, these types of bugs can be
really hard to get to the bottom of.

>
> Pushed a fix as 7c17655cd3d859bf0c5a86d9782a7788205fc05a.
>
> Thanks, rr!  You made my day!  :-)
>
> Now testing Guix builds on x86_64, i686, ARMv7, and AArch64 to see if
> that addresses seemingly related issues.

I've tested it on AArch64 and it's looking good, I'm running Guile 3
finally! I've tested by running 'guix pull --branch=wip-guile-3.0.1' on
a rockpro64 running the Guix system, I've then reconfigured and rebooted
and it's all good.

Thanks so much for the fix! Hopefully it'll work on every platform and
that can be the end of it :-).

Pierre





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#39266: Finalization thread hits wrong-type-arg on weak vector (AArch64)
  2020-03-09 22:19   ` Pierre Langlois
@ 2020-03-10 17:25     ` Ludovic Courtès
  0 siblings, 0 replies; 7+ messages in thread
From: Ludovic Courtès @ 2020-03-10 17:25 UTC (permalink / raw)
  To: Pierre Langlois; +Cc: 39266-done

Hi Pierre,

Pierre Langlois <pierre.langlois@gmx.com> skribis:

> I've tested it on AArch64 and it's looking good, I'm running Guile 3
> finally! I've tested by running 'guix pull --branch=wip-guile-3.0.1' on
> a rockpro64 running the Guix system, I've then reconfigured and rebooted
> and it's all good.

Thanks for testing!

> Thanks so much for the fix! Hopefully it'll work on every platform and
> that can be the end of it :-).

Yup, I’ve tested ‘guix pull --branch=wip-guile-3.0.1’ and ‘guix build
guile3.0-guix’ on all 4 architectures that Guix supports, and everything
is fine.

I’ve now pushed the upgrade to 3.0.1 + patch to Guix.

Closing!  \o/

The bug appears to be rare for Guile workloads not as intensive as a
Guix build (never reported, never seen), but we should still probably do
a bug-fix 3.0.2 release in the coming weeks, I guess.

Ludo’.





^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-03-10 17:25 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-24 15:14 bug#39266: Finalization thread hits wrong-type-arg on weak vector (AArch64) Ludovic Courtès
2020-02-19 13:50 ` Ludovic Courtès
2020-02-19 14:19   ` Brian Woodcox
2020-02-29 15:09 ` shtwzrd via Bug reports for GUILE, GNU's Ubiquitous Extension Language
2020-03-09 14:38 ` Ludovic Courtès
2020-03-09 22:19   ` Pierre Langlois
2020-03-10 17:25     ` Ludovic Courtès

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).