Dear Ludovic,

I apologize for the late response.  We rebuilt guile from scratch with the latest GC (7.2f) and the patch you provided.  We got a 73GB output file in reproducing the problem with the extra logging!  I am attaching the last 100,000 lines of output as a file.  Appended is the stack trace.  Can you kindly take a look when you get a chance?

It appears that len becomes 0 after successive removals and then we hit the assertion failure when len is 0 and removed is 1.

Thanks,
Anand

#0  0x00e93430 in __kernel_vsyscall ()
#1  0x001e4b01 in raise () from /lib/libc.so.6
#2  0x001e63da in abort () from /lib/libc.so.6
#3  0x001ddddb in __assert_fail_base () from /lib/libc.so.6
#4  0x001dde96 in __assert_fail () from /lib/libc.so.6
#5  0x080e9ad0 in vacuum_weak_hash_table (table=0xa86c840) at hashtab.c:138
#6  0x080e97f7 in weak_gc_callback (hook_data=0x0, fn_data=0x6, data=0x0) at hashtab.c:440
#7  weak_gc_hook (hook_data=0x0, fn_data=0x6, data=0x0) at hashtab.c:449
#8  0x080eac7d in scm_c_hook_run (hook=0x8961ba0, data=0x0) at hooks.c:103
#9  0x080dff76 in run_before_gc_c_hook () at gc.c:240
#10 0x08188ed9 in GC_notify_full_gc (stop_func=0x8188090 <GC_never_stop_func>) at alloc.c:335
#11 GC_try_to_collect_inner (stop_func=0x8188090 <GC_never_stop_func>) at alloc.c:430
#12 0x081894db in GC_collect_or_expand (needed_blocks=1, ignore_off_page=0, retry=0) at alloc.c:1253
#13 0x0818963c in GC_allocobj (gran=2, kind=0) at alloc.c:1340
#14 0x0818c066 in GC_generic_malloc_inner (lb=16, k=0) at malloc.c:121
#15 0x0818ccb6 in GC_generic_malloc_many (lb=16, k=0, result=0xb4c372c) at mallocx.c:423
#16 0x08194ee8 in GC_malloc_atomic (bytes=16) at thread_local_alloc.c:195
#17 0x080df522 in do_gc_malloc_atomic (size=0, what=0x6 <Address 0x6 out of bounds>) at gc-malloc.c:106
#18 0x080f612b in make_bignum () at numbers.c:253
#19 scm_i_mkbig () at numbers.c:267
#20 0x080f8ae5 in scm_difference (x=0xbc8f260, y=0xd3412b0) at numbers.c:7861
#21 0x08143434 in vm_regular_engine (vm=0xc001b50, program=0x0, argv=0xbc8f260, nargs=0) at vm-i-scheme.c:414
#22 0x0813c5b7 in scm_c_vm_run (vm=0xc001b50, program=0xb44f400, argv=0x0, nargs=0) at vm.c:768
#23 0x080d8f7a in scm_call_0 (proc=0xb44f400) at eval.c:480
#24 0x08137c9b in really_launch (d=0xfface2a0) at threads.c:1005
#25 0x08163dc2 in c_body (d=0xf61d620c) at continuations.c:517
#26 0x08143d7d in vm_regular_engine (vm=0xc001b50, program=0x0, argv=0x2, nargs=4) at vm-i-system.c:858
#27 0x0813c5b7 in scm_c_vm_run (vm=0xc001b50, program=0xaa0ddc8, argv=0xf61d6168, nargs=4) at vm.c:768
#28 0x080d8e91 in scm_call_4 (proc=0xaa0ddc8, arg1=0x404, arg2=0xc005ef0, arg3=0xc005ee0, arg4=0xc005ed0) at eval.c:507
#29 0x08138c5e in scm_catch_with_pre_unwind_handler (key=0x404, thunk=0xc005ef0, handler=0xc005ee0, pre_unwind_handler=0xc005ed0) at throw.c:73
#30 0x08164067 in scm_i_with_continuation_barrier (body=0x8163db0 <c_body>, body_data=0xf61d620c, handler=0x8163fb0 <c_handler>, handler_data=0xf61d620c,
    pre_unwind_handler=0x8163e00 <pre_unwind_handler>, pre_unwind_handler_data=0xa9daff8) at continuations.c:455
#31 0x08164102 in scm_c_with_continuation_barrier (func=0x8137c10 <really_launch>, data=0xfface2a0) at continuations.c:551
#32 0x08137a52 in with_guile_and_parent (base=0xf61d626c, data=0xf61d628c) at threads.c:906
#33 0x081900ff in GC_call_with_stack_base (fn=0x8137a10 <with_guile_and_parent>, arg=0xf61d628c) at misc.c:1573
#34 0x081375c2 in scm_i_with_guile_and_parent (func=<value optimized out>, data=0x6, parent=0x2740) at threads.c:949
#35 0x08137625 in launch_thread (d=0xfface2a0) at threads.c:1017
#36 0x08199cdb in GC_inner_start_routine (sb=0xf61d633c, arg=0xbc87fc0) at pthread_start.c:56
#37 0x081900ff in GC_call_with_stack_base (fn=0x8199c80 <GC_inner_start_routine>, arg=0xbc87fc0) at misc.c:1573
#38 0x08195319 in GC_start_routine (arg=0xbc87fc0) at pthread_support.c:1549
#39 0x00146a49 in start_thread () from /lib/libpthread.so.0
#40 0x00298e1e in clone () from /lib/libc.so.6


On Wed, Dec 3, 2014 at 1:17 AM, Ludovic Courtès <ludo@gnu.org> wrote:
Anand Mohanadoss <anand108@gmail.com> skribis:

> #4  0x003b8e96 in __assert_fail () from /lib/libc.so.6
> #5  0x080e93d7 in vacuum_weak_hash_table (table=0x97f8b10) at hashtab.c:137
> #6  0x080e9857 in weak_gc_callback (hook_data=0x0, fn_data=0x6, data=0x0)
> at hashtab.c:437
> #7  weak_gc_hook (hook_data=0x0, fn_data=0x6, data=0x0) at hashtab.c:446
> #8  0x080eaa2d in scm_c_hook_run (hook=0x895acc0, data=0x0) at hooks.c:103
> #9  0x080dfda6 in run_before_gc_c_hook () at gc.c:240
> #10 0x08187719 in GC_notify_full_gc (stop_func=0x81868e0
> <GC_never_stop_func>) at alloc.c:334
> #11 GC_try_to_collect_inner (stop_func=0x81868e0 <GC_never_stop_func>) at
> alloc.c:429
> #12 0x08187d1b in GC_collect_or_expand (needed_blocks=4, ignore_off_page=0,
> retry=0) at alloc.c:1242
> #13 0x0818a64f in GC_alloc_large (lb=14080, k=1, flags=0) at malloc.c:63
> #14 0x0818a9c2 in GC_generic_malloc (lb=14076, k=1) at malloc.c:175
> #15 0x0818ace7 in GC_core_malloc (lb=14076) at malloc.c:263
> #16 0x080df572 in do_gc_malloc (size=0, what=0x6 <Address 0x6 out of
> bounds>) at gc-malloc.c:100
> #17 0x0813a565 in scm_c_make_vector (k=3517, fill=0x304) at vectors.c:408
> #18 0x080e9471 in scm_i_rehash (table=0x97f8b10, hash_fn=0x816c1d0
> <scm_ihashq>, closure=0x0, func_name=0x854f8bd
> "scm_hash_fn_create_handle_x") at hashtab.c:344

Looking at this stack trace, it seems ‘table’ is being concurrently
modified: a GC occurs while it is being rehashed.

Could you apply with attached patch (with “patch -p1 < the-patch” run
from the top of the source tree), and report the lines that are printed
before the assertion failure?

Also, please report the corresponding backtrace, as you did above (just
to make sure this is the same scenario.)

Thanks in advance,
Ludo’.