* bug#19883: Smob's mark_smob has become unreliable in Guile 2.x @ 2015-02-16 16:41 ` David Kastrup 2015-02-16 18:13 ` bug#19883: Correction for backtrace David Kastrup ` (2 more replies) 0 siblings, 3 replies; 23+ messages in thread From: David Kastrup @ 2015-02-16 16:41 UTC (permalink / raw) To: 19883 [-- Attachment #1: Type: text/plain, Size: 5346 bytes --] This precludes porting LilyPond to Guile 2 since LilyPond relies extensively on the mark_smob functionality. The symptom likely occurs when an object is collected and consequently its free_smob function gets called. This free_smob function essentially is responsible for freeing all resources claimed by the smob, rendering the underlying data moot. If mark_smob continues to get called, its interpretation of the now dead data is not likely to lead to happy results. Debugging is tricky since the gc phases tend to occur asynchronously. If I start the accompanying program with ./test 2 20000 40 I tend to get a core dump perhaps every fourth call. Running repeatedly in a debugger does not produce core dumps (perhaps one needs to reload the debugger for each try?), so one needs to do ulimit -c unlimited ./test 2 20000 40 and, after the problem triggers gdb ./test core for post-mortem debugging. A traceback looks like (gdb) bt #0 0x00000000 in ?? () #1 0x0804aee0 in ?? () #2 0xb761b2da in ?? () from /usr/lib/libguile-2.0.so.22 #3 0xb72751f8 in GC_mark_from () from /usr/lib/i386-linux-gnu/libgc.so.1 #4 0xb72766ca in GC_mark_some () from /usr/lib/i386-linux-gnu/libgc.so.1 #5 0xb726cd16 in GC_stopped_mark () from /usr/lib/i386-linux-gnu/libgc.so.1 #6 0xb726d730 in GC_try_to_collect_inner () from /usr/lib/i386-linux-gnu/libgc.so.1 #7 0xb726e12a in GC_collect_or_expand () from /usr/lib/i386-linux-gnu/libgc.so.1 #8 0xb726e2b1 in GC_allocobj () from /usr/lib/i386-linux-gnu/libgc.so.1 #9 0xb72731a4 in GC_generic_malloc_inner () from /usr/lib/i386-linux-gnu/libgc.so.1 #10 0xb72732be in GC_generic_malloc () from /usr/lib/i386-linux-gnu/libgc.so.1 #11 0xb761b81e in scm_i_new_smob () from /usr/lib/libguile-2.0.so.22 #12 0x0804985a in std::vector<void (*)(), std::allocator<void (*)()> >::_M_insert_aux(__gnu_cxx::__normal_iterator<void (**)(), std::vector<void (*)(), std::allocator<void (*)()> > >, void (* const&)()) () #13 0x0804a6fc in __gnu_cxx::new_allocator<void (*)()>::allocate(unsigned int, void const*) () #14 0x0804a03d in void std::_Destroy<void (**)(), void (*)()>(void (**)(), void (**)(), std::allocator<void (*)()>&) () #15 0x08049a73 in void std::_Destroy<Family**, Family*>(Family**, Family**, std::allocator<Family*>&) () #16 0x0804934d in scm_puts () #17 0x0804938f in Scm_init::Scm_init(void (*)()) () #18 0x0804938f in Scm_init::Scm_init(void (*)()) () #19 0x0804938f in Scm_init::Scm_init(void (*)()) () #20 0x0804938f in Scm_init::Scm_init(void (*)()) () #21 0x0804938f in Scm_init::Scm_init(void (*)()) () #22 0x0804938f in Scm_init::Scm_init(void (*)()) () #23 0x0804938f in Scm_init::Scm_init(void (*)()) () #24 0x0804938f in Scm_init::Scm_init(void (*)()) () #25 0x0804938f in Scm_init::Scm_init(void (*)()) () #26 0x0804938f in Scm_init::Scm_init(void (*)()) () #27 0x0804938f in Scm_init::Scm_init(void (*)()) () #28 0x0804938f in Scm_init::Scm_init(void (*)()) () #29 0x0804938f in Scm_init::Scm_init(void (*)()) () #30 0x080495e7 in std::vector<Family*, std::allocator<Family*> >::operator[](unsigned int) () #31 0xb75b7dfd in ?? () from /usr/lib/libguile-2.0.so.22 #32 0xb76418e7 in ?? () from /usr/lib/libguile-2.0.so.22 #33 0xb761afb9 in ?? () from /usr/lib/libguile-2.0.so.22 #34 0xb7659f20 in ?? () from /usr/lib/libguile-2.0.so.22 #35 0xb765a539 in ?? () from /usr/lib/libguile-2.0.so.22 #36 0xb75c24f3 in scm_call_4 () from /usr/lib/libguile-2.0.so.22 #37 0xb7641acf in scm_catch_with_pre_unwind_handler () from /usr/lib/libguile-2.0.so.22 #38 0xb7641bd4 in scm_c_catch () from /usr/lib/libguile-2.0.so.22 #39 0xb75b85d1 in ?? () from /usr/lib/libguile-2.0.so.22 #40 0xb75b86d3 in scm_c_with_continuation_barrier () from /usr/lib/libguile-2.0.so.22 #41 0xb763ef7e in ?? () from /usr/lib/libguile-2.0.so.22 #42 0xb72782c1 in GC_call_with_stack_base () from /usr/lib/i386-linux-gnu/libgc.so.1 #43 0xb763f3e6 in scm_with_guile () from /usr/lib/libguile-2.0.so.22 #44 0x080496c5 in void __gnu_cxx::__alloc_traits<std::allocator<void (*)()> >::construct<void (*)()>(std::allocator<void (*)()>&, void (**)(), void (* const&)()) () #45 0xb72cfa83 in __libc_start_main (main=0x4, argc=134517216, argv=0x0, init=0x8049201 <main+16>, fini=0x804967e <std::_Vector_base<void (*)(), std::allocator<void (*)()> >::~_Vector_base()+40>, rtld_fini=0x4, stack_end=0xbfca7aa4) at libc-start.c:287 #46 0xb7750000 in ?? () from /lib/ld-linux.so.2 Backtrace stopped: previous frame inner to this frame (corrupt stack?) The frames listed as #17 0x0804938f in Scm_init::Scm_init(void (*)()) () #18 ... appear to be a fluke in address/symbol correlation and do not seem to have an actual relation to the listed function when looking at the code. I append all the requisite source files producing the test binary when running "make". I can send my binary (when desired) which was compiled using 2.0.11 on a i586 platform. However, I would expect the problem to reproduce in some kind of manner on other systems. The compiler was gcc version 4.9.1 (Ubuntu 4.9.1-16ubuntu6), Target: i686-linux-gnu. The headers are somewhat cooked-down variants (to remove dependencies on other parts of LilyPond) of the Smob organizing classes used in LilyPond. If one wants to compile for Guile v1 for comparison, one probably needs typedef void * scm_t_func; or so somewhere. [-- Attachment #2: guile-bug.tgz --] [-- Type: application/x-gtar-compressed, Size: 6174 bytes --] [-- Attachment #3: Type: text/plain, Size: 19 bytes --] -- David Kastrup ^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#19883: Correction for backtrace 2015-02-16 16:41 ` bug#19883: Smob's mark_smob has become unreliable in Guile 2.x David Kastrup @ 2015-02-16 18:13 ` David Kastrup 2015-02-25 22:16 ` Ludovic Courtès 2015-03-01 6:51 ` bug#19883: Smob's mark_smob has become unreliable in Guile 2.x Mark H Weaver 2016-10-09 7:51 ` bug#19883: The “finalized” SMOB type Artyom Poptsov 2 siblings, 1 reply; 23+ messages in thread From: David Kastrup @ 2015-02-16 18:13 UTC (permalink / raw) To: 19883 This is embarrassing: I used the wrong executable in connection with the core dump. With the matching executable, the coredump makes a lot more sense: #0 0x00000000 in ?? () #1 0x0804aee0 in Smob_base<Family>::mark_trampoline (arg=0x9fbb000) at smobs.tcc:34 #2 0xb761b2da in ?? () from /usr/lib/libguile-2.0.so.22 #3 0xb72751f8 in GC_mark_from () from /usr/lib/i386-linux-gnu/libgc.so.1 #4 0xb72766ca in GC_mark_some () from /usr/lib/i386-linux-gnu/libgc.so.1 #5 0xb726cd16 in GC_stopped_mark () from /usr/lib/i386-linux-gnu/libgc.so.1 #6 0xb726d730 in GC_try_to_collect_inner () from /usr/lib/i386-linux-gnu/libgc.so.1 #7 0xb726e12a in GC_collect_or_expand () from /usr/lib/i386-linux-gnu/libgc.so.1 #8 0xb726e2b1 in GC_allocobj () from /usr/lib/i386-linux-gnu/libgc.so.1 #9 0xb72731a4 in GC_generic_malloc_inner () from /usr/lib/i386-linux-gnu/libgc.so.1 #10 0xb72732be in GC_generic_malloc () from /usr/lib/i386-linux-gnu/libgc.so.1 #11 0xb761b81e in scm_i_new_smob () from /usr/lib/libguile-2.0.so.22 #12 0x0804985a in scm_new_smob (tc=7039, data=158987224) at /usr/include/guile/2.0/libguile/smob.h:91 #13 0x0804a6fc in Smob_base<Family>::register_ptr (p=0x979f3d8) at smobs.tcc:53 #14 0x0804a03d in Smob<Family>::unprotected_smobify_self (this=0x979f3dc) at smobs.hh:273 #15 0x08049a73 in Smob<Family>::smobify_self (this=0x979f3dc) at smobs.hh:286 #16 0x0804934d in Family::Family (this=0x979f3d8, totals=0, branch=2) at test.cc:30 #17 0x0804938f in Family::Family (this=0x98b7c98, totals=3, branch=2) at test.cc:35 #18 0x0804938f in Family::Family (this=0x979e598, totals=8, branch=2) at test.cc:35 #19 0x0804938f in Family::Family (this=0x98b65c8, totals=17, branch=2) at test.cc:35 #20 0x0804938f in Family::Family (this=0x9934d78, totals=37, branch=2) at test.cc:35 #21 0x0804938f in Family::Family (this=0x99377c8, totals=76, branch=2) at test.cc:35 #22 0x0804938f in Family::Family (this=0x96921e0, totals=154, branch=2) at test.cc:35 #23 0x0804938f in Family::Family (this=0xa0c3998, totals=311, branch=2) at test.cc:35 #24 0x0804938f in Family::Family (this=0x9d0cc88, totals=623, branch=2) at test.cc:35 #25 0x0804938f in Family::Family (this=0x9b8b0a8, totals=1248, branch=2) at test.cc:35 #26 0x0804938f in Family::Family (this=0x9e84808, totals=2498, branch=2) at test.cc:35 #27 0x0804938f in Family::Family (this=0x9bf3d48, totals=4998, branch=2) at test.cc:35 #28 0x0804938f in Family::Family (this=0x9d05de8, totals=9998, branch=2) at test.cc:35 #29 0x0804938f in Family::Family (this=0x9b8dcb8, totals=19999, branch=2) at test.cc:35 #30 0x080495e7 in workload (avv=0xbfca7aa4) at test.cc:71 #31 0xb75b7dfd in ?? () from /usr/lib/libguile-2.0.so.22 #32 0xb76418e7 in ?? () from /usr/lib/libguile-2.0.so.22 #33 0xb761afb9 in ?? () from /usr/lib/libguile-2.0.so.22 #34 0xb7659f20 in ?? () from /usr/lib/libguile-2.0.so.22 #35 0xb765a539 in ?? () from /usr/lib/libguile-2.0.so.22 #36 0xb75c24f3 in scm_call_4 () from /usr/lib/libguile-2.0.so.22 #37 0xb7641acf in scm_catch_with_pre_unwind_handler () from /usr/lib/libguile-2.0.so.22 #38 0xb7641bd4 in scm_c_catch () from /usr/lib/libguile-2.0.so.22 #39 0xb75b85d1 in ?? () from /usr/lib/libguile-2.0.so.22 #40 0xb75b86d3 in scm_c_with_continuation_barrier () from /usr/lib/libguile-2.0.so.22 #41 0xb763ef7e in ?? () from /usr/lib/libguile-2.0.so.22 #42 0xb72782c1 in GC_call_with_stack_base () from /usr/lib/i386-linux-gnu/libgc.so.1 #43 0xb763f3e6 in scm_with_guile () from /usr/lib/libguile-2.0.so.22 #44 0x080496c5 in main (ac=4, av=0xbfca7aa4) at test.cc:85 (gdb) So the core indeed occurs when trying to call scm_gc_mark on a smob no longer (or not yet?) associated with a valid structure in memory, in a garbage collection apparently triggered during normal allocation of smob data. Sorry for the nonsensical core dump previously. -- David Kastrup ^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#19883: Correction for backtrace 2015-02-16 18:13 ` bug#19883: Correction for backtrace David Kastrup @ 2015-02-25 22:16 ` Ludovic Courtès 2015-02-25 23:17 ` David Kastrup 0 siblings, 1 reply; 23+ messages in thread From: Ludovic Courtès @ 2015-02-25 22:16 UTC (permalink / raw) To: David Kastrup; +Cc: 19883 David Kastrup <dak@gnu.org> skribis: > This is embarrassing: I used the wrong executable in connection with the > core dump. With the matching executable, the coredump makes a lot more > sense: > > #0 0x00000000 in ?? () > #1 0x0804aee0 in Smob_base<Family>::mark_trampoline (arg=0x9fbb000) > at smobs.tcc:34 > #2 0xb761b2da in ?? () from /usr/lib/libguile-2.0.so.22 > #3 0xb72751f8 in GC_mark_from () from /usr/lib/i386-linux-gnu/libgc.so.1 Could you try commenting out all the SMOB mark functions in LilyPond? This doesn’t fix the bug, of course, but it’s probably a good workaround: user-provided mark functions are not needed in Guile 2.0 since libgc scans the whole heap for live pointers. Thanks, Ludo’. ^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#19883: Correction for backtrace 2015-02-25 22:16 ` Ludovic Courtès @ 2015-02-25 23:17 ` David Kastrup 2015-02-26 11:35 ` Ludovic Courtès 0 siblings, 1 reply; 23+ messages in thread From: David Kastrup @ 2015-02-25 23:17 UTC (permalink / raw) To: Ludovic Courtès; +Cc: 19883 ludo@gnu.org (Ludovic Courtès) writes: > David Kastrup <dak@gnu.org> skribis: > >> This is embarrassing: I used the wrong executable in connection with the >> core dump. With the matching executable, the coredump makes a lot more >> sense: >> >> #0 0x00000000 in ?? () >> #1 0x0804aee0 in Smob_base<Family>::mark_trampoline (arg=0x9fbb000) >> at smobs.tcc:34 >> #2 0xb761b2da in ?? () from /usr/lib/libguile-2.0.so.22 >> #3 0xb72751f8 in GC_mark_from () from /usr/lib/i386-linux-gnu/libgc.so.1 > > Could you try commenting out all the SMOB mark functions in LilyPond? > > This doesn’t fix the bug, of course, but it’s probably a good > workaround: user-provided mark functions are not needed in Guile 2.0 > since libgc scans the whole heap for live pointers. Even the test program crashes at the end (when `count' is called in order to traverse the created hierarchy) when you disable the setting of the mark function in the init method in smobs.tcc. A pointer to a C++ structure does not appear to protect the corresponding SMOB data and free_smob calls the delete operator which calls destructors and clobbers the memory area. Program received signal SIGSEGV, Segmentation fault. 0x08049b0a in std::vector<Family*, std::allocator<Family*> >::size ( this=0x1b8b) at /usr/include/c++/4.9/bits/stl_vector.h:655 655 { return size_type(this->_M_impl._M_finish - this->_M_impl._M_start); } (gdb) bt #0 0x08049b0a in std::vector<Family*, std::allocator<Family*> >::size ( this=0x1b8b) at /usr/include/c++/4.9/bits/stl_vector.h:655 #1 0x08049498 in Family::count (this=0x1b7f) at test.cc:53 #2 0x0804947c in Family::count (this=0x834f350) at test.cc:54 #3 0x0804947c in Family::count (this=0x8297d40) at test.cc:54 #4 0x0804947c in Family::count (this=0x828a9f8) at test.cc:54 #5 0x0804947c in Family::count (this=0x817d768) at test.cc:54 #6 0x0804947c in Family::count (this=0x828d588) at test.cc:54 #7 0x0804947c in Family::count (this=0x83298b8) at test.cc:54 #8 0x0804947c in Family::count (this=0x817fe58) at test.cc:54 #9 0x080495df in workload (avv=0xbffff074) at test.cc:73 #10 0xb7e66dfd in ?? () from /usr/lib/libguile-2.0.so.22 #11 0xb7ef08e7 in ?? () from /usr/lib/libguile-2.0.so.22 #12 0xb7ec9fb9 in ?? () from /usr/lib/libguile-2.0.so.22 #13 0xb7f08f20 in ?? () from /usr/lib/libguile-2.0.so.22 #14 0xb7f09539 in ?? () from /usr/lib/libguile-2.0.so.22 #15 0xb7e714f3 in scm_call_4 () from /usr/lib/libguile-2.0.so.22 #16 0xb7ef0acf in scm_catch_with_pre_unwind_handler () from /usr/lib/libguile-2.0.so.22 #17 0xb7ef0bd4 in scm_c_catch () from /usr/lib/libguile-2.0.so.22 #18 0xb7e675d1 in ?? () from /usr/lib/libguile-2.0.so.22 #19 0xb7e676d3 in scm_c_with_continuation_barrier () from /usr/lib/libguile-2.0.so.22 #20 0xb7eedf7e in ?? () from /usr/lib/libguile-2.0.so.22 #21 0xb7b272c1 in GC_call_with_stack_base () from /usr/lib/i386-linux-gnu/libgc.so.1 #22 0xb7eee3e6 in scm_with_guile () from /usr/lib/libguile-2.0.so.22 #23 0x08049685 in main (ac=4, av=0xbffff074) at test.cc:85 -- David Kastrup ^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#19883: Correction for backtrace 2015-02-25 23:17 ` David Kastrup @ 2015-02-26 11:35 ` Ludovic Courtès 2015-02-26 12:32 ` David Kastrup 0 siblings, 1 reply; 23+ messages in thread From: Ludovic Courtès @ 2015-02-26 11:35 UTC (permalink / raw) To: David Kastrup; +Cc: 19883 David Kastrup <dak@gnu.org> skribis: > ludo@gnu.org (Ludovic Courtès) writes: > >> David Kastrup <dak@gnu.org> skribis: >> >>> This is embarrassing: I used the wrong executable in connection with the >>> core dump. With the matching executable, the coredump makes a lot more >>> sense: >>> >>> #0 0x00000000 in ?? () >>> #1 0x0804aee0 in Smob_base<Family>::mark_trampoline (arg=0x9fbb000) >>> at smobs.tcc:34 >>> #2 0xb761b2da in ?? () from /usr/lib/libguile-2.0.so.22 >>> #3 0xb72751f8 in GC_mark_from () from /usr/lib/i386-linux-gnu/libgc.so.1 >> >> Could you try commenting out all the SMOB mark functions in LilyPond? >> >> This doesn’t fix the bug, of course, but it’s probably a good >> workaround: user-provided mark functions are not needed in Guile 2.0 >> since libgc scans the whole heap for live pointers. > > Even the test program crashes at the end (when `count' is called in > order to traverse the created hierarchy) when you disable the setting of > the mark function in the init method in smobs.tcc. Could you add debugging symbols for libguile? I don’t understand how ‘count’ gets called. Do you know if this is a use-after-free error? Perhaps setting MALLOC_CHECK_=1 would give a hint. If this is the case, Andy had the idea of turning on topological finalization in the GC. This may help for this particular case, but I vaguely recall that this breaks other finalizer-related things. (I would check by myself, but ISTR that building LilyPond “on one’s own” is not recommended. What would you suggest? A Guix recipe would be sweet.) > A pointer to a C++ structure does not appear to protect the > corresponding SMOB data and free_smob calls the delete operator which > calls destructors and clobbers the memory area. Oh, I was mistaken in my previous message. GC scans the stack and the GC-managed heap (stuff allocated with GC_MALLOC/scm_gc_malloc et al.), but it does *not* scan the malloc/new heap. So indeed, C++ objects that hold references to ‘SCM’ objects, such as instances of ‘Smob<X>’, must either have a mark function, or they must be allocated with scm_gc_malloc. Would it be possible to add a ‘new’ operator to ‘Smob’ that uses ‘scm_gc_malloc’, and a ‘delete’ operator that uses ‘scm_gc_free’? Thanks, Ludo’. ^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#19883: Correction for backtrace 2015-02-26 11:35 ` Ludovic Courtès @ 2015-02-26 12:32 ` David Kastrup 2015-02-26 14:03 ` Ludovic Courtès 0 siblings, 1 reply; 23+ messages in thread From: David Kastrup @ 2015-02-26 12:32 UTC (permalink / raw) To: Ludovic Courtès; +Cc: 19883 ludo@gnu.org (Ludovic Courtès) writes: > David Kastrup <dak@gnu.org> skribis: > >> ludo@gnu.org (Ludovic Courtès) writes: >> >>> David Kastrup <dak@gnu.org> skribis: >>> >>>> This is embarrassing: I used the wrong executable in connection with the >>>> core dump. With the matching executable, the coredump makes a lot more >>>> sense: >>>> >>>> #0 0x00000000 in ?? () >>>> #1 0x0804aee0 in Smob_base<Family>::mark_trampoline (arg=0x9fbb000) >>>> at smobs.tcc:34 >>>> #2 0xb761b2da in ?? () from /usr/lib/libguile-2.0.so.22 >>>> #3 0xb72751f8 in GC_mark_from () from /usr/lib/i386-linux-gnu/libgc.so.1 >>> >>> Could you try commenting out all the SMOB mark functions in LilyPond? >>> >>> This doesn’t fix the bug, of course, but it’s probably a good >>> workaround: user-provided mark functions are not needed in Guile 2.0 >>> since libgc scans the whole heap for live pointers. >> >> Even the test program crashes at the end (when `count' is called in >> order to traverse the created hierarchy) when you disable the setting of >> the mark function in the init method in smobs.tcc. > > Could you add debugging symbols for libguile? I don’t understand how > ‘count’ gets called. Figure me surprised. Here is the recursive walk: int Family::count () { int sum = 1; for (int i = 0; i < kids.size (); i++) sum += kids[i]->count (); return sum; } and here is the starting call in workload(): cout << "last has " << Family::unsmob (k)->count () << endl; > Do you know if this is a use-after-free error? Sure. Nothing else would clobber the kids[] array to contain bad pointers. > If this is the case, Andy had the idea of turning on topological > finalization in the GC. This may help for this particular case, but I > vaguely recall that this breaks other finalizer-related things. I don't see why. Topological finalization might help with mark-after-free. But why would it help if there is not even any mark call involved? This is clearly use-after-free. > (I would check by myself, but ISTR that building LilyPond “on one’s > own” is not recommended. What would you suggest? A Guix recipe would > be sweet.) Is there a reason you are not using the test program provided with this bug report? There is no real point in experimenting with LilyPond's complexity when a simple test program using its memory management classes already crashes. LilyPond's GUILEv2 branch is currently out of order again since 2.0.11 changed encoding mechanisms _again_ in an incompatible manner (what GUILE calls "stable" is anything but). It is becoming harder and harder to work around GUILE's attempts of wresting encoding control from the application, while GUILE has no byte-transparent decoding of UTF-8, does not support strings encoded in UTF-8, and (as of 2.0.11 or 2.0.10) supports _only_ string ports redecoded to UTF-8. So dealing with memory-mapped UTF-8 encoded files which are multiplexed between reading by GUILE and reading by an UTF-8 decoding parser has again been thwarted. While I try figuring out how to repair the damage this time, testing with LilyPond itself is hard to interpret since a number of problems are not related to the memory management. As long as this simple test program can show the memory management related crashes, I don't see the point in throwing people at LilyPond: that has not delivered any results the last several times I tried it. >> A pointer to a C++ structure does not appear to protect the >> corresponding SMOB data and free_smob calls the delete operator which >> calls destructors and clobbers the memory area. > > Oh, I was mistaken in my previous message. GC scans the stack and the > GC-managed heap (stuff allocated with GC_MALLOC/scm_gc_malloc et al.), > but it does *not* scan the malloc/new heap. > > So indeed, C++ objects that hold references to ‘SCM’ objects, such as > instances of ‘Smob<X>’, must either have a mark function, or they must > be allocated with scm_gc_malloc. > > Would it be possible to add a ‘new’ operator to ‘Smob’ that uses > ‘scm_gc_malloc’, and a ‘delete’ operator that uses ‘scm_gc_free’? It would not help since many of the references are stored in STL containers (like std::vector <Grob *>) which have their data allocated/deallocated separately from the memory area of the structure itself. Frankly, I don't get the current strategy of GUILE: basically any use of scm_set_smob_mark will result in a function that can be called with garbage from a smob that has already been deallocated via the function registered with scm_set_smob_free. GUILEv2 developers have resisted fixing this bug for years by trying to stop people from using scm_set_smob_mark and instead telling people to have their entire heap scanned by a conservative garbage collector. For an application like LilyPond which can easily have the heap cover more than half of the available address space and run for half an hour (when generating docs) processing independent files with large individual memory requirements, this strategy will have both considerable performance impacts as well as bleed enough randomly retained memory to run the application into the ground eventually. In my current work on fixing the encoding stuff again I have patched my code to deal with the mark-after-free errors in the free and mark trampolines myself. I need to find a solution for the encoding mess before I can actually indulge in more testing of this workaround. However, due to the intransparency of GUILE's implementation and the multithreaded collector, I have no guarantees that my work on the respective trampolines will reliably prevent all mark-after-free errors. This is something that needs to get fixed in GUILE. It does not make sense to provide a mark callback mechanism that can be called with garbage in GUILE's free store. When GUILE releases/collects memory, it does not make sense to leave the SMOB cells in a state indistinguishable from from valid data. Apart from causing crashes in mark functions, this makes work much harder for the conservative garbage collector. -- David Kastrup ^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#19883: Correction for backtrace 2015-02-26 12:32 ` David Kastrup @ 2015-02-26 14:03 ` Ludovic Courtès 2015-02-26 15:30 ` David Kastrup 0 siblings, 1 reply; 23+ messages in thread From: Ludovic Courtès @ 2015-02-26 14:03 UTC (permalink / raw) To: David Kastrup; +Cc: 19883 [-- Attachment #1: Type: text/plain, Size: 201 bytes --] David Kastrup <dak@gnu.org> skribis: > Is there a reason you are not using the test program provided with this > bug report? Sorry, I had overlooked it. Please assume good faith. With this patch: [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: Type: text/x-patch, Size: 697 bytes --] diff --git a/smobs.tcc b/smobs.tcc index c8b9dfe..b1d61b7 100644 --- a/smobs.tcc +++ b/smobs.tcc @@ -132,9 +132,6 @@ void Smob_base<Super>::init () // types in order to be able to compare them. The other comparisons // are for static member functions and thus are ordinary function // pointers which work without those contortions. - if (static_cast<SCM (Super::*)()>(&Super::mark_smob) != - static_cast<SCM (Super::*)()>(&Smob_base<Super>::mark_smob)) - scm_set_smob_mark (smob_tag_, Super::mark_trampoline); scm_set_smob_print (smob_tag_, Super::print_trampoline); if (&Super::equal_p != &Smob_base<Super>::equal_p) scm_set_smob_equalp (smob_tag_, Super::equal_p); [-- Attachment #3: Type: text/plain, Size: 1555 bytes --] “./test 1000 1000 10000” works fine. However, it’s not representative since the C++ objects in this test do not hold ‘SCM’ references, AFAICS. > LilyPond's GUILEv2 branch is currently out of order again since 2.0.11 > changed encoding mechanisms _again_ Please submit a different bug report. >>> A pointer to a C++ structure does not appear to protect the >>> corresponding SMOB data and free_smob calls the delete operator which >>> calls destructors and clobbers the memory area. >> >> Oh, I was mistaken in my previous message. GC scans the stack and the >> GC-managed heap (stuff allocated with GC_MALLOC/scm_gc_malloc et al.), >> but it does *not* scan the malloc/new heap. >> >> So indeed, C++ objects that hold references to ‘SCM’ objects, such as >> instances of ‘Smob<X>’, must either have a mark function, or they must >> be allocated with scm_gc_malloc. >> >> Would it be possible to add a ‘new’ operator to ‘Smob’ that uses >> ‘scm_gc_malloc’, and a ‘delete’ operator that uses ‘scm_gc_free’? > > It would not help since many of the references are stored in STL > containers (like std::vector <Grob *>) which have their data > allocated/deallocated separately from the memory area of the structure > itself. Oh, OK. Still, I don’t think this is a problem because each C++ object has a corresponding SMOB, and said SMOB is GC-protected; thus the C++ object itself is also GC-protected until the SMOB is unprotected. Here’s the patch I’ve ended up with: [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #4: Type: text/x-patch, Size: 2789 bytes --] diff --git a/smobs.hh b/smobs.hh index 3701280..a41a645 100644 --- a/smobs.hh +++ b/smobs.hh @@ -263,6 +263,20 @@ private: protected: Smob () : self_scm_ (SCM_UNDEFINED), protection_cons_ (SCM_EOL) { }; public: + static void *operator new (std::size_t size) + { + /* This C++ object is referred to by the corresponding SMOB, which is + itself GC-protected. Thus, no need to protect the C++ object. */ + return scm_gc_malloc (size, "lily-smob"); + } + + static void operator delete (void *thing) + { + /* Nothing to do: the GC will reclaim memory for THING when it deems + appropriate. */ + // printf ("delete %p\n", thing); + } + static size_t free_smob (SCM obj) { delete Smob_base<Super>::unregister_ptr (obj); diff --git a/smobs.tcc b/smobs.tcc index c8b9dfe..0eb0c30 100644 --- a/smobs.tcc +++ b/smobs.tcc @@ -31,7 +31,7 @@ template <class Super> SCM Smob_base<Super>::mark_trampoline (SCM arg) { - return (Super::unsmob (arg))->mark_smob (); + abort (); } template <class Super> @@ -60,14 +60,14 @@ template <class Super> SCM Smob_base<Super>::mark_smob () { - return SCM_UNSPECIFIED; + abort (); } template <class Super> size_t Smob_base<Super>::free_smob (SCM) { - return 0; + abort (); } template <class Super> @@ -125,16 +125,11 @@ void Smob_base<Super>::init () // While that's not a consideration for type_p_name_, it's easier // doing it like the rest. - if (&Super::free_smob != &Smob_base<Super>::free_smob) - scm_set_smob_free (smob_tag_, Super::free_smob); // Old GCC versions get their type lattice for pointers-to-members // tangled up to a degree where we need to typecast _both_ covariant // types in order to be able to compare them. The other comparisons // are for static member functions and thus are ordinary function // pointers which work without those contortions. - if (static_cast<SCM (Super::*)()>(&Super::mark_smob) != - static_cast<SCM (Super::*)()>(&Smob_base<Super>::mark_smob)) - scm_set_smob_mark (smob_tag_, Super::mark_trampoline); scm_set_smob_print (smob_tag_, Super::print_trampoline); if (&Super::equal_p != &Smob_base<Super>::equal_p) scm_set_smob_equalp (smob_tag_, Super::equal_p); diff --git a/test.cc b/test.cc index b5ab476..6bbd435 100644 --- a/test.cc +++ b/test.cc @@ -41,9 +41,7 @@ Family::Family (int totals, int branch) SCM Family::mark_smob () { - for (int i = 0; i < kids.size (); i++) - scm_gc_mark (kids[i]->self_scm ()); - return SCM_EOL; + abort (); } int @@ -51,7 +49,11 @@ Family::count () { int sum = 1; for (int i = 0; i < kids.size (); i++) - sum += kids[i]->count (); + { + sum += kids[i]->count (); + if ((i % 100) == 0) + scm_gc (); + } return sum; } [-- Attachment #5: Type: text/plain, Size: 1671 bytes --] It works on this test case, and, assuming I correctly understand LilyPond’s use case, it may be a viable solution for LilyPond. Could you give it a try and report back? > Frankly, I don't get the current strategy of GUILE: basically any use of > scm_set_smob_mark will result in a function that can be called with > garbage from a smob that has already been deallocated via the function > registered with scm_set_smob_free. That’s not true. For example, the GnuTLS bindings use the exact same code for 1.8 and 2.0 without any problems; it’s surely a simpler example than LilyPond, of course, but still. ~~~~~~~ In the interest of the Guile and LilyPond projects, I would like to improve our communication. Let me explain what my personal expectations are. First, like many others, I’m a volunteer with limited bandwidth. Thus, I really prefer bug reports that are as concise as possible and to-the-point; I can’t afford to read so much. Second, I don’t want to mix issues: this bug report is about a specific GC issue, not about an encoding issue. It’s better for me (and everyone I guess) when a bug report remains focused. Third, we must fix these LilyPond vs. Guile 2 issues. You and others have spent a lot of time on this already. I think it would help to get everyone involved on both sides. Thus, could you Cc: this bug report to the LilyPond developer list, or the corresponding LilyPond bug report? This is really important to me. Please don’t take it personally or anything. I’m really trying hard to see how we can improve the collaboration between the two projects. Thanks, Ludo’. ^ permalink raw reply related [flat|nested] 23+ messages in thread
* bug#19883: Correction for backtrace 2015-02-26 14:03 ` Ludovic Courtès @ 2015-02-26 15:30 ` David Kastrup 2015-02-26 18:15 ` Ludovic Courtès 2016-06-23 9:50 ` Andy Wingo 0 siblings, 2 replies; 23+ messages in thread From: David Kastrup @ 2015-02-26 15:30 UTC (permalink / raw) To: Ludovic Courtès; +Cc: 19883 ludo@gnu.org (Ludovic Courtès) writes: > David Kastrup <dak@gnu.org> skribis: > >> Is there a reason you are not using the test program provided with this >> bug report? > > Sorry, I had overlooked it. Please assume good faith. > > With this patch: > > diff --git a/smobs.tcc b/smobs.tcc > index c8b9dfe..b1d61b7 100644 > --- a/smobs.tcc > +++ b/smobs.tcc > @@ -132,9 +132,6 @@ void Smob_base<Super>::init () > // types in order to be able to compare them. The other comparisons > // are for static member functions and thus are ordinary function > // pointers which work without those contortions. > - if (static_cast<SCM (Super::*)()>(&Super::mark_smob) != > - static_cast<SCM (Super::*)()>(&Smob_base<Super>::mark_smob)) > - scm_set_smob_mark (smob_tag_, Super::mark_trampoline); > scm_set_smob_print (smob_tag_, Super::print_trampoline); > if (&Super::equal_p != &Smob_base<Super>::equal_p) > scm_set_smob_equalp (smob_tag_, Super::equal_p); > > > “./test 1000 1000 10000” works fine. That's not really surprising: this specifies a "fork factor" of 1000 and 1000 points of data which means that the data structures are nested only one level deep. Try ./test 2 2000 200 which will cause a nesting of about 10 levels, making it much more likely for collection/allocation problems to strike. > However, it’s not representative since the C++ objects in this test do > not hold ‘SCM’ references, AFAICS. Why isn't it representative? That's _very_ much the situation in LilyPond's code base. >> LilyPond's GUILEv2 branch is currently out of order again since >> 2.0.11 changed encoding mechanisms _again_ > > Please submit a different bug report. There were separate bug reports. This is considered a feature. >> It would not help since many of the references are stored in STL >> containers (like std::vector <Grob *>) which have their data >> allocated/deallocated separately from the memory area of the >> structure itself. > > Oh, OK. Still, I don’t think this is a problem because each C++ > object has a corresponding SMOB, and said SMOB is GC-protected; thus > the C++ object itself is also GC-protected until the SMOB is > unprotected. The code given in test.cc is representative for LilyPond: most of the C++ objects refer to other C++ objects via pointers, and the protection of SMOB and C++ objects is managed through the mark callbacks. Complex C++ objects contain their own SCM value as part of the Smob base class. Simple C++ objects (derived from Simple_smob) don't and are only optionally managed by GUILE. The test program currently does not demonstrate use of Simple_smob: at the current point of time getting the crashes of Smob under control is the first target. > Here’s the patch I’ve ended up with: > > diff --git a/smobs.hh b/smobs.hh > index 3701280..a41a645 100644 > --- a/smobs.hh > +++ b/smobs.hh > @@ -263,6 +263,20 @@ private: > protected: > Smob () : self_scm_ (SCM_UNDEFINED), protection_cons_ (SCM_EOL) { }; > public: > + static void *operator new (std::size_t size) > + { > + /* This C++ object is referred to by the corresponding SMOB, which is > + itself GC-protected. Thus, no need to protect the C++ object. */ > + return scm_gc_malloc (size, "lily-smob"); > + } > + > + static void operator delete (void *thing) > + { > + /* Nothing to do: the GC will reclaim memory for THING when it deems > + appropriate. */ > + // printf ("delete %p\n", thing); > + } > + As I stated: this will not help with STL containers which are extensively used in pretty much every data structure of LilyPond. >> Frankly, I don't get the current strategy of GUILE: basically any use >> of scm_set_smob_mark will result in a function that can be called >> with garbage from a smob that has already been deallocated via the >> function registered with scm_set_smob_free. > > That’s not true. Shrug. I posted the backtrace and the example program. > Let me explain what my personal expectations are. First, like many > others, I’m a volunteer with limited bandwidth. Thus, I really prefer > bug reports that are as concise as possible and to-the-point; I can’t > afford to read so much. That's what I consider the primary reason that every offer to help with LilyPond so far has gone dead the moment I provided a branch and recipe to work on. LilyPond is a large and complex application. That's why I went to the pain of providing standalone code demonstrating the memory allocation and management problems so that this could be worked on independently from other problems. > Second, I don’t want to mix issues: this bug report is about a > specific GC issue, not about an encoding issue. It’s better for me > (and everyone I guess) when a bug report remains focused. Which is why I provided a separate test program that does not break whenever the "stable" Guile version changes its encoding APIs again. Several weeks ago I promised RMS to update the GUILEv2 branch to the current version of LilyPond as he got a renewed promise of cooperation. It turned out that in the mean time Ubuntu had updated Guilev2 from 2.0.9 (or something) to 2.0.11, and the encoding issues I got under control for 2.0.9 were back with a vengeance where a speedy solution was not in sight. So I spent the time to create test code _only_ involving the memory management issues. Now the complaint appears to be that the code is not representative for LilyPond. It most certainly is. > Third, we must fix these LilyPond vs. Guile 2 issues. You and others > have spent a lot of time on this already. In the last two years, nobody except myself spent any time on GUILEv2/LilyPond. Previous to that, Ian Hulin spent time on reorganizing our code base and startup code in order not to get the byte compilation, module system and other interpreter/compiler differences of GUILEv2 mess up the basic operation, mostly by disabling a lot of stuff. There will be a lot of followup work to get code management and performance into reasonable shape once it works at all. When it is not possible to run the test suites without random crashes, there is no reasonably effective way for people to work on those aspects however. > I think it would help to get everyone involved on both sides. Thus, > could you Cc: this bug report to the LilyPond developer list, or the > corresponding LilyPond bug report? This is really important to me. Shrug. I'll put a link to this bug report to a suitable LilyPond issue. It will not likely trigger a response or other feedback, however. Nobody but myself is working on GUILEv2 migration. The current C++ template class implementation/abstraction of the GUILE memory management has been written by me. No other currently active developer has had either significant involvement with the current code, or the functionally similar previous code that used C preprocessor macros instead of language features to do essentially the same. > Please don’t take it personally or anything. I’m really trying hard > to see how we can improve the collaboration between the two projects. For better or worse, LilyPond has no other developer to offer than myself for that task. There is nobody else available who even has had exposure to GUILEv2 or even GUILE1 internals. -- David Kastrup ^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#19883: Correction for backtrace 2015-02-26 15:30 ` David Kastrup @ 2015-02-26 18:15 ` Ludovic Courtès 2015-02-26 19:25 ` David Kastrup 2016-06-23 9:50 ` Andy Wingo 1 sibling, 1 reply; 23+ messages in thread From: Ludovic Courtès @ 2015-02-26 18:15 UTC (permalink / raw) To: David Kastrup; +Cc: 19883 David Kastrup <dak@gnu.org> skribis: > ludo@gnu.org (Ludovic Courtès) writes: > >> David Kastrup <dak@gnu.org> skribis: [...] >>> It would not help since many of the references are stored in STL >>> containers (like std::vector <Grob *>) which have their data >>> allocated/deallocated separately from the memory area of the >>> structure itself. >> >> Oh, OK. Still, I don’t think this is a problem because each C++ >> object has a corresponding SMOB, and said SMOB is GC-protected; thus >> the C++ object itself is also GC-protected until the SMOB is >> unprotected. > > The code given in test.cc is representative for LilyPond: most of the > C++ objects refer to other C++ objects via pointers, and the protection > of SMOB and C++ objects is managed through the mark callbacks. Complex > C++ objects contain their own SCM value as part of the Smob base class. > Simple C++ objects (derived from Simple_smob) don't and are only > optionally managed by GUILE. I don’t think this contradicts what I wrote above, does it? >> Here’s the patch I’ve ended up with: >> >> diff --git a/smobs.hh b/smobs.hh >> index 3701280..a41a645 100644 >> --- a/smobs.hh >> +++ b/smobs.hh >> @@ -263,6 +263,20 @@ private: >> protected: >> Smob () : self_scm_ (SCM_UNDEFINED), protection_cons_ (SCM_EOL) { }; >> public: >> + static void *operator new (std::size_t size) >> + { >> + /* This C++ object is referred to by the corresponding SMOB, which is >> + itself GC-protected. Thus, no need to protect the C++ object. */ >> + return scm_gc_malloc (size, "lily-smob"); >> + } >> + >> + static void operator delete (void *thing) >> + { >> + /* Nothing to do: the GC will reclaim memory for THING when it deems >> + appropriate. */ >> + // printf ("delete %p\n", thing); >> + } >> + > > As I stated: this will not help with STL containers which are > extensively used in pretty much every data structure of LilyPond. Sorry, I don’t understand how it doesn’t help. It would be a problem is ‘Smob’ objects could be copied, thus ending up in non-GC-scanned storage, but IIUC they cannot be copied because their copy constructor is private. What am I missing? At any rate, I don’t see any failure with the test program. >> I think it would help to get everyone involved on both sides. Thus, >> could you Cc: this bug report to the LilyPond developer list, or the >> corresponding LilyPond bug report? This is really important to me. > > Shrug. I'll put a link to this bug report to a suitable LilyPond issue. Thank you. Though I want other LilyPond developers to get involved, and I’m afraid it would be easy for them to just ignore a side bug report. It’s a vital task for LilyPond, it cannot be a one-person side-project on the LilyPond side. Thanks in advance, Ludo’. ^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#19883: Correction for backtrace 2015-02-26 18:15 ` Ludovic Courtès @ 2015-02-26 19:25 ` David Kastrup 2015-02-27 9:55 ` Ludovic Courtès 0 siblings, 1 reply; 23+ messages in thread From: David Kastrup @ 2015-02-26 19:25 UTC (permalink / raw) To: Ludovic Courtès; +Cc: 19883 ludo@gnu.org (Ludovic Courtès) writes: > David Kastrup <dak@gnu.org> skribis: > >> Shrug. I'll put a link to this bug report to a suitable LilyPond issue. > > Thank you. Though I want other LilyPond developers to get involved, and > I’m afraid it would be easy for them to just ignore a side bug report. > > It’s a vital task for LilyPond, it cannot be a one-person side-project > on the LilyPond side. Three years ago in August there was a meeting of developers at my house, a lot of information was passed around and in a concerted effort LilyPond-2.16.0 was released. Since then all developments would properly be labelled a one-person project or side-project. Now if you take a look at GUILE 2.1 development, in particular all commits that are _not_ merges from the 2.0 branch, you'll find that quite more than 90% have come from the same person. LilyPond development is quite more diverse, but the various changes in the last years have invariably been one-person projects. Of the three large GUILE-based applications Gnucash, TeXmacs, and LilyPond, the only successful migration to GUILEv2 so far has been Gnucash, and its integration with GUILE is much smaller than that of LilyPond. The one-person projects concerning finding and/or fixing language and compiler problems or restructuring the related code areas have been exclusively mine. That is not necessarily a good match since my productivity drops to near zero when I lose interest with a problem, and without anybody else keeping the ball rolling, it can easily stay there. But I have no resources I could call upon to change that. -- David Kastrup ^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#19883: Correction for backtrace 2015-02-26 19:25 ` David Kastrup @ 2015-02-27 9:55 ` Ludovic Courtès 2015-02-27 10:18 ` David Kastrup 0 siblings, 1 reply; 23+ messages in thread From: Ludovic Courtès @ 2015-02-27 9:55 UTC (permalink / raw) To: David Kastrup; +Cc: 19883 David Kastrup <dak@gnu.org> skribis: > Three years ago in August there was a meeting of developers at my house, > a lot of information was passed around and in a concerted effort > LilyPond-2.16.0 was released. > > Since then all developments would properly be labelled a one-person > project or side-project. OK. I viewed LilyPond as a project with more people involved. > Now if you take a look at GUILE 2.1 development, in particular all > commits that are _not_ merges from the 2.0 branch, you'll find that > quite more than 90% have come from the same person. > > LilyPond development is quite more diverse, but the various changes in > the last years have invariably been one-person projects. > > Of the three large GUILE-based applications Gnucash, TeXmacs, and > LilyPond, the only successful migration to GUILEv2 so far has been > Gnucash, and its integration with GUILE is much smaller than that of > LilyPond. > > The one-person projects concerning finding and/or fixing language and > compiler problems or restructuring the related code areas have been > exclusively mine. That is not necessarily a good match since my > productivity drops to near zero when I lose interest with a problem, and > without anybody else keeping the ball rolling, it can easily stay there. Sure. There’s quite a number of names showing up in the lilypond-devel archive though. We need them to feel concerned about this. Ludo’. ^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#19883: Correction for backtrace 2015-02-27 9:55 ` Ludovic Courtès @ 2015-02-27 10:18 ` David Kastrup 0 siblings, 0 replies; 23+ messages in thread From: David Kastrup @ 2015-02-27 10:18 UTC (permalink / raw) To: Ludovic Courtès; +Cc: 19883 ludo@gnu.org (Ludovic Courtès) writes: > David Kastrup <dak@gnu.org> skribis: > >> Three years ago in August there was a meeting of developers at my house, >> a lot of information was passed around and in a concerted effort >> LilyPond-2.16.0 was released. >> >> Since then all developments would properly be labelled a one-person >> project or side-project. > > OK. I viewed LilyPond as a project with more people involved. There are a number of them involved, but with regard to coding, everybody is working on his own projects. The largest amount of "teamwork" is probably done in the translation team (which translates documentation and web pages to several different languages) and with documentation writers. There are also people compiling the releases and running the test suites. And there is feedback on the bug trackers. > Sure. There’s quite a number of names showing up in the > lilypond-devel archive though. We need them to feel concerned about > this. The only concern you'll get is people piping up and saying they cannot believe that not more progress has been made. Which is not all that dissimilar to what I see with GUILE and other projects even when sending ready-made patches. In this particular case however, the respective people do not have the low-level skills to actually contribute significantly. Which is the reason I try to condense the ugly stuff into well-encapsulated and controlled places instead of keeping them distributed across the code base. The memory management stuff is of that "nobody else treads here" kind. Now the encoding stuff is wildly enfuriating in its pointless erection and shifting around of road blocks for people needing to actually work with strings and buffers in their original encoding, but would possibly be suitable as material to fight with to more C programmers. -- David Kastrup ^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#19883: Correction for backtrace 2015-02-26 15:30 ` David Kastrup 2015-02-26 18:15 ` Ludovic Courtès @ 2016-06-23 9:50 ` Andy Wingo 2016-06-23 10:36 ` Andy Wingo 2016-06-23 13:12 ` Ludovic Courtès 1 sibling, 2 replies; 23+ messages in thread From: Andy Wingo @ 2016-06-23 9:50 UTC (permalink / raw) To: David Kastrup; +Cc: Ludovic Courtès, 19883 On Thu 26 Feb 2015 16:30, David Kastrup <dak@gnu.org> writes: > Try ./test 2 2000 200 I can reproduce the crash with your test case, thanks :) The patch below fixes the bug for me. WDYT Ludovic? Andy commit db30120fc3a1727d8f221cbb014314f2babf841e Author: Andy Wingo <wingo@pobox.com> Date: Thu Jun 23 11:47:42 2016 +0200 Fix race between SMOB marking and finalization * libguile/smob.c (clear_smobnum): New helper. (finalize_smob): Re-set the smobnum to the "finalized smob" type before finalizing. Fixes #19883. (scm_smob_prehistory): Pre-register a "finalized smob" type, which has no mark procedure. diff --git a/libguile/smob.c b/libguile/smob.c index 6a97caa..43ea613 100644 --- a/libguile/smob.c +++ b/libguile/smob.c @@ -372,20 +372,43 @@ scm_gc_mark (SCM o) } \f +static void* +clear_smobnum (void *ptr) +{ + SCM smob; + scm_t_bits smobnum; + + smob = SCM_PACK_POINTER (ptr); + + smobnum = SCM_SMOBNUM (smob); + /* Frob the object's type in place, re-setting it to be the "finalized + smob" type. This will prevent other routines from accessing its + internals in a way that assumes that the smob data is valid. This + is notably the case for SMOB's own "mark" procedure, if any; as the + finalizer runs without the alloc lock, it's possible for a GC to + occur while it's running, in which case the object is alive and yet + its data is invalid. */ + SCM_SET_SMOB_DATA_0 (smob, SCM_SMOB_DATA_0 (smob) & ~(scm_t_bits) 0xff00); + + return (void *) smobnum; +} + /* Finalize SMOB by calling its SMOB type's free function, if any. */ static void finalize_smob (void *ptr, void *data) { SCM smob; + scm_t_bits smobnum; size_t (* free_smob) (SCM); smob = SCM_PACK_POINTER (ptr); + smobnum = (scm_t_bits) GC_call_with_alloc_lock (clear_smobnum, ptr); + #if 0 - printf ("finalizing SMOB %p (smobnum: %u)\n", - ptr, SCM_SMOBNUM (smob)); + printf ("finalizing SMOB %p (smobnum: %u)\n", ptr, smobnum); #endif - free_smob = scm_smobs[SCM_SMOBNUM (smob)].free; + free_smob = scm_smobs[smobnum].free; if (free_smob) free_smob (smob); } @@ -460,6 +483,7 @@ void scm_smob_prehistory () { long i; + scm_t_bits finalized_smob_tc16; scm_i_pthread_key_create (¤t_mark_stack_pointer, NULL); scm_i_pthread_key_create (¤t_mark_stack_limit, NULL); @@ -483,6 +507,9 @@ scm_smob_prehistory () scm_smobs[i].apply = 0; scm_smobs[i].apply_trampoline = SCM_BOOL_F; } + + finalized_smob_tc16 = scm_make_smob_type ("finalized smob", 0); + if (SCM_TC2SMOBNUM (finalized_smob_tc16) != 0) abort (); } /* ^ permalink raw reply related [flat|nested] 23+ messages in thread
* bug#19883: Correction for backtrace 2016-06-23 9:50 ` Andy Wingo @ 2016-06-23 10:36 ` Andy Wingo 2016-06-23 13:13 ` Ludovic Courtès 2016-06-23 13:12 ` Ludovic Courtès 1 sibling, 1 reply; 23+ messages in thread From: Andy Wingo @ 2016-06-23 10:36 UTC (permalink / raw) To: Ludovic Courtès; +Cc: 19883-done, David Kastrup [-- Attachment #1: Type: text/plain, Size: 878 bytes --] On Thu 23 Jun 2016 11:50, Andy Wingo <wingo@pobox.com> writes: > On Thu 26 Feb 2015 16:30, David Kastrup <dak@gnu.org> writes: > >> Try ./test 2 2000 200 > > I can reproduce the crash with your test case, thanks :) The patch below > fixes the bug for me. WDYT Ludovic? Here's a patch with a test case. I'm going to apply as it seems to be obviously the right thing and the test case does reproduce what I think is the bug (racing mark and finalize procedures, even if it's only happening in one thread, finalizers and mark procedures do introduce concurrency). We trigger the concurrency in a simple way, via allocation in the finalizer. The patch does fix the original test. GC could happen due to another thread of course. I'm actually not sure where the concurrency is coming from in David's test though :/ I'm very interested in any feedback you might have! Andy [-- Attachment #2: 0001-Fix-race-between-SMOB-marking-and-finalization.patch --] [-- Type: text/plain, Size: 5429 bytes --] From 8dff3af087c6eaa83ae0d72aa8b22aef5c65d65d Mon Sep 17 00:00:00 2001 From: Andy Wingo <wingo@pobox.com> Date: Thu, 23 Jun 2016 11:47:42 +0200 Subject: [PATCH] Fix race between SMOB marking and finalization * libguile/smob.c (clear_smobnum): New helper. (finalize_smob): Re-set the smobnum to the "finalized smob" type before finalizing. Fixes #19883. (scm_smob_prehistory): Pre-register a "finalized smob" type, which has no mark procedure. * test-suite/standalone/test-smob-mark-race.c: New file. * test-suite/standalone/Makefile.am: Arrange to build and run the new test. --- libguile/smob.c | 33 +++++++++++++-- test-suite/standalone/Makefile.am | 6 +++ test-suite/standalone/test-smob-mark-race.c | 65 +++++++++++++++++++++++++++++ 3 files changed, 101 insertions(+), 3 deletions(-) create mode 100644 test-suite/standalone/test-smob-mark-race.c diff --git a/libguile/smob.c b/libguile/smob.c index 90849a8..ed9d91a 100644 --- a/libguile/smob.c +++ b/libguile/smob.c @@ -374,20 +374,43 @@ scm_gc_mark (SCM o) } \f +static void* +clear_smobnum (void *ptr) +{ + SCM smob; + scm_t_bits smobnum; + + smob = PTR2SCM (ptr); + + smobnum = SCM_SMOBNUM (smob); + /* Frob the object's type in place, re-setting it to be the "finalized + smob" type. This will prevent other routines from accessing its + internals in a way that assumes that the smob data is valid. This + is notably the case for SMOB's own "mark" procedure, if any; as the + finalizer runs without the alloc lock, it's possible for a GC to + occur while it's running, in which case the object is alive and yet + its data is invalid. */ + SCM_SET_SMOB_DATA_0 (smob, SCM_SMOB_DATA_0 (smob) & ~(scm_t_bits) 0xff00); + + return (void *) smobnum; +} + /* Finalize SMOB by calling its SMOB type's free function, if any. */ static void finalize_smob (void *ptr, void *data) { SCM smob; + scm_t_bits smobnum; size_t (* free_smob) (SCM); smob = PTR2SCM (ptr); + smobnum = (scm_t_bits) GC_call_with_alloc_lock (clear_smobnum, ptr); + #if 0 - printf ("finalizing SMOB %p (smobnum: %u)\n", - ptr, SCM_SMOBNUM (smob)); + printf ("finalizing SMOB %p (smobnum: %u)\n", ptr, smobnum); #endif - free_smob = scm_smobs[SCM_SMOBNUM (smob)].free; + free_smob = scm_smobs[smobnum].free; if (free_smob) free_smob (smob); } @@ -470,6 +493,7 @@ void scm_smob_prehistory () { long i; + scm_t_bits finalized_smob_tc16; scm_i_pthread_key_create (¤t_mark_stack_pointer, NULL); scm_i_pthread_key_create (¤t_mark_stack_limit, NULL); @@ -493,6 +517,9 @@ scm_smob_prehistory () scm_smobs[i].apply = 0; scm_smobs[i].apply_trampoline_objcode = SCM_BOOL_F; } + + finalized_smob_tc16 = scm_make_smob_type ("finalized smob", 0); + if (SCM_TC2SMOBNUM (finalized_smob_tc16) != 0) abort (); } /* diff --git a/test-suite/standalone/Makefile.am b/test-suite/standalone/Makefile.am index 712418a..aec7418 100644 --- a/test-suite/standalone/Makefile.am +++ b/test-suite/standalone/Makefile.am @@ -275,4 +275,10 @@ test_smob_mark_LDADD = $(LIBGUILE_LDADD) check_PROGRAMS += test-smob-mark TESTS += test-smob-mark +test_smob_mark_race_SOURCES = test-smob-mark-race.c +test_smob_mark_race_CFLAGS = ${test_cflags} +test_smob_mark_race_LDADD = $(LIBGUILE_LDADD) +check_PROGRAMS += test-smob-mark-race +TESTS += test-smob-mark-race + EXTRA_DIST += ${check_SCRIPTS} diff --git a/test-suite/standalone/test-smob-mark-race.c b/test-suite/standalone/test-smob-mark-race.c new file mode 100644 index 0000000..eca0325 --- /dev/null +++ b/test-suite/standalone/test-smob-mark-race.c @@ -0,0 +1,65 @@ +/* Copyright (C) 2016 Free Software Foundation, Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public License + * as published by the Free Software Foundation; either version 3 of + * the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA + * 02110-1301 USA + */ + +#if HAVE_CONFIG_H +#include <config.h> +#endif + +#undef NDEBUG + +#include <libguile.h> +#include <assert.h> + +static SCM +mark_smob (SCM smob) +{ + assert (SCM_SMOB_DATA (smob) == 1); + return SCM_BOOL_F; +} + +static size_t +finalize_smob (SCM smob) +{ + assert (SCM_SMOB_DATA (smob) == 1); + SCM_SET_SMOB_DATA (smob, 0); + /* Allocate a bit in the hopes of triggering a new GC, making the + marker race with the finalizer. */ + scm_cons (SCM_BOOL_F, SCM_BOOL_F); + return 0; +} + +static void +tests (void *data, int argc, char **argv) +{ + scm_t_bits tc16; + int i; + + tc16 = scm_make_smob_type ("smob with finalizer", 0); + scm_set_smob_mark (tc16, mark_smob); + scm_set_smob_free (tc16, finalize_smob); + + for (i = 0; i < 1000 * 1000; i++) + scm_new_smob (tc16, 1); +} + +int +main (int argc, char *argv[]) +{ + scm_boot_guile (argc, argv, tests, NULL); + return 0; +} -- 2.8.3 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* bug#19883: Correction for backtrace 2016-06-23 10:36 ` Andy Wingo @ 2016-06-23 13:13 ` Ludovic Courtès 0 siblings, 0 replies; 23+ messages in thread From: Ludovic Courtès @ 2016-06-23 13:13 UTC (permalink / raw) To: Andy Wingo; +Cc: 19883-done, David Kastrup Andy Wingo <wingo@pobox.com> skribis: > On Thu 23 Jun 2016 11:50, Andy Wingo <wingo@pobox.com> writes: > >> On Thu 26 Feb 2015 16:30, David Kastrup <dak@gnu.org> writes: >> >>> Try ./test 2 2000 200 >> >> I can reproduce the crash with your test case, thanks :) The patch below >> fixes the bug for me. WDYT Ludovic? > > Here's a patch with a test case. Perfect, you can ignore my previous message. :-) Ludo’. ^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#19883: Correction for backtrace 2016-06-23 9:50 ` Andy Wingo 2016-06-23 10:36 ` Andy Wingo @ 2016-06-23 13:12 ` Ludovic Courtès 1 sibling, 0 replies; 23+ messages in thread From: Ludovic Courtès @ 2016-06-23 13:12 UTC (permalink / raw) To: Andy Wingo; +Cc: 19883, David Kastrup Andy Wingo <wingo@pobox.com> skribis: > On Thu 26 Feb 2015 16:30, David Kastrup <dak@gnu.org> writes: > >> Try ./test 2 2000 200 > > I can reproduce the crash with your test case, thanks :) The patch below > fixes the bug for me. WDYT Ludovic? > > Andy > > commit db30120fc3a1727d8f221cbb014314f2babf841e > Author: Andy Wingo <wingo@pobox.com> > Date: Thu Jun 23 11:47:42 2016 +0200 > > Fix race between SMOB marking and finalization > > * libguile/smob.c (clear_smobnum): New helper. > (finalize_smob): Re-set the smobnum to the "finalized smob" type > before finalizing. Fixes #19883. > (scm_smob_prehistory): Pre-register a "finalized smob" type, which has > no mark procedure. This LGTM, nice hack! Do you think the test case could be added to the test suite somehow? Thank you, Ludo’. ^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#19883: Smob's mark_smob has become unreliable in Guile 2.x 2015-02-16 16:41 ` bug#19883: Smob's mark_smob has become unreliable in Guile 2.x David Kastrup 2015-02-16 18:13 ` bug#19883: Correction for backtrace David Kastrup @ 2015-03-01 6:51 ` Mark H Weaver 2015-03-01 10:09 ` David Kastrup 2015-03-01 15:00 ` David Kastrup 2016-10-09 7:51 ` bug#19883: The “finalized” SMOB type Artyom Poptsov 2 siblings, 2 replies; 23+ messages in thread From: Mark H Weaver @ 2015-03-01 6:51 UTC (permalink / raw) To: David Kastrup; +Cc: 19883 Hi David, Thank you for the minimal test case. This is enormously helpful. David Kastrup <dak@gnu.org> writes: > The symptom likely occurs when an object is collected and consequently > its free_smob function gets called. After some investigation, I believe I've confirmed that this is exactly what's happening. I added a field to the Smob class that is set to 12345678 when the object is created (and 21345678 at a later point), and then 87654321 when free_smob is called. I then observed that the segfault happens when the smob_mark function is called on a smob whose tag field is valid, but whose corresponding C++ object has 87654321 in the special field, indicating that its smob_free method was already called. This happened with GC_MARKERS=1. I believe this is a bug in BDWGC. gc_mark.h warns: --8<---------------cut here---------------start------------->8--- /* WARNING: Such a mark procedure may be invoked on an unused object */ /* residing on a free list. Such objects are cleared, except for a */ /* free list link field in the first word. Thus mark procedures may */ /* not count on the presence of a type descriptor, and must handle this */ /* case correctly somehow. */ --8<---------------cut here---------------end--------------->8--- and protect Guile users from this issue by checking that the tag on the smob cell (first word) is correct before calling the user-defined smob_mark function. However, in this case, the mark procedure was called on a smob cell whose first word was the correct tag and whose second word was non-zero and clearly a pointer to an already deleted instance of the C++ Smob class. I think the next step is to remove libguile from this test program, and turn it into a minimal demonstration of this apparent BDWGC bug, for submission to the BDWGC developers. FYI, here are the changes I made to the test program and the results I observed. First, some changes that I think might be needed for GC safety: --8<---------------cut here---------------start------------->8--- --- smobs.hh.ORIG 2015-02-15 13:35:03.000000000 -0500 +++ smobs.hh 2015-02-28 23:42:06.346803668 -0500 @@ -270,8 +271,9 @@ } SCM unprotected_smobify_self () { - self_scm_ = Smob_base<Super>::register_ptr (static_cast<Super *> (this)); - return self_scm_; + SCM s = Smob_base<Super>::register_ptr (static_cast<Super *> (this)); + self_scm_ = s; + return s; } void protect () { @@ -279,12 +281,12 @@ } SCM unprotect () { - unprotect_smob (self_scm_, &protection_cons_); - return self_scm_; + SCM s = self_scm_; + unprotect_smob (s, &protection_cons_); + return s; } void smobify_self () { - self_scm_ = unprotected_smobify_self (); - protect (); + protect_smob (unprotected_smobify_self (), &protection_cons_); } SCM self_scm () const { return self_scm_; } }; --8<---------------cut here---------------end--------------->8--- The changes to 'unprotected_smobify_self' and 'smobify_self" are probably only needed for a multithreaded program. The idea is to try to ensure that the smob is protected from GC by being on the stack (and not merely in the malloc/free heap) until it is protected. The change to 'unprotect' is along the same lines, and might be needed even for single-threaded programs. I don't know that 'scm_gc_unprotect_object' (called from 'unprotect_smob') will guarantee not to free the object passed to it if there are no other visible references to it. However, none of these changes fixed the problem here. BTW, one other thing to keep in mind: it will never be enough if the only GC-visible reference to an object is the C++ Smob object. There must always be a GC-visible reference to the SCM value. --8<---------------cut here---------------start------------->8--- --- smobs.hh.ORIG 2015-02-15 13:35:03.000000000 -0500 +++ smobs.hh 2015-02-28 23:42:06.346803668 -0500 @@ -263,6 +263,7 @@ protected: Smob () : self_scm_ (SCM_UNDEFINED), protection_cons_ (SCM_EOL) { }; public: + volatile int mhw_valid; static size_t free_smob (SCM obj) { delete Smob_base<Super>::unregister_ptr (obj); --- smobs.tcc.ORIG 2015-02-16 08:29:02.000000000 -0500 +++ smobs.tcc 2015-02-28 23:30:19.563298917 -0500 @@ -50,6 +50,7 @@ // announce the memory as being GC-controlled before even // allocating the controlling smob. SCM s = SCM_UNDEFINED; + p->mhw_valid = 12345678; SCM_NEWSMOB (s, smob_tag (), p); scm_gc_register_collectable_memory (p, sizeof (*p), smob_name_.c_str ()); return s; @@ -94,6 +95,9 @@ Smob_base<Super>::unregister_ptr (SCM obj) { Super *p = Super::unchecked_unsmob (obj); + if (p->mhw_valid != 12345678 && p->mhw_valid != 21345678) + abort (); + p->mhw_valid = 87654321; scm_gc_unregister_collectable_memory (p, sizeof (*p), smob_name_.c_str ()); return p; } --- test.cc.ORIG 2015-02-16 10:33:24.000000000 -0500 +++ test.cc 2015-02-28 23:28:34.690778883 -0500 @@ -28,6 +28,7 @@ Family::Family (int totals, int branch) { smobify_self (); + mhw_valid = 21345678; for (int l=0; l<branch; l++) { if (l >= totals) --8<---------------cut here---------------end--------------->8--- Here's what I observed: --8<---------------cut here---------------start------------->8--- mhw@jojen:~/guile-bug$ GC_MARKERS=1 ./test 2 20000 40 Iteration 0 Iteration 1 Iteration 2 Iteration 3 Segmentation fault (core dumped) mhw@jojen:~/guile-bug$ gdb ./test core GNU gdb (GDB) 7.8.2 Copyright (C) 2014 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i686-pc-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from ./test...done. [New LWP 30663] warning: Could not load shared library symbols for linux-gate.so.1. Do you need "set solib-search-path" or "set sysroot"? [Thread debugging using libthread_db enabled] Using host libthread_db library "/gnu/store/3g20rdmnavpblsmgppyl8jhg67nidhjk-glibc-2.20/lib/libthread_db.so.1". Core was generated by `./test 2 20000 40'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0a2abde3 in ?? () warning: File "/gnu/store/7a5b7fqnxzbwhx7k2bhbc4i1nliq53rl-gcc-4.9.2-lib/lib/libstdc++.so.6.0.20-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/gnu/store/3lhr8q28q6f59774di9av7ncy09jd55d-guile-2.0.11/lib/libguile-2.0.so.22.7.2-gdb.scm". To enable execution of this file add add-auto-load-safe-path /gnu/store/7a5b7fqnxzbwhx7k2bhbc4i1nliq53rl-gcc-4.9.2-lib/lib/libstdc++.so.6.0.20-gdb.py line to your configuration file "/home/mhw/.gdbinit". To completely disable this security protection add set auto-load safe-path / line to your configuration file "/home/mhw/.gdbinit". For more information about this security protection see the "Auto-loading safe path" section in the GDB manual. E.g., run from the shell: info "(gdb)Auto-loading safe path" (gdb) bt #0 0x0a2abde3 in ?? () #1 0xb766dc4a in smob_mark (addr=0xa15f6e0, mark_stack_ptr=0x9e5a320, mark_stack_limit=0x9e62000, env=0) at smob.c:335 #2 0xb758664a in GC_mark_from (mark_stack_top=0x9e5a320, mark_stack=0x9e5a000, mark_stack_limit=0x9e62000) at mark.c:737 #3 0xb7587c07 in GC_mark_some (cold_gc_frame=0xbf874ed8 "pVZ\267M\332W\267\330N\207\277") at mark.c:386 #4 0xb757da4d in GC_stopped_mark (stop_func=stop_func@entry=0xb757d5b0 <GC_never_stop_func>) at alloc.c:637 #5 0xb757e4a9 in GC_try_to_collect_inner (stop_func=0xb757d5b0 <GC_never_stop_func>) at alloc.c:456 #6 0xb757ee00 in GC_collect_or_expand (needed_blocks=needed_blocks@entry=1, ignore_off_page=ignore_off_page@entry=0, retry=retry@entry=0) at alloc.c:1266 #7 0xb757effc in GC_allocobj (gran=gran@entry=3, kind=kind@entry=1) at alloc.c:1355 #8 0xb758440b in GC_generic_malloc_inner (lb=lb@entry=24, k=k@entry=1) at malloc.c:133 #9 0xb7581e8a in GC_register_finalizer_inner (obj=obj@entry=0xa005720, fn=fn@entry=0xb766d970 <finalize_smob>, cd=cd@entry=0x0, ofn=ofn@entry=0xbf875098, ocd=ocd@entry=0xbf87509c, mp=mp@entry=0xb75811c0 <GC_null_finalize_mark_proc>) at finalize.c:524 #10 0xb75821e5 in GC_register_finalizer_no_order (obj=obj@entry=0xa005720, fn=fn@entry=0xb766d970 <finalize_smob>, cd=cd@entry=0x0, ofn=ofn@entry=0xbf875098, ocd=ocd@entry=0xbf87509c) at finalize.c:575 #11 0xb761bfbb in scm_i_set_finalizer (obj=obj@entry=0xa005720, proc=proc@entry=0xb766d970 <finalize_smob>, data=data@entry=0x0) at finalizers.c:44 #12 0xb766e1f8 in scm_i_new_smob (tc=7039, data=171214120) at smob.c:416 #13 0x08049b12 in scm_new_smob (tc=7039, data=171214120) at /gnu/store/3lhr8q28q6f59774di9av7ncy09jd55d-guile-2.0.11/include/guile/2.0/libguile/smob.h:91 #14 0x0804a8dc in Smob_base<Family>::register_ptr (p=0xa348528) at smobs.tcc:54 #15 0x0804a281 in Smob<Family>::unprotected_smobify_self (this=0xa34852c) at smobs.hh:274 #16 0x08049d0a in Smob<Family>::smobify_self (this=0xa34852c) at smobs.hh:289 #17 0x0804960c in Family::Family (this=0xa348528, totals=8, branch=2) at test.cc:30 #18 0x08049658 in Family::Family (this=0xa2a1638, totals=17, branch=2) at test.cc:36 #19 0x08049658 in Family::Family (this=0xa2a1758, totals=37, branch=2) at test.cc:36 #20 0x08049658 in Family::Family (this=0xa349478, totals=76, branch=2) at test.cc:36 #21 0x08049658 in Family::Family (this=0xa349008, totals=154, branch=2) at test.cc:36 #22 0x08049658 in Family::Family (this=0xa347318, totals=310, branch=2) at test.cc:36 #23 0x08049658 in Family::Family (this=0xa29d938, totals=623, branch=2) at test.cc:36 #24 0x08049658 in Family::Family (this=0xa2998b8, totals=1248, branch=2) at test.cc:36 #25 0x08049658 in Family::Family (this=0xa27f818, totals=2498, branch=2) at test.cc:36 #26 0x08049658 in Family::Family (this=0xa343608, totals=4998, branch=2) at test.cc:36 #27 0x08049658 in Family::Family (this=0xa1e2a88, totals=9998, branch=2) at test.cc:36 #28 0x08049658 in Family::Family (this=0x9f9a3b8, totals=19999, branch=2) at test.cc:36 #29 0x0804989f in workload (avv=0xbf8759f4) at test.cc:72 #30 0xb7608480 in c_body (d=0xbf875884) at continuations.c:517 #31 0xb76955de in apply_catch_closure (clo=#<smob catch-closure 9ffab20>, args=()) at throw.c:140 #32 0xb766d8cf in apply_1 (smob=#<smob catch-closure 9ffab20>, a=()) at smob.c:142 #33 0xb76ac662 in vm_regular_engine (vm=#<vm 9ee2060>, program=#<program 9ee04c0>, argv=<optimized out>, nargs=<optimized out>) at vm-i-system.c:858 #34 0xb76ae621 in scm_c_vm_run (vm=#<vm 9ee2060>, program=program@entry=#<program 9f5ddc8>, argv=argv@entry=0xbf875790, nargs=nargs@entry=4) at vm.c:768 #35 0xb76135b4 in scm_call_4 (proc=#<program 9f5ddc8>, arg1=arg1@entry=#t, arg2=arg2@entry=#<smob catch-closure 9ffab20>, arg3=arg3@entry=<error reading variable: ERROR: Cannot access memory at address 0x0>, arg4=arg4@entry=<error reading variable: ERROR: Cannot access memory at address 0x0>) at eval.c:507 #36 0xb76957ee in scm_catch_with_pre_unwind_handler (key=key@entry=#t, thunk=thunk@entry=#<smob catch-closure 9ffab20>, handler=handler@entry=<error reading variable: ERROR: Cannot access memory at address 0x0>, pre_unwind_handler=<error reading variable: ERROR: Cannot access memory at address 0x0>) at throw.c:73 #37 0xb7695920 in scm_c_catch (tag=tag@entry=#t, body=body@entry=0xb7608470 <c_body>, body_data=body_data@entry=0xbf875884, handler=handler@entry=0xb7608930 <c_handler>, handler_data=handler_data@entry=0xbf875884, pre_unwind_handler=pre_unwind_handler@entry=0xb7608630 <pre_unwind_handler>, pre_unwind_handler_data=pre_unwind_handler_data@entry=0x9ee2020) at throw.c:207 #38 0xb7608cd7 in scm_i_with_continuation_barrier (body=body@entry=0xb7608470 <c_body>, body_data=body_data@entry=0xbf875884, handler=handler@entry=0xb7608930 <c_handler>, handler_data=handler_data@entry=0xbf875884, pre_unwind_handler=pre_unwind_handler@entry=0xb7608630 <pre_unwind_handler>, pre_unwind_handler_data=0x9ee2020) at continuations.c:455 #39 0xb7608da6 in scm_c_with_continuation_barrier (func=0x80497a4 <workload(void*)>, data=0xbf8759f4) at continuations.c:551 #40 0xb7692b32 in with_guile_and_parent (base=0xbf8758ec, data=0xbf875914) at threads.c:906 #41 0xb7589ac8 in GC_call_with_stack_base (fn=fn@entry=0xb7692ae0 <with_guile_and_parent>, arg=arg@entry=0xbf875914) at misc.c:1840 #42 0xb7692fcf in scm_i_with_guile_and_parent (parent=<optimized out>, data=0xbf8759f4, func=0x80497a4 <workload(void*)>) at threads.c:949 #43 scm_with_guile (func=0x80497a4 <workload(void*)>, data=0xbf8759f4) at threads.c:955 #44 0x0804997d in main (ac=4, av=0xbf8759f4) at test.cc:86 (gdb) dir /home/mhw/gc-7.4.2 Source directories searched: /home/mhw/gc-7.4.2:$cdir:$cwd (gdb) dir /home/mhw/guile-2.0.11/libguile Source directories searched: /home/mhw/guile-2.0.11/libguile:/home/mhw/gc-7.4.2:$cdir:$cwd (gdb) up #1 0xb766dc4a in smob_mark (addr=0xa15f6e0, mark_stack_ptr=0x9e5a320, mark_stack_limit=0x9e62000, env=0) at smob.c:335 warning: Source file is more recent than executable. 335 obj = scm_smobs[smobnum].mark (cell); (gdb) list 330 scm_i_pthread_setspecific (current_mark_stack_pointer, mark_stack_ptr); 331 scm_i_pthread_setspecific (current_mark_stack_limit, mark_stack_limit); 332 333 /* Invoke the SMOB's mark procedure, which will in turn invoke 334 `scm_gc_mark', which may modify `current_mark_stack_pointer'. */ 335 obj = scm_smobs[smobnum].mark (cell); 336 337 mark_stack_ptr = scm_i_pthread_getspecific (current_mark_stack_pointer); 338 339 if (SCM_NIMP (obj)) (gdb) p *(Family *)0x9f9a3b8 $1 = {<Smob<Family>> = {<Smob_base<Family>> = {static smob_tag_ = 7039, static scm_init_ = {<No data fields>}, static smob_name_ = {static npos = <optimized out>, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x9ea1fd4 "Family"}}, static type_p_name_ = <optimized out>, static smob_proc = <optimized out>, static smob_proc_signature_ = <optimized out>}, self_scm_ = #<smob Family a3b28a0>, protection_cons_ = (), mhw_valid = 21345678}, _vptr.Family = 0x804b668 <vtable for Family+8>, kids = {<std::_Vector_base<Family*, std::allocator<Family*> >> = { _M_impl = {<std::allocator<Family*>> = {<__gnu_cxx::new_allocator<Family*>> = {<No data fields>}, <No data fields>}, _M_start = 0x9f9a788, _M_finish = 0x9f9a78c, _M_end_of_storage = 0x9f9a78c}}, <No data fields>}} (gdb) p ((Family *)0x9f9a3b8)->mhw_valid $2 = 21345678 (gdb) p ((Family *)0xa348528)->mhw_valid $3 = 12345678 (gdb) p ((Family *)0xa2a1638)->mhw_valid $4 = 21345678 (gdb) x/2xw addr 0xa15f6e0: 0x00001b7f 0x0a352c58 (gdb) p *(Family *)0x0a352c58 $5 = {<Smob<Family>> = {<Smob_base<Family>> = {static smob_tag_ = 7039, static scm_init_ = {<No data fields>}, static smob_name_ = {static npos = <optimized out>, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x9ea1fd4 "Family"}}, static type_p_name_ = <optimized out>, static smob_proc = <optimized out>, static smob_proc_signature_ = <optimized out>}, self_scm_ = #<smob Family a15f6e0>, protection_cons_ = (), mhw_valid = 87654321}, _vptr.Family = 0xa202ca8, kids = {<std::_Vector_base<Family*, std::allocator<Family*> >> = { _M_impl = {<std::allocator<Family*>> = {<__gnu_cxx::new_allocator<Family*>> = {<No data fields>}, <No data fields>}, _M_start = 0xa352cf8, _M_finish = 0xa352d00, _M_end_of_storage = 0xa352d00}}, <No data fields>}} (gdb) p/x 7039 $6 = 0x1b7f (gdb) --8<---------------cut here---------------end--------------->8--- Regards, Mark ^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#19883: Smob's mark_smob has become unreliable in Guile 2.x 2015-03-01 6:51 ` bug#19883: Smob's mark_smob has become unreliable in Guile 2.x Mark H Weaver @ 2015-03-01 10:09 ` David Kastrup 2015-03-01 17:02 ` Mark H Weaver 2015-03-01 15:00 ` David Kastrup 1 sibling, 1 reply; 23+ messages in thread From: David Kastrup @ 2015-03-01 10:09 UTC (permalink / raw) To: Mark H Weaver; +Cc: 19883 Mark H Weaver <mhw@netris.org> writes: > Hi David, > > Thank you for the minimal test case. This is enormously helpful. > > David Kastrup <dak@gnu.org> writes: > >> The symptom likely occurs when an object is collected and consequently >> its free_smob function gets called. > > After some investigation, I believe I've confirmed that this is exactly > what's happening. > > I added a field to the Smob class that is set to 12345678 when the > object is created (and 21345678 at a later point), and then 87654321 > when free_smob is called. I then observed that the segfault happens > when the smob_mark function is called on a smob whose tag field is > valid, but whose corresponding C++ object has 87654321 in the special > field, indicating that its smob_free method was already called. This > happened with GC_MARKERS=1. There are a few remedies for the test case that do not necessarily apply to the full-blown code. I was able to avoid the crashes here by declaring the mark_smob function non-virtual. Declaring it virtual was actually more an oversight than anything else. Not routing the scm_mark call through a pointer fetched from the no longer existent virtual function table will stop the crashes in the example code. It's not an option for LilyPond proper, however, since the type lattice is complex enough that routing some of the marking process through a virtual member function is not really avoidable. Another more likely effective patch (which already made it into LilyPond's code base) _clears_ the SCM data (where the pointer to the C++ structure is stored) in unregister_ptr. Since all calls to the member functions mark_smob (and print_smob accordingly) are routed through the same mark_trampoline, letting this trampoline check for a non-null pointer avoids the mark-after-free situation in the general case. However, this is still a probabilistic fix. For one thing, there might be race conditions inside of GUILE. For another, the basic premise is that calling the mark function with garbage must not result in crashes. Since LilyPond relies on C++ data structures and containers for its operation and those are also involved in the mark walks, "can be called on arbitrary garbage" (of which already freed smobs are just one special case, so catering for them is not sufficient) is not an option that can be implemented with reasonable amount of effort in the mark_smob routines of LilyPond. However, from the GUILE side the mark_smob calls originate from objects detected to be SCM cells, and GUILE should be able to manage areas filled with SCM cells such that it can _reliably_ distinguish allocated from unallocated cells and there is no such thing as random garbage. When that is the case, it may be sufficient to clear out the tag field in the course of calling the free_smob hook, avoiding any subsequent mark calls to occur on a cell that has lost its type consistency anyway. Again, this might involve race conditions: no idea. Because of potential race conditions, a reliable fix needs to be in GUILE or BDWGC and cannot be done at the application end. The best we will be able to achieve there is "crashes almost never randomly". Which is what we should get in LilyPond with <URL:https://codereview.appspot.com/197690043> (already committed to the code base as <URL:http://git.savannah.gnu.org/cgit/lilypond.git/commit/?id=60eeb4a4ff217e3abbfcb6b4de94f0a6ef6d437d>. > I believe this is a bug in BDWGC. gc_mark.h warns: > > /* WARNING: Such a mark procedure may be invoked on an unused object */ > /* residing on a free list. Such objects are cleared, except for a */ > /* free list link field in the first word. Thus mark procedures may */ > /* not count on the presence of a type descriptor, and must handle this */ > /* case correctly somehow. */ > > and protect Guile users from this issue by checking that the tag on the > smob cell (first word) is correct before calling the user-defined > smob_mark function. However, in this case, the mark procedure was > called on a smob cell whose first word was the correct tag and whose > second word was non-zero and clearly a pointer to an already deleted > instance of the C++ Smob class. > > I think the next step is to remove libguile from this test program, and > turn it into a minimal demonstration of this apparent BDWGC bug, for > submission to the BDWGC developers. I don't know about the interplay of GUILE and BDWGC. It may be that the general C freestore operation that BDWGC provides cannot prevent the "can be called on garbage" effect and functions need to be robust against that. However, the specific "garbage" here resulted directly from marking an already freed smob, and with a frequency that is far above random. When that happens, it is extremely unlikely that memory is being either successfully retained or freed. And that sounds like a bad idea in BDWGC itself. > FYI, here are the changes I made to the test program and the results I > observed. > > First, some changes that I think might be needed for GC safety: > > --- smobs.hh.ORIG 2015-02-15 13:35:03.000000000 -0500 > +++ smobs.hh 2015-02-28 23:42:06.346803668 -0500 > @@ -270,8 +271,9 @@ > } > SCM unprotected_smobify_self () > { > - self_scm_ = Smob_base<Super>::register_ptr (static_cast<Super *> (this)); > - return self_scm_; > + SCM s = Smob_base<Super>::register_ptr (static_cast<Super *> (this)); > + self_scm_ = s; > + return s; Don't see the intended difference here. It's conceivable that with several "volatile" declarations the compiler may generate different code that would protect against self_scm_ being modified before the function returns. Is that supposed to help against non-atomic changes of self_scm_? > @@ -279,12 +281,12 @@ > } > SCM unprotect () > { > - unprotect_smob (self_scm_, &protection_cons_); > - return self_scm_; > + SCM s = self_scm_; > + unprotect_smob (s, &protection_cons_); > + return s; > } I see how this one would be advisable in multi-threaded applications. > void smobify_self () { > - self_scm_ = unprotected_smobify_self (); > - protect (); > + protect_smob (unprotected_smobify_self (), &protection_cons_); > } > SCM self_scm () const { return self_scm_; } > }; > > > The changes to 'unprotected_smobify_self' and 'smobify_self" are > probably only needed for a multithreaded program. The idea is to try to > ensure that the smob is protected from GC by being on the stack (and not > merely in the malloc/free heap) until it is protected. > > The change to 'unprotect' is along the same lines, and might be needed > even for single-threaded programs. I don't know that > 'scm_gc_unprotect_object' (called from 'unprotect_smob') will guarantee > not to free the object passed to it if there are no other visible > references to it. > > However, none of these changes fixed the problem here. I'll put them in as there is no sane reason not to. However, they appear to mostly concern low-probability races whereas we still have an elephant in the room. > BTW, one other thing to keep in mind: it will never be enough if the > only GC-visible reference to an object is the C++ Smob object. There > must always be a GC-visible reference to the SCM value. That's why we need the mark functions in LilyPond. It would be a really large PITA to have to route everything through SCM since there is a really large corpus of code and non-SCM data structures involved. > (gdb) bt > #0 0x0a2abde3 in ?? () > #1 0xb766dc4a in smob_mark (addr=0xa15f6e0, mark_stack_ptr=0x9e5a320, mark_stack_limit=0x9e62000, env=0) at smob.c:335 > #2 0xb758664a in GC_mark_from (mark_stack_top=0x9e5a320, mark_stack=0x9e5a000, mark_stack_limit=0x9e62000) at mark.c:737 > #3 0xb7587c07 in GC_mark_some (cold_gc_frame=0xbf874ed8 "pVZ\267M\332W\267\330N\207\277") at mark.c:386 > #4 0xb757da4d in GC_stopped_mark (stop_func=stop_func@entry=0xb757d5b0 <GC_never_stop_func>) at alloc.c:637 > #5 0xb757e4a9 in GC_try_to_collect_inner (stop_func=0xb757d5b0 <GC_never_stop_func>) at alloc.c:456 > #6 0xb757ee00 in GC_collect_or_expand (needed_blocks=needed_blocks@entry=1, ignore_off_page=ignore_off_page@entry=0, > retry=retry@entry=0) at alloc.c:1266 > #7 0xb757effc in GC_allocobj (gran=gran@entry=3, kind=kind@entry=1) at alloc.c:1355 > #8 0xb758440b in GC_generic_malloc_inner (lb=lb@entry=24, k=k@entry=1) at malloc.c:133 > #9 0xb7581e8a in GC_register_finalizer_inner (obj=obj@entry=0xa005720, fn=fn@entry=0xb766d970 <finalize_smob>, > cd=cd@entry=0x0, ofn=ofn@entry=0xbf875098, ocd=ocd@entry=0xbf87509c, > mp=mp@entry=0xb75811c0 <GC_null_finalize_mark_proc>) at finalize.c:524 > #10 0xb75821e5 in GC_register_finalizer_no_order (obj=obj@entry=0xa005720, fn=fn@entry=0xb766d970 <finalize_smob>, > cd=cd@entry=0x0, ofn=ofn@entry=0xbf875098, ocd=ocd@entry=0xbf87509c) at finalize.c:575 > #11 0xb761bfbb in scm_i_set_finalizer (obj=obj@entry=0xa005720, proc=proc@entry=0xb766d970 <finalize_smob>, > data=data@entry=0x0) at finalizers.c:44 > #12 0xb766e1f8 in scm_i_new_smob (tc=7039, data=171214120) at smob.c:416 > #13 0x08049b12 in scm_new_smob (tc=7039, data=171214120) > at /gnu/store/3lhr8q28q6f59774di9av7ncy09jd55d-guile-2.0.11/include/guile/2.0/libguile/smob.h:91 > #14 0x0804a8dc in Smob_base<Family>::register_ptr (p=0xa348528) at smobs.tcc:54 > #15 0x0804a281 in Smob<Family>::unprotected_smobify_self (this=0xa34852c) at smobs.hh:274 > #16 0x08049d0a in Smob<Family>::smobify_self (this=0xa34852c) at smobs.hh:289 That's when smobify_self tries to create a smob pointing to the C++-allocated data structure. This calls scm_new_smob, and in the course of that, garbage is apparently collected. And smob_mark is called on an already freed C++ structure with a bad virtual table, so when mark_trampoline tries routing the virtual table call to mark_smob, it crashes. With the already checked-in changes and your suggested changes, we'll likely get the mark-after-free crashes under control. They still seem to warrant fixing at the BDWGC level as they, crashes aside, appear to cause memory to be in some half-released state good for neither retaining nor reallocation. But the mark-on-general-garbage problem seems sort of system-immanent to conservative garbage collection while making the management of C++ data structures (often relying on some internal consistency outside of user control) extremely awkward. Fixing that at the SCM level seems both desirable and doable and will likely require GUILE to not let garbage (rather than recognizably unallocated cells) into its managed cell store. -- David Kastrup ^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#19883: Smob's mark_smob has become unreliable in Guile 2.x 2015-03-01 10:09 ` David Kastrup @ 2015-03-01 17:02 ` Mark H Weaver 2015-03-01 18:38 ` David Kastrup 0 siblings, 1 reply; 23+ messages in thread From: Mark H Weaver @ 2015-03-01 17:02 UTC (permalink / raw) To: David Kastrup; +Cc: 19883 David Kastrup <dak@gnu.org> writes: > Since LilyPond relies on C++ data structures and containers for its > operation and those are also involved in the mark walks, "can be called > on arbitrary garbage" Who are you quoting here? > But the mark-on-general-garbage problem seems sort of system-immanent to > conservative garbage collection [...] Marking arbitrary garbage is certainly not inherent (is that the word you meant to use?) to conservative garbage collection. There may be bugs, but BDWGC is designed to prevent this. See the "Allocation" and "Mark phase" sections of <http://www.hboehm.info/gc/gcdescr.html>. In particular, from the "Allocation" section: The collector uses a two level allocator. A large block is defined to be one larger than half of HBLKSIZE, which is a power of 2, typically on the order of the page size. Large block sizes are rounded up to the next multiple of HBLKSIZE and then allocated by GC_allochblk. Recent versions of the collector use an approximate best fit algorithm by keeping free lists for several large block sizes. The actual implementation of GC_allochblk is significantly complicated by black-listing issues (see below). Small blocks are allocated in chunks of size HBLKSIZE. Each chunk is dedicated to only one object size and kind. From the "Mark phase" section: We determine whether a candidate pointer is actually the address of a heap block. This is done in the following steps: The candidate pointer is checked against rough heap bounds. These heap bounds are maintained such that all actual heap objects fall between them. In order to facilitate black-listing (see below) we also include address regions that the heap is likely to expand into. Most non-pointers fail this initial test. The candidate pointer is divided into two pieces; the most significant bits identify a HBLKSIZE-sized page in the address space, and the least significant bits specify an offset within that page. (A hardware page may actually consist of multiple such pages. HBLKSIZE is usually the page size divided by a small power of two.) The page address part of the candidate pointer is looked up in a table. Each table entry contains either 0, indicating that the page is not part of the garbage collected heap, a small integer n, indicating that the page is part of large object, starting at least n pages back, or a pointer to a descriptor for the page. In the first case, the candidate pointer i not a true pointer and can be safely ignored. In the last two cases, we can obtain a descriptor for the page containing the beginning of the object. The starting address of the referenced object is computed. The page descriptor contains the size of the object(s) in that page, the object kind, and the necessary mark bits for those objects. The size information can be used to map the candidate pointer to the object starting address. To accelerate this process, the page header also contains a pointer to a precomputed map of page offsets to displacements from the beginning of an object. The use of this map avoids a potentially slow integer remainder operation in computing the object start address. The mark bit for the target object is checked and set. If the object was previously unmarked, the object is pushed on the mark stack. The descriptor is read from the page descriptor. (This is computed from information GC_obj_kinds when the page is first allocated.) In Guile, there is a dedicated "kind" for SMOBs that have user-defined mark functions. This means that they are segregrated into their own HBLKSIZE-sized blocks, and so BDWGC is able to reliably determine whether our general SMOB marker ('smob_mark' in smob.c) should be called, without relying on some unreliable tag inside the block itself. So, while it is true that a conservative GC cannot avoid possibly retaining some garbage that should be freed, it is possible to reliably avoid calling mark functions on arbitrary garbage, and BDWGC is designed to do this. BTW, a few more points to mention: I vaguely recall that you may have wondered in the past if Boehm GC "parallel marking" means that it is marking while the mutator (user code) is running. That is not the case. BDWGC first stops all user threads (on POSIX I believe this is done by sending signals to all threads and keeping them in the signal handlers during GC) and then runs multiple of its own marker threads in parallel. I guess this means that user-specified mark functions may be called in multiple threads. However, this can be avoided by setting the GC_MARKERS environment variable to "1", and I set this while running your test code to eliminate this possible complication for now. Another document worth looking at is: http://www.hboehm.info/gc/finalization.html > However, from the GUILE side the mark_smob calls originate from objects > detected to be SCM cells, and GUILE should be able to manage areas > filled with SCM cells such that it can _reliably_ distinguish allocated > from unallocated cells and there is no such thing as random garbage. As described above, BDWGC already handles this, and so far I haven't seen evidence of 'smob_mark' being called on random garbage. However, it seems that 'smob_mark' is being called on objects that have already been finalized. > When that is the case, it may be sufficient to clear out the tag field > in the course of calling the free_smob hook, avoiding any subsequent > mark calls to occur on a cell that has lost its type consistency anyway. > Again, this might involve race conditions: no idea. Yes, this sounds like a work-around worth considering. I'll think about whether it can be done reliably in Guile and/or LilyPond. It occurs to me that during times of heavy allocation, it might be possible for a GC to be triggered while a finalizer is being run. Since the object being finalized would be on the stack, it would be considered live by the GC, even if most or all of the finalizer had already been run. It warrants investigation. Also, while reading the GC design documentation, a particular warning raised some alarms. From "Finalization" in gcdescr.html: Any objects remaining unmarked at the end of this process are added to a queue of objects whose finalizers can be run. Depending on collector configuration, finalizers are dequeued and run either implicitly during allocation calls, or explicitly in response to a user request. (Note that the former is unfortunately both the default and not generally safe. If finalizers perform synchronization, it may result in deadlocks. Nontrivial finalizers generally need to perform synchronization, and thus require a different collector configuration.) and from finalization.html: Finalization code may be run anyplace an allocation or other call to the collector takes place. In multithreaded programs, finalizers have to obey the normal locking conventions to ensure safety. Code run directly from finalizers should not acquire locks that may be held during allocation. This restriction can be easily circumvented by registering a finalizer which enqueues the real action for execution in a separate thread. In single-threaded code, it is also often easiest to have finalizers queue actions, which are then explicitly run during an explicit call by the user's program. So it may be that we need to be careful of this. Here's more recommended reading on the difficulties of implementing finalizers properly: http://www.hpl.hp.com/techreports/2002/HPL-2002-335.html Finalizers are a huge can of worms. I suspect this is why Ludovic was encouraging you to avoid them entirely, but I agree that we need to find a way to make them work for when they are needed. > Because of potential race conditions, a reliable fix needs to be in > GUILE or BDWGC and cannot be done at the application end. The best we > will be able to achieve there is "crashes almost never randomly". I'm not sure this is true. It will require more thought. >> First, some changes that I think might be needed for GC safety: >> >> --- smobs.hh.ORIG 2015-02-15 13:35:03.000000000 -0500 >> +++ smobs.hh 2015-02-28 23:42:06.346803668 -0500 >> @@ -270,8 +271,9 @@ >> } >> SCM unprotected_smobify_self () >> { >> - self_scm_ = Smob_base<Super>::register_ptr (static_cast<Super *> (this)); >> - return self_scm_; >> + SCM s = Smob_base<Super>::register_ptr (static_cast<Super *> (this)); >> + self_scm_ = s; >> + return s; > > Don't see the intended difference here. It's conceivable that with > several "volatile" declarations the compiler may generate different code > that would protect against self_scm_ being modified before the function > returns. Is that supposed to help against non-atomic changes of > self_scm_? No, the idea is to ensure the existence of a GC-visible copy of the SCM value until 'scm_gc_protect_object' is called. For objects allocated in the malloc/free heap, self_scm_ is not visible to the GC. >> BTW, one other thing to keep in mind: it will never be enough if the >> only GC-visible reference to an object is the C++ Smob object. There >> must always be a GC-visible reference to the SCM value. > > That's why we need the mark functions in LilyPond. The mark functions can ensure that in your Family class, the children will be marked even though the parent->child pointers point directly to the C++ object. However, whenever the root of the tree is not protected by 'scm_gc_protect_object', the SCM value of that root must be visible to the GC, in order to cause the user-specified 'mark' function to be called in the first place. It should be noted that for any particular C++ class, another alternative is to arrange for instances of that class to be allocated using 'scm_gc_malloc', as Ludovic suggested. If you do that, then it would suffice to have a GC-visible pointer to instances of that class, which might make your life easier. Regards, Mark ^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#19883: Smob's mark_smob has become unreliable in Guile 2.x 2015-03-01 17:02 ` Mark H Weaver @ 2015-03-01 18:38 ` David Kastrup 0 siblings, 0 replies; 23+ messages in thread From: David Kastrup @ 2015-03-01 18:38 UTC (permalink / raw) To: Mark H Weaver; +Cc: 19883 Mark H Weaver <mhw@netris.org> writes: > Finalizers are a huge can of worms. I suspect this is why Ludovic was > encouraging you to avoid them entirely, but I agree that we need to > find a way to make them work for when they are needed. Again, LilyPond does not really have an option to avoid finalizers (free_smob action): the C++ class instances associated with smobs contain STL containers, and the storage for those is allocated separately on the heap. Running the destructor of the C++ class instance (which is the basic finalization action done when calling the delete operator on the C++ class instance) will then also destruct the STL container, freeing all memory associated with _that_. We don't have control over the storage layout of the STL templates. While one could allocate them using a separate allocation template parameter, that may be a C++11 feature. And it would really complicate the code. Moving all of the smobification stuff into a few template classes has actually locked up our release process for almost a year: 2.19.15 was released in September, 2.19.16 is currently being uploaded. The release-building crosscompiling environment had to be updated to a newer GCC version to accept the templated code, and that GCC reason had new dependencies and needed to be bootstrapped with a C++ rather than a C compiler. And the person who has run the release process the last few years was not actually a programmer. > The mark functions can ensure that in your Family class, the children > will be marked even though the parent->child pointers point directly > to the C++ object. However, whenever the root of the tree is not > protected by 'scm_gc_protect_object', the SCM value of that root must > be visible to the GC, in order to cause the user-specified 'mark' > function to be called in the first place. > > It should be noted that for any particular C++ class, another > alternative is to arrange for instances of that class to be allocated > using 'scm_gc_malloc', as Ludovic suggested. If you do that, then it > would suffice to have a GC-visible pointer to instances of that class, > which might make your life easier. Again: a lot of smobified C++ structures contain STL containers that have their separate allocation that can easily have pointers to C++ structures associated with smobs again that need to be marked. At any rate: since LilyPond can eat a significant ratio of the entire memory space on 32bit machines, I consider it prudent not to have basically all of that scanned by the conservative garbage collector since that would significantly increase the number of false positives when marking. And we are talking about an advertised GUILE feature after all. With regard to "making our life easier": it's a sunk cost since all the work has obviously been done for GUILEv1 already. -- David Kastrup ^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#19883: Smob's mark_smob has become unreliable in Guile 2.x 2015-03-01 6:51 ` bug#19883: Smob's mark_smob has become unreliable in Guile 2.x Mark H Weaver 2015-03-01 10:09 ` David Kastrup @ 2015-03-01 15:00 ` David Kastrup 1 sibling, 0 replies; 23+ messages in thread From: David Kastrup @ 2015-03-01 15:00 UTC (permalink / raw) To: Mark H Weaver; +Cc: 19883 Mark H Weaver <mhw@netris.org> writes: [...] The race-condition related patch suggestions have been submitted to LilyPond's tracker. Tracker issue: 4304 (http://code.google.com/p/lilypond/issues/detail?id=4304) Rietveld issue: 210810043 (https://codereview.appspot.com/210810043) Issue description: Smob code changes mostly relevant for multithreading These changes have been suggested by Mark H Weaver in <URL:http://debbugs.gnu.org/cgi/bugreport.cgi?bug=19883#41> -- David Kastrup ^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#19883: The “finalized” SMOB type 2015-02-16 16:41 ` bug#19883: Smob's mark_smob has become unreliable in Guile 2.x David Kastrup 2015-02-16 18:13 ` bug#19883: Correction for backtrace David Kastrup 2015-03-01 6:51 ` bug#19883: Smob's mark_smob has become unreliable in Guile 2.x Mark H Weaver @ 2016-10-09 7:51 ` Artyom Poptsov 2 siblings, 0 replies; 23+ messages in thread From: Artyom Poptsov @ 2016-10-09 7:51 UTC (permalink / raw) To: Ludovic Courtès; +Cc: 19883, guile-devel [-- Attachment #1: Type: text/plain, Size: 980 bytes --] Hello Ludovic, thanks for pointing that out. IIRC, I saw this kind of errors during testing but I thought I fixed them. What Guile-SSH version do you use? Here's 'free_session' procedure from Guile-SSH version 0.10.0: --8<---------------cut here---------------start------------->8--- size_t free_session (SCM session_smob) { if (! SCM_SMOB_PREDICATE (session_tag, session_smob)) { _ssh_log (SSH_LOG_FUNCTIONS, "free_session", "%s", "already freed"); return 0; } struct session_data *data = _scm_to_session_data (session_smob); [...] return 0; } --8<---------------cut here---------------end--------------->8--- As you can see, there's additional smob check before accessing the smob's data. Please let me know if the problem manifest itself in the latest Guile-SSH version. Thanks! - Artyom -- Artyom V. Poptsov <poptsov.artyom@gmail.com>; GPG Key: 0898A02F Home page: http://poptsov-artyom.narod.ru/ [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 472 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <87vax2klj6.fsf@elephant.savannah>]
[parent not found: <87fuo5hc88.fsf@gnu.org>]
* bug#19883: The “finalized” SMOB type [not found] ` <87fuo5hc88.fsf@gnu.org> @ 2016-10-09 19:00 ` Artyom Poptsov 0 siblings, 0 replies; 23+ messages in thread From: Artyom Poptsov @ 2016-10-09 19:00 UTC (permalink / raw) To: Ludovic Courtès; +Cc: 19883, David Kastrup, guile-devel [-- Attachment #1: Type: text/plain, Size: 420 bytes --] Hello Ludovic and David, thanks for all the comments on the smob freeing callbacks. Apparently I misread the documentation so the bug was introduced. I've changed the callback procedures as was suggested; the patches included in Guile-SSH version 0.10.1 released a few moments ago. - Artyom -- Artyom V. Poptsov <poptsov.artyom@gmail.com>; GPG Key: 0898A02F Home page: http://poptsov-artyom.narod.ru/ [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 472 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2016-10-09 19:00 UTC | newest] Thread overview: 23+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <87mvieh7mu.fsf@gnu.org> 2015-02-16 16:41 ` bug#19883: Smob's mark_smob has become unreliable in Guile 2.x David Kastrup 2015-02-16 18:13 ` bug#19883: Correction for backtrace David Kastrup 2015-02-25 22:16 ` Ludovic Courtès 2015-02-25 23:17 ` David Kastrup 2015-02-26 11:35 ` Ludovic Courtès 2015-02-26 12:32 ` David Kastrup 2015-02-26 14:03 ` Ludovic Courtès 2015-02-26 15:30 ` David Kastrup 2015-02-26 18:15 ` Ludovic Courtès 2015-02-26 19:25 ` David Kastrup 2015-02-27 9:55 ` Ludovic Courtès 2015-02-27 10:18 ` David Kastrup 2016-06-23 9:50 ` Andy Wingo 2016-06-23 10:36 ` Andy Wingo 2016-06-23 13:13 ` Ludovic Courtès 2016-06-23 13:12 ` Ludovic Courtès 2015-03-01 6:51 ` bug#19883: Smob's mark_smob has become unreliable in Guile 2.x Mark H Weaver 2015-03-01 10:09 ` David Kastrup 2015-03-01 17:02 ` Mark H Weaver 2015-03-01 18:38 ` David Kastrup 2015-03-01 15:00 ` David Kastrup 2016-10-09 7:51 ` bug#19883: The “finalized” SMOB type Artyom Poptsov [not found] ` <87vax2klj6.fsf@elephant.savannah> [not found] ` <87fuo5hc88.fsf@gnu.org> 2016-10-09 19:00 ` Artyom Poptsov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).