From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Mark H Weaver Newsgroups: gmane.lisp.guile.bugs Subject: bug#19883: Smob's mark_smob has become unreliable in Guile 2.x Date: Sun, 01 Mar 2015 12:02:41 -0500 Message-ID: <87vbik647i.fsf@netris.org> References: <87twyln70f.fsf@fencepost.gnu.org> <87zj7x5hxr.fsf@netris.org> <87385p9ggl.fsf@fencepost.gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1425229406 17621 80.91.229.3 (1 Mar 2015 17:03:26 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 1 Mar 2015 17:03:26 +0000 (UTC) Cc: 19883@debbugs.gnu.org To: David Kastrup Original-X-From: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Sun Mar 01 18:03:16 2015 Return-path: Envelope-to: guile-bugs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1YS7Gq-0003Qw-Lm for guile-bugs@m.gmane.org; Sun, 01 Mar 2015 18:03:12 +0100 Original-Received: from localhost ([::1]:53298 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YS7Gp-0008Ja-EH for guile-bugs@m.gmane.org; Sun, 01 Mar 2015 12:03:11 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:49660) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YS7Gk-0008Hj-7g for bug-guile@gnu.org; Sun, 01 Mar 2015 12:03:08 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YS7Gg-00075T-UK for bug-guile@gnu.org; Sun, 01 Mar 2015 12:03:06 -0500 Original-Received: from debbugs.gnu.org ([140.186.70.43]:58130) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YS7Gg-00075O-Qq for bug-guile@gnu.org; Sun, 01 Mar 2015 12:03:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80) (envelope-from ) id 1YS7Gg-0007h3-GH for bug-guile@gnu.org; Sun, 01 Mar 2015 12:03:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Mark H Weaver Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Sun, 01 Mar 2015 17:03:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 19883 X-GNU-PR-Package: guile X-GNU-PR-Keywords: Original-Received: via spool by 19883-submit@debbugs.gnu.org id=B19883.142522935929540 (code B ref 19883); Sun, 01 Mar 2015 17:03:02 +0000 Original-Received: (at 19883) by debbugs.gnu.org; 1 Mar 2015 17:02:39 +0000 Original-Received: from localhost ([127.0.0.1]:33495 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1YS7GI-0007gN-S2 for submit@debbugs.gnu.org; Sun, 01 Mar 2015 12:02:39 -0500 Original-Received: from world.peace.net ([50.252.239.5]:60624 ident=hope8) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1YS7GC-0007gA-I6 for 19883@debbugs.gnu.org; Sun, 01 Mar 2015 12:02:36 -0500 Original-Received: from [10.1.10.78] (helo=jojen) by world.peace.net with esmtpsa (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.72) (envelope-from ) id 1YS7G0-0004fC-Ch; Sun, 01 Mar 2015 12:02:20 -0500 In-Reply-To: <87385p9ggl.fsf@fencepost.gnu.org> (David Kastrup's message of "Sun, 01 Mar 2015 11:09:46 +0100") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4 (gnu/linux) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 140.186.70.43 X-BeenThere: bug-guile@gnu.org List-Id: "Bug reports for GUILE, GNU's Ubiquitous Extension Language" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Original-Sender: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.bugs:7738 Archived-At: David Kastrup writes: > Since LilyPond relies on C++ data structures and containers for its > operation and those are also involved in the mark walks, "can be called > on arbitrary garbage" Who are you quoting here? > But the mark-on-general-garbage problem seems sort of system-immanent to > conservative garbage collection [...] Marking arbitrary garbage is certainly not inherent (is that the word you meant to use?) to conservative garbage collection. There may be bugs, but BDWGC is designed to prevent this. See the "Allocation" and "Mark phase" sections of . In particular, from the "Allocation" section: The collector uses a two level allocator. A large block is defined to be one larger than half of HBLKSIZE, which is a power of 2, typically on the order of the page size. Large block sizes are rounded up to the next multiple of HBLKSIZE and then allocated by GC_allochblk. Recent versions of the collector use an approximate best fit algorithm by keeping free lists for several large block sizes. The actual implementation of GC_allochblk is significantly complicated by black-listing issues (see below). Small blocks are allocated in chunks of size HBLKSIZE. Each chunk is dedicated to only one object size and kind. >From the "Mark phase" section: We determine whether a candidate pointer is actually the address of a heap block. This is done in the following steps: The candidate pointer is checked against rough heap bounds. These heap bounds are maintained such that all actual heap objects fall between them. In order to facilitate black-listing (see below) we also include address regions that the heap is likely to expand into. Most non-pointers fail this initial test. The candidate pointer is divided into two pieces; the most significant bits identify a HBLKSIZE-sized page in the address space, and the least significant bits specify an offset within that page. (A hardware page may actually consist of multiple such pages. HBLKSIZE is usually the page size divided by a small power of two.) The page address part of the candidate pointer is looked up in a table. Each table entry contains either 0, indicating that the page is not part of the garbage collected heap, a small integer n, indicating that the page is part of large object, starting at least n pages back, or a pointer to a descriptor for the page. In the first case, the candidate pointer i not a true pointer and can be safely ignored. In the last two cases, we can obtain a descriptor for the page containing the beginning of the object. The starting address of the referenced object is computed. The page descriptor contains the size of the object(s) in that page, the object kind, and the necessary mark bits for those objects. The size information can be used to map the candidate pointer to the object starting address. To accelerate this process, the page header also contains a pointer to a precomputed map of page offsets to displacements from the beginning of an object. The use of this map avoids a potentially slow integer remainder operation in computing the object start address. The mark bit for the target object is checked and set. If the object was previously unmarked, the object is pushed on the mark stack. The descriptor is read from the page descriptor. (This is computed from information GC_obj_kinds when the page is first allocated.) In Guile, there is a dedicated "kind" for SMOBs that have user-defined mark functions. This means that they are segregrated into their own HBLKSIZE-sized blocks, and so BDWGC is able to reliably determine whether our general SMOB marker ('smob_mark' in smob.c) should be called, without relying on some unreliable tag inside the block itself. So, while it is true that a conservative GC cannot avoid possibly retaining some garbage that should be freed, it is possible to reliably avoid calling mark functions on arbitrary garbage, and BDWGC is designed to do this. BTW, a few more points to mention: I vaguely recall that you may have wondered in the past if Boehm GC "parallel marking" means that it is marking while the mutator (user code) is running. That is not the case. BDWGC first stops all user threads (on POSIX I believe this is done by sending signals to all threads and keeping them in the signal handlers during GC) and then runs multiple of its own marker threads in parallel. I guess this means that user-specified mark functions may be called in multiple threads. However, this can be avoided by setting the GC_MARKERS environment variable to "1", and I set this while running your test code to eliminate this possible complication for now. Another document worth looking at is: http://www.hboehm.info/gc/finalization.html > However, from the GUILE side the mark_smob calls originate from objects > detected to be SCM cells, and GUILE should be able to manage areas > filled with SCM cells such that it can _reliably_ distinguish allocated > from unallocated cells and there is no such thing as random garbage. As described above, BDWGC already handles this, and so far I haven't seen evidence of 'smob_mark' being called on random garbage. However, it seems that 'smob_mark' is being called on objects that have already been finalized. > When that is the case, it may be sufficient to clear out the tag field > in the course of calling the free_smob hook, avoiding any subsequent > mark calls to occur on a cell that has lost its type consistency anyway. > Again, this might involve race conditions: no idea. Yes, this sounds like a work-around worth considering. I'll think about whether it can be done reliably in Guile and/or LilyPond. It occurs to me that during times of heavy allocation, it might be possible for a GC to be triggered while a finalizer is being run. Since the object being finalized would be on the stack, it would be considered live by the GC, even if most or all of the finalizer had already been run. It warrants investigation. Also, while reading the GC design documentation, a particular warning raised some alarms. From "Finalization" in gcdescr.html: Any objects remaining unmarked at the end of this process are added to a queue of objects whose finalizers can be run. Depending on collector configuration, finalizers are dequeued and run either implicitly during allocation calls, or explicitly in response to a user request. (Note that the former is unfortunately both the default and not generally safe. If finalizers perform synchronization, it may result in deadlocks. Nontrivial finalizers generally need to perform synchronization, and thus require a different collector configuration.) and from finalization.html: Finalization code may be run anyplace an allocation or other call to the collector takes place. In multithreaded programs, finalizers have to obey the normal locking conventions to ensure safety. Code run directly from finalizers should not acquire locks that may be held during allocation. This restriction can be easily circumvented by registering a finalizer which enqueues the real action for execution in a separate thread. In single-threaded code, it is also often easiest to have finalizers queue actions, which are then explicitly run during an explicit call by the user's program. So it may be that we need to be careful of this. Here's more recommended reading on the difficulties of implementing finalizers properly: http://www.hpl.hp.com/techreports/2002/HPL-2002-335.html Finalizers are a huge can of worms. I suspect this is why Ludovic was encouraging you to avoid them entirely, but I agree that we need to find a way to make them work for when they are needed. > Because of potential race conditions, a reliable fix needs to be in > GUILE or BDWGC and cannot be done at the application end. The best we > will be able to achieve there is "crashes almost never randomly". I'm not sure this is true. It will require more thought. >> First, some changes that I think might be needed for GC safety: >> >> --- smobs.hh.ORIG 2015-02-15 13:35:03.000000000 -0500 >> +++ smobs.hh 2015-02-28 23:42:06.346803668 -0500 >> @@ -270,8 +271,9 @@ >> } >> SCM unprotected_smobify_self () >> { >> - self_scm_ = Smob_base::register_ptr (static_cast (this)); >> - return self_scm_; >> + SCM s = Smob_base::register_ptr (static_cast (this)); >> + self_scm_ = s; >> + return s; > > Don't see the intended difference here. It's conceivable that with > several "volatile" declarations the compiler may generate different code > that would protect against self_scm_ being modified before the function > returns. Is that supposed to help against non-atomic changes of > self_scm_? No, the idea is to ensure the existence of a GC-visible copy of the SCM value until 'scm_gc_protect_object' is called. For objects allocated in the malloc/free heap, self_scm_ is not visible to the GC. >> BTW, one other thing to keep in mind: it will never be enough if the >> only GC-visible reference to an object is the C++ Smob object. There >> must always be a GC-visible reference to the SCM value. > > That's why we need the mark functions in LilyPond. The mark functions can ensure that in your Family class, the children will be marked even though the parent->child pointers point directly to the C++ object. However, whenever the root of the tree is not protected by 'scm_gc_protect_object', the SCM value of that root must be visible to the GC, in order to cause the user-specified 'mark' function to be called in the first place. It should be noted that for any particular C++ class, another alternative is to arrange for instances of that class to be allocated using 'scm_gc_malloc', as Ludovic suggested. If you do that, then it would suffice to have a GC-visible pointer to instances of that class, which might make your life easier. Regards, Mark