From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Andy Wingo Newsgroups: gmane.lisp.guile.devel Subject: Mark procedures Date: Wed, 04 Nov 2015 10:10:46 +0000 Message-ID: <87vb9ihy6x.fsf@igalia.com> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1446631870 24113 80.91.229.3 (4 Nov 2015 10:11:10 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 4 Nov 2015 10:11:10 +0000 (UTC) Cc: guile-devel@gnu.org To: "Mark H. Weaver" Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Wed Nov 04 11:11:02 2015 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Ztv1v-0003SB-4k for guile-devel@m.gmane.org; Wed, 04 Nov 2015 11:10:59 +0100 Original-Received: from localhost ([::1]:53698 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Ztv1u-0002r0-PY for guile-devel@m.gmane.org; Wed, 04 Nov 2015 05:10:58 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:32772) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Ztv1q-0002mr-I5 for guile-devel@gnu.org; Wed, 04 Nov 2015 05:10:56 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Ztv1n-0005OS-2u for guile-devel@gnu.org; Wed, 04 Nov 2015 05:10:54 -0500 Original-Received: from pb-sasl0.int.icgroup.com ([208.72.237.25]:60684 helo=sasl.smtp.pobox.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Ztv1m-0005OB-Ns for guile-devel@gnu.org; Wed, 04 Nov 2015 05:10:51 -0500 Original-Received: from sasl.smtp.pobox.com (unknown [127.0.0.1]) by pb-sasl0.pobox.com (Postfix) with ESMTP id D6D071FF86; Wed, 4 Nov 2015 05:10:49 -0500 (EST) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=from:to:cc :subject:date:message-id:mime-version:content-type; s=sasl; bh=N 7AWpE3UInz0aULxCt4hV8fgKq4=; b=PoqOvp/Qa9Og0urrGSOPGoHZ+5k7UIHBA v+BhBe9kVAmYYPzfPnugbqUgdcRPIYsKOL6pb8pGZljCkoJk4Wh9ZgLOSukHTavf IFn2HRAxgZtRHJdL5WUXI8xdf8Gu9X98t/cU8dJuPN/SwJDiU1r6KkeqvIWWj8Ni rmic4ixEiw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=from:to:cc :subject:date:message-id:mime-version:content-type; q=dns; s= sasl; b=e/a0bYXkv6kLhqspMShwv9mLfEwvJtsaKO/704RvjVFA/Ai7A0L7Xame 6R/903RcmAq03x0sAwrlE2DQkGMBAR5SvcKsF8s5dVcl5Gf3jyUulKcqdlLV1oxG KDWL9hJ3u535EQYoARcDOrUNuiiEb+/afZdlNmmtg6K+74p+e1c= Original-Received: from pb-sasl0.int.icgroup.com (unknown [127.0.0.1]) by pb-sasl0.pobox.com (Postfix) with ESMTP id CF4E01FF85; Wed, 4 Nov 2015 05:10:49 -0500 (EST) Original-Received: from rusty (unknown [88.160.190.192]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by pb-sasl0.pobox.com (Postfix) with ESMTPSA id 2E7BE1FF82; Wed, 4 Nov 2015 05:10:49 -0500 (EST) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux) X-Pobox-Relay-ID: 527EF250-82DC-11E5-BC65-31311E2D4245-02397024!pb-sasl0.pobox.com X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.72.237.25 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:17968 Archived-At: Greetings Mark! And hello to guile-devel :) Sorry for being incommunicado in these last months. The good news is that we're finally ready to release 2.1.1, yay! I'll be easing myself back into the mailing list soon but I wanted to expand on a topic we discussed over IRC yesterday: mark procedures. To recap, since the ages of yore, Guile has allowed SMOB implementations to specify "mark procedures". Indeed in the olden days, these procedures were practically required, as the GC wouldn't otherwise trace any field of a SMOB. When we started using the BDW conservative collector, mark procedures became less necessary. At least in master, Guile itself doesn't specify any SMOB mark procedures, and the only markers set via the lower-level "gc_kind" mechanism from BDW-GC are to implement weak tables, and to precisely mark VM stacks. I would ideally like a world in which mark procedures don't exist. Mark procedures have many really negative things about them: they are (1) insufficient for what we want to do with them, (2) extraordinarily error-prone, (3) (mostly) unnecessary, and (4) slow. Let's take them one by one. Firstly, if we consider things we would like to do with our GC, mark procedures don't help us get there. They don't help with a moving GC; that prototype would rather be "void (*mark) (SCM* scm_loc)", to allow Guile to relocate a pointer, and even that doesn't quite cut it because we allow for non-SCM-valued pointers as well. What you need is a slot visitor; something like https://chromium.googlesource.com/v8/v8.git/+/master/src/objects-inl.h#1533. This is the same facility you need to do heap profiling as well -- you need both objects, and you get that by working on void** instead of void*. Secondly, mark procedures are an ***amazing*** foot-gun. They run concurrently with the user's program -- on random threads, if the collector is doing parallel marking, but even without parallel marking, a mark procedure can run from any allocation site -- or indeed any site, if the program has multiple threads. The mutator can hold almost any lock at the time the mark procedure is called. With finalizers we have been able to avoid this via `scm_set_automatic_finalization_enabled' and `scm_run_finalizers' but with mark procedures this isn't possible. It's also really hard to write a good mark procedure, once you start trying anything interesting; and if you're not doing anything interesting, you surely don't need a mark procedure, and can just rely on the builtin tracer. Additionally when you mess up somehow with your mark procedure, the errors you get will appear far from their cause. Thirdly, mark procedures are usually unnecessary. By far the simplest way around a mark procedure is to just let the conservative tracer handle it, and just make sure your SMOB or whatever object has fields to the things you need to trace. However thinking forward to a re-introduction of precise GC so that we can relocate objects, maybe this is unsatisfactory; and yet mark procedures are still insufficient for this case as I noted. There are a few options we have today and a few more things we could build in the future: * If your SMOB A holds on to field B and there is no path from B to A, you can just scm_gc_protect_object on B, and arrange for A's finalizer to scm_gc_unprotect_object on B. Finalizers are safer than mark procedures (and that's saying something). * Or you could add an association from A to B to a weak-key table. * If there is a possibility of a path from B to A, you need an ephemeron table, and Guile doesn't do that right now. But it should! * If your object A has a fixed shape, in that it will always hold on to an object B and that object will be in word N of the SMOB, then the SMOB can have an associated descriptor indicating how the object is laid out. Guile's GC could use this descriptor to know what to mark, whether the values are tagged or not (if we adopted a different tagging system that made this necessary), and so on. This is already possible with GC_make_descriptor. * If your object A does not have a fixed shape and you are using mark procedures to punch through other C/C++ data structures to grab references, it is probable that you have already gotten your mark function wrong due to concurrency. The C/C++ data structures are not necessarily in a suitable state for access. This is the case in which mark procedures actually do something for you, but probably it means randomly crashing your program, so there's that. Finally, mark procedures are slow. They call out from the hot path in GC and they recurse back into the library via scm_gc_mark. scm_gc_mark actually communicates state via thread-local variables, which are slow as well. Their existence can prevent us as hackers from making changes to the GC that would improve performance for users that don't use mark procedures. Probably the fastest way to use them is for each object kind with mark procedures to be allocated out of its own pool, meaning that objects with mark procedures increase fragmentation of the system as a whole. In summary, mark procedures have a lot of points against them, especially as currently implemented! There's only one case that I am aware of in which mark procedures actually bring anything to the table -- the case in which you punch through complicated data structures that the GC can't trace -- but (a) I'm not sure that case wouldn't be served just as well via ephemerons (b) I seriously doubt that any such mark procedure implementation is actually correct. Besides the fact that the formulation of mark procedures as they are doesn't serve our future purposes :P Not even the JNI makes the mistake of exporting mark functions, nor do any of the high-performance JavaScript implementations, nor does Lua, etc etc. At the most we should be making our foreign objects define type descriptors and letting Guile itself handle the tracing implementation. At the end of our IRC conversation yesterday you put the burden on me to argue against mark procedures, which was fair, but at this point I think we would need good arguments for keeping them, at least in the long run :) In the short run as we add new APIs like the foreign object facility, I think it makes sense to *avoid* adding mark functions to the new APIs, at least for now. We don't have the use cases right now and so we would surely specify the wrong thing. We can always add something later. Regards, Andy