unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* Mark procedures
@ 2015-11-04 10:10 Andy Wingo
  2015-11-04 12:01 ` Stefan Israelsson Tampe
                   ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Andy Wingo @ 2015-11-04 10:10 UTC (permalink / raw)
  To: Mark H. Weaver; +Cc: guile-devel

Greetings Mark!  And hello to guile-devel :)  Sorry for being
incommunicado in these last months.  The good news is that we're finally
ready to release 2.1.1, yay!  I'll be easing myself back into the
mailing list soon but I wanted to expand on a topic we discussed over
IRC yesterday: mark procedures.

To recap, since the ages of yore, Guile has allowed SMOB implementations
to specify "mark procedures".  Indeed in the olden days, these
procedures were practically required, as the GC wouldn't otherwise trace
any field of a SMOB.  When we started using the BDW conservative
collector, mark procedures became less necessary.  At least in master,
Guile itself doesn't specify any SMOB mark procedures, and the only
markers set via the lower-level "gc_kind" mechanism from BDW-GC are to
implement weak tables, and to precisely mark VM stacks.

I would ideally like a world in which mark procedures don't exist.  Mark
procedures have many really negative things about them: they are (1)
insufficient for what we want to do with them, (2) extraordinarily
error-prone, (3) (mostly) unnecessary, and (4) slow.  Let's take them
one by one.

Firstly, if we consider things we would like to do with our GC, mark
procedures don't help us get there.  They don't help with a moving GC;
that prototype would rather be "void (*mark) (SCM* scm_loc)", to allow
Guile to relocate a pointer, and even that doesn't quite cut it because
we allow for non-SCM-valued pointers as well.  What you need is a slot
visitor; something like
https://chromium.googlesource.com/v8/v8.git/+/master/src/objects-inl.h#1533.
This is the same facility you need to do heap profiling as well -- you
need both objects, and you get that by working on void** instead of
void*.

Secondly, mark procedures are an ***amazing*** foot-gun.  They run
concurrently with the user's program -- on random threads, if the
collector is doing parallel marking, but even without parallel marking,
a mark procedure can run from any allocation site -- or indeed any site,
if the program has multiple threads.  The mutator can hold almost any
lock at the time the mark procedure is called.  With finalizers we have
been able to avoid this via `scm_set_automatic_finalization_enabled' and
`scm_run_finalizers' but with mark procedures this isn't possible.  It's
also really hard to write a good mark procedure, once you start trying
anything interesting; and if you're not doing anything interesting, you
surely don't need a mark procedure, and can just rely on the builtin
tracer.  Additionally when you mess up somehow with your mark procedure,
the errors you get will appear far from their cause.

Thirdly, mark procedures are usually unnecessary.  By far the simplest
way around a mark procedure is to just let the conservative tracer
handle it, and just make sure your SMOB or whatever object has fields to
the things you need to trace.  However thinking forward to a
re-introduction of precise GC so that we can relocate objects, maybe
this is unsatisfactory; and yet mark procedures are still insufficient
for this case as I noted.  There are a few options we have today and a
few more things we could build in the future:

  * If your SMOB A holds on to field B and there is no path from B to A,
    you can just scm_gc_protect_object on B, and arrange for A's
    finalizer to scm_gc_unprotect_object on B.  Finalizers are safer
    than mark procedures (and that's saying something).

  * Or you could add an association from A to B to a weak-key table.

  * If there is a possibility of a path from B to A, you need an
    ephemeron table, and Guile doesn't do that right now.  But it
    should!

  * If your object A has a fixed shape, in that it will always hold on
    to an object B and that object will be in word N of the SMOB, then
    the SMOB can have an associated descriptor indicating how the object
    is laid out.  Guile's GC could use this descriptor to know what to
    mark, whether the values are tagged or not (if we adopted a
    different tagging system that made this necessary), and so on.  This
    is already possible with GC_make_descriptor.

  * If your object A does not have a fixed shape and you are using mark
    procedures to punch through other C/C++ data structures to grab
    references, it is probable that you have already gotten your mark
    function wrong due to concurrency.  The C/C++ data structures are
    not necessarily in a suitable state for access.  This is the case in
    which mark procedures actually do something for you, but probably it
    means randomly crashing your program, so there's that.

Finally, mark procedures are slow.  They call out from the hot path in
GC and they recurse back into the library via scm_gc_mark.  scm_gc_mark
actually communicates state via thread-local variables, which are slow
as well.  Their existence can prevent us as hackers from making changes
to the GC that would improve performance for users that don't use mark
procedures.  Probably the fastest way to use them is for each object
kind with mark procedures to be allocated out of its own pool, meaning
that objects with mark procedures increase fragmentation of the system
as a whole.

In summary, mark procedures have a lot of points against them,
especially as currently implemented!  There's only one case that I am
aware of in which mark procedures actually bring anything to the
table -- the case in which you punch through complicated data structures
that the GC can't trace -- but (a) I'm not sure that case wouldn't be
served just as well via ephemerons (b) I seriously doubt that any such
mark procedure implementation is actually correct.  Besides the fact
that the formulation of mark procedures as they are doesn't serve our
future purposes :P

Not even the JNI makes the mistake of exporting mark functions, nor do
any of the high-performance JavaScript implementations, nor does Lua,
etc etc.  At the most we should be making our foreign objects define
type descriptors and letting Guile itself handle the tracing
implementation.

At the end of our IRC conversation yesterday you put the burden on me to
argue against mark procedures, which was fair, but at this point I think
we would need good arguments for keeping them, at least in the long run
:)  In the short run as we add new APIs like the foreign object
facility, I think it makes sense to *avoid* adding mark functions to the
new APIs, at least for now.  We don't have the use cases right now and
so we would surely specify the wrong thing.  We can always add something
later.

Regards,

Andy



^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2021-09-08 20:50 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-04 10:10 Mark procedures Andy Wingo
2015-11-04 12:01 ` Stefan Israelsson Tampe
2015-11-04 17:01 ` Mark procedures, LilyPond, and backward compatibility Mark H Weaver
2015-11-05 10:16   ` Foreign object API Ludovic Courtès
2016-02-01 22:50     ` Foreign objects removed from ‘stable-2.0’ Ludovic Courtès
2015-11-05 14:46   ` Mark procedures, LilyPond, and backward compatibility Andy Wingo
2015-11-05 10:29 ` Mark procedures Ludovic Courtès
2015-11-05 13:11   ` Andy Wingo
2015-11-05 14:17     ` Ludovic Courtès
2015-11-06 12:32     ` Mark procedures and LilyPond Mark H Weaver
2015-11-06 13:50       ` Ludovic Courtès
2015-11-06 15:05       ` Stefan Monnier
2016-01-24  8:58       ` Hans Åberg
2016-06-20 10:34     ` Mark procedures Andy Wingo
2016-06-20 12:15       ` Ludovic Courtès
2021-05-18 15:46 ` Ephemerons, self-referentality in weak hashtables Christopher Lemmer Webber
2021-06-20 15:01   ` Maxime Devos
2021-06-21 17:15     ` Maxime Devos
2021-09-08 16:18       ` Christine Lemmer-Webber
2021-09-08 20:11         ` Maxime Devos
2021-09-08 20:50           ` Christine Lemmer-Webber

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).