Re: Mark procedures - Stefan Israelsson Tampe

unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed

From: Stefan Israelsson Tampe <stefan.itampe@gmail.com>
To: Andy Wingo <wingo@pobox.com>
Cc: "Mark H. Weaver" <mhw@netris.org>, guile-devel <guile-devel@gnu.org>
Subject: Re: Mark procedures
Date: Wed, 4 Nov 2015 13:01:50 +0100	[thread overview]
Message-ID: <CAGua6m0M9aSi3Kzho_H87FfFYho8dcS98VxcLGHmA4XfyMvMDw@mail.gmail.com> (raw)
In-Reply-To: <87vb9ihy6x.fsf@igalia.com>

[-- Attachment #1: Type: text/plain, Size: 11098 bytes --]

Good post!

I first like to second the need for our own gc.

Regarding mark procedures. I prefer that we collect the use cases and make
sure that the framework can handle them in
a robust and effective way. I don't like to write mark procedures at all
and your fixes seam to address quite a lot of my own
hack using mark procedures.

So let's see if we can support guile-log's logical programs. I can run
quite advanced programs from the
prolog world like constraint programming in finite domains. To run these
programs you need gc'ing of
logical variables. The problem with an assoc datastructure to represent
prolog variables are that they
keep a reference to all assigned variables and you loose the ability to gc
them. kanren has a few trick
that can help but they are of the kind "if your lucky you will not blow the
stack" so I can't use that. What
I do is to maintain all logical asignement in a global stack and work with
that hands on.

One of the essential features is to detect when a variable in the stack is
ready for garbage collection. How to solve this?

This is my plan
finalizers:
if v is in the stack, then I would mark them as in the stack. In the
finilizer I would look if the variable is in the stack
and then mark it as ready for reclaim, and then keep them as referenced and
also clear the data. It would then be good
if we could get hook to the sweep phase so that we can scan the prolog
stack, and remove the variables not reachable but in the stack.
This means that we could benefit from a secondary reclaim feature of memory
else the variables would stay in memory until
the next gc.

The problem is that these stacks needs to mark some fields and not other
fields. This can be fixed by protecting added stuff and finaliziers
but it will slow down the handling. But this pattern can be coded what you
need is a function that tells if a field should be marked or not
we could for example add a pair of a vector and a bitvector where the
bitvector is telling about gc or not similarly to your struct suggestion

Another interesting gc feature is that when you set a variable multiple
times you can clear all data untill the last one after
a backtracking point until the next one, but the backtracking point can be
garbage collected as well. So I need to scan the stack to see if there is
data
that can be reclaimed at each sweep phase.This means it can take multiple
gc's to reclaim all potential memory and I'm not happy
as it is now - When working on this I would probably introduce a doubly
linked list between the backtracking points and maintain them
so essentially when you mark the backtracking point I need to scan the
stack to the next backtracking point and mark all the last reseted
values. Doing this will be a good solution that does not need multiple gc.

Another issue is that a backtracking point can be located in a state tree.
Here again I need to maintain links correctly so that I can scan
the list and mark reseted values just as explained for the stack.

One way around this issue is maintain a library in guile that implements a
general datastructure that can be used as a prolog stack. Or some
other infrastructure that can be used. The pattern is indeed interesting to
try to generalize.

What I do fear though is that the gc of prolog programs with lot's of state
stored will lead to quite a big rucksack of finilizer logic. The state is
stored
in trees and I need to maintain them on the basis of special gc treatment.
A solution again would be to store a bit tree that tells about the marking
of the
elements in the tree so we would have a third data-structure where we would
like to conditionally mark elements. This means that we could use a
finalizer
on the backtracking object.

Finally a question.
If in a finilizer I mark objects. Would I then be able mark what they
reference in turn and so on?

Thinking about this maybe we could use structs to shield of gc but it can
have an awful overhead. I would love to have a bit in SCM
values that tell if they need special gc or no special gc e.g. if this bit
is marked you do your work in the finilizer where you can force
a mark onto the variable although the bit is set.

Regards
Stefan

On Wed, Nov 4, 2015 at 11:10 AM, Andy Wingo <wingo@pobox.com> wrote:

> Greetings Mark!  And hello to guile-devel :)  Sorry for being
> incommunicado in these last months.  The good news is that we're finally
> ready to release 2.1.1, yay!  I'll be easing myself back into the
> mailing list soon but I wanted to expand on a topic we discussed over
> IRC yesterday: mark procedures.
>
> To recap, since the ages of yore, Guile has allowed SMOB implementations
> to specify "mark procedures".  Indeed in the olden days, these
> procedures were practically required, as the GC wouldn't otherwise trace
> any field of a SMOB.  When we started using the BDW conservative
> collector, mark procedures became less necessary.  At least in master,
> Guile itself doesn't specify any SMOB mark procedures, and the only
> markers set via the lower-level "gc_kind" mechanism from BDW-GC are to
> implement weak tables, and to precisely mark VM stacks.
>
> I would ideally like a world in which mark procedures don't exist.  Mark
> procedures have many really negative things about them: they are (1)
> insufficient for what we want to do with them, (2) extraordinarily
> error-prone, (3) (mostly) unnecessary, and (4) slow.  Let's take them
> one by one.
>
> Firstly, if we consider things we would like to do with our GC, mark
> procedures don't help us get there.  They don't help with a moving GC;
> that prototype would rather be "void (*mark) (SCM* scm_loc)", to allow
> Guile to relocate a pointer, and even that doesn't quite cut it because
> we allow for non-SCM-valued pointers as well.  What you need is a slot
> visitor; something like
> https://chromium.googlesource.com/v8/v8.git/+/master/src/objects-inl.h#1533
> .
> This is the same facility you need to do heap profiling as well -- you
> need both objects, and you get that by working on void** instead of
> void*.
>
> Secondly, mark procedures are an ***amazing*** foot-gun.  They run
> concurrently with the user's program -- on random threads, if the
> collector is doing parallel marking, but even without parallel marking,
> a mark procedure can run from any allocation site -- or indeed any site,
> if the program has multiple threads.  The mutator can hold almost any
> lock at the time the mark procedure is called.  With finalizers we have
> been able to avoid this via `scm_set_automatic_finalization_enabled' and
> `scm_run_finalizers' but with mark procedures this isn't possible.  It's
> also really hard to write a good mark procedure, once you start trying
> anything interesting; and if you're not doing anything interesting, you
> surely don't need a mark procedure, and can just rely on the builtin
> tracer.  Additionally when you mess up somehow with your mark procedure,
> the errors you get will appear far from their cause.
>
> Thirdly, mark procedures are usually unnecessary.  By far the simplest
> way around a mark procedure is to just let the conservative tracer
> handle it, and just make sure your SMOB or whatever object has fields to
> the things you need to trace.  However thinking forward to a
> re-introduction of precise GC so that we can relocate objects, maybe
> this is unsatisfactory; and yet mark procedures are still insufficient
> for this case as I noted.  There are a few options we have today and a
> few more things we could build in the future:
>
>   * If your SMOB A holds on to field B and there is no path from B to A,
>     you can just scm_gc_protect_object on B, and arrange for A's
>     finalizer to scm_gc_unprotect_object on B.  Finalizers are safer
>     than mark procedures (and that's saying something).
>
>   * Or you could add an association from A to B to a weak-key table.
>
>   * If there is a possibility of a path from B to A, you need an
>     ephemeron table, and Guile doesn't do that right now.  But it
>     should!
>
>   * If your object A has a fixed shape, in that it will always hold on
>     to an object B and that object will be in word N of the SMOB, then
>     the SMOB can have an associated descriptor indicating how the object
>     is laid out.  Guile's GC could use this descriptor to know what to
>     mark, whether the values are tagged or not (if we adopted a
>     different tagging system that made this necessary), and so on.  This
>     is already possible with GC_make_descriptor.
>
>   * If your object A does not have a fixed shape and you are using mark
>     procedures to punch through other C/C++ data structures to grab
>     references, it is probable that you have already gotten your mark
>     function wrong due to concurrency.  The C/C++ data structures are
>     not necessarily in a suitable state for access.  This is the case in
>     which mark procedures actually do something for you, but probably it
>     means randomly crashing your program, so there's that.
>
> Finally, mark procedures are slow.  They call out from the hot path in
> GC and they recurse back into the library via scm_gc_mark.  scm_gc_mark
> actually communicates state via thread-local variables, which are slow
> as well.  Their existence can prevent us as hackers from making changes
> to the GC that would improve performance for users that don't use mark
> procedures.  Probably the fastest way to use them is for each object
> kind with mark procedures to be allocated out of its own pool, meaning
> that objects with mark procedures increase fragmentation of the system
> as a whole.
>
> In summary, mark procedures have a lot of points against them,
> especially as currently implemented!  There's only one case that I am
> aware of in which mark procedures actually bring anything to the
> table -- the case in which you punch through complicated data structures
> that the GC can't trace -- but (a) I'm not sure that case wouldn't be
> served just as well via ephemerons (b) I seriously doubt that any such
> mark procedure implementation is actually correct.  Besides the fact
> that the formulation of mark procedures as they are doesn't serve our
> future purposes :P
>
> Not even the JNI makes the mistake of exporting mark functions, nor do
> any of the high-performance JavaScript implementations, nor does Lua,
> etc etc.  At the most we should be making our foreign objects define
> type descriptors and letting Guile itself handle the tracing
> implementation.
>
> At the end of our IRC conversation yesterday you put the burden on me to
> argue against mark procedures, which was fair, but at this point I think
> we would need good arguments for keeping them, at least in the long run
> :)  In the short run as we add new APIs like the foreign object
> facility, I think it makes sense to *avoid* adding mark functions to the
> new APIs, at least for now.  We don't have the use cases right now and
> so we would surely specify the wrong thing.  We can always add something
> later.
>
> Regards,
>
> Andy
>
>

[-- Attachment #2: Type: text/html, Size: 13158 bytes --]

next prev parent reply	other threads:[~2015-11-04 12:01 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-04 10:10 Mark procedures Andy Wingo
2015-11-04 12:01 ` Stefan Israelsson Tampe [this message]
2015-11-04 17:01 ` Mark procedures, LilyPond, and backward compatibility Mark H Weaver
2015-11-05 10:16   ` Foreign object API Ludovic Courtès
2016-02-01 22:50     ` Foreign objects removed from ‘stable-2.0’ Ludovic Courtès
2015-11-05 14:46   ` Mark procedures, LilyPond, and backward compatibility Andy Wingo
2015-11-05 10:29 ` Mark procedures Ludovic Courtès
2015-11-05 13:11   ` Andy Wingo
2015-11-05 14:17     ` Ludovic Courtès
2015-11-06 12:32     ` Mark procedures and LilyPond Mark H Weaver
2015-11-06 13:50       ` Ludovic Courtès
2015-11-06 15:05       ` Stefan Monnier
2016-01-24  8:58       ` Hans Åberg
2016-06-20 10:34     ` Mark procedures Andy Wingo
2016-06-20 12:15       ` Ludovic Courtès
2021-05-18 15:46 ` Ephemerons, self-referentality in weak hashtables Christopher Lemmer Webber
2021-06-20 15:01   ` Maxime Devos
2021-06-21 17:15     ` Maxime Devos
2021-09-08 16:18       ` Christine Lemmer-Webber
2021-09-08 20:11         ` Maxime Devos
2021-09-08 20:50           ` Christine Lemmer-Webber

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGua6m0M9aSi3Kzho_H87FfFYho8dcS98VxcLGHmA4XfyMvMDw@mail.gmail.com \
    --to=stefan.itampe@gmail.com \
    --cc=guile-devel@gnu.org \
    --cc=mhw@netris.org \
    --cc=wingo@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).