Hello,

This reminds me of a related but distinct issue - region-based allocation.
Hypothetically, you could use regions to implement a limited form of this
accounting, if you could ask the question, "what is the size of this memory
region?"

It certainly wouldn't do as much as custodians, but it would serve for the
simple purpose of a sandbox. It would also have another nice effect,
because you could allocate all of the sandbox's objects together and get
memory locality when running in the sandbox.

I don't know if this is the right solution, but I am curious what other
people think.

Best,
Noah


On Sun, Mar 9, 2014 at 6:48 AM, Andy Wingo <wingo-e+AXbWqSrlAAvxtiuMwx3w@public.gmane.org> wrote:

> Hi,
>
> I was thinking this morning about memory accounting in libgc.  There are
> a couple of use cases for this.
>
> One is simply asking "how much memory does this object use?"
>
> Another case is the common "implement an IRC bot that people can use to
> evaluate expressions in a sandbox, but with a limit on the amount of
> time and space those programs can use".  Let's assume that we've already
> arranged that the program can't write to files or do other things we
> wouldn't like it to do, and also that we have given it a time limit
> through some means.
>
> Both of these cases should be handled by GC, since GC is what can see
> the whole object graph in a program.  The rest of this mail describes
> what Racket does in their GC, and how we might be able to do it in
> libgc.
>
> Racket has a type of objects called "custodians".  Custodians nest.
> There is one root custodian for the whole program, and creating a new
> custodian adds a new kid of the current custodian.  There is a separate
> form to make a custodian current.  New threads are created "within" the
> current custodian.  (Racket threads are not pthreads, but this concept
> generalizes to pthreads.)  There is also a form to "enter" a new
> custodian, effectively partitioning a thread's stack into a range owned
> by the old custodian, and a range owned by the new custodian.
>
> If there are multiple custodians in a program -- something that isn't
> always the case -- then after GC runs, their GC calls
> BTC_do_accounting() here:
>
>
> http://git.racket-lang.org/plt/blob/c27930f9399564497208eb31a44f7091d02ab7be:/racket/src/racket/gc2/mem_account.c#l417
>
> This does a mark of the program's objects, but in a different mode (the
> "accounting" flag is set).  Custodians and threads have a separate GC
> "kind", in BDW-GC terms.  When marking a custodian or a thread in
> accounting mode, the mark procedure only recurses into sub-objects if
> the custodian is "current".  First, all the program static roots are
> marked, excluding stacks, and memory use from those roots is attributed
> to the root custodian.  Then BTC_do_accounting traverses the custodian
> tree, in pre-order, for each custodian setting it "current" and running
> its mark procedure, and marking stack segments for any threads created
> by that custodian.  Any visited unmarked object is accounted to the
> current custodian (i.e., the size of the object is added to the current
> custodian's size counter).  Then the tree is traversed in post-order,
> and child memory use is also accounted to parents.  This is O(n) in live
> heap size and doesn't require any additional memory.
>
> Finally if any custodian is over its limit, an after-GC hook sets that
> custodian as scheduled for shutdown, which in Racket will close
> resources associated with the custodian (for example, files opened when
> the custodian was current), arrange to kill threads spawned by the
> custodian, arrange to longjmp() out of any entered custodian, etc.
>
> How does this affect libgc?
>
> First of all, it gives an answer to the question of "how much memory
> does an object use" -- simply stop the world, mark the heap in two parts
> (the first time ignoring the object in question, the second time
> starting from the object), and subtract the live heap size of the former
> from the latter.  Libgc could do this without too much problem, it seems
> to me, on objects of any kind.  It would be a little extra code but it
> could be useful.  Or not?  Dunno.
>
> On the question of limiting memory usage -- clearly, arranging to free
> memory is out of libgc's purview, as that's an application issue.  But
> allowing an application to impose a memory limitation on a part of the
> program does need libgc help, and it would be most usefully done via
> something like a custodian tree.  Custodians could be implemented by a
> new GC kind, so that they get a mark procedure; but then we would need a
> hook to be able to do accounting after GC is done, and really it seems
> this is best done in libgc somehow (even if its implementation is
> layered on top of gc kinds, mark procedures, and such).
>
> Accounting a thread's stack to multiple owners is tricky, but doable --
> we could do it with a new version of GC_call_with_gc_active or so.
>
> There is also the issue of objects reachable by custodians that are
> "peers" -- where one does not dominate the other.  In that case the
> object would be charged randomly to one or the other.  That's probably
> OK.
>
> Also, you would probably want to maintain a general set of per-custodian
> data -- file descriptors, for example.  One would also want to maintain
> an estimate of non-GC allocations per-custodian (for example for
> third-party image processing libraries), but I don't know how that could
> be done; perhaps it is only possible if you register a separate GC kind
> for those objects, perhaps with a mark procedure.
>
> Anyway, just some thoughts.  I don't know when I could implement this,
> but I thought I'd put this out there in case someone had any ideas about
> this, or pointers to literature.  Cheers.
>
> Regards,
>
> Andy
> --
> http://wingolog.org/
>
>