From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Noah Lavine Newsgroups: gmane.comp.programming.garbage-collection.boehmgc,gmane.lisp.guile.devel Subject: Re: [Gc] Memory accounting in libgc Date: Tue, 18 Mar 2014 11:10:31 -0400 Message-ID: References: <87k3c33awa.fsf@pobox.com> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0389672580==" X-Trace: ger.gmane.org 1395155457 31166 80.91.229.3 (18 Mar 2014 15:10:57 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 18 Mar 2014 15:10:57 +0000 (UTC) Cc: bdwgc , guile-devel To: Andy Wingo Original-X-From: bdwgc-bounces-ZwoEplunGu1I4Lznb4ZCK0B+6BGkLq7r@public.gmane.org Tue Mar 18 16:11:05 2014 Return-path: Envelope-to: gcpgb-gc-Uylq5CNFT+jYtjvyW6yDsg@public.gmane.org Original-Received: from tomlinson.opendylan.org ([217.115.14.9]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1WPvfV-000495-9p for gcpgb-gc-Uylq5CNFT+jYtjvyW6yDsg@public.gmane.org; Tue, 18 Mar 2014 16:11:05 +0100 Original-Received: from tomlinson.opendylan.org (localhost [127.0.0.1]) by tomlinson.opendylan.org (Postfix) with ESMTP id B8718443F0; Tue, 18 Mar 2014 16:11:03 +0100 (CET) Original-Received: from tomlinson.opendylan.org (localhost [127.0.0.1]) by spamd.opendylan.org (Postfix) with ESMTP id 81A1E444CB for ; Tue, 18 Mar 2014 16:10:58 +0100 (CET) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on tomlinson.opendylan.org X-Spam-Level: X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,FREEMAIL_FROM, HTML_MESSAGE autolearn=ham version=3.3.1 Received-SPF: pass (gmail.com ... _spf.google.com: Sender is authorized to use 'noah549-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org' in 'mfrom' identity (mechanism 'include:_netblocks2.google.com' matched)) receiver=tomlinson.opendylan.org; identity=mailfrom; envelope-from="noah549-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org"; helo=mail-pd0-x233.google.com; client-ip="2607:f8b0:400e:c02::233" Original-Received: from mail-pd0-x233.google.com (mail-pd0-x233.google.com [IPv6:2607:f8b0:400e:c02::233]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by tomlinson.opendylan.org (Postfix) with ESMTPS id F2262443F0 for ; Tue, 18 Mar 2014 16:10:57 +0100 (CET) Original-Received: by mail-pd0-f179.google.com with SMTP id w10so7180642pde.24 for ; Tue, 18 Mar 2014 08:10:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=9QAuB4C7rIkZhsIBoiu0JILevj53PmGNf+T2um6xk3w=; b=qi6/nfBp+4GJbGA40S01YCslx7c+qKqyIMHLfP1OUAbNX4o4Z9iRounDj1RhCx0i/J VbLeLBqizQIgMzyR5TbHYC6qBjEyivuxCN37l9apR2TKyhq3VlsLfh/tdiuHFHkUxJAc bRXKpQhzd+rjBx5Qk2PeZNXshTzrdDxpOknp13TSWmU0PC7SU4q+4aSoutm6Xa5syPxU fAmYsQJkOE5WsZ8wM/njl6OjQ+5Yu5KO+j47MHyKpYplgz8Iuai18SQi0yHceQvQphPu hjd4UUBwE+kc4HlGCPVsYac8kIQHZC1lxuD6VnPaFRDoRGkVSwwTYTie1UEikkwr5JXr vaiw== X-Received: by 10.66.136.229 with SMTP id qd5mr33588288pab.118.1395155451937; Tue, 18 Mar 2014 08:10:51 -0700 (PDT) Original-Received: by 10.68.101.101 with HTTP; Tue, 18 Mar 2014 08:10:31 -0700 (PDT) In-Reply-To: <87k3c33awa.fsf-e+AXbWqSrlAAvxtiuMwx3w@public.gmane.org> X-Google-Sender-Auth: d0VkBgdYDjec_nzyknmF30RrOEI X-BeenThere: bdwgc-ZwoEplunGu1I4Lznb4ZCK0B+6BGkLq7r@public.gmane.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: Boehm-Demers-Weiser Garbage Collector bugs/discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: bdwgc-bounces-ZwoEplunGu1I4Lznb4ZCK0B+6BGkLq7r@public.gmane.org Errors-To: bdwgc-bounces-ZwoEplunGu1I4Lznb4ZCK0B+6BGkLq7r@public.gmane.org Xref: news.gmane.org gmane.comp.programming.garbage-collection.boehmgc:5835 gmane.lisp.guile.devel:16983 Archived-At: --===============0389672580== Content-Type: multipart/alternative; boundary=001a113380b41e8a0704f4e2f1e1 --001a113380b41e8a0704f4e2f1e1 Content-Type: text/plain; charset=ISO-8859-1 Hello, This reminds me of a related but distinct issue - region-based allocation. Hypothetically, you could use regions to implement a limited form of this accounting, if you could ask the question, "what is the size of this memory region?" It certainly wouldn't do as much as custodians, but it would serve for the simple purpose of a sandbox. It would also have another nice effect, because you could allocate all of the sandbox's objects together and get memory locality when running in the sandbox. I don't know if this is the right solution, but I am curious what other people think. Best, Noah On Sun, Mar 9, 2014 at 6:48 AM, Andy Wingo wrote: > Hi, > > I was thinking this morning about memory accounting in libgc. There are > a couple of use cases for this. > > One is simply asking "how much memory does this object use?" > > Another case is the common "implement an IRC bot that people can use to > evaluate expressions in a sandbox, but with a limit on the amount of > time and space those programs can use". Let's assume that we've already > arranged that the program can't write to files or do other things we > wouldn't like it to do, and also that we have given it a time limit > through some means. > > Both of these cases should be handled by GC, since GC is what can see > the whole object graph in a program. The rest of this mail describes > what Racket does in their GC, and how we might be able to do it in > libgc. > > Racket has a type of objects called "custodians". Custodians nest. > There is one root custodian for the whole program, and creating a new > custodian adds a new kid of the current custodian. There is a separate > form to make a custodian current. New threads are created "within" the > current custodian. (Racket threads are not pthreads, but this concept > generalizes to pthreads.) There is also a form to "enter" a new > custodian, effectively partitioning a thread's stack into a range owned > by the old custodian, and a range owned by the new custodian. > > If there are multiple custodians in a program -- something that isn't > always the case -- then after GC runs, their GC calls > BTC_do_accounting() here: > > > http://git.racket-lang.org/plt/blob/c27930f9399564497208eb31a44f7091d02ab7be:/racket/src/racket/gc2/mem_account.c#l417 > > This does a mark of the program's objects, but in a different mode (the > "accounting" flag is set). Custodians and threads have a separate GC > "kind", in BDW-GC terms. When marking a custodian or a thread in > accounting mode, the mark procedure only recurses into sub-objects if > the custodian is "current". First, all the program static roots are > marked, excluding stacks, and memory use from those roots is attributed > to the root custodian. Then BTC_do_accounting traverses the custodian > tree, in pre-order, for each custodian setting it "current" and running > its mark procedure, and marking stack segments for any threads created > by that custodian. Any visited unmarked object is accounted to the > current custodian (i.e., the size of the object is added to the current > custodian's size counter). Then the tree is traversed in post-order, > and child memory use is also accounted to parents. This is O(n) in live > heap size and doesn't require any additional memory. > > Finally if any custodian is over its limit, an after-GC hook sets that > custodian as scheduled for shutdown, which in Racket will close > resources associated with the custodian (for example, files opened when > the custodian was current), arrange to kill threads spawned by the > custodian, arrange to longjmp() out of any entered custodian, etc. > > How does this affect libgc? > > First of all, it gives an answer to the question of "how much memory > does an object use" -- simply stop the world, mark the heap in two parts > (the first time ignoring the object in question, the second time > starting from the object), and subtract the live heap size of the former > from the latter. Libgc could do this without too much problem, it seems > to me, on objects of any kind. It would be a little extra code but it > could be useful. Or not? Dunno. > > On the question of limiting memory usage -- clearly, arranging to free > memory is out of libgc's purview, as that's an application issue. But > allowing an application to impose a memory limitation on a part of the > program does need libgc help, and it would be most usefully done via > something like a custodian tree. Custodians could be implemented by a > new GC kind, so that they get a mark procedure; but then we would need a > hook to be able to do accounting after GC is done, and really it seems > this is best done in libgc somehow (even if its implementation is > layered on top of gc kinds, mark procedures, and such). > > Accounting a thread's stack to multiple owners is tricky, but doable -- > we could do it with a new version of GC_call_with_gc_active or so. > > There is also the issue of objects reachable by custodians that are > "peers" -- where one does not dominate the other. In that case the > object would be charged randomly to one or the other. That's probably > OK. > > Also, you would probably want to maintain a general set of per-custodian > data -- file descriptors, for example. One would also want to maintain > an estimate of non-GC allocations per-custodian (for example for > third-party image processing libraries), but I don't know how that could > be done; perhaps it is only possible if you register a separate GC kind > for those objects, perhaps with a mark procedure. > > Anyway, just some thoughts. I don't know when I could implement this, > but I thought I'd put this out there in case someone had any ideas about > this, or pointers to literature. Cheers. > > Regards, > > Andy > -- > http://wingolog.org/ > > --001a113380b41e8a0704f4e2f1e1 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hello,

This reminds me of a related but= distinct issue - region-based allocation. Hypothetically, you could use re= gions to implement a limited form of this accounting, if you could ask the = question, "what is the size of this memory region?"

It certainly wouldn't do as much as custodians, but= it would serve for the simple purpose of a sandbox. It would also have ano= ther nice effect, because you could allocate all of the sandbox's objec= ts together and get memory locality when running in the sandbox.

I don't know if this is the right solution, but I a= m curious what other people think.

Best,
Noah


On Sun, Mar 9, 2014 at 6:48 AM, Andy Wingo <wingo-e+AXbWqSrlAAvxtiuMwx3w@public.gmane.org> w= rote:
Hi,

I was thinking this morning about memory accounting in libgc. =A0There are<= br> a couple of use cases for this.

One is simply asking "how much memory does this object use?"

Another case is the common "implement an IRC bot that people can use t= o
evaluate expressions in a sandbox, but with a limit on the amount of
time and space those programs can use". =A0Let's assume that we= 9;ve already
arranged that the program can't write to files or do other things we wouldn't like it to do, and also that we have given it a time limit
through some means.

Both of these cases should be handled by GC, since GC is what can see
the whole object graph in a program. =A0The rest of this mail describes
what Racket does in their GC, and how we might be able to do it in
libgc.

Racket has a type of objects called "custodians". =A0Custodians n= est.
There is one root custodian for the whole program, and creating a new
custodian adds a new kid of the current custodian. =A0There is a separate form to make a custodian current. =A0New threads are created "within&q= uot; the
current custodian. =A0(Racket threads are not pthreads, but this concept generalizes to pthreads.) =A0There is also a form to "enter" a ne= w
custodian, effectively partitioning a thread's stack into a range owned=
by the old custodian, and a range owned by the new custodian.

If there are multiple custodians in a program -- something that isn't always the case -- then after GC runs, their GC calls
BTC_do_accounting() here:

=A0 http://git.racket-lang.org/plt/blob/c27930f9399564497208eb31a44f7091d02a= b7be:/racket/src/racket/gc2/mem_account.c#l417

This does a mark of the program's objects, but in a different mode (the=
"accounting" flag is set). =A0Custodians and threads have a separ= ate GC
"kind", in BDW-GC terms. =A0When marking a custodian or a thread = in
accounting mode, the mark procedure only recurses into sub-objects if
the custodian is "current". =A0First, all the program static root= s are
marked, excluding stacks, and memory use from those roots is attributed
to the root custodian. =A0Then BTC_do_accounting traverses the custodian tree, in pre-order, for each custodian setting it "current" and r= unning
its mark procedure, and marking stack segments for any threads created
by that custodian. =A0Any visited unmarked object is accounted to the
current custodian (i.e., the size of the object is added to the current
custodian's size counter). =A0Then the tree is traversed in post-order,=
and child memory use is also accounted to parents. =A0This is O(n) in live<= br> heap size and doesn't require any additional memory.

Finally if any custodian is over its limit, an after-GC hook sets that
custodian as scheduled for shutdown, which in Racket will close
resources associated with the custodian (for example, files opened when
the custodian was current), arrange to kill threads spawned by the
custodian, arrange to longjmp() out of any entered custodian, etc.

How does this affect libgc?

First of all, it gives an answer to the question of "how much memory does an object use" -- simply stop the world, mark the heap in two par= ts
(the first time ignoring the object in question, the second time
starting from the object), and subtract the live heap size of the former from the latter. =A0Libgc could do this without too much problem, it seems<= br> to me, on objects of any kind. =A0It would be a little extra code but it could be useful. =A0Or not? =A0Dunno.

On the question of limiting memory usage -- clearly, arranging to free
memory is out of libgc's purview, as that's an application issue. = =A0But
allowing an application to impose a memory limitation on a part of the
program does need libgc help, and it would be most usefully done via
something like a custodian tree. =A0Custodians could be implemented by a new GC kind, so that they get a mark procedure; but then we would need a hook to be able to do accounting after GC is done, and really it seems
this is best done in libgc somehow (even if its implementation is
layered on top of gc kinds, mark procedures, and such).

Accounting a thread's stack to multiple owners is tricky, but doable --=
we could do it with a new version of GC_call_with_gc_active or so.

There is also the issue of objects reachable by custodians that are
"peers" -- where one does not dominate the other. =A0In that case= the
object would be charged randomly to one or the other. =A0That's probabl= y
OK.

Also, you would probably want to maintain a general set of per-custodian data -- file descriptors, for example. =A0One would also want to maintain an estimate of non-GC allocations per-custodian (for example for
third-party image processing libraries), but I don't know how that coul= d
be done; perhaps it is only possible if you register a separate GC kind
for those objects, perhaps with a mark procedure.

Anyway, just some thoughts. =A0I don't know when I could implement this= ,
but I thought I'd put this out there in case someone had any ideas abou= t
this, or pointers to literature. =A0Cheers.

Regards,

Andy
--
http://wingolog.org/=


--001a113380b41e8a0704f4e2f1e1-- --===============0389672580== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ bdwgc mailing list bdwgc-ZwoEplunGu1I4Lznb4ZCK0B+6BGkLq7r@public.gmane.org https://lists.opendylan.org/mailman/listinfo/bdwgc --===============0389672580==--