From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stefan Israelsson Tampe Newsgroups: gmane.lisp.guile.devel Subject: Re: Imporved cons representatoin in guile Date: Tue, 14 Jul 2015 15:18:52 +0200 Message-ID: References: NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=047d7bd755eefb6a22051ad5ade0 X-Trace: ger.gmane.org 1436879954 12288 80.91.229.3 (14 Jul 2015 13:19:14 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 14 Jul 2015 13:19:14 +0000 (UTC) Cc: guile-devel To: Stefan Monnier Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Tue Jul 14 15:19:11 2015 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1ZF075-0004c8-B3 for guile-devel@m.gmane.org; Tue, 14 Jul 2015 15:19:11 +0200 Original-Received: from localhost ([::1]:59606 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZF074-0001Ki-I4 for guile-devel@m.gmane.org; Tue, 14 Jul 2015 09:19:10 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:44040) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZF06x-0001KK-Ft for guile-devel@gnu.org; Tue, 14 Jul 2015 09:19:08 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZF06p-0005j5-IZ for guile-devel@gnu.org; Tue, 14 Jul 2015 09:19:03 -0400 Original-Received: from mail-pd0-x236.google.com ([2607:f8b0:400e:c02::236]:33005) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZF06p-0005hl-1U for guile-devel@gnu.org; Tue, 14 Jul 2015 09:18:55 -0400 Original-Received: by pdbqm3 with SMTP id qm3so6006162pdb.0 for ; Tue, 14 Jul 2015 06:18:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=0PDYII5/qbD/a0yjULiJNchhcu4ejvtinTEoKaROPgc=; b=sgMJ1hXQykl6ZKx24hbkdtkiYJXPyXg/QytD8QtZBf2N6u+VaPH/TZDA+P3dNXxHgp j6kIWjC3ULdLwaTApYNueS7IbeS50o8SxUobbyT+bg5OCtCFbfikkl6MS/2KYcrPVIgV BbpPU7ea6CDb/XWoyyQD68Db+lp972sgTLKH2MNnQfurBfKoK/FfPid5OFRCDREqY0HT +KtZ9jfM+TPTd+8aas+39dDLjIfiCnOww6adS5titS6r3xhY0FwBFd09dxnu09ogUSRy rJ9Ls2Vqy0ziTBZdqfF+mM5VVdyEPpoSEoQgnWCnmT3aQU3VoNt9yv4yhAioxcy7xQ06 AeEQ== X-Received: by 10.68.167.197 with SMTP id zq5mr80229780pbb.85.1436879932842; Tue, 14 Jul 2015 06:18:52 -0700 (PDT) Original-Received: by 10.70.78.132 with HTTP; Tue, 14 Jul 2015 06:18:52 -0700 (PDT) In-Reply-To: X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2607:f8b0:400e:c02::236 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:17760 Archived-At: --047d7bd755eefb6a22051ad5ade0 Content-Type: text/plain; charset=UTF-8 I think that getting cons cells to work with this is the easiest. With this one can try to benchmark the change and investigate. I think that I know enough of guile internals to make this work in a branch. I can then post for anybody to try out. If it is a good idea to try this let me know and I take a stab at it. I think that getting this to work on more datastructures is a bit tricky. One could for example use the current markings only on the proxied datastructures but then one needs a huge commit to make it work. Maybe you suggested this, But I think that the best would be to use two gc memory regions one for the current mark system and one for the compressed data structures where relative references is correctly followed. The drawback with this is that we need to double the thread caches of lists of free memory blocks. I know that this works cause I dabbled with multiple memory types in guile-log when making the prolog variables gc-able. This approach also enables an incremental development transitions of datastructures suitable for compression. > you'll want the GC to know about them, so it tries to keep/bring > related objects near each other. does bdw-gc do this on current cons cells? > I imagined it working by dividing the heap into zones whose size > corresponds to the "reachability" of an SCM/2 relative pointer (so an > SCM/2 field can point to any other object within the same zone without > going through a proxy), and have each zone come with an array > of proxies. To make this work in a good way is quite difficult. You need bdw-gc to group the free memory in relation to ther allocation block in order to simulate the demon of Maxwell if this is not currently done you need quite a hack into the bdw-gc code to make it work. Especially constants are loaded via installing pages and here I suspect that we can get problems with a lot of references going out of bounds. When it comes with the heap memory from the gc would that not mostly be within 30 bit relative adresses? anyhow I agree for large programs of aroung 1GB we would start to see more and more proxied datastructures ending with somthing that is unoptimal. I imagined that any solution to guile that incorporated a compresion feature would need the possibility via a configuration option to use the old uncompressed system. Especially it would be questionable to have large datastructures like vectors to be compressed cause the although small but positive probability of referencing constants. >I was thinking about this mostly in the context of 64bit boxes where the >SCM/2 references would basically only ever need to go through proxies once >the process size passes the 4GB limit. But for small-memory systems >using 32bit pointers, the same idea might work as well. Yeah, the information slack is quite high on these Boxes we need to make sure not to force this feature on the 32bit ones. > Interestingly, this split into zones+proxies would also allow the GC to > operate on each zone independently (you don't need to trace the objects > in the other zones, only their proxies). Not sure I follow, but marking a proxy implies marking in the other zone so I can't see this atm. What is not too difficult is to be able to have say 10 memory regions and then when you work with scheme code of large applications you could do, (with-mem-region 8 thunk) With the meaning that all allocation comes from region 8. This is a complexity code wise, but could be something that can be profiled out and just used on the hot code paths and datastructures. Now I'm not sure how well bdw-gc handles this but I think that controlling the adress ranges of allocated blocks is a relative small change to the code base, the issue here is probably portability. Ok That's all for now, Cheers! On Fri, Jul 10, 2015 at 7:31 PM, Stefan Monnier wrote: > > To compress even further we need a way to could use > > x ->[SCM/2/SCM/2], witt SCM/2 the same tagging half the size as the > normal > > Note that this is not specific to cons-cells. I've been meaning to try > and experiment with such a box-compression scheme for several years, but > it never got high enough on the todo list (tho I did get a student to > look at it, but he gave up his MSc before getting far enough). > > Note that because of set-car! and set-cdr! you need those SCM/2 fields > to be able to hold *any* value. So an SCM/2 field can hold 3 different > things: > - an immediate value. > - a compressed reference, typically using relative addressing, where > you'll want the GC to know about them, so it tries to keep/bring > related objects near each other. > - a "proxy reference", which identifies a full SCM cell which contains > the actual value. It can be a relative pointer or an index in some > table. > > I imagined it working by dividing the heap into zones whose size > corresponds to the "reachability" of an SCM/2 relative pointer (so an > SCM/2 field can point to any other object within the same zone without > going through a proxy), and have each zone come with an array > of proxies. > > I was thinking about this mostly in the context of 64bit boxes where the > SCM/2 references would basically only ever need to go through proxies once > the process size passes the 4GB limit. But for small-memory systems > using 32bit pointers, the same idea might work as well. > > Interestingly, this split into zones+proxies would also allow the GC to > operate on each zone independently (you don't need to trace the objects > in the other zones, only their proxies). > > > Stefan > > > --047d7bd755eefb6a22051ad5ade0 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

I think that getting cons cells to work wit= h this is the easiest. With this one can try to benchmark the change and in= vestigate.

I think that I know enough of guile internals= to make this work in a branch. I can then post for anybody to try out. If = it is a good idea to try this let me know and I take a stab at it.

I think that getting this to work on more datastructures i= s a bit tricky. One could for example use the current markings only on the = proxied datastructures but then one needs a huge commit to make it work.

Maybe you suggested this, But I think that the best = would be to use two gc memory regions one for the current mark system and o= ne for the compressed data structures where relative references is correctl= y followed. The drawback with this is that we need to double the thread cac= hes of lists of free memory blocks. I know that this works cause I dabbled = with multiple memory types in guile-log when making the prolog variables gc= -able. This approach also enables an incremental development transitions of= datastructures suitable for compression.

> you'll want the GC to know ab= out them, so it tries to keep/bring
> related objects= near each other.

does bdw-gc do this on current cons cells?

> I= imagined it working by dividing the heap into zones whose size
> corresponds to the "reachability" of an SCM/2 relati= ve pointer (so an
> SCM/2 field can point to any othe= r object within the same zone without
> going through= a proxy), and have each zone come with an array
> of= proxies.

<= /div>
To make this work in a good way is quite difficult. You need bdw-= gc to group the free memory in relation to ther allocation block in order t= o simulate the demon of Maxwell if this is not currently done you need quit= e a hack into the bdw-gc code to make it work.

Especially constants are loaded via installing pages and= here I suspect that we can get problems with a lot of references going out= of bounds. When it comes with the heap memory from the gc would that not m= ostly be within 30 bit relative adresses? anyhow I agree for large programs= of aroung 1GB we would start to see more and more proxied datastructures e= nding with somthing that is unoptimal. I imagined that any solution to guil= e that incorporated a compresion feature would need the possibility via a c= onfiguration option to use the old uncompressed system. Especially it would= be questionable to have large datastructures like vectors to be compressed= cause the although small but positive probability of referencing constants= .

=

>= I was thinking about this mostly in the context of 64bit boxes where the
>SCM/2 references would basically only ever need to go = through proxies once
>the process size passes the 4GB= limit.=C2=A0 But for small-memory systems

>using 32b= it pointers, the same idea might work as well.

Yeah, the information slack is quite high on these Boxes we need to = make sure not to force this feature on the 32bit ones.

> Interestingly, this split into zones+proxies would also= allow the GC to
> operate on each zone independently= (you don't need to trace the objects
> in the ot= her zones, only their proxies).
=
Not sure= I follow, but marking a proxy implies marking in the other zone so I can&#= 39;t see this atm.

Wh= at is not too difficult is to be able to have say 10 memory regions and the= n when you work with scheme code of large applications you could do,=

(with-mem-region 8 thunk)

With the meaning that = all allocation comes from region 8. This is a complexity code wise, but cou= ld be something that can be profiled out and just used on the hot code path= s and datastructures. Now I'm not sure how well bdw-gc handles this but= I think that controlling the adress ranges of allocated blocks is a relati= ve small change to the code base, the issue here is probably portability.

Ok That's all for = now,=C2=A0
Ch= eers!



On Fri, J= ul 10, 2015 at 7:31 PM, Stefan Monnier <monnier@iro.umontreal.ca> wrote:
>= To compress even further we need a way to could use
> x ->[SCM/2/SCM/2], witt SCM/2 the same tagging half the size as the= normal

Note that this is not specific to cons-cells.=C2=A0 I've been me= aning to try
and experiment with such a box-compression scheme for several years, but it never got high enough on the todo list (tho I did get a student to
look at it, but he gave up his MSc before getting far enough).

Note that because of set-car! and set-cdr! you need those SCM/2 fields
to be able to hold *any* value.=C2=A0 So an SCM/2 field can hold 3 differen= t
things:
- an immediate value.
- a compressed reference, typically using relative addressing, where
=C2=A0 you'll want the GC to know about them, so it tries to keep/bring=
=C2=A0 related objects near each other.
- a "proxy reference", which identifies a full SCM cell which con= tains
=C2=A0 the actual value.=C2=A0 It can be a relative pointer or an index in = some
=C2=A0 table.

I imagined it working by dividing the heap into zones whose size
corresponds to the "reachability" of an SCM/2 relative pointer (s= o an
SCM/2 field can point to any other object within the same zone without
going through a proxy), and have each zone come with an array
of proxies.

I was thinking about this mostly in the context of 64bit boxes where the SCM/2 references would basically only ever need to go through proxies once<= br> the process size passes the 4GB limit.=C2=A0 But for small-memory systems using 32bit pointers, the same idea might work as well.

Interestingly, this split into zones+proxies would also allow the GC to
operate on each zone independently (you don't need to trace the objects=
in the other zones, only their proxies).


=C2=A0 =C2=A0 =C2=A0 =C2=A0 Stefan



--047d7bd755eefb6a22051ad5ade0--