unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
From: Stefan Israelsson Tampe <stefan.itampe@gmail.com>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: guile-devel <guile-devel@gnu.org>
Subject: Re: Imporved cons representatoin in guile
Date: Tue, 14 Jul 2015 15:18:52 +0200	[thread overview]
Message-ID: <CAGua6m1uwN9zqD-X9SLKvfrdu7Wf+_VEp267G7ZpU98qwbXbdw@mail.gmail.com> (raw)
In-Reply-To: <jwv7fq8q6f6.fsf-monnier+gmane.lisp.guile.devel@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 5619 bytes --]

I think that getting cons cells to work with this is the easiest. With this
one can try to benchmark the change and investigate.

I think that I know enough of guile internals to make this work in a
branch. I can then post for anybody to try out. If it is a good idea to try
this let me know and I take a stab at it.

I think that getting this to work on more datastructures is a bit tricky.
One could for example use the current markings only on the proxied
datastructures but then one needs a huge commit to make it work.

Maybe you suggested this, But I think that the best would be to use two gc
memory regions one for the current mark system and one for the compressed
data structures where relative references is correctly followed. The
drawback with this is that we need to double the thread caches of lists of
free memory blocks. I know that this works cause I dabbled with multiple
memory types in guile-log when making the prolog variables gc-able. This
approach also enables an incremental development transitions of
datastructures suitable for compression.

> you'll want the GC to know about them, so it tries to keep/bring
> related objects near each other.

does bdw-gc do this on current cons cells?

> I imagined it working by dividing the heap into zones whose size
> corresponds to the "reachability" of an SCM/2 relative pointer (so an
> SCM/2 field can point to any other object within the same zone without
> going through a proxy), and have each zone come with an array
> of proxies.

To make this work in a good way is quite difficult. You need bdw-gc to
group the free memory in relation to ther allocation block in order to
simulate the demon of Maxwell if this is not currently done you need quite
a hack into the bdw-gc code to make it work.

Especially constants are loaded via installing pages and here I suspect
that we can get problems with a lot of references going out of bounds. When
it comes with the heap memory from the gc would that not mostly be within
30 bit relative adresses? anyhow I agree for large programs of aroung 1GB
we would start to see more and more proxied datastructures ending with
somthing that is unoptimal. I imagined that any solution to guile that
incorporated a compresion feature would need the possibility via a
configuration option to use the old uncompressed system. Especially it
would be questionable to have large datastructures like vectors to be
compressed cause the although small but positive probability of referencing
constants.


>I was thinking about this mostly in the context of 64bit boxes where the
>SCM/2 references would basically only ever need to go through proxies once
>the process size passes the 4GB limit.  But for small-memory systems
>using 32bit pointers, the same idea might work as well.

Yeah, the information slack is quite high on these Boxes we need to make
sure not to force this feature on the 32bit ones.

> Interestingly, this split into zones+proxies would also allow the GC to
> operate on each zone independently (you don't need to trace the objects
> in the other zones, only their proxies).

Not sure I follow, but marking a proxy implies marking in the other zone so
I can't see this atm.

What is not too difficult is to be able to have say 10 memory regions and
then when you work with scheme code of large applications you could do,

(with-mem-region 8 thunk)

With the meaning that all allocation comes from region 8. This is a
complexity code wise, but could be something that can be profiled out and
just used on the hot code paths and datastructures. Now I'm not sure how
well bdw-gc handles this but I think that controlling the adress ranges of
allocated blocks is a relative small change to the code base, the issue
here is probably portability.

Ok That's all for now,
Cheers!



On Fri, Jul 10, 2015 at 7:31 PM, Stefan Monnier <monnier@iro.umontreal.ca>
wrote:

> > To compress even further we need a way to could use
> > x ->[SCM/2/SCM/2], witt SCM/2 the same tagging half the size as the
> normal
>
> Note that this is not specific to cons-cells.  I've been meaning to try
> and experiment with such a box-compression scheme for several years, but
> it never got high enough on the todo list (tho I did get a student to
> look at it, but he gave up his MSc before getting far enough).
>
> Note that because of set-car! and set-cdr! you need those SCM/2 fields
> to be able to hold *any* value.  So an SCM/2 field can hold 3 different
> things:
> - an immediate value.
> - a compressed reference, typically using relative addressing, where
>   you'll want the GC to know about them, so it tries to keep/bring
>   related objects near each other.
> - a "proxy reference", which identifies a full SCM cell which contains
>   the actual value.  It can be a relative pointer or an index in some
>   table.
>
> I imagined it working by dividing the heap into zones whose size
> corresponds to the "reachability" of an SCM/2 relative pointer (so an
> SCM/2 field can point to any other object within the same zone without
> going through a proxy), and have each zone come with an array
> of proxies.
>
> I was thinking about this mostly in the context of 64bit boxes where the
> SCM/2 references would basically only ever need to go through proxies once
> the process size passes the 4GB limit.  But for small-memory systems
> using 32bit pointers, the same idea might work as well.
>
> Interestingly, this split into zones+proxies would also allow the GC to
> operate on each zone independently (you don't need to trace the objects
> in the other zones, only their proxies).
>
>
>         Stefan
>
>
>

[-- Attachment #2: Type: text/html, Size: 9002 bytes --]

      reply	other threads:[~2015-07-14 13:18 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-10 11:20 Imporved cons representatoin in guile Stefan Israelsson Tampe
2015-07-10 11:37 ` Stefan Israelsson Tampe
2015-07-10 17:31 ` Stefan Monnier
2015-07-14 13:18   ` Stefan Israelsson Tampe [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGua6m1uwN9zqD-X9SLKvfrdu7Wf+_VEp267G7ZpU98qwbXbdw@mail.gmail.com \
    --to=stefan.itampe@gmail.com \
    --cc=guile-devel@gnu.org \
    --cc=monnier@iro.umontreal.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).