I think that getting cons cells to work with this is the easiest. With this one can try to benchmark the change and investigate.

I think that I know enough of guile internals to make this work in a branch. I can then post for anybody to try out. If it is a good idea to try this let me know and I take a stab at it.

I think that getting this to work on more datastructures is a bit tricky. One could for example use the current markings only on the proxied datastructures but then one needs a huge commit to make it work.

Maybe you suggested this, But I think that the best would be to use two gc memory regions one for the current mark system and one for the compressed data structures where relative references is correctly followed. The drawback with this is that we need to double the thread caches of lists of free memory blocks. I know that this works cause I dabbled with multiple memory types in guile-log when making the prolog variables gc-able. This approach also enables an incremental development transitions of datastructures suitable for compression.

> you'll want the GC to know about them, so it tries to keep/bring
> related objects near each other.

does bdw-gc do this on current cons cells?

> I imagined it working by dividing the heap into zones whose size
> corresponds to the "reachability" of an SCM/2 relative pointer (so an
> SCM/2 field can point to any other object within the same zone without
> going through a proxy), and have each zone come with an array
> of proxies.

To make this work in a good way is quite difficult. You need bdw-gc to group the free memory in relation to ther allocation block in order to simulate the demon of Maxwell if this is not currently done you need quite a hack into the bdw-gc code to make it work.

Especially constants are loaded via installing pages and here I suspect that we can get problems with a lot of references going out of bounds. When it comes with the heap memory from the gc would that not mostly be within 30 bit relative adresses? anyhow I agree for large programs of aroung 1GB we would start to see more and more proxied datastructures ending with somthing that is unoptimal. I imagined that any solution to guile that incorporated a compresion feature would need the possibility via a configuration option to use the old uncompressed system. Especially it would be questionable to have large datastructures like vectors to be compressed cause the although small but positive probability of referencing constants.

>I was thinking about this mostly in the context of 64bit boxes where the
>SCM/2 references would basically only ever need to go through proxies once
>the process size passes the 4GB limit. But for small-memory systems
>using 32bit pointers, the same idea might work as well.

Yeah, the information slack is quite high on these Boxes we need to make sure not to force this feature on the 32bit ones.

> Interestingly, this split into zones+proxies would also allow the GC to
> operate on each zone independently (you don't need to trace the objects
> in the other zones, only their proxies).

Not sure I follow, but marking a proxy implies marking in the other zone so I can't see this atm.

What is not too difficult is to be able to have say 10 memory regions and then when you work with scheme code of large applications you could do,

(with-mem-region 8 thunk)

With the meaning that all allocation comes from region 8. This is a complexity code wise, but could be something that can be profiled out and just used on the hot code paths and datastructures. Now I'm not sure how well bdw-gc handles this but I think that controlling the adress ranges of allocated blocks is a relative small change to the code base, the issue here is probably portability.

Ok That's all for now,

Cheers!

On Fri, Jul 10, 2015 at 7:31 PM, Stefan Monnier <monnier@iro.umontreal.ca> wrote:

> To compress even further we need a way to could use
> x ->[SCM/2/SCM/2], witt SCM/2 the same tagging half the size as the normal

Note that this is not specific to cons-cells. I've been meaning to try
and experiment with such a box-compression scheme for several years, but
it never got high enough on the todo list (tho I did get a student to
look at it, but he gave up his MSc before getting far enough).

Note that because of set-car! and set-cdr! you need those SCM/2 fields
to be able to hold *any* value. So an SCM/2 field can hold 3 different
things:
- an immediate value.
- a compressed reference, typically using relative addressing, where
you'll want the GC to know about them, so it tries to keep/bring
related objects near each other.
- a "proxy reference", which identifies a full SCM cell which contains
the actual value. It can be a relative pointer or an index in some
table.

I imagined it working by dividing the heap into zones whose size
corresponds to the "reachability" of an SCM/2 relative pointer (so an
SCM/2 field can point to any other object within the same zone without
going through a proxy), and have each zone come with an array
of proxies.

I was thinking about this mostly in the context of 64bit boxes where the
SCM/2 references would basically only ever need to go through proxies once
the process size passes the 4GB limit. But for small-memory systems
using 32bit pointers, the same idea might work as well.

Interestingly, this split into zones+proxies would also allow the GC to
operate on each zone independently (you don't need to trace the objects
in the other zones, only their proxies).

Stefan