unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* compile-rtl
@ 2012-10-14 15:13 Stefan Israelsson Tampe
  2013-01-21 17:32 ` compile-rtl Andy Wingo
  0 siblings, 1 reply; 5+ messages in thread
From: Stefan Israelsson Tampe @ 2012-10-14 15:13 UTC (permalink / raw)
  To: guile-devel

[-- Attachment #1: Type: text/plain, Size: 4035 bytes --]

Hi,

I'm right now trying to port compile-glil to compile-rtl and I would say
that what's hindering me is
what design choices to take?

1.
The main problem is that we do not have a stack pointer the same way as
before. Of cause we still need
to store temporary values and we will simply have to use predefined local
register indices, that's not hard to implement though.
The problem is that we will have memory not cleared that's not going to be
GC:ed until the
frame is changed. This will cause some potential memory leaks. To let it be
as it is designed right now, will mean that it is
very difficult looking at the scheme code to see what will be protected
from gc or not. And I fear that we will need to handle
this in some smart way. The solutions are either, keep a stack pointer in a
register and updated it for each push and pop
as before. Or do what we do after let forms, clear the variable slots  -
expensive! What I think is
that we need to have the equivalent of a stack pointer in order to properly
handle gc'ing. We could still have a register approach
and gain quite some speed bump (2x) from using direct addressing in stead
of using the middle layer of the stack for everything.

My vote is introduce a register sp again!

2.
Now when we are tail-calling in rtl, we just fill the argument slots with
the new function arguments directly and then tail-call by filling in
number of arguments and function. This is very smart and just some simple
features added would mean that a lot of translation
from one memory location to the other is skipped. I really like how the rtl
code handle this but there is an expense. It complicates life a little
when we overwrite arguments that are used in the calculation of other
arguments. I'm working on how to handle this but I just wanted to point
out how nice this design choice is.

3.
call's, here currently we send a list of register positions to the call
function and the call function itself will copy those to the argument
fields. This is the opposite
of the tail-call, in a case like (f 1 3) you will create two temporary
variables and copy them later on to the argument position. Here I would
like to change the
semantic so that we fill in the arguments directly just like in the
tail-call. Also, I would like to use the end of the stack region to compute
the function frame. Well even if we use the end of the full frame we will
save one move per argument which is nice.

4.
Return values.
We talked about this before and there is some concerns how to make this
fast. I would like to skip the complication by simply put the return values
in the
argument positions to the function. Also the number of arguments is
mediated to the reciever. I would like to skip the mv-return slot and add a
rmove function
like,

(call     pos proc)
(rmove pos (i ...))

And it is rmove's responsibility to move the arguments to it's return
positions, also it's possible for functions calling functions
to just clear the function slot and keep the values for later use by simply
increasing the stack to contain the return value(s).
this means that we can keep the copying to the minimal (which is one of the
big costs). Also keeping it this way lead to quite some smooth
handling of code that evaluates a function on the return values, one just
need to fill in the function and return ip and evaluate. cool.
The downside is for native code where we may transfer things back in
computer registers. But the overhead of function call's are of such dignity
that
the copying of one value is not of importance even for natively compiled
functions, The upfront cost of handling functions is pretty high. Note that
the improvement's that I suggest have two properties, 1. they are
composable, 2. they scale well in the number of arguments and return values.
My feeling is that speed bumps due to function overhead can be adressed by
a) Inlining
b) A well designed interface of user defined instructions
c) Develop separate function call mechanisms that a compiler can deduce and
use.

WDYT
/Stefan

[-- Attachment #2: Type: text/html, Size: 4279 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: compile-rtl
  2012-10-14 15:13 compile-rtl Stefan Israelsson Tampe
@ 2013-01-21 17:32 ` Andy Wingo
  2013-01-21 19:01   ` compile-rtl Stefan Israelsson Tampe
  2013-01-26 12:28   ` compile-rtl Stefan Israelsson Tampe
  0 siblings, 2 replies; 5+ messages in thread
From: Andy Wingo @ 2013-01-21 17:32 UTC (permalink / raw)
  To: Stefan Israelsson Tampe; +Cc: guile-devel

On Sun 14 Oct 2012 17:13, Stefan Israelsson Tampe <stefan.itampe@gmail.com> writes:

> potential memory leaks. To let it be as it is designed right now, will
> mean that it is very difficult looking at the scheme code to see what
> will be protected from gc or not.

Explicit clearing is much better than a stack pointer.  We can be much
smarter about it.

> Now when we are tail-calling in rtl, we just fill the argument slots
> with the new function arguments directly and then tail-call by filling
> in
> number of arguments and function. This is very smart and just some
> simple features added would mean that a lot of translation
> from one memory location to the other is skipped. I really like how the
> rtl code handle this but there is an expense. It complicates life a
> little
> when we overwrite arguments that are used in the calculation of other
> arguments. I'm working on how to handle this but I just wanted to point
> out how nice this design choice is.

Thanks!  In general at the end you have a parallel register move (or
parallel assignment), which has a number of standard solutions out
there.

> call's, here currently we send a list of register positions to the call
> function and the call function itself will copy those to the argument
> fields. This is the opposite
> of the tail-call, in a case like (f 1 3) you will create two temporary
> variables and copy them later on to the argument position. Here I would
> like to change the 
> semantic so that we fill in the arguments directly just like in the
> tail-call.

We can't do this unless we reintroduce a stack pointer, I don't think;
and anyway reopening the hole for the in-progress frame is pretty nasty
for the runtime (the backtrace code in particular).  Dunno.  I'm
hesitant to change this.

> Also, I would like to use the end of the stack region to compute the
> function frame. Well even if we use the end of the full frame we will
> save one move per argument which is nice.

This is possible I guess, but again makes tooling a bit difficult.

> 4.
> Return values.
> We talked about this before and there is some concerns how to make this
> fast. I would like to skip the complication by simply put the return
> values in the
> argument positions to the function. Also the number of arguments is
> mediated to the reciever. I would like to skip the mv-return slot

Have you been working off of wip-rtl after the commit, "remove
return_loc, instead put mv returns on the stack" ?  Does that not solve
the issue for you?  After that commit, there will be no more MVRA
(though it is still there while the VM is in a transitional state).

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: compile-rtl
  2013-01-21 17:32 ` compile-rtl Andy Wingo
@ 2013-01-21 19:01   ` Stefan Israelsson Tampe
  2013-01-26 12:28   ` compile-rtl Stefan Israelsson Tampe
  1 sibling, 0 replies; 5+ messages in thread
From: Stefan Israelsson Tampe @ 2013-01-21 19:01 UTC (permalink / raw)
  To: Andy Wingo; +Cc: guile-devel

[-- Attachment #1: Type: text/plain, Size: 4494 bytes --]

On Mon, Jan 21, 2013 at 6:32 PM, Andy Wingo <wingo@pobox.com> wrote:

> On Sun 14 Oct 2012 17:13, Stefan Israelsson Tampe <stefan.itampe@gmail.com>
> writes:
>
> > potential memory leaks. To let it be as it is designed right now, will
> > mean that it is very difficult looking at the scheme code to see what
> > will be protected from gc or not.
>
> Explicit clearing is much better than a stack pointer.  We can be much
> smarter about it.


You may be right here, But still I had a try of this and coded some
examples with a different
kind of stack pointer that I expect to be really interesting. Please dig a
little further in the backlog
to find an email discussing this. We can take up the discussion then.

>

> Now when we are tail-calling in rtl, we just fill the argument slots
> > with the new function arguments directly and then tail-call by filling
> > in
> > number of arguments and function. This is very smart and just some
> > simple features added would mean that a lot of translation
> > from one memory location to the other is skipped. I really like how the
> > rtl code handle this but there is an expense. It complicates life a
> > little
> > when we overwrite arguments that are used in the calculation of other
> > arguments. I'm working on how to handle this but I just wanted to point
> > out how nice this design choice is.
>
> Thanks!  In general at the end you have a parallel register move (or
> parallel assignment), which has a number of standard solutions out
> there.


I have coded allocation that allocates directly to the tail call slots e.g.
inf needed move away the
affected slots to some safe address, so the generated RTL code will short
circut the move to the tail
call addresses. I must also mention that my compile-rtl code is built ontop
of compile-glil (In order to
get an understanding of all tree-il finesses) And is a bit hacky. I expect
a purer verions to be coded in
the end. Noha have started a try to do this. Meanwhile I can port over any
good ideas from my endavour.



> call's, here currently we send a list of register positions to the call
> > function and the call function itself will copy those to the argument
> > fields. This is the opposite
> > of the tail-call, in a case like (f 1 3) you will create two temporary
> > variables and copy them later on to the argument position. Here I would
> > like to change the
> > semantic so that we fill in the arguments directly just like in the
> > tail-call.
>
> We can't do this unless we reintroduce a stack pointer, I don't think;
> and anyway reopening the hole for the in-progress frame is pretty nasty
> for the runtime (the backtrace code in particular).  Dunno.  I'm
> hesitant to change this.


In my current code I follow your approach but have some flags where I can
add
this feature if we like to investigate this.  To note is that I simulate
the stack pointer in
the rtl register generation, but it never enters explicitly in the code. I
do use some kind of
lightweight bursty stack register mechanism that is pretty cool concept.
But as said before
that is a mail further down in the backlog. Unfourtunately I'm pretty
ignorant of the runtime
issues and Have to educate myself on this issue before I can answer any
better. Good point!



> > Also, I would like to use the end of the stack region to compute the
> > function frame. Well even if we use the end of the full frame we will
> > save one move per argument which is nice.
>
> This is possible I guess, but again makes tooling a bit difficult.


I did not continue further down this path



> > 4.
> > Return values.
> > We talked about this before and there is some concerns how to make this
> > fast. I would like to skip the complication by simply put the return
> > values in the
> > argument positions to the function. Also the number of arguments is
> > mediated to the reciever. I would like to skip the mv-return slot
>
> Have you been working off of wip-rtl after the commit, "remove
> return_loc, instead put mv returns on the stack" ?  Does that not solve
> the issue for you?  After that commit, there will be no more MVRA
> (though it is still there while the VM is in a transitional state).
>

I think I might have taking your idea and implemented it fully in a patched
rtl vm.


I really would like to stress that if you though the aschm stuff is craze,
the compile-rtl
code is crazy^2. But it's a good testbed to try out some context ANd I
learned a lot!


>
> Andy
> --
> http://wingolog.org/
>

[-- Attachment #2: Type: text/html, Size: 6667 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: compile-rtl
  2013-01-21 17:32 ` compile-rtl Andy Wingo
  2013-01-21 19:01   ` compile-rtl Stefan Israelsson Tampe
@ 2013-01-26 12:28   ` Stefan Israelsson Tampe
  2013-01-27 10:28     ` compile-rtl Andy Wingo
  1 sibling, 1 reply; 5+ messages in thread
From: Stefan Israelsson Tampe @ 2013-01-26 12:28 UTC (permalink / raw)
  To: Andy Wingo; +Cc: guile-devel

Andy Wingo <wingo@pobox.com> writes:

>> Now when we are tail-calling in rtl, we just fill the argument slots
>> with the new function arguments directly and then tail-call by filling
>> in
>> number of arguments and function. This is very smart and just some
>> simple features added would mean that a lot of translation
>> from one memory location to the other is skipped. I really like how the
>> rtl code handle this but there is an expense. It complicates life a
>> little
>> when we overwrite arguments that are used in the calculation of other
>> arguments. I'm working on how to handle this but I just wanted to point
>> out how nice this design choice is.
>
> Thanks!  In general at the end you have a parallel register move (or
> parallel assignment), which has a number of standard solutions out
> there.
>

This is quite a natural first step. But especially for loops there is
a similar tail pattern that probably needs to be optimized better w.r.t.
register slots when we compile nativly

/Stefan



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: compile-rtl
  2013-01-26 12:28   ` compile-rtl Stefan Israelsson Tampe
@ 2013-01-27 10:28     ` Andy Wingo
  0 siblings, 0 replies; 5+ messages in thread
From: Andy Wingo @ 2013-01-27 10:28 UTC (permalink / raw)
  To: Stefan Israelsson Tampe; +Cc: guile-devel

On Sat 26 Jan 2013 13:28, Stefan Israelsson Tampe <stefan.itampe@gmail.com> writes:

> Andy Wingo <wingo@pobox.com> writes:
>
>>> Now when we are tail-calling in rtl, we just fill the argument slots
>>> with the new function arguments directly and then tail-call by filling
>>> in
>>> number of arguments and function. This is very smart and just some
>>> simple features added would mean that a lot of translation
>>> from one memory location to the other is skipped. I really like how the
>>> rtl code handle this but there is an expense. It complicates life a
>>> little
>>> when we overwrite arguments that are used in the calculation of other
>>> arguments. I'm working on how to handle this but I just wanted to point
>>> out how nice this design choice is.
>>
>> Thanks!  In general at the end you have a parallel register move (or
>> parallel assignment), which has a number of standard solutions out
>> there.
>
> This is quite a natural first step. But especially for loops there is
> a similar tail pattern that probably needs to be optimized better w.r.t.
> register slots when we compile nativly

There are two things that can help here:

  (1) Tight allocation of registers (VM or native).  Registers should be
      allocated not just in terms of scope but in terms of where they
      are used -- after their last use they are dead and so the register
      is available for some other purpose.  This can allow the loop
      variables in a loop can be updated in-place, without shuffling.

  (2) For the native case, native register allocation.  I've worked with
      linear scan before and it seems pretty good, and we could do it
      directly on the CPS form; Wimmer et al are able to do it on SSA
      form, so I don't see why not.

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-01-27 10:28 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-14 15:13 compile-rtl Stefan Israelsson Tampe
2013-01-21 17:32 ` compile-rtl Andy Wingo
2013-01-21 19:01   ` compile-rtl Stefan Israelsson Tampe
2013-01-26 12:28   ` compile-rtl Stefan Israelsson Tampe
2013-01-27 10:28     ` compile-rtl Andy Wingo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).