Hi interesting thoughts Mark, The below refere to to (very) simplistic vm compiler I'm working on right now. The current overhead found in function call's, function setup and function return means that it's hard to factor in the lowest level of optimizations. I view the call overhead so large that function call's is dispatched to gosub routines and not inlined. E.g. it is very close in spirit to a VM call. On the other hand the compiler can be improved and and some point in the future function call's might be so fast (especially for functions where we can prove features) so that your ideas can be a real boon. I will try to do my best though to implement some of your ideas at a second rewrite of the compiler. but at the first step I care to make sure to inline for example + - ash etc. so that they are fast on fixnums, that the branching is done natively list processing is fast and a large enough set of translations of VM operations is ready so that we can translate most of the code in guile. This can increase the speed at the first step with a factor of maybe 3-5. We can then continue to work from there. I would also like to say that the current rtl call overhead is less then the old stable-2.0 versions so plain rtl VM will be faster in this respect. Also to note is that, by the nature of the new VM, a simple compilation might yield less of an advantage then the stable-2.0 VM. The reason is that it looks like many operations in the RTL VM does more things per operations - a boon for it's speed because those things will mean that we don't gain as much on a native compilation of the RTL VM as of the stable-2.0 VM. A though: assert_nargs_ee reserve_locals assert_nargs_ee_locals bind_rest bind_kwargs Could we not implement this logic in the call instructions? /Stefan On Fri, Aug 3, 2012 at 4:29 AM, Mark H Weaver wrote: > Hi Andy, thanks for the update! Exciting times for Guile :) > > > On 08/02/2012 10:29 AM, Andy Wingo wrote: > >> Instead I'd rather just use Dybvig's suggestion: every call instruction >> is preceded by an MV return address. For e.g. (values (f)), calling `f' >> would be: >> >> ... >> goto CALL >> MVRA: >> truncate-and-jump RA >> CALL: >> call f >> RA: >> return >> >> So the overhead of multiple values in the normal single-value case is >> one jump per call. When we do native compilation, this cost will be >> negligible. OTOH for MV returns, we return to a different address than >> the one on the stack, which will cause a branch misprediction (google >> "return stack buffers" for more info). >> > > I wonder if it might be better to avoid this branch misprediction by > always returning to the same address. Upon return, a special register > would contain N-1, where N is the number of return values. The first few > return values would also be stored in registers (hopefully at least two), > and if necessary the remaining values would be stored elsewhere, perhaps on > the stack or in a list or vector pointed to by another register. > > In the common case where a given call site expects a small constant number > of return values, the compiler could emit a statically-predicted > conditional branch to verify that N-1 is the expected value (usually zero), > and then generate code that expects to find the return values in the > appropriate registers. > > On some architectures, it might also make sense for the callee to set the > processor's "zero?" condition code as if N-1 had been tested, to allow for > a shorter check in the common single-value case. > > Of course, the calling convention can be chosen independently for each > instruction set architecture / ABI. > > What do you think? > > Mark > > >