From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Noah Lavine Newsgroups: gmane.lisp.guile.devel Subject: Re: Register VM WIP Date: Wed, 16 May 2012 10:54:37 -0400 Message-ID: References: <871umqr8q0.fsf@pobox.com> <873972zczy.fsf@gnu.org> <87bolpmgew.fsf@pobox.com> <871umkbvp3.fsf@netris.org> <87fwb0k35g.fsf@pobox.com> <87sjf09r5v.fsf@netris.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: dough.gmane.org 1337180097 18854 80.91.229.3 (16 May 2012 14:54:57 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Wed, 16 May 2012 14:54:57 +0000 (UTC) Cc: Andy Wingo , =?ISO-8859-1?Q?Ludovic_Court=E8s?= , guile-devel@gnu.org To: Mark H Weaver Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Wed May 16 16:54:56 2012 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1SUfct-00077h-BC for guile-devel@m.gmane.org; Wed, 16 May 2012 16:54:55 +0200 Original-Received: from localhost ([::1]:60298 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SUfcs-0003Il-NJ for guile-devel@m.gmane.org; Wed, 16 May 2012 10:54:54 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:56303) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SUfch-0003IP-6A for guile-devel@gnu.org; Wed, 16 May 2012 10:54:52 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SUfce-00031j-Pc for guile-devel@gnu.org; Wed, 16 May 2012 10:54:42 -0400 Original-Received: from mail-gg0-f169.google.com ([209.85.161.169]:49952) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SUfce-00031G-HJ; Wed, 16 May 2012 10:54:40 -0400 Original-Received: by ggm4 with SMTP id 4so954577ggm.0 for ; Wed, 16 May 2012 07:54:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=TIBBcydmHulORCbYEN5rsc9+agULMpbkXR/7Arr9oag=; b=QL+ykUG0ZXHUMjlk/uYDlQ9LGRbsy9+DkuPNrBM7lkkJzLzHOQJhKG046seJXVZ1c3 GegXr2Nb32iV0dBS/aS+FIxkwLp/7Rnz0ZCOKM5lb7mhh6LyLfaPBT89EwgSCyBWzyqv oXOVQkMg+8LWXA64XzlSKD5/wGGNls2IRoNbMQH+FBWokr61KgY+fijhXv0a56gKhHxc 1oH3AyhLw8YvFL1dBZ7AWgTnnOV3R3QC683UpCmfhoivOSkmgFDe9faVcIFBsEROnFd4 fsKKoUELzK2Q21YM2CP2MT8G/ZZeylX3pm7x084SBQ1cnNChkqRyqLriaSdqcm7R+kRf kvwQ== Original-Received: by 10.50.158.167 with SMTP id wv7mr2474103igb.7.1337180077650; Wed, 16 May 2012 07:54:37 -0700 (PDT) Original-Received: by 10.42.29.200 with HTTP; Wed, 16 May 2012 07:54:37 -0700 (PDT) In-Reply-To: <87sjf09r5v.fsf@netris.org> X-Google-Sender-Auth: RPldERto7VvE17D7mcmUkYm3geg X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 209.85.161.169 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:14466 Archived-At: Hi Mark, You are thinking along very similar lines to how I used to think. But I have a different way to think about it that might make it seem better. In our current VM, we have two stacks: the local-variable stack, which has frames for different function calls and is generally what you'd think of as a stack, and the temporary-variable stack, which is literally a stack in the sense that you only operate on the top of it. The temporary-variable stack makes us do a lot of unnecessary work, because we have to load things from the local-variable stack to the temporary-variable stack. I think what Andy is proposing to do is to get rid of the temporary-variable stack and operate directly on the local-variable stack. We shouldn't think of these registers as being like machine registers, and in fact maybe "registers" is not a good name for these objects. They are really just variables in the topmost stack frame. This should only reduce memory usage, because the local-variable stack stays the same and the temporary-variable stack goes away (some temporaries might move to the local-variable stack, but it can't be more than were on the temporary-variable stack, so that's still a win). The machine I was initially thinking of, and I imagine you were too, is different. I had imagined a machine where the number of registers was limited, ideally to the length of a processor cache line, and was separate from the local-variables stack. In such a machine, the registers are used as a cache for the local variables, and you get to deal with all the register allocation problems that a standard compiler would. That would accomplish the goal of keeping more things in cache. The "registers as cache" idea may result in faster code than the "directly addressing local variables" idea, but it's also more complicated to implement. So it makes sense to me that we would try directly addressing local variables first, and maybe later move to using a fixed-size cache of registers. It also occurs to me that the RTL intermediate language, which is really just a language for directly addressing an arbitrary number of local variables, is a standard compiler intermediate language. So it might be useful to have that around anyway, because we could more easily feed its output into, for instance, GCC. Andy, is this an accurate description of the register VM? And Mark and everyone else, does it seem better when you look at it this way? Noah On Wed, May 16, 2012 at 9:44 AM, Mark H Weaver wrote: > Hi Andy! > > Andy Wingo writes: >> On Wed 16 May 2012 06:23, Mark H Weaver writes: >> >>> It's surprising to me for another reason: in order to make the >>> instructions reasonably compact, only a limited number of bits are >>> available in each instruction to specify which registers to use. >> >> It turns out that being reasonably compact isn't terribly important -- >> more important is the number of opcodes it takes to get something done, >> which translates to the number of dispatches. =A0Have you seen the "dire= ct >> threading" VM implementation strategy? =A0In that case the opcode is not >> an index into a jump table, it's a word that encodes the pointer >> directly. =A0So it's a word wide, just for the opcode. =A0That's what >> JavaScriptCore does, for example. =A0The opcode is a word wide, and each >> operand is a word as well. >> >> The design of the wip-rtl VM is to allow 16M registers (24-bit >> addressing). =A0However many instructions can just address 2**8 register= s >> (8-bit addressing) or 2**12 registers (12-bit addressing). =A0We will >> reserve registers 253 to 255 as temporaries. =A0If you have so many >> registers as to need more than that, then you have to shuffle operands >> down into the temporaries. =A0That's the plan, anyway. > > I'm very concerned about this design, for the same reason that I was > concerned about NaN-boxing on 32-bit platforms. =A0Efficient use of memor= y > is extremely important on modern architectures, because of the vast (and > increasing) disparity between cache speed and RAM speed. =A0If you can fi= t > the active set into the cache, that often makes a profound difference in > the speed of a program. > > I agree that with VMs, minimizing the number of dispatches is crucial, > but beyond a certain point, having more registers is not going to save > you any dispatches, because they will almost never be used anyway. > 2^12 registers is _far_ beyond that point. > > As I wrote before concerning NaN-boxing, I suspect that the reason these > memory-bloated designs are so successful in the JavaScript world is that > they are specifically optimized for use within a modern web browser, > which is already a memory hog anyway. =A0Therefore, if the language > implementation wastes yet more memory it will hardly be noticed. > > If I were designing this VM, I'd work hard to allow as many loops as > possible to run completely in the cache. =A0That means that three things > have to fit into the cache together: the VM itself, the user loop code, > and the user data. =A0IMO, the sum of these three things should be made a= s > small as possible. > > I certainly agree that we should have a generous number of registers, > but I suspect that the sweet spot for a VM is 256, because it enables > more compact dispatching code in the VM, and yet is more than enough to > allow a decent register allocator to generate good code. > > That's my educated guess anyway. =A0Feel free to prove me wrong :) > > =A0 =A0Regards, > =A0 =A0 =A0Mark