From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: indirect threading for bytecode interpreter Date: Thu, 17 Sep 2009 20:59:38 -0400 Message-ID: References: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1253235599 22357 80.91.229.12 (18 Sep 2009 00:59:59 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 18 Sep 2009 00:59:59 +0000 (UTC) Cc: Helmut Eller , emacs-devel@gnu.org To: Tom Tromey Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Sep 18 02:59:52 2009 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1MoRpG-0006oz-Co for ged-emacs-devel@m.gmane.org; Fri, 18 Sep 2009 02:59:50 +0200 Original-Received: from localhost ([127.0.0.1]:44292 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MoRpF-0004Gt-9y for ged-emacs-devel@m.gmane.org; Thu, 17 Sep 2009 20:59:49 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1MoRpA-0004Go-Rl for emacs-devel@gnu.org; Thu, 17 Sep 2009 20:59:44 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1MoRp6-0004Gc-6j for emacs-devel@gnu.org; Thu, 17 Sep 2009 20:59:44 -0400 Original-Received: from [199.232.76.173] (port=48552 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MoRp6-0004GZ-37 for emacs-devel@gnu.org; Thu, 17 Sep 2009 20:59:40 -0400 Original-Received: from ironport2-out.teksavvy.com ([206.248.154.181]:6185 helo=ironport2-out.pppoe.ca) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1MoRp5-0004D1-MH for emacs-devel@gnu.org; Thu, 17 Sep 2009 20:59:39 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApsEADN4skpLd/GK/2dsb2JhbACBU9NShBwFhEyDMg X-IronPort-AV: E=Sophos;i="4.44,406,1249272000"; d="scan'208";a="45924099" Original-Received: from 75-119-241-138.dsl.teksavvy.com (HELO pastel.home) ([75.119.241.138]) by ironport2-out.pppoe.ca with ESMTP; 17 Sep 2009 20:59:38 -0400 Original-Received: by pastel.home (Postfix, from userid 20848) id 27AB3827E; Thu, 17 Sep 2009 20:59:38 -0400 (EDT) In-Reply-To: (Tom Tromey's message of "Thu, 17 Sep 2009 15:06:46 -0600") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1.50 (gnu/linux) X-detected-operating-system: by monty-python.gnu.org: Genre and OS details not recognized. X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:115453 Archived-At: Helmut> 5% doesn't sound like a lot to some people. > Shrug. Obviously I think the tradeoff is worth it, or I would not have > sent the patch. I don't think the result is all that ugly. And, > importantly, it is very low-hanging fruit. I agree it doesn't seem that ugly. Looking at http://lists.gnu.org/archive/html/emacs-devel/2004-05/txt1OKi7Cs5BI.txt again, I like his use of # define OPLABL(X) [X] = &&lbl_ ##X to initialize the table, making sure that it's initialized correctly (no need for your sanity checks). Helmut> vmgen sounds like a good idea, but I fear that it makes the build Helmut> process quite a bit more complicated. > You can check in the generated code. IIUC the code it generates may depend on the platform (tho, it can probably output platform-independent code as well, I guess). > vmgen is a nice idea. Yes, and it could bring yet more optimization tricks, for free. > I rejected writing this as a direct-threaded interpreter because > I assumed that the added memory use would be a bad tradeoff. But, if > you are interested in that, perhaps I could take a stab at it. I think a direct-threaded interpreter would take a bit more work, because you need to replace the bytecode with "word-code". I don't know how much of an impact it would have on memory use. Helmut> I'm wondering why gcc can't perform this transformation from the Helmut> switch based code. Is there no compiler setting to skip the Helmut> range check in the switch statement? > It isn't about range checking but about eliminating a jump during the > dispatch. Actually, IIRC Anton Ertl (vmgen's author) has some articles which indicate that a big part of the win isn't just the removal of some instructions, but more importantly the multiplication of "jump to next target": instead of having only 1 computed jump to the next byte-code target (plus N jumps back to the starting point), you have N computed jumps, so each one can be predicted independently. The single computed jump in gcc's output code is terribly difficult for the CPU to predict, leading to lots and lots of cycles wasted due to mispredictions. The N computed jumps aren't very easy to predict either, but some of them at least are a bit easier, because some sequences of byte-code are more common than others, so the CPU's jump prediction can fail a bit less often, leading to fewer wasted cycles. Anton had some experiments where he duplicated some byte-codes, and showed that it could also improve performance (again, by making the jumps more predictable: the actual executed instructions were exactly identical). > GCC could be taught to do this. I imagine that it has always been > simpler for people to just update their interpreter than it has been to > try to fix GCC. IIRC, some people experimented with gcc to teach it to do this kind of copy the initial jump to the end of each block, but IIRC it was difficult for gcc to tell automatically when it was a good idea and when it wasn't. After all, by duplicating this code, you increase the code size, and if each branch's prediction is pretty much identical, you might be better off with a single jump so the prediction data from one branch helps the other branches as well (as so it doesn't use up as much space in the jump prediction table). For interpreters it's almost always a good thing to do, because a lot of execution time will be spent in this loop+switch, but in general it's not that clear cut. > I don't think that some possible future GCC change should affect whether > this patch goes in. No, indeed. Stefan