From: Stefan Monnier <monnier@iro.umontreal.ca>
To: Tom Tromey <tromey@redhat.com>
Cc: Helmut Eller <eller.helmut@gmail.com>, emacs-devel@gnu.org
Subject: Re: indirect threading for bytecode interpreter
Date: Thu, 17 Sep 2009 20:59:38 -0400 [thread overview]
Message-ID: <jwvtyz1yth1.fsf-monnier+emacs@gnu.org> (raw)
In-Reply-To: <m3ljkdi8m1.fsf@fleche.redhat.com> (Tom Tromey's message of "Thu, 17 Sep 2009 15:06:46 -0600")
Helmut> 5% doesn't sound like a lot to some people.
> Shrug. Obviously I think the tradeoff is worth it, or I would not have
> sent the patch. I don't think the result is all that ugly. And,
> importantly, it is very low-hanging fruit.
I agree it doesn't seem that ugly. Looking at
http://lists.gnu.org/archive/html/emacs-devel/2004-05/txt1OKi7Cs5BI.txt
again, I like his use of
# define OPLABL(X) [X] = &&lbl_ ##X
to initialize the table, making sure that it's initialized correctly
(no need for your sanity checks).
Helmut> vmgen sounds like a good idea, but I fear that it makes the build
Helmut> process quite a bit more complicated.
> You can check in the generated code.
IIUC the code it generates may depend on the platform (tho, it can
probably output platform-independent code as well, I guess).
> vmgen is a nice idea.
Yes, and it could bring yet more optimization tricks, for free.
> I rejected writing this as a direct-threaded interpreter because
> I assumed that the added memory use would be a bad tradeoff. But, if
> you are interested in that, perhaps I could take a stab at it.
I think a direct-threaded interpreter would take a bit more work,
because you need to replace the bytecode with "word-code". I don't know
how much of an impact it would have on memory use.
Helmut> I'm wondering why gcc can't perform this transformation from the
Helmut> switch based code. Is there no compiler setting to skip the
Helmut> range check in the switch statement?
> It isn't about range checking but about eliminating a jump during the
> dispatch.
Actually, IIRC Anton Ertl (vmgen's author) has some articles which
indicate that a big part of the win isn't just the removal of some
instructions, but more importantly the multiplication of "jump to next
target": instead of having only 1 computed jump to the next byte-code
target (plus N jumps back to the starting point), you have N computed
jumps, so each one can be predicted independently. The single computed
jump in gcc's output code is terribly difficult for the CPU to predict,
leading to lots and lots of cycles wasted due to mispredictions.
The N computed jumps aren't very easy to predict either, but some of
them at least are a bit easier, because some sequences of byte-code are
more common than others, so the CPU's jump prediction can fail a bit
less often, leading to fewer wasted cycles.
Anton had some experiments where he duplicated some byte-codes, and
showed that it could also improve performance (again, by making the
jumps more predictable: the actual executed instructions were exactly
identical).
> GCC could be taught to do this. I imagine that it has always been
> simpler for people to just update their interpreter than it has been to
> try to fix GCC.
IIRC, some people experimented with gcc to teach it to do this kind of
copy the initial jump to the end of each block, but IIRC it was
difficult for gcc to tell automatically when it was a good idea and when
it wasn't. After all, by duplicating this code, you increase the code
size, and if each branch's prediction is pretty much identical, you
might be better off with a single jump so the prediction data from one
branch helps the other branches as well (as so it doesn't use up as
much space in the jump prediction table).
For interpreters it's almost always a good thing to do, because a lot of
execution time will be spent in this loop+switch, but in general it's
not that clear cut.
> I don't think that some possible future GCC change should affect whether
> this patch goes in.
No, indeed.
Stefan
next prev parent reply other threads:[~2009-09-18 0:59 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-17 16:46 indirect threading for bytecode interpreter Tom Tromey
2009-09-17 18:21 ` Stefan Monnier
2009-09-17 19:20 ` Helmut Eller
2009-09-17 19:38 ` Tom Tromey
2009-09-17 20:41 ` Helmut Eller
2009-09-17 21:06 ` Tom Tromey
2009-09-17 22:48 ` Helmut Eller
2009-09-18 0:59 ` Stefan Monnier [this message]
2009-09-18 2:59 ` Tom Tromey
2009-09-17 20:43 ` Stefan Monnier
2009-09-17 20:57 ` Tom Tromey
2009-09-18 19:15 ` Dan Nicolaescu
2009-09-18 20:26 ` Tom Tromey
2009-09-21 1:58 ` Dan Nicolaescu
2009-09-21 3:17 ` Tom Tromey
2009-09-21 13:13 ` Stefan Monnier
2009-09-21 13:21 ` David Kastrup
2009-09-21 13:47 ` joakim
2009-09-21 14:46 ` Stefan Monnier
2009-09-21 14:27 ` Tom Tromey
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=jwvtyz1yth1.fsf-monnier+emacs@gnu.org \
--to=monnier@iro.umontreal.ca \
--cc=eller.helmut@gmail.com \
--cc=emacs-devel@gnu.org \
--cc=tromey@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.