Storing Bytecode Offset

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Storing Bytecode Offset
@ 2020-06-30  0:20 Zach Shaftel
  2020-07-11 15:29 ` Tom Tromey
  0 siblings, 1 reply; 4+ messages in thread
From: Zach Shaftel @ 2020-06-30  0:20 UTC (permalink / raw)
  To: emacs-devel; +Cc: Rocky Bernstein

Hello all,

Rocky Bernstein and I have been working on the bytecode interpreter
to save the offset as functions are executed. The plan is to then map
the new information to source code locations once that's in place, for
more helpful backtrace information; but first the offset recording needs
to be done right. We've tested[1] a number of different approaches and
are still weighing the pros and cons of them, so we'd like to hear
others' input.

Rocky has pushed the code to Savannah. In the
feature/zach-soc-funcall-from-bytecode branch, I made a crude transplant
of code from Ffuncall and a few others into exec_byte_code, so that
compiled function calls don't need to pass through Ffuncall. In the case
of lexically scoped compiled functions, exec_byte_code is called
directly. The benefit here is that the offset doesn't have to be made
available in the thread struct (it can be passed as an argument to a
record_in_backtrace_with_offset function), and cutting out the Ffuncall
speeds up bytecode execution in general. According to tests, this
version is faster than the existing byte code interpreter, while still
storing the offset for every frame. But the code on that branch is just
a rough proof-of-concept and isn't 100% accurate. Doing it right would
require a lot of refactoring that could ultimately lead to a dead end.

In the feature/zach-soc-bytecode-in-traceback branch, the offset is
stored in the thread_state struct. Prior to the most recent commit, the
offset was then stashed in the backtrace specbinding frame from
record_in_backtrace, so each bytecode call in the backtrace buffer was
annotated with the offset when the error occurred. This is the ultimate
goal, but the current implementation is flawed and a significant source
of slowdown. Even without that, the code is slow, which is why we hope
there are other ideas or ways to improve. We've tried other changes,
like storing the next backtrace frame in the thread_state to eliminate a
loop to find the pertinent frame, but so far we haven't been able to
achieve acceptable performance in this implementation.

-Zach

------------------------------

[1] We've been testing with scripts at
https://gitlab.com/Zach_S/bench-compare.el. We've collected some data in
the `results` and `data` directories. The compare-benches.el script
concurrently runs elisp-benchmarks.el on supplied Emacs executables and
collects all the results into an org-mode file, like the ones in
`results`. If you try it out, let me know what goes wrong, I'm sure
there are bugs.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Storing Bytecode Offset
  2020-06-30  0:20 Storing Bytecode Offset Zach Shaftel
@ 2020-07-11 15:29 ` Tom Tromey
  2020-07-13  3:27   ` Rocky Bernstein
  0 siblings, 1 reply; 4+ messages in thread
From: Tom Tromey @ 2020-07-11 15:29 UTC (permalink / raw)
  To: Zach Shaftel; +Cc: Rocky Bernstein, emacs-devel

>>>>> "Zach" == Zach Shaftel <zshaftel@gmail.com> writes:

Zach> Rocky has pushed the code to Savannah. In the
Zach> feature/zach-soc-funcall-from-bytecode branch
[...]
Zach> But the code on that branch is just
Zach> a rough proof-of-concept and isn't 100% accurate. Doing it right would
Zach> require a lot of refactoring that could ultimately lead to a dead end.

Could you say more specifically what is wrong on that branch?  What kind
of refactoring is needed?

I think other interpreters out there take this "self-call" approach, and
it seems like a decent idea to try in Emacs.

Zach> In the feature/zach-soc-bytecode-in-traceback branch, the offset is
Zach> stored in the thread_state struct. Prior to the most recent commit, the
Zach> offset was then stashed in the backtrace specbinding frame from
Zach> record_in_backtrace, so each bytecode call in the backtrace buffer was
Zach> annotated with the offset when the error occurred. This is the ultimate
Zach> goal, but the current implementation is flawed and a significant source
Zach> of slowdown.

I took a brief look at it.  One thing I noticed is that the assignment
is done in NEXT:

+#define NEXT UPDATE_OFFSET; goto *(targets[op = FETCH])

However, it seems to me that there cases where this is not needed -- any
bytecode that cannot possibly cause an exception doesn't need to record
the offset.  For example:

	CASE (Bvarref6):
	  op = FETCH;
	varref:
	  {
	    Lisp_Object v1 = vectorp[op], v2;
	    if (!SYMBOLP (v1)
		|| XSYMBOL (v1)->u.s.redirect != SYMBOL_PLAINVAL
		|| (v2 = SYMBOL_VAL (XSYMBOL (v1)), EQ (v2, Qunbound)))
	      v2 = Fsymbol_value (v1);
	    PUSH (v2);
	    NEXT;
	  }

Here, I think the updating only has to be done before the call to
Fsymbol_value.

Or:

	CASE (Bgotoifnil):
	  {
	    Lisp_Object v1 = POP;
	    op = FETCH2;
	    if (NILP (v1))
	      goto op_branch;
	    NEXT;
	  }

This opcode doesn't need to update the offset at all.

This matters because the change is introducing an extra store into
performance-sensitive code.

I had a couple of other ideas for how to micro-optimize this store.
I don't know if they will work, but you might try them.

The updating looks like this:

+#define UPDATE_OFFSET (backtrace_byte_offset = pc - bytestr_data)

If you also stored bytestr_data in the structure, then instead of doing
the subtraction in the hot loop, you could do the subtraction when
computing the location -- which happens more rarely, and which is not
performance-sensitive.

Also, as you said, backtrace_byte_offset actually references a field in
the thread state:

+  /* The offset of the current op of the byte-code function being
+     executed. */
+  int m_backtrace_byte_offset;
+#define backtrace_byte_offset (current_thread->m_backtrace_byte_offset)

You might try changing this to be a pointer, and have it point to a
local variable that is updated by UPDATE_OFFSET.  This might be a bit
faster due to cache effects.

Note that you do *not* want to have it point to the actual "pc" variable
-- if you take the address of "pc", the compiler will probably not put
it in a register.  I mean, you might try it, but I would expect a
performance loss in this case.

Tom

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Storing Bytecode Offset
  2020-07-11 15:29 ` Tom Tromey
@ 2020-07-13  3:27   ` Rocky Bernstein
  2020-07-13  8:01     ` Andrea Corallo via Emacs development discussions.
  0 siblings, 1 reply; 4+ messages in thread
From: Rocky Bernstein @ 2020-07-13  3:27 UTC (permalink / raw)
  To: Tom Tromey, Andrea Corallo; +Cc: Zach Shaftel, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 5948 bytes --]

Thanks for looking over the changes and making suggestions.

On Sat, Jul 11, 2020 at 11:29 AM Tom Tromey <tom@tromey.com> wrote:

> >>>>> "Zach" == Zach Shaftel <zshaftel@gmail.com> writes:
>
> Zach> Rocky has pushed the code to Savannah. In the
> Zach> feature/zach-soc-funcall-from-bytecode branch
> [...]
> Zach> But the code on that branch is just
> Zach> a rough proof-of-concept and isn't 100% accurate. Doing it right
> would
> Zach> require a lot of refactoring that could ultimately lead to a dead
> end.
>
> Could you say more specifically what is wrong on that branch?  What kind
> of refactoring is needed?
>
> I think other interpreters out there take this "self-call" approach, and
> it seems like a decent idea to try in Emacs.
>

Zach: the code added to exec_byte_code in that branch is almost identical to
the contents of Ffuncall, with the exception of the lexical bytecode
special case. That part is just a copy of some of funcall_lambda and
fetch_and_exec_byte_code. That doesn't seem to be a problem in terms of
performance or functionality (so far) but the code duplication is
obviously not ideal. I don't see any way to avoid this without some big
changes to the whole funcall call graph.

That being said, Zach would love to have this in the bytecode interpreter,
so he'll
would gladly make those changes if this ends up being the best way
forward.  However unless it is necessary now, it would probably be
done after the Summer of Code project when we hope to have more of the
complete solution done.

>
> Zach> In the feature/zach-soc-bytecode-in-traceback branch, the offset is
> Zach> stored in the thread_state struct. Prior to the most recent commit,
> the
> Zach> offset was then stashed in the backtrace specbinding frame from
> Zach> record_in_backtrace, so each bytecode call in the backtrace buffer
> was
> Zach> annotated with the offset when the error occurred. This is the
> ultimate
> Zach> goal, but the current implementation is flawed and a significant
> source
> Zach> of slowdown.
>
> I took a brief look at it.  One thing I noticed is that the assignment
> is done in NEXT:
>
> +#define NEXT UPDATE_OFFSET; goto *(targets[op = FETCH])
>
> However, it seems to me that there cases where this is not needed -- any
> bytecode that cannot possibly cause an exception doesn't need to record
> the offset.  For example:
>
>         CASE (Bvarref6):
>           op = FETCH;
>         varref:
>           {
>             Lisp_Object v1 = vectorp[op], v2;
>             if (!SYMBOLP (v1)
>                 || XSYMBOL (v1)->u.s.redirect != SYMBOL_PLAINVAL
>                 || (v2 = SYMBOL_VAL (XSYMBOL (v1)), EQ (v2, Qunbound)))
>               v2 = Fsymbol_value (v1);
>             PUSH (v2);
>             NEXT;
>           }
>
> Here, I think the updating only has to be done before the call to
> Fsymbol_value.
>
> Or:
>
>         CASE (Bgotoifnil):
>           {
>             Lisp_Object v1 = POP;
>             op = FETCH2;
>             if (NILP (v1))
>               goto op_branch;
>             NEXT;
>           }
>
> This opcode doesn't need to update the offset at all.
>
> This matters because the change is introducing an extra store into
> performance-sensitive code.
>

Absolutely correct. There is about a 7% slowdown and we've tried various
approaches, some that you've mentioned below and some that were
mentioned by Andrea Corallo.

In particular, storing bytstr_data in the frame. As a fallback position
Andrea also suggested giving up on the most recent offset and keeping
offsets in the backtrace.

We think that this is bad in that the most-recent position is the one
that is most desired.

No matter what is done, there is going to be a loss of performance
unless we gain it back somewhere else, e.g. bytecode-to-bytecode call
which is going to be complicated

*Please developers, we need some feedback here.*

1. What would a maximum acceptable slowdown be?

2. If we follow the update only as needed there are going to be a
   massive number of changes which will make review harder. Will that
   diminish the chances of this being merged into master?

Note that if the code saved offsets initially (or was written less well)
then a 7% slowdown
probably wouldn't be an issue. If you were to take your other favorite
programming
language that reports bytecode offsets, and announce that you can speed up
the
interpreter by 7% but you lose backtrace locations, I suspect the community
wouldn't
go for that.

>
> I had a couple of other ideas for how to micro-optimize this store.
> I don't know if they will work, but you might try them.
>
> The updating looks like this:
>
> +#define UPDATE_OFFSET (backtrace_byte_offset = pc - bytestr_data)
>
> If you also stored bytestr_data in the structure, then instead of doing
> the subtraction in the hot loop, you could do the subtraction when
> computing the location -- which happens more rarely, and which is not
> performance-sensitive.
>

Andrea suggested this too. We will do this.

>
>
> Also, as you said, backtrace_byte_offset actually references a field in
> the thread state:
>
> +  /* The offset of the current op of the byte-code function being
> +     executed. */
> +  int m_backtrace_byte_offset;
> +#define backtrace_byte_offset (current_thread->m_backtrace_byte_offset)
>
> You might try changing this to be a pointer, and have it point to a
> local variable that is updated by UPDATE_OFFSET.  This might be a bit
> faster due to cache effects.
>

 That's a great idea, We will try that out.

> Note that you do *not* want to have it point to the actual "pc" variable
> -- if you take the address of "pc", the compiler will probably not put
> it in a register.  I mean, you might try it, but I would expect a
> performance loss in this case.
>

We saw this firsthand when we kept the pointers in the
specbinding frame. I'm glad you clarified what was going on because we
were puzzled.

>
> Tom
>

Rocky and Zach

[-- Attachment #2: Type: text/html, Size: 8031 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Storing Bytecode Offset
  2020-07-13  3:27   ` Rocky Bernstein
@ 2020-07-13  8:01     ` Andrea Corallo via Emacs development discussions.
  0 siblings, 0 replies; 4+ messages in thread
From: Andrea Corallo via Emacs development discussions. @ 2020-07-13  8:01 UTC (permalink / raw)
  To: Rocky Bernstein; +Cc: Tom Tromey, Zach Shaftel, emacs-devel

Rocky Bernstein <rocky@gnu.org> writes:

> Andrea also suggested giving up on the most recent offset and keeping
> offsets in the backtrace.

To be precise this is somenthing I mentioned as a possibility if none of
the other perf impact mitigations works out sufficiently.

> We think that this is bad in that the most-recent position is the one
> that is most desired.

Anyway IMO would be probably more fruitful to focus on the byte compiler
preserving location topic as first.  This is has a value by its own
allowing for emitting correct diagnostic *and* is a pre for having the
bytecode offset thing useful.

  Andrea

-- 
akrl@sdf.org

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-07-13  8:01 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-30  0:20 Storing Bytecode Offset Zach Shaftel
2020-07-11 15:29 ` Tom Tromey
2020-07-13  3:27   ` Rocky Bernstein
2020-07-13  8:01     ` Andrea Corallo via Emacs development discussions.

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).