* Storing Bytecode Offset @ 2020-06-30 0:20 Zach Shaftel 2020-07-11 15:29 ` Tom Tromey 0 siblings, 1 reply; 4+ messages in thread From: Zach Shaftel @ 2020-06-30 0:20 UTC (permalink / raw) To: emacs-devel; +Cc: Rocky Bernstein Hello all, Rocky Bernstein and I have been working on the bytecode interpreter to save the offset as functions are executed. The plan is to then map the new information to source code locations once that's in place, for more helpful backtrace information; but first the offset recording needs to be done right. We've tested[1] a number of different approaches and are still weighing the pros and cons of them, so we'd like to hear others' input. Rocky has pushed the code to Savannah. In the feature/zach-soc-funcall-from-bytecode branch, I made a crude transplant of code from Ffuncall and a few others into exec_byte_code, so that compiled function calls don't need to pass through Ffuncall. In the case of lexically scoped compiled functions, exec_byte_code is called directly. The benefit here is that the offset doesn't have to be made available in the thread struct (it can be passed as an argument to a record_in_backtrace_with_offset function), and cutting out the Ffuncall speeds up bytecode execution in general. According to tests, this version is faster than the existing byte code interpreter, while still storing the offset for every frame. But the code on that branch is just a rough proof-of-concept and isn't 100% accurate. Doing it right would require a lot of refactoring that could ultimately lead to a dead end. In the feature/zach-soc-bytecode-in-traceback branch, the offset is stored in the thread_state struct. Prior to the most recent commit, the offset was then stashed in the backtrace specbinding frame from record_in_backtrace, so each bytecode call in the backtrace buffer was annotated with the offset when the error occurred. This is the ultimate goal, but the current implementation is flawed and a significant source of slowdown. Even without that, the code is slow, which is why we hope there are other ideas or ways to improve. We've tried other changes, like storing the next backtrace frame in the thread_state to eliminate a loop to find the pertinent frame, but so far we haven't been able to achieve acceptable performance in this implementation. -Zach ------------------------------ [1] We've been testing with scripts at https://gitlab.com/Zach_S/bench-compare.el. We've collected some data in the `results` and `data` directories. The compare-benches.el script concurrently runs elisp-benchmarks.el on supplied Emacs executables and collects all the results into an org-mode file, like the ones in `results`. If you try it out, let me know what goes wrong, I'm sure there are bugs. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Storing Bytecode Offset 2020-06-30 0:20 Storing Bytecode Offset Zach Shaftel @ 2020-07-11 15:29 ` Tom Tromey 2020-07-13 3:27 ` Rocky Bernstein 0 siblings, 1 reply; 4+ messages in thread From: Tom Tromey @ 2020-07-11 15:29 UTC (permalink / raw) To: Zach Shaftel; +Cc: Rocky Bernstein, emacs-devel >>>>> "Zach" == Zach Shaftel <zshaftel@gmail.com> writes: Zach> Rocky has pushed the code to Savannah. In the Zach> feature/zach-soc-funcall-from-bytecode branch [...] Zach> But the code on that branch is just Zach> a rough proof-of-concept and isn't 100% accurate. Doing it right would Zach> require a lot of refactoring that could ultimately lead to a dead end. Could you say more specifically what is wrong on that branch? What kind of refactoring is needed? I think other interpreters out there take this "self-call" approach, and it seems like a decent idea to try in Emacs. Zach> In the feature/zach-soc-bytecode-in-traceback branch, the offset is Zach> stored in the thread_state struct. Prior to the most recent commit, the Zach> offset was then stashed in the backtrace specbinding frame from Zach> record_in_backtrace, so each bytecode call in the backtrace buffer was Zach> annotated with the offset when the error occurred. This is the ultimate Zach> goal, but the current implementation is flawed and a significant source Zach> of slowdown. I took a brief look at it. One thing I noticed is that the assignment is done in NEXT: +#define NEXT UPDATE_OFFSET; goto *(targets[op = FETCH]) However, it seems to me that there cases where this is not needed -- any bytecode that cannot possibly cause an exception doesn't need to record the offset. For example: CASE (Bvarref6): op = FETCH; varref: { Lisp_Object v1 = vectorp[op], v2; if (!SYMBOLP (v1) || XSYMBOL (v1)->u.s.redirect != SYMBOL_PLAINVAL || (v2 = SYMBOL_VAL (XSYMBOL (v1)), EQ (v2, Qunbound))) v2 = Fsymbol_value (v1); PUSH (v2); NEXT; } Here, I think the updating only has to be done before the call to Fsymbol_value. Or: CASE (Bgotoifnil): { Lisp_Object v1 = POP; op = FETCH2; if (NILP (v1)) goto op_branch; NEXT; } This opcode doesn't need to update the offset at all. This matters because the change is introducing an extra store into performance-sensitive code. I had a couple of other ideas for how to micro-optimize this store. I don't know if they will work, but you might try them. The updating looks like this: +#define UPDATE_OFFSET (backtrace_byte_offset = pc - bytestr_data) If you also stored bytestr_data in the structure, then instead of doing the subtraction in the hot loop, you could do the subtraction when computing the location -- which happens more rarely, and which is not performance-sensitive. Also, as you said, backtrace_byte_offset actually references a field in the thread state: + /* The offset of the current op of the byte-code function being + executed. */ + int m_backtrace_byte_offset; +#define backtrace_byte_offset (current_thread->m_backtrace_byte_offset) You might try changing this to be a pointer, and have it point to a local variable that is updated by UPDATE_OFFSET. This might be a bit faster due to cache effects. Note that you do *not* want to have it point to the actual "pc" variable -- if you take the address of "pc", the compiler will probably not put it in a register. I mean, you might try it, but I would expect a performance loss in this case. Tom ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Storing Bytecode Offset 2020-07-11 15:29 ` Tom Tromey @ 2020-07-13 3:27 ` Rocky Bernstein 2020-07-13 8:01 ` Andrea Corallo via Emacs development discussions. 0 siblings, 1 reply; 4+ messages in thread From: Rocky Bernstein @ 2020-07-13 3:27 UTC (permalink / raw) To: Tom Tromey, Andrea Corallo; +Cc: Zach Shaftel, emacs-devel [-- Attachment #1: Type: text/plain, Size: 5948 bytes --] Thanks for looking over the changes and making suggestions. On Sat, Jul 11, 2020 at 11:29 AM Tom Tromey <tom@tromey.com> wrote: > >>>>> "Zach" == Zach Shaftel <zshaftel@gmail.com> writes: > > Zach> Rocky has pushed the code to Savannah. In the > Zach> feature/zach-soc-funcall-from-bytecode branch > [...] > Zach> But the code on that branch is just > Zach> a rough proof-of-concept and isn't 100% accurate. Doing it right > would > Zach> require a lot of refactoring that could ultimately lead to a dead > end. > > Could you say more specifically what is wrong on that branch? What kind > of refactoring is needed? > > I think other interpreters out there take this "self-call" approach, and > it seems like a decent idea to try in Emacs. > Zach: the code added to exec_byte_code in that branch is almost identical to the contents of Ffuncall, with the exception of the lexical bytecode special case. That part is just a copy of some of funcall_lambda and fetch_and_exec_byte_code. That doesn't seem to be a problem in terms of performance or functionality (so far) but the code duplication is obviously not ideal. I don't see any way to avoid this without some big changes to the whole funcall call graph. That being said, Zach would love to have this in the bytecode interpreter, so he'll would gladly make those changes if this ends up being the best way forward. However unless it is necessary now, it would probably be done after the Summer of Code project when we hope to have more of the complete solution done. > > Zach> In the feature/zach-soc-bytecode-in-traceback branch, the offset is > Zach> stored in the thread_state struct. Prior to the most recent commit, > the > Zach> offset was then stashed in the backtrace specbinding frame from > Zach> record_in_backtrace, so each bytecode call in the backtrace buffer > was > Zach> annotated with the offset when the error occurred. This is the > ultimate > Zach> goal, but the current implementation is flawed and a significant > source > Zach> of slowdown. > > I took a brief look at it. One thing I noticed is that the assignment > is done in NEXT: > > +#define NEXT UPDATE_OFFSET; goto *(targets[op = FETCH]) > > However, it seems to me that there cases where this is not needed -- any > bytecode that cannot possibly cause an exception doesn't need to record > the offset. For example: > > CASE (Bvarref6): > op = FETCH; > varref: > { > Lisp_Object v1 = vectorp[op], v2; > if (!SYMBOLP (v1) > || XSYMBOL (v1)->u.s.redirect != SYMBOL_PLAINVAL > || (v2 = SYMBOL_VAL (XSYMBOL (v1)), EQ (v2, Qunbound))) > v2 = Fsymbol_value (v1); > PUSH (v2); > NEXT; > } > > Here, I think the updating only has to be done before the call to > Fsymbol_value. > > Or: > > CASE (Bgotoifnil): > { > Lisp_Object v1 = POP; > op = FETCH2; > if (NILP (v1)) > goto op_branch; > NEXT; > } > > This opcode doesn't need to update the offset at all. > > This matters because the change is introducing an extra store into > performance-sensitive code. > Absolutely correct. There is about a 7% slowdown and we've tried various approaches, some that you've mentioned below and some that were mentioned by Andrea Corallo. In particular, storing bytstr_data in the frame. As a fallback position Andrea also suggested giving up on the most recent offset and keeping offsets in the backtrace. We think that this is bad in that the most-recent position is the one that is most desired. No matter what is done, there is going to be a loss of performance unless we gain it back somewhere else, e.g. bytecode-to-bytecode call which is going to be complicated *Please developers, we need some feedback here.* 1. What would a maximum acceptable slowdown be? 2. If we follow the update only as needed there are going to be a massive number of changes which will make review harder. Will that diminish the chances of this being merged into master? Note that if the code saved offsets initially (or was written less well) then a 7% slowdown probably wouldn't be an issue. If you were to take your other favorite programming language that reports bytecode offsets, and announce that you can speed up the interpreter by 7% but you lose backtrace locations, I suspect the community wouldn't go for that. > > I had a couple of other ideas for how to micro-optimize this store. > I don't know if they will work, but you might try them. > > The updating looks like this: > > +#define UPDATE_OFFSET (backtrace_byte_offset = pc - bytestr_data) > > If you also stored bytestr_data in the structure, then instead of doing > the subtraction in the hot loop, you could do the subtraction when > computing the location -- which happens more rarely, and which is not > performance-sensitive. > Andrea suggested this too. We will do this. > > > Also, as you said, backtrace_byte_offset actually references a field in > the thread state: > > + /* The offset of the current op of the byte-code function being > + executed. */ > + int m_backtrace_byte_offset; > +#define backtrace_byte_offset (current_thread->m_backtrace_byte_offset) > > You might try changing this to be a pointer, and have it point to a > local variable that is updated by UPDATE_OFFSET. This might be a bit > faster due to cache effects. > That's a great idea, We will try that out. > Note that you do *not* want to have it point to the actual "pc" variable > -- if you take the address of "pc", the compiler will probably not put > it in a register. I mean, you might try it, but I would expect a > performance loss in this case. > We saw this firsthand when we kept the pointers in the specbinding frame. I'm glad you clarified what was going on because we were puzzled. > > Tom > Rocky and Zach [-- Attachment #2: Type: text/html, Size: 8031 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Storing Bytecode Offset 2020-07-13 3:27 ` Rocky Bernstein @ 2020-07-13 8:01 ` Andrea Corallo via Emacs development discussions. 0 siblings, 0 replies; 4+ messages in thread From: Andrea Corallo via Emacs development discussions. @ 2020-07-13 8:01 UTC (permalink / raw) To: Rocky Bernstein; +Cc: Tom Tromey, Zach Shaftel, emacs-devel Rocky Bernstein <rocky@gnu.org> writes: > Andrea also suggested giving up on the most recent offset and keeping > offsets in the backtrace. To be precise this is somenthing I mentioned as a possibility if none of the other perf impact mitigations works out sufficiently. > We think that this is bad in that the most-recent position is the one > that is most desired. Anyway IMO would be probably more fruitful to focus on the byte compiler preserving location topic as first. This is has a value by its own allowing for emitting correct diagnostic *and* is a pre for having the bytecode offset thing useful. Andrea -- akrl@sdf.org ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2020-07-13 8:01 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-06-30 0:20 Storing Bytecode Offset Zach Shaftel 2020-07-11 15:29 ` Tom Tromey 2020-07-13 3:27 ` Rocky Bernstein 2020-07-13 8:01 ` Andrea Corallo via Emacs development discussions.
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).