GSoC project - Improving ELisp Traceback and Debugging Information

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* GSoC project - Improving ELisp Traceback and Debugging Information
@ 2020-06-03 21:42 Zach Shaftel
  2020-06-05  7:00 ` Helmut Eller
  0 siblings, 1 reply; 5+ messages in thread
From: Zach Shaftel @ 2020-06-03 21:42 UTC (permalink / raw)
  To: emacs-devel

Hello all,

This summer I will be working on improving ELisp traceback 
information for Google Summer of Code.

My ultimate goal is to record the source location of calls so that 
this can be used by the backtrace, eg. buttons which jump to the 
exact function call which produces the error. The minimum goal 
however is to have the offset recorded while maintaining 
acceptable performance, and allow the backtrace to jump to the 
point in the disassembly where the error occurs.

So far I've modified the byte-code interpreter to simply store the 
offset of each funcall in the backtrace specbinding frame, and 
modified backtrace.el so the sequence of offsets is printed 
alongside each respective call in the backtrace. It's available on 
the feature/soc-bytecode-in-traceback-specbinding branch on 
Savannah. Here's what the backtrace output looks like:

Debugger entered--Lisp error: (wrong-type-argument 
number-or-marker-p nil)
    10 test-debugger()
     6 call-test-debugger((110 117 109 98 101 114 115 0 1 2 3 4 
     5))
     9 call-call-test-debugger()
       load("/home/zach/ELisp/bad-stuff.elc" nil t)
   513 command-line-1(("-l" "bad-stuff.elc"))
  1482 command-line()
   417 normal-top-level()

The current implementation entails a performance regression (based 
on elisp-benchmarks.el on my machine, a ~10% slowdown), so it's 
not viable in the current state, but there's plenty of ways to 
improve on that. Any ideas would be appreciated.

I've been looking at the scratch/accurate-warning-pos branch as 
well as prior discussions and am still evaluating different 
approaches to solving the task. It might be necessary to modify 
the way code is represented during compilation, be it simply with 
the annotated symbols as in that branch or with another more 
generalized form of object representation. The latter approach 
would be more versatile, but doing so while still preventing the 
compiler from hogging memory would be tough, and is broad enough 
that it's probably outside the scope of this project.

I'd love to hear others' thoughts, advice, and comments on the 
project, and on what sorts of changes would be most desired for 
inclusion in Emacs.

Thanks,
Zach Shaftel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: GSoC project - Improving ELisp Traceback and Debugging Information
  2020-06-03 21:42 GSoC project - Improving ELisp Traceback and Debugging Information Zach Shaftel
@ 2020-06-05  7:00 ` Helmut Eller
  2020-06-06 18:20   ` Zach Shaftel
  0 siblings, 1 reply; 5+ messages in thread
From: Helmut Eller @ 2020-06-05  7:00 UTC (permalink / raw)
  To: emacs-devel

On Wed, Jun 03 2020, Zach Shaftel wrote:

> Hello all,
>
> This summer I will be working on improving ELisp traceback information
> for Google Summer of Code.

Sounds very useful.

> So far I've modified the byte-code interpreter to simply store the
> offset of each funcall in the backtrace specbinding frame, and 
> modified backtrace.el so the sequence of offsets is printed alongside
> each respective call in the backtrace. It's available on the
> feature/soc-bytecode-in-traceback-specbinding branch on
> Savannah.

I don't see this branch on Savannah; there's a
feature/soc-bytecode-in-traceback from 2020-04-27, but apparently
doesn't contain all this.  Anyway, just wanted to say, that it would
nice if bytecode to bytecode calls would not leave the exec_byte_code
function.  Those calls should push the necessary frames and continue the
interpreter loop.  That way the bytecoe PC doesn't need to be saved
redundantly on the C stack and the specbinding stack.

> I've been looking at the scratch/accurate-warning-pos branch as well
> as prior discussions and am still evaluating different approaches to
> solving the task. It might be necessary to modify the way code is
> represented during compilation, be it simply with the annotated
> symbols as in that branch or with another more generalized form of
> object representation.

Instead of annotating symbols I would annotate cons cells.  The reader
could keep a hash table on the side an record the source position of
cons cells.

Helmut

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: GSoC project - Improving ELisp Traceback and Debugging Information
  2020-06-05  7:00 ` Helmut Eller
@ 2020-06-06 18:20   ` Zach Shaftel
  2020-06-06 19:56     ` Helmut Eller
  0 siblings, 1 reply; 5+ messages in thread
From: Zach Shaftel @ 2020-06-06 18:20 UTC (permalink / raw)
  To: Helmut Eller; +Cc: emacs-devel

> I don't see this branch on Savannah; there's a
> feature/soc-bytecode-in-traceback from 2020-04-27, but 
> apparently
> doesn't contain all this.

Ah my mistake, Rocky Bernstein had pushed that branch, I'm still 
waiting
to hear back from copyright-clerk@fsf.org so I don't know if I can 
push
to Savannah just yet. The repo is available at
https://github.com/SwiftLawnGnome/emacs-gsoc/tree/feature/soc-bytecode-in-traceback-specbinding
if you'd like to take a look.

> Anyway, just wanted to say, that it would nice if bytecode to 
> bytecode
> calls would not leave the exec_byte_code function. Those calls 
> should
> push the necessary frames and continue the interpreter loop. 
> That way
> the bytecoe PC doesn't need to be saved redundantly on the C 
> stack and
> the specbinding stack.

That's an excellent idea. That would make the logic cleaner and 
should
speed up the interpreter to boot. I'll get to work on that right 
away.

> Instead of annotating symbols I would annotate cons cells.  The 
> reader
> could keep a hash table on the side an record the source 
> position of
> cons cells.

That was also mentioned in this thread
https://lists.gnu.org/archive/html/emacs-devel/2020-03/msg00444.html
discussing this project. I'll be looking into this option but as 
Alan
Mackenzie mentioned in that thread it might not be plausible, 
largely
due to the sheer number of cons cells created during compilation 
and
macroexpansion. Keeping that information across all those source
transformations seems nigh impossible without some very convoluted
logic.

I'm also not so keen on the symbols approach because it splits 
symbols
into two different types, annotated and bare, which to me just 
seems
unnecessarily complicated. But this could be changed so that it 
isn't
transparent to the user like it is in that branch.

I'll be looking at how other Lisp compilers record source code
locations. SBCL is what I'm most familiar with but that compiler 
is very
complex, and uses an intermediate code representation during 
compilation
that makes recording this type of information easier. Ideally I 
would
teach the byte compiler to do something as advanced as this as 
well, but
that would probably entail a complete overhaul that wouldn't fit 
into
the span of my project. Perhaps, once the offset is readily 
available, I
could start this undertaking and continue work on it after GSoC 
ends.

-Zach

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: GSoC project - Improving ELisp Traceback and Debugging Information
  2020-06-06 18:20   ` Zach Shaftel
@ 2020-06-06 19:56     ` Helmut Eller
  2020-06-06 23:18       ` Zach Shaftel
  0 siblings, 1 reply; 5+ messages in thread
From: Helmut Eller @ 2020-06-06 19:56 UTC (permalink / raw)
  To: Zach Shaftel; +Cc: emacs-devel

> I'll be looking at how other Lisp compilers record source code
> locations. SBCL is what I'm most familiar with but that compiler is
> very
> complex, and uses an intermediate code representation during
> compilation
> that makes recording this type of information easier.

[For now lets pretend that "source-path" is the same as a
source-location.]

Before converting to IR1, SBCL records the "source-path" of each cons
cell in a top-level form in the hash table *SOURCE-PATHS*.  (See the
function SUB-FIND-SOURCE-PATHS defined in ir1tran.lisp).  Later, when
the compiler walks down the (macroexpanded) form, it searches the
recorded source-path for the current form.  If something is recorded,
good; if nothing is recorded, then the compiler uses the source-path of
the parent form instead (the top-level form has always a recorded
source-path).

I think this approach is quite simple and usually works pretty well.
Most macros return part of the input arguments somewhere in their output
and so the compiler will naturally find the recorded source-path in the
hash table.  Problematic are macros that completely copy/rewrite the
input.  Luckily, such macros are rare.

I don't know if it would work well enough for macroexpand-all and
cconv.el.  But at least, it doesn't need new data types.  It could also
help to manually propagate the source locations in a few places, like
macroexp--cons.

Helmut

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: GSoC project - Improving ELisp Traceback and Debugging Information
  2020-06-06 19:56     ` Helmut Eller
@ 2020-06-06 23:18       ` Zach Shaftel
  0 siblings, 0 replies; 5+ messages in thread
From: Zach Shaftel @ 2020-06-06 23:18 UTC (permalink / raw)
  To: Helmut Eller; +Cc: emacs-devel

> Before converting to IR1, SBCL records the "source-path" of each cons
> cell in a top-level form in the hash table *SOURCE-PATHS*.  (See the
> function SUB-FIND-SOURCE-PATHS defined in ir1tran.lisp).

Ah okay, I guess I'm not as familiar with the SBCL compilation process
as I thought. I had some trouble finding where the source path recording
actually takes place so thanks for pointing that out, gonna be doing a
lot more digging through the source.

> Later, when the compiler walks down the (macroexpanded) form, it
> searches the recorded source-path for the current form. If something
> is recorded, good; if nothing is recorded, then the compiler uses the
> source-path of the parent form instead (the top-level form has always
> a recorded source-path).
>
> I think this approach is quite simple and usually works pretty well.
> Most macros return part of the input arguments somewhere in their output
> and so the compiler will naturally find the recorded source-path in the
> hash table.  Problematic are macros that completely copy/rewrite the
> input.  Luckily, such macros are rare.

Great, that certainly does sound like something the byte compiler can
manage to keep track of.

> I don't know if it would work well enough for macroexpand-all and
> cconv.el.  But at least, it doesn't need new data types.  It could also
> help to manually propagate the source locations in a few places, like
> macroexp--cons.

That seems like a good approach and should be easy enough, and luckily
macroexp--expand-all already tries hard to preserve `eq`uality of
expanded forms. cconv.el might be the bigger issue, but that's a task
for later.

-Zach




^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-06-06 23:18 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-03 21:42 GSoC project - Improving ELisp Traceback and Debugging Information Zach Shaftel
2020-06-05  7:00 ` Helmut Eller
2020-06-06 18:20   ` Zach Shaftel
2020-06-06 19:56     ` Helmut Eller
2020-06-06 23:18       ` Zach Shaftel

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).