Proposal: stack traces with line numbers

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Proposal: stack traces with line numbers
@ 2017-10-15  0:17 John Williams
  2017-10-15  1:54 ` Daniele Nicolodi
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: John Williams @ 2017-10-15  0:17 UTC (permalink / raw)
  To: emacs-devel

Elisp is a fun language to work in, for the most part, but one thing I
find very irritating compared to other languages is that there's no
way to get a stack trace with line numbers. I'm wondering if others
feel the same way and would be open to accepting a change to add
better support for line numbers. Here's my plan:

1. Revise the reader to attach source references (i.e. filename, line
number, and column number) to forms as they are read.
2. Update the byte compiler to preserve source references in compiled code.
3. Update the debugger to display source references in backtraces
whenever possible.
4. Add a simple API for users to retrieve a stack trace suitable for
writing to logs, etc. (There's already a stack trace API, but the
information you can get from it isn't all that useful.)
5. Possibly add some facilities for macro authors to control the
source refs in macro expansions. I'm not sure about that part because
I believe most macros will propagate source information in a
reasonable way simply by virtue of embedding their arguments in the
expansions they generate.

I already have a working proof of concept for the first part. What it
does is attach a vector of (file name, line number, column number) to
the head of each list as it is read. The information is "attached"
using cons cells as keys in a weak-key hash table. I also added a
little function to fetch data from the hash table so the
representation is abstracted a little bit.

Here's my rationale for the engineering decisions I've made so far:

- I'm using a hash table because the other alternatives I looked at
involved changing the representation of (some) cons cells, which
doesn't sound so bad until you start looking at all the
performance-critical code paths that would need to change, and all the
parts of Emacs (e.g. the garbage collector) where the low-level
representation of cons cells is handled as a special case.

- I'm storing the information in vectors because it seems like a
reasonably efficient use of memory. Certainly better than a list. It
would be easy enough to encode all the relevant information in a
string, but then the reader would be spending time building strings
that will need to be decoded later, and I'm not sure it would help
anyway, because each string would be unique, whereas with a vector,
the same string object can be used for every reference in a file.
Adding a new primitive type would also be an option, but it hardly
seems worth the complexity to save a couple of words per source ref
when 99% of them will probably only be retained long enough to
byte-compile the code.

- I'm saving line and column numbers rather that just byte/character
offsets, because that's what developers need, and if it wasn't saved
in that format, displaying a stack trace would involve opening the
original source code to compute that information from the file
contents. If I dropped the column numbers I could store a source ref
in a cons cell rather than a vector, but it seems like a shame to
throw away that kind of information when it's so easy to collect. (I
could even pack the line and column number into a single integer,
since I don't think it would be a big deal if there was an overflow
for an incredibly large file, or a file with very long lines, but
again, that seems like unnecessary complexity to me.)

- I'm only attaching information to lists because only lists can be
function calls, and attaching information to things like symbols would
be problematic because every occurrence of a given symbol is
represented by the same Lisp object. Of course some lists aren't
function calls, but attaching a source ref to every list is a lot
simpler and more reliable than trying to guess which lists are
ultimately going to become function calls.

- I'm only attaching information to the head of each list purely as a
memory-saving measure. I can't think of scenario where you'd need a
source reference for a list without having its head available, except
maybe in the expansion of a macro that disassembles its arguments and
puts them back together in a new list. If it's an issue in practice, I
think a better solution would be for the macro expander to propagate
source refs to every cons cell in a macro argument at the point where
macro expansion takes place.

Thoughts?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Proposal: stack traces with line numbers
  2017-10-15  0:17 Proposal: stack traces with line numbers John Williams
@ 2017-10-15  1:54 ` Daniele Nicolodi
  2017-10-15  2:42 ` raman
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Daniele Nicolodi @ 2017-10-15  1:54 UTC (permalink / raw)
  To: emacs-devel

On 14/10/17 18:17, John Williams wrote:
> Elisp is a fun language to work in, for the most part, but one thing I
> find very irritating compared to other languages is that there's no
> way to get a stack trace with line numbers. I'm wondering if others
> feel the same way and would be open to accepting a change to add
> better support for line numbers. Here's my plan:

[snip]

I think finishing the port of Emacs to Guile would be less effort.

Sorry, I could not resist :-)

Cheers,
Daniele



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Proposal: stack traces with line numbers
  2017-10-15  0:17 Proposal: stack traces with line numbers John Williams
  2017-10-15  1:54 ` Daniele Nicolodi
@ 2017-10-15  2:42 ` raman
  2017-10-15  3:20   ` Noam Postavsky
  2017-10-15  3:40 ` Robert Weiner
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: raman @ 2017-10-15  2:42 UTC (permalink / raw)
  To: John Williams; +Cc: emacs-devel

Actually pressing enter on a line in the backtrace buffer jumps to the
appropriate source code -- what more would line numbers give us?
-- 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Proposal: stack traces with line numbers
  2017-10-15  2:42 ` raman
@ 2017-10-15  3:20   ` Noam Postavsky
  0 siblings, 0 replies; 12+ messages in thread
From: Noam Postavsky @ 2017-10-15  3:20 UTC (permalink / raw)
  To: raman; +Cc: John Williams, Emacs developers

On Sat, Oct 14, 2017 at 10:42 PM, raman <raman@google.com> wrote:
> Actually pressing enter on a line in the backtrace buffer jumps to the
> appropriate source code -- what more would line numbers give us?

Well, jumping to a particular line might be more useful than just
jumping to the beginning of a function. In practice, I find the
current behaviour of jumping to the function is generally good enough.

Apropos line numbers, there is also the long-standing issue of
inaccurate line numbers in byte compile errors & warnings (Bug#2681),
I think Alan was working on it, but perhaps it's stalled since I don't
recall any mention of it since July [1]. At any rate, since both these
things involve changes to the reader, probably some coordination is in
order. In particular, the shortcut of only taking line numbers for the
head of list might not be sensible if we want to solve the
error/warning line numbering problem too. Stefan had some ideas about
a "real" solution in [2].

[Bug#2681]: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=2681
[1]: https://lists.gnu.org/archive/html/emacs-devel/2017-07/msg00653.html
[2]: https://lists.gnu.org/archive/html/emacs-devel/2017-07/msg00729.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Proposal: stack traces with line numbers
  2017-10-15  0:17 Proposal: stack traces with line numbers John Williams
  2017-10-15  1:54 ` Daniele Nicolodi
  2017-10-15  2:42 ` raman
@ 2017-10-15  3:40 ` Robert Weiner
  2017-10-15 10:01 ` Helmut Eller
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Robert Weiner @ 2017-10-15  3:40 UTC (permalink / raw)
  To: John Williams; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 744 bytes --]

On Sat, Oct 14, 2017 at 8:17 PM, John Williams <jrw@pobox.com> wrote:

> Elisp is a fun language to work in, for the most part, but one thing I
> find very irritating compared to other languages is that there's no
> way to get a stack trace with line numbers. I'm wondering if others
> feel the same way and would be open to accepting a change to add
> better support for line numbers.


Sounds like some nice work that will pinpoint Lisp errors
even more closely than stack traces do now.  But as noted
in an earlier message, things are not bad in the Lisp space.
Emacs C code is where we often lack any reasonable pointer
to the source of an error, so improving upon the core
mechanisms there would be most valuable.

Bob

[-- Attachment #2: Type: text/html, Size: 1907 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Proposal: stack traces with line numbers
  2017-10-15  0:17 Proposal: stack traces with line numbers John Williams
                   ` (2 preceding siblings ...)
  2017-10-15  3:40 ` Robert Weiner
@ 2017-10-15 10:01 ` Helmut Eller
  2017-10-15 16:20   ` Stefan Monnier
  2017-10-16 22:43   ` John Williams
  2017-10-16  1:53 ` Richard Stallman
  2017-10-16 21:51 ` Wilfred Hughes
  5 siblings, 2 replies; 12+ messages in thread
From: Helmut Eller @ 2017-10-15 10:01 UTC (permalink / raw)
  To: emacs-devel; +Cc: John Williams

On Sat, Oct 14 2017, John Williams wrote:

> 1. Revise the reader to attach source references (i.e. filename, line
> number, and column number) to forms as they are read.  [...]

This part would be useful not just for backtraces, but also for improved
source positions in compiler messages.  And in general for users who use
the reader without the compiler.

> The information is "attached"
> using cons cells as keys in a weak-key hash table.  [...]

Unless you care about interpreted code, a non-weak hash-table should be
enough.  I think this hash table should work similar to
read-symbol-positions-list.

> - I'm storing the information in vectors because it seems like a
> reasonably efficient use of memory. [...]

It's debatable whether a [file line column] vector is an efficent
representation.  E.g. all lists in a source form come from the same file
(or buffer or string) so storing the same filename many times seems
redundant.  It might also be reasonable to use different representations
in the debug info than for the data-structures used by the reader or
compiler.

> - I'm saving line and column numbers rather that just byte/character
> offsets [...]

Line/column pairs have the (minor) advantage that line numbers have a
higher porbability to stay the same after small edits to the source.
But other than that, it seems to me that character offsets encode the
same information more compactly.

> - I'm only attaching information to the head of each list purely as a
> memory-saving measure. I can't think of scenario where you'd need a
> source reference for a list without having its head available, except
> maybe in the expansion of a macro that disassembles its arguments and
> puts them back together in a new list.  If it's an issue in practice,

In Lisp almost everything is a macro, so I bet that this is an issue.

> I think a better solution would be for the macro expander to propagate
> source refs to every cons cell in a macro argument at the point where
> macro expansion takes place.

It's clearly desirable that source positions are propagated
automatically as often as possible.  That job will be easier if the
reader records more information.

So, I think the reader should, at least optionally, also record
positions of every cons cell not just the first in a list.  Also, in
addition to the start position the reader could/should also record the
end position.

Helmut

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Proposal: stack traces with line numbers
  2017-10-15 10:01 ` Helmut Eller
@ 2017-10-15 16:20   ` Stefan Monnier
  2017-10-16 22:43   ` John Williams
  1 sibling, 0 replies; 12+ messages in thread
From: Stefan Monnier @ 2017-10-15 16:20 UTC (permalink / raw)
  To: emacs-devel

> Unless you care about interpreted code, a non-weak hash-table should be
> enough.

Some of the position info might need to be preserved "indefinitely", so
only a weak hash-table would handle that right (of course, an
alternative would be to store the position directly in the return value
of `read`, e.g. as is done in edebug-read-storing-offsets).

>> - I'm storing the information in vectors because it seems like a
>> reasonably efficient use of memory. [...]
> It's debatable whether a [file line column] vector is an efficent
> representation.  E.g. all lists in a source form come from the same file
> (or buffer or string) so storing the same filename many times seems
> redundant.

After macro-expansion, the source code can be made of pieces coming from
different files.

> So, I think the reader should, at least optionally, also record
> positions of every cons cell not just the first in a list.

The macro expansion code will need to be changed to propagate the source
info from the call to the expansion, and I think that should be
sufficient to make it unnecessary to preserve info about cons cells in
cdr position.

        Stefan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Proposal: stack traces with line numbers
  2017-10-15  0:17 Proposal: stack traces with line numbers John Williams
                   ` (3 preceding siblings ...)
  2017-10-15 10:01 ` Helmut Eller
@ 2017-10-16  1:53 ` Richard Stallman
  2017-10-16 21:51 ` Wilfred Hughes
  5 siblings, 0 replies; 12+ messages in thread
From: Richard Stallman @ 2017-10-16  1:53 UTC (permalink / raw)
  To: John Williams; +Cc: emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

This feature will be very nice.  Thanks for working on it.

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Proposal: stack traces with line numbers
  2017-10-15  0:17 Proposal: stack traces with line numbers John Williams
                   ` (4 preceding siblings ...)
  2017-10-16  1:53 ` Richard Stallman
@ 2017-10-16 21:51 ` Wilfred Hughes
  5 siblings, 0 replies; 12+ messages in thread
From: Wilfred Hughes @ 2017-10-16 21:51 UTC (permalink / raw)
  To: John Williams; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 4907 bytes --]

Thanks, this is an extremely worthwhile project.

When I built elisp-refs, I really missed having a reader that could report
the positions of sexps. I ended up using scan-sexps:
https://github.com/Wilfred/elisp-refs/blob/master/elisp-refs.el#L60-L98 and
taking advantage of the fact that read moves point.

Solving this properly would open up lots of opportunities for better elisp
tools.

On 15 October 2017 at 01:17, John Williams <jrw@pobox.com> wrote:

> Elisp is a fun language to work in, for the most part, but one thing I
> find very irritating compared to other languages is that there's no
> way to get a stack trace with line numbers. I'm wondering if others
> feel the same way and would be open to accepting a change to add
> better support for line numbers. Here's my plan:
>
> 1. Revise the reader to attach source references (i.e. filename, line
> number, and column number) to forms as they are read.
> 2. Update the byte compiler to preserve source references in compiled code.
> 3. Update the debugger to display source references in backtraces
> whenever possible.
> 4. Add a simple API for users to retrieve a stack trace suitable for
> writing to logs, etc. (There's already a stack trace API, but the
> information you can get from it isn't all that useful.)
> 5. Possibly add some facilities for macro authors to control the
> source refs in macro expansions. I'm not sure about that part because
> I believe most macros will propagate source information in a
> reasonable way simply by virtue of embedding their arguments in the
> expansions they generate.
>
> I already have a working proof of concept for the first part. What it
> does is attach a vector of (file name, line number, column number) to
> the head of each list as it is read. The information is "attached"
> using cons cells as keys in a weak-key hash table. I also added a
> little function to fetch data from the hash table so the
> representation is abstracted a little bit.
>
> Here's my rationale for the engineering decisions I've made so far:
>
> - I'm using a hash table because the other alternatives I looked at
> involved changing the representation of (some) cons cells, which
> doesn't sound so bad until you start looking at all the
> performance-critical code paths that would need to change, and all the
> parts of Emacs (e.g. the garbage collector) where the low-level
> representation of cons cells is handled as a special case.
>
> - I'm storing the information in vectors because it seems like a
> reasonably efficient use of memory. Certainly better than a list. It
> would be easy enough to encode all the relevant information in a
> string, but then the reader would be spending time building strings
> that will need to be decoded later, and I'm not sure it would help
> anyway, because each string would be unique, whereas with a vector,
> the same string object can be used for every reference in a file.
> Adding a new primitive type would also be an option, but it hardly
> seems worth the complexity to save a couple of words per source ref
> when 99% of them will probably only be retained long enough to
> byte-compile the code.
>
> - I'm saving line and column numbers rather that just byte/character
> offsets, because that's what developers need, and if it wasn't saved
> in that format, displaying a stack trace would involve opening the
> original source code to compute that information from the file
> contents. If I dropped the column numbers I could store a source ref
> in a cons cell rather than a vector, but it seems like a shame to
> throw away that kind of information when it's so easy to collect. (I
> could even pack the line and column number into a single integer,
> since I don't think it would be a big deal if there was an overflow
> for an incredibly large file, or a file with very long lines, but
> again, that seems like unnecessary complexity to me.)
>
> - I'm only attaching information to lists because only lists can be
> function calls, and attaching information to things like symbols would
> be problematic because every occurrence of a given symbol is
> represented by the same Lisp object. Of course some lists aren't
> function calls, but attaching a source ref to every list is a lot
> simpler and more reliable than trying to guess which lists are
> ultimately going to become function calls.
>
> - I'm only attaching information to the head of each list purely as a
> memory-saving measure. I can't think of scenario where you'd need a
> source reference for a list without having its head available, except
> maybe in the expansion of a macro that disassembles its arguments and
> puts them back together in a new list. If it's an issue in practice, I
> think a better solution would be for the macro expander to propagate
> source refs to every cons cell in a macro argument at the point where
> macro expansion takes place.
>
> Thoughts?
>
>

[-- Attachment #2: Type: text/html, Size: 5666 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Proposal: stack traces with line numbers
  2017-10-15 10:01 ` Helmut Eller
  2017-10-15 16:20   ` Stefan Monnier
@ 2017-10-16 22:43   ` John Williams
  2017-10-17  0:00     ` Helmut Eller
  2017-10-18 15:00     ` John Williams
  1 sibling, 2 replies; 12+ messages in thread
From: John Williams @ 2017-10-16 22:43 UTC (permalink / raw)
  To: Helmut Eller; +Cc: emacs-devel

On Sun, Oct 15, 2017 at 3:01 AM, Helmut Eller <eller.helmut@gmail.com> wrote:
> On Sat, Oct 14 2017, John Williams wrote:
>> The information is "attached"
>> using cons cells as keys in a weak-key hash table.  [...]
>
> Unless you care about interpreted code, a non-weak hash-table should be
> enough.  I think this hash table should work similar to
> read-symbol-positions-list.

What I had in mind was a single global hashtable, because that way
it's easy to make it look as if the source refs are physically part of
the annotated cons cells, and users of the API don't need to be aware
that a supplementary data structure even exists. But of course using a
global hashtable with strong keys would create a huge space leak in
the reader.

Is there any particular disadvantage to using weak keys?

>> - I'm storing the information in vectors because it seems like a
>> reasonably efficient use of memory. [...]
>
> It's debatable whether a [file line column] vector is an efficent
> representation.  E.g. all lists in a source form come from the same file
> (or buffer or string) so storing the same filename many times seems
> redundant.  It might also be reasonable to use different representations
> in the debug info than for the data-structures used by the reader or
> compiler.

The file name would be a single string object shared by every ref in a
given file (or nil when there is no file), so we'd only be saving a
few words per source ref (one for the string itself, plus one or two
saved by using a cons cell instead of a two-element vector.)

>> - I'm saving line and column numbers rather that just byte/character
>> offsets [...]
>
> Line/column pairs have the (minor) advantage that line numbers have a
> higher porbability to stay the same after small edits to the source.
> But other than that, it seems to me that character offsets encode the
> same information more compactly.

It seems like we may be talking about different things. I'm speaking
strictly about the in-memory representation produced by the reader,
which will be quickly garbage-collected in most cases (assuming most
elisp code is compiled, either as part of the Emacs build processes,
or by the package manager). I haven't even thought about how to
represent the same information in bytecode, but I assume it will be
quite different, and more focused on compactness.

>> - I'm only attaching information to the head of each list purely as a
>> memory-saving measure. I can't think of scenario where you'd need a
>> source reference for a list without having its head available, except
>> maybe in the expansion of a macro that disassembles its arguments and
>> puts them back together in a new list.  If it's an issue in practice,
>
> In Lisp almost everything is a macro, so I bet that this is an issue.

Maybe. From what I can tell, most function calls in macro arguments
are copied directly into the expansion, so no important information
would be lost in the expansion process, except for a few outliers like
iter-defun, which appears to completely re-assemble the code it's
given. Attaching the same information to every cons cell wouldn't be
difficult, though. Every cell in a given list could share the same
source ref, so the main overhead would be the extra hash table
entries. My guess is that doing so would roughly double or even triple
the average memory footprint of a cons cell produced by the reader,
but I don't think that would be a problem unless you're trying to run
Emacs on an embedded platform, and it's a feature that could be easily
compiled out or disabled at runtime if necessary.

>> I think a better solution would be for the macro expander to propagate
>> source refs to every cons cell in a macro argument at the point where
>> macro expansion takes place.
>
> It's clearly desirable that source positions are propagated
> automatically as often as possible.  That job will be easier if the
> reader records more information.

It would definitely be simpler. I'll defer to others' opinions
regarding the relative merits of each approach.

> So, I think the reader should, at least optionally, also record
> positions of every cons cell not just the first in a list.  Also, in
> addition to the start position the reader could/should also record the
> end position.

That would not be difficult to implement.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Proposal: stack traces with line numbers
  2017-10-16 22:43   ` John Williams
@ 2017-10-17  0:00     ` Helmut Eller
  2017-10-18 15:00     ` John Williams
  1 sibling, 0 replies; 12+ messages in thread
From: Helmut Eller @ 2017-10-17  0:00 UTC (permalink / raw)
  To: emacs-devel

On Mon, Oct 16 2017, John Williams wrote:

> What I had in mind was a single global hashtable, because that way
> it's easy to make it look as if the source refs are physically part of
> the annotated cons cells, and users of the API don't need to be aware
> that a supplementary data structure even exists. But of course using a
> global hashtable with strong keys would create a huge space leak in
> the reader.
>
> Is there any particular disadvantage to using weak keys?

The question is probably more if a global hashtable is a good idea.

I think an interface to read, like

(let* ((read-with-symbol-positions t)
       (read-symbol-positions-list '())
       (read-cons-position-table (make-hash-table :key 'eq))
       (form (read ...))
    ... do stuff with form ...)

would be fairly clean. Actually, it's quite hard to imagine a different
solution :-).  So users of read will probably have the choice anyway
whether to bind read-cons-position-table (or whatever the name will be)
to a fresh hashtable or reuse a global table.

> The file name would be a single string object shared by every ref in a
> given file (or nil when there is no file), so we'd only be saving a
> few words per source ref (one for the string itself, plus one or two
> saved by using a cons cell instead of a two-element vector.)

For the interface to read (the macro expander/compiler is a different
story) I would only record character positions.  Certainly easier to
handle for the garbage collector than a vector.  But it's not my call to
make.

Helmut

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Proposal: stack traces with line numbers
  2017-10-16 22:43   ` John Williams
  2017-10-17  0:00     ` Helmut Eller
@ 2017-10-18 15:00     ` John Williams
  1 sibling, 0 replies; 12+ messages in thread
From: John Williams @ 2017-10-18 15:00 UTC (permalink / raw)
  To: Helmut Eller; +Cc: emacs-devel

I just remembered something else about weak hash maps that I forgot to
mention earlier. Most people who try to use weak references eventually
get burned by surprising nondeterministic behavior, but the design I'm
proposing has an interesting property I first learned about in the
context of the JavaScript WeakMap type: as long as the map's :test is
'eq and :weakness is 'key, and the map is only ever accessed using
gethash and puthash, nondeterministic behavior is impossible to
observe, because a given entry becomes eligible for garbage collection
precisely when it is no longer accessible using gethash. This property
allows you to effectively add a new field to certain instances of an
existing type without altering the type itself.

On Mon, Oct 16, 2017 at 3:43 PM, John Williams <jrw@pobox.com> wrote:
> On Sun, Oct 15, 2017 at 3:01 AM, Helmut Eller <eller.helmut@gmail.com> wrote:
>> On Sat, Oct 14 2017, John Williams wrote:
>>> The information is "attached"
>>> using cons cells as keys in a weak-key hash table.  [...]
>>
>> Unless you care about interpreted code, a non-weak hash-table should be
>> enough.  I think this hash table should work similar to
>> read-symbol-positions-list.
>
> What I had in mind was a single global hashtable, because that way
> it's easy to make it look as if the source refs are physically part of
> the annotated cons cells, and users of the API don't need to be aware
> that a supplementary data structure even exists. But of course using a
> global hashtable with strong keys would create a huge space leak in
> the reader.
>
> Is there any particular disadvantage to using weak keys?
>
>>> - I'm storing the information in vectors because it seems like a
>>> reasonably efficient use of memory. [...]
>>
>> It's debatable whether a [file line column] vector is an efficent
>> representation.  E.g. all lists in a source form come from the same file
>> (or buffer or string) so storing the same filename many times seems
>> redundant.  It might also be reasonable to use different representations
>> in the debug info than for the data-structures used by the reader or
>> compiler.
>
> The file name would be a single string object shared by every ref in a
> given file (or nil when there is no file), so we'd only be saving a
> few words per source ref (one for the string itself, plus one or two
> saved by using a cons cell instead of a two-element vector.)
>
>>> - I'm saving line and column numbers rather that just byte/character
>>> offsets [...]
>>
>> Line/column pairs have the (minor) advantage that line numbers have a
>> higher porbability to stay the same after small edits to the source.
>> But other than that, it seems to me that character offsets encode the
>> same information more compactly.
>
> It seems like we may be talking about different things. I'm speaking
> strictly about the in-memory representation produced by the reader,
> which will be quickly garbage-collected in most cases (assuming most
> elisp code is compiled, either as part of the Emacs build processes,
> or by the package manager). I haven't even thought about how to
> represent the same information in bytecode, but I assume it will be
> quite different, and more focused on compactness.
>
>>> - I'm only attaching information to the head of each list purely as a
>>> memory-saving measure. I can't think of scenario where you'd need a
>>> source reference for a list without having its head available, except
>>> maybe in the expansion of a macro that disassembles its arguments and
>>> puts them back together in a new list.  If it's an issue in practice,
>>
>> In Lisp almost everything is a macro, so I bet that this is an issue.
>
> Maybe. From what I can tell, most function calls in macro arguments
> are copied directly into the expansion, so no important information
> would be lost in the expansion process, except for a few outliers like
> iter-defun, which appears to completely re-assemble the code it's
> given. Attaching the same information to every cons cell wouldn't be
> difficult, though. Every cell in a given list could share the same
> source ref, so the main overhead would be the extra hash table
> entries. My guess is that doing so would roughly double or even triple
> the average memory footprint of a cons cell produced by the reader,
> but I don't think that would be a problem unless you're trying to run
> Emacs on an embedded platform, and it's a feature that could be easily
> compiled out or disabled at runtime if necessary.
>
>>> I think a better solution would be for the macro expander to propagate
>>> source refs to every cons cell in a macro argument at the point where
>>> macro expansion takes place.
>>
>> It's clearly desirable that source positions are propagated
>> automatically as often as possible.  That job will be easier if the
>> reader records more information.
>
> It would definitely be simpler. I'll defer to others' opinions
> regarding the relative merits of each approach.
>
>> So, I think the reader should, at least optionally, also record
>> positions of every cons cell not just the first in a list.  Also, in
>> addition to the start position the reader could/should also record the
>> end position.
>
> That would not be difficult to implement.



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-10-18 15:00 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-15  0:17 Proposal: stack traces with line numbers John Williams
2017-10-15  1:54 ` Daniele Nicolodi
2017-10-15  2:42 ` raman
2017-10-15  3:20   ` Noam Postavsky
2017-10-15  3:40 ` Robert Weiner
2017-10-15 10:01 ` Helmut Eller
2017-10-15 16:20   ` Stefan Monnier
2017-10-16 22:43   ` John Williams
2017-10-17  0:00     ` Helmut Eller
2017-10-18 15:00     ` John Williams
2017-10-16  1:53 ` Richard Stallman
2017-10-16 21:51 ` Wilfred Hughes

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).