Thoughts on getting correct line numbers in the byte compiler's warning messages

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Thoughts on getting correct line numbers in the byte compiler's warning messages
@ 2018-11-01 17:59 Alan Mackenzie
  2018-11-01 22:45 ` Stefan Monnier
  2018-11-08  4:47 ` Michael Heerdegen
  0 siblings, 2 replies; 44+ messages in thread
From: Alan Mackenzie @ 2018-11-01 17:59 UTC (permalink / raw)
  To: emacs-devel

Hello, Emacs.

Most of the time, the byte compiler identifies the correct place of error
in its warning messages.  This is remarkable, given the crude hack which
it uses.

However, it sometimes fails, and this has given rise to a number of bug
reports, e.g., 22288, and several others which have been merged with it.
In bug #22288:

    (defun test ()
      (let (a))
      a)

, the byte compiler correctly reports "reference to free variable 'a',
but wrongly gives the source position as L2 C9 rather than L3 C3.

The problem is that the Emacs Lisp source code being compiled is first
read, and this discards line/column numbers of the constructs created.  I
believe that, somehow, accurate source position information must be
preserved.  But how?  It is not easy.

The forms created by the reader go through several (?many) transformative
phases where they get replaced by successor forms.  This makes things
more difficult.

My first idea to track position information was for the reader to create
a hash table of conses (the key) and positions (the value), so that the
position could be found simply by accessing the entry corresponding with
the current form.  This doesn't work so easily, because of the previous
paragraph.

Then I tried duplicating a hash table entry when a transformation was
effected.  This was just too tedious and error prone, and was also slow.

Second idea was still to maintain this hash table, but on each
transformation to write the result back to the same cons cell as the
original.  I actually put quite a lot of work into this approach, but in
the end didn't get very far.  It was just too much detailed work, too
fiddly.

The third idea is to amend the reader so that whereas it now produces a
form, in a byte compiler special mode, it would produce the cons (form .
offset).  So, for example, the text "(not a)" currently gets read into
the form (not . (a . nil)).  The amended reader would produce (((not . 1)
. ((a . 5) . (nil . 6))) . 0) (where 0, 1, 5, and 6 are the textual
offsets of the elements coded).  Such forms would require special
versions of `cons', `car', `cdr', `cond', ...., `mapcar', .... to be
easily manipulable.  These versions would be macros to begin with, but
probably primitives ultimately.  Assuming appropriate design, it should
be possibly to substitute these new macros/primitives for the existing
cons/car/cdr/...s in the byte compiler without too much related change.
I'm still exploring this scheme.

I feel that this bug is not intractable, though it will take quite a lot
of work to fix.

Comments?

-- 
Alan Mackenzie (Nuremberg, Germany).

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-01 17:59 Thoughts on getting correct line numbers in the byte compiler's warning messages Alan Mackenzie
@ 2018-11-01 22:45 ` Stefan Monnier
  2018-11-05 10:53   ` Alan Mackenzie
  2018-11-08  4:47 ` Michael Heerdegen
  1 sibling, 1 reply; 44+ messages in thread
From: Stefan Monnier @ 2018-11-01 22:45 UTC (permalink / raw)
  To: emacs-devel

> The third idea is to amend the reader so that whereas it now produces a
> form, in a byte compiler special mode, it would produce the cons (form .
> offset).  So, for example, the text "(not a)" currently gets read into

Sounds good.  I have the vague feeling that I mentioned it already, but
in case I haven't: please make sure the positions are character-precise
rather than line-precise, so that we can (eventually) ditch Edebug's
Elisp-reimplementation-of-the-reader which returns the same kind of info
(and needs character-precise location info).

        Stefan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-01 22:45 ` Stefan Monnier
@ 2018-11-05 10:53   ` Alan Mackenzie
  2018-11-05 15:57     ` Eli Zaretskii
  2018-11-06 13:56     ` Stefan Monnier
  0 siblings, 2 replies; 44+ messages in thread
From: Alan Mackenzie @ 2018-11-05 10:53 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

Hello, Stefan.

On Thu, Nov 01, 2018 at 18:45:00 -0400, Stefan Monnier wrote:
> > The third idea is to amend the reader so that whereas it now produces a
> > form, in a byte compiler special mode, it would produce the cons (form .
> > offset).  So, for example, the text "(not a)" currently gets read into

> Sounds good.  I have the vague feeling that I mentioned it already, but
> in case I haven't: please make sure the positions are character-precise
> rather than line-precise, so that we can (eventually) ditch Edebug's
> Elisp-reimplementation-of-the-reader which returns the same kind of info
> (and needs character-precise location info).

Actually this idea was not good; macros could not handle such a form
without severe changes in the way macros work.  (A research project,
perhaps).

I have come up with an improved scheme, which may well work.

The reader would produce, in place of the Lisp_Objects it currently
does, an object with Lisp_Type 1 (which is currently unused).  The rest
of the object would be an address pointing at two Lisp_Objects, one
being the "real" read object, the other being a source position.

The low level routines, like CONSP, and a million others in lisp.h would
need amendment.  But the Lisp system would continue with 8-byte objects,
and the higher level bits (nearly all of it) would not need changes.
The beauty of this scheme is that, outside of byte compilation, nothing
else would change.

One or two extra functions would be needed, such as `big-object' which
would create a new-type object out of a source offset and "ordinary"
object, `big-object-p', `big-offset' to get the source offset from a big
object, and possibly one or two others.

These would naturally be available to byte-compile-warn and friends,
supplying the source position.  To cope with the times when no source
position would be available (e.g. in forms expanded from macros), the
new variable `byte-compile-containing-form' would be bound at strategic
places in the byte compiler.  This would provide a fallback source
position.

The extra indirection involved in these "big objects" would naturally
slow down byte compilation somewhat.  I've no idea how much, but it
might not be much at all.

And yes, the source positions used would be character-precise.

What do you think?

>         Stefan

-- 
Alan Mackenzie (Nuremberg, Germany).

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-05 10:53   ` Alan Mackenzie
@ 2018-11-05 15:57     ` Eli Zaretskii
  2018-11-05 16:51       ` Alan Mackenzie
  2018-11-06 13:56     ` Stefan Monnier
  1 sibling, 1 reply; 44+ messages in thread
From: Eli Zaretskii @ 2018-11-05 15:57 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: monnier, emacs-devel

> Date: Mon, 5 Nov 2018 10:53:02 +0000
> From: Alan Mackenzie <acm@muc.de>
> Cc: emacs-devel@gnu.org
> 
> The reader would produce, in place of the Lisp_Objects it currently
> does, an object with Lisp_Type 1 (which is currently unused).  The rest
> of the object would be an address pointing at two Lisp_Objects, one
> being the "real" read object, the other being a source position.

Sounds gross to me.

Did you consider using mint_ptr objects instead?  That'd be still be
gross, but at least we won't introduce another type of Lisp_Object.

Also, what about keeping the source position in some other way, like a
property of some symbol?



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-05 15:57     ` Eli Zaretskii
@ 2018-11-05 16:51       ` Alan Mackenzie
  2018-11-06  4:34         ` Herring, Davis
  0 siblings, 1 reply; 44+ messages in thread
From: Alan Mackenzie @ 2018-11-05 16:51 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: monnier, emacs-devel

Hello, Eli.

On Mon, Nov 05, 2018 at 17:57:35 +0200, Eli Zaretskii wrote:
> > Date: Mon, 5 Nov 2018 10:53:02 +0000
> > From: Alan Mackenzie <acm@muc.de>
> > Cc: emacs-devel@gnu.org

> > The reader would produce, in place of the Lisp_Objects it currently
> > does, an object with Lisp_Type 1 (which is currently unused).  The rest
> > of the object would be an address pointing at two Lisp_Objects, one
> > being the "real" read object, the other being a source position.

> Sounds gross to me.

What is done at the moment is no less gross.  Just to clarify, the above
acton of read would only be done when in byte compilation, a bit like
how the current list of source symbols is also only for when in
compilation.

I've spend many hours at my PC, trying to figure out a neat way of
solving this problem.  The above is the best I've been able to come up
with, so far.

Why do you think the idea is gross, given the difficulty of the
underlying problem?  The idea should work with only moderate amendment
of the byte-compiler/macro routines, and virtually no change outside of
that, bar amending the reader and the lowest level functions like `cons'
and `car'.

> Did you consider using mint_ptr objects instead?  That'd be still be
> gross, but at least we won't introduce another type of Lisp_Object.

The using up of the last available object type is a severe disadvantage,
yes.  I wasn't aware of mint_ptrs until you just mentioned them.  I'll
need to read up on them to get the hang of what they're about.

> Also, what about keeping the source position in some other way, like a
> property of some symbol?

Difficult.  Essentially, these source positions are properties of
Lisp_Objects, such as conses, not of symbols.  A typical symbol is used
several or many times in a compilation unit.  Some means has to be found
of attaching properties (in this case, source positions), to arbitrary
Lisp_Objects.

It's gradually become clear to me that what I proposed this morning is a
special case of attaching a property list to an arbitrary object.  Maybe
an actual property list, being more general, would be a better idea.

Alternatively, it may be possible to use a vector or pseudovector type
rather than using Lisp_Type 1 to implement basically the same idea.
This would be slower at run time, however, possibly not significantly.

-- 
Alan Mackenzie (Nuremberg, Germany).

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-05 16:51       ` Alan Mackenzie
@ 2018-11-06  4:34         ` Herring, Davis
  2018-11-06  8:53           ` Alan Mackenzie
  0 siblings, 1 reply; 44+ messages in thread
From: Herring, Davis @ 2018-11-06  4:34 UTC (permalink / raw)
  To: Alan Mackenzie
  Cc: Eli Zaretskii, monnier@iro.umontreal.ca, emacs-devel@gnu.org

> I've spend many hours at my PC, trying to figure out a neat way of
> solving this problem.  The above is the best I've been able to come up
> with, so far.

Considering patterns like AoS vs. SoA, could the reader produce (on demand) a pair: the expression read and a parallel structure of position information?  For example,

'(foo
bar [baz])

=>

((quote (foo bar)) .
 (0 (2 6 [11])))

where the numbers are character offsets from the beginning of the read?  This loses information on the opening delimiter for each list/cons/vector; it could be added with certain obvious alterations to the location structure if that's a problem.

Davis

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-06  4:34         ` Herring, Davis
@ 2018-11-06  8:53           ` Alan Mackenzie
  0 siblings, 0 replies; 44+ messages in thread
From: Alan Mackenzie @ 2018-11-06  8:53 UTC (permalink / raw)
  To: Herring, Davis
  Cc: Eli Zaretskii, monnier@iro.umontreal.ca, emacs-devel@gnu.org

Hello, Davis.

On Tue, Nov 06, 2018 at 04:34:30 +0000, Herring, Davis wrote:
> > I've spend many hours at my PC, trying to figure out a neat way of
> > solving this problem.  The above is the best I've been able to come up
> > with, so far.

> Considering patterns like AoS vs. SoA, could the reader produce (on
> demand) a pair: the expression read and a parallel structure of
> position information?  For example,

> '(foo
> bar [baz])

> =>

> ((quote (foo bar)) .
>  (0 (2 6 [11])))

> where the numbers are character offsets from the beginning of the
> read?  This loses information on the opening delimiter for each
> list/cons/vector; it could be added with certain obvious alterations
> to the location structure if that's a problem.

Such a structure could be generated easily.  But how are we going to use
it?  The problem is how do we associate a particular piece of the main
structure with the pertinent bit of the auxiliary structure?

For example, at the time we're compiling baz, the byte compiler has just
baz itself.  How do we get to the 11 in the offsets structure?

This is the essence of the problem - associating data with the elements
of an arbitrary structure of lisp objects.  My proposal from yesterday
does this rigorously.

> Davis

-- 
Alan Mackenzie (Nuremberg, Germany).



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-05 10:53   ` Alan Mackenzie
  2018-11-05 15:57     ` Eli Zaretskii
@ 2018-11-06 13:56     ` Stefan Monnier
  2018-11-06 15:11       ` Alan Mackenzie
  1 sibling, 1 reply; 44+ messages in thread
From: Stefan Monnier @ 2018-11-06 13:56 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: emacs-devel

> Actually this idea was not good;

[ I'll assume you're not talking about the idea of using such a reader in
  edebug, but about using such a reader for your use case.  ]

> macros could not handle such a form without severe changes in the way
> macros work.  (A research project, perhaps).

Right.  The way I was thinking about it was that when calling
macros we'd do something like:

    (plain-to-annotated
     (macroexpand (annotated-to-plain sexp)))

not a research project by any stretch, but its impact on performance
could be a problem, indeed.

> The reader would produce, in place of the Lisp_Objects it currently
> does, an object with Lisp_Type 1 (which is currently unused).  The rest
> of the object would be an address pointing at two Lisp_Objects, one
> being the "real" read object, the other being a source position.

More generally, you're suggesting here to add a new object type (could
just as well be a new pseudo-vector or any such thing: these are just
low-level concerns that don't really affect the overall design).

> The low level routines, like CONSP, and a million others in lisp.h would
> need amendment.

So you're suggesting to change the low-level routines accessing
virtually all object types to also accept those "annotated objects"?

That means all processing of all objects would be slowed down.
I think that's a serious problem (I'd rather pay a significant slow
down in byte-compilation than a smaller slowdown on everything else).

> But the Lisp system would continue with 8-byte objects,
> and the higher level bits (nearly all of it) would not need changes.
> The beauty of this scheme is that, outside of byte compilation, nothing
> else would change.

Also, I wonder how this (or any other of the techniques discussed) solve
the original problem you describe:

    The forms created by the reader go through several (?many)
    transformative phases where they get replaced by successor forms.
    This makes things more difficult.

E.g. we could implement big-object as

    (defun big-object (object location)
      (cons object location))
or
    (defun big-object (object location)
      (puthash object location location-hash-table)
      object)
or
    (defun big-object (object location)
      (make-new-special-object object location))

but the problem remains of how to put it at all the places where we need it.

> The extra indirection involved in these "big objects" would naturally
> slow down byte compilation somewhat.  I've no idea how much, but it
> might not be much at all.

Indeed, I don't think that's a significant issue.

        Stefan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-06 13:56     ` Stefan Monnier
@ 2018-11-06 15:11       ` Alan Mackenzie
  2018-11-06 16:29         ` Stefan Monnier
  0 siblings, 1 reply; 44+ messages in thread
From: Alan Mackenzie @ 2018-11-06 15:11 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

Hello, Stefan.

On Tue, Nov 06, 2018 at 08:56:48 -0500, Stefan Monnier wrote:
> > Actually this idea was not good;

> [ I'll assume you're not talking about the idea of using such a reader in
>   edebug, but about using such a reader for your use case.  ]

In particular, in the byte compiler.

> > macros could not handle such a form without severe changes in the way
> > macros work.  (A research project, perhaps).

> Right.  The way I was thinking about it was that when calling
> macros we'd do something like:

>     (plain-to-annotated
>      (macroexpand (annotated-to-plain sexp)))

That would lose too much of the wanted source position data.

> not a research project by any stretch, but its impact on performance
> could be a problem, indeed.

> > The reader would produce, in place of the Lisp_Objects it currently
> > does, an object with Lisp_Type 1 (which is currently unused).  The rest
> > of the object would be an address pointing at two Lisp_Objects, one
> > being the "real" read object, the other being a source position.

> More generally, you're suggesting here to add a new object type (could
> just as well be a new pseudo-vector or any such thing: these are just
> low-level concerns that don't really affect the overall design).

There's nothing just about hurting performance.

> > The low level routines, like CONSP, and a million others in lisp.h would
> > need amendment.

> So you're suggesting to change the low-level routines accessing
> virtually all object types to also accept those "annotated objects"?

Yes.

> That means all processing of all objects would be slowed down.
> I think that's a serious problem (I'd rather pay a significant slow
> down in byte-compilation than a smaller slowdown on everything else).

The slow down would not be great.  For example, XCONS first checks the
3-bit tag, and if all's OK, removes it, otherwise it handles the error.  I'm
proposing enhancing the "otherwise" to check for a tag of 1 together
with a proper cons at the far end of a pointer.  With care, there should
be no loss in the usual case, here.

I timed a bootstrap, unoptimised GCC, with an extra tag check and
storage to a global variable inserted into XFIXNUM.  (Currently there is
no such check there).  The slowdown was around 1.3%

> > But the Lisp system would continue with 8-byte objects,
> > and the higher level bits (nearly all of it) would not need changes.
> > The beauty of this scheme is that, outside of byte compilation, nothing
> > else would change.

> Also, I wonder how this (or any other of the techniques discussed) solve
> the original problem you describe:

>     The forms created by the reader go through several (?many)
>     transformative phases where they get replaced by successor forms.
>     This makes things more difficult.

Many of the original forms produced by the reader survive these
transformations.  For those that do not, we could bind
byte-compile-containing-position (or whatever) to a sensible position
each time the compiler enters a "major" form (whatever that might mean).

> E.g. we could implement big-object as

> 1.  (defun big-object (object location)
>       (cons object location))
> or
> 2.  (defun big-object (object location)
>       (puthash object location location-hash-table)
>       object)
> or
> 3.  (defun big-object (object location)
>       (make-new-special-object object location))

1. wouldn't work, as such.  E.g. evaluating `car' must get the car of
the original OBJECT, not the car of (cons OBJECT LOCATION).

I've tried 2., and given up on it: everywhere in the compiler where FORM
is transformed to NEWFORM, a copy of a hash has to be created for
NEWFORM.  Also, there's no convenient key for recording the hash of an
occurence of a symbol (such as `if').

3. is what I'm proposing, I think.  The motivating thing here is that
the rest of the system can handle NEW-SPECIAL-OBJECT and get the same
result it would have from OBJECT.  Hence the use of Lisp_Type 1, or
possibly a new pseudovector type.

> but the problem remains of how to put it at all the places where we
> need it.

Every object produced by the reader during byte compilation would have
its source position attached to the object, in essence.  Objects
produced by macro expansion would not have this, but we could arrange to
copy the info much of the time.  (E.g. the result of a `mapcar'
operating on a list of FORMs would be given the position information of
the list.)  Other non-reader forms would have to depend on the variable
byte-compile-containing-position mentioned above.

Incidentally, I'm coming round the the idea of calling the new object an
_extended_ object.  In place of the fixnum source position proposed, we
could use, for example, a property list.  There are surely many
applications for having a property list on a cons form.  :-)

> > The extra indirection involved in these "big objects" would naturally
> > slow down byte compilation somewhat.  I've no idea how much, but it
> > might not be much at all.

> Indeed, I don't think that's a significant issue.

>         Stefan

-- 
Alan Mackenzie (Nuremberg, Germany).

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-06 15:11       ` Alan Mackenzie
@ 2018-11-06 16:29         ` Stefan Monnier
  2018-11-06 19:15           ` Alan Mackenzie
  2018-11-07 17:00           ` Alan Mackenzie
  0 siblings, 2 replies; 44+ messages in thread
From: Stefan Monnier @ 2018-11-06 16:29 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: emacs-devel

> I timed a bootstrap, unoptimised GCC, with an extra tag check and
> storage to a global variable inserted into XFIXNUM.  (Currently there is
> no such check there).  The slowdown was around 1.3%

That accumulates for every data type, and it increases code size,
reduces cache hit rate...

You may find it acceptable, but I don't, mostly because I know
fundamentally it's not needed: it's only introduced for short/medium
term convenience (to avoid having to rewrite a lot of code).
And I can't see how we'll be able to get rid of it in the long run
(gradually or not).

So in the long run it's a bad option.

> Many of the original forms produced by the reader survive these
> transformations.

Yeah, that's why I thought of using a hash-table.

> I've tried 2., and given up on it: everywhere in the compiler where FORM
> is transformed to NEWFORM, a copy of a hash has to be created for
> NEWFORM.

Same with your new scheme: everywhere where a "big cons-cell" is
transformed, by a macro you'll get a "small cons-cell".
That's a constant of all options, AFAICT.

> Also, there's no convenient key for recording the hash of an
> occurence of a symbol (such as `if').

Ah, right, I keep forgetting this detail.  Yes, that's a major downer.

> 3. is what I'm proposing, I think.

Yes [ sorry, you had to guess; I thought it was clear enough].

> The motivating thing here is that the rest of the system can handle
> NEW-SPECIAL-OBJECT and get the same result it would have from OBJECT.
> Hence the use of Lisp_Type 1, or possibly a new pseudovector type.

How 'bout we don't try to add location to all objects, but only to some
specific objects?  E.g. only cons-cells?

We could add a new "big cons-cell" type which shares the same tag, and
just adds additional info after the end of the normal cons-cell
(cons-cell would either be allocated from small_cons_blocks or
big_cons_blocks, so you'd have to look at the enclosing cons_block to
determine which kind of cons-cell you have).

So normal code is not slowed down at all (except I guess for the GC
which will be marginally slower).

        Stefan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-06 16:29         ` Stefan Monnier
@ 2018-11-06 19:15           ` Alan Mackenzie
  2018-11-06 20:04             ` Stefan Monnier
  2018-11-07 17:00           ` Alan Mackenzie
  1 sibling, 1 reply; 44+ messages in thread
From: Alan Mackenzie @ 2018-11-06 19:15 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

Hello again, Stefan.

Now for something completely different.

On Tue, Nov 06, 2018 at 11:29:41 -0500, Stefan Monnier wrote:

[ .... ]

> So in the long run it [Alan's idea for extended Lisp Objects] is a bad
> option.

I feel that intuitively, hence agree with you.  It would be nice to have
robust warning line numbers, though.

In the rest of this post, I will no longer be discussing this scheme.

> > Many of the original forms produced by the reader survive these
> > transformations.

> Yeah, that's why I thought of using a hash-table.

What I tried before (about two years ago) was having each
reader-produced form as a key, and the source position as a value.  Each
time the source was transformed, the new form became a new key, and the
value stayed the same.

I vaguely remember this being slow.

Maybe it would be better the other way around.  The source position
would be the key, and the value would be a list of (equivalent) forms.
Building this table would be faster.  Finding a form in that table for a
warning message would be much slower, but that shouldn't matter.

[ .... ]

> > Also, there's no convenient key for recording the hash of an
> > occurence of a symbol (such as `if').

> Ah, right, I keep forgetting this detail.  Yes, that's a major downer.

Here's my latest idea: we maintain byte-compile-containing-forms as a
stack of containing forms.  Each time we're manipulating a list of
forms, we increment a counter N with each form.  That form is often a
symbol.

In byte-compile-warn, if we can't find the current form in the above
table, we search for the containing form, get its source offset, put
point there and read the next N forms, moving forward in the source text
to the position we need.  That this might be slow (I don't really think
it would be) is again unimportant.

[ .... ]

> How 'bout we don't try to add location to all objects, but only to some
> specific objects?  E.g. only cons-cells?

Yes, and vectors too.  Integers, symbols, strings, and floats, no.

[ .... ]

>         Stefan

-- 
Alan Mackenzie (Nuremberg, Germany).

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-06 19:15           ` Alan Mackenzie
@ 2018-11-06 20:04             ` Stefan Monnier
  2018-11-07 12:35               ` Alan Mackenzie
  0 siblings, 1 reply; 44+ messages in thread
From: Stefan Monnier @ 2018-11-06 20:04 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: emacs-devel

>> > Many of the original forms produced by the reader survive these
>> > transformations.
>> Yeah, that's why I thought of using a hash-table.
> What I tried before (about two years ago) was having each
> reader-produced form as a key, and the source position as a value.  Each
> time the source was transformed, the new form became a new key, and the
> value stayed the same.
>
> I vaguely remember this being slow.

Which part do you remember being slow (e.g. just performing a `read`
that returns a sexp and fills that table along the way)?

> Maybe it would be better the other way around.  The source position
> would be the key, and the value would be a list of (equivalent) forms.
> Building this table would be faster.

I don't follow you: why would this be faster?

> Finding a form in that table for a warning message would be much
> slower, but that shouldn't matter.

It could matter, but yeah, let's not worry about that for now.

> In byte-compile-warn, if we can't find the current form in the above
> table, we search for the containing form, get its source offset, put
> point there and read the next N forms, moving forward in the source text
> to the position we need.  That this might be slow (I don't really think
> it would be) is again unimportant.

I lost you here as well: how is the location data propagated from the
reader to the byte-compiler's phase that ends up running
byte-compile-warn?  I mean, how is the location info
preserved while going through macro-expansion, closure-conversion,
and byte-optimize-form?  Or are most objects left untouched in practice?

I guess we could limit the info (e.g. stored in a hash-table) to map
"first cons-cell in a list" to its location info, and then change
macroexp.el, cconv.el, and friends to preserve this info as much as
possible (we may even come up with a `with-location-data` macro that
encapsulates most of the work so the changes are easy to apply).

Is that what you're thinking of?

        Stefan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-06 20:04             ` Stefan Monnier
@ 2018-11-07 12:35               ` Alan Mackenzie
  2018-11-07 17:11                 ` Stefan Monnier
  0 siblings, 1 reply; 44+ messages in thread
From: Alan Mackenzie @ 2018-11-07 12:35 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

Hello, Stefan.

On Tue, Nov 06, 2018 at 15:04:51 -0500, Stefan Monnier wrote:
> >> > Many of the original forms produced by the reader survive these
> >> > transformations.
> >> Yeah, that's why I thought of using a hash-table.
> > What I tried before (about two years ago) was having each
> > reader-produced form as a key, and the source position as a value.  Each
> > time the source was transformed, the new form became a new key, and the
> > value stayed the same.
> >
> > I vaguely remember this being slow.

> Which part do you remember being slow (e.g. just performing a `read`
> that returns a sexp and fills that table along the way)?

Looking at notes I made at the time, I amended a small portion of e.g.
byte-optimize-body to make a new hash entry with the same value when a
form was transformed.  The slowdown on just the byte optimiser was
around a factor of three.  I think the comparison was with the
byte-optimiser in the released version (without any hash tables).

> > Maybe it would be better the other way around.  The source position
> > would be the key, and the value would be a list of (equivalent) forms.
> > Building this table would be faster.

> I don't follow you: why would this be faster?

I don't think I follow myself here.  I was thinking that accessing a
hash table element was slow, therefore keeping a table value current and
pushing transformed forms onto it would be faster than creating a new
hash table entry for these new forms.  Looking at the code for hash
tables, the access time can not be all that long.

> > Finding a form in that table for a warning message would be much
> > slower, but that shouldn't matter.

> It could matter, but yeah, let's not worry about that for now.

> > In byte-compile-warn, if we can't find the current form in the above
> > table, we search for the containing form, get its source offset, put
> > point there and read the next N forms, moving forward in the source text
> > to the position we need.  That this might be slow (I don't really think
> > it would be) is again unimportant.

> I lost you here as well: how is the location data propagated from the
> reader to the byte-compiler's phase that ends up running
> byte-compile-warn?

For objects created by the reader, they can be looked up in the hash
table.  But your real question ....

> I mean, how is the location info preserved while going through
> macro-expansion, closure-conversion, and byte-optimize-form?  Or are
> most objects left untouched in practice?

Either by making new entries in the table for transformed forms, or by
noting byte-compile-containing-form and "sub-form number 2" and using
read (or forward-sexp, even) on the source text to move forward to
sub-form 2.

> I guess we could limit the info (e.g. stored in a hash-table) to map
> "first cons-cell in a list" to its location info, and then change
> macroexp.el, cconv.el, and friends to preserve this info as much as
> possible (we may even come up with a `with-location-data` macro that
> encapsulates most of the work so the changes are easy to apply).

> Is that what you're thinking of?

That's the sort of thing, yes.
>         Stefan

-- 
Alan Mackenzie (Nuremberg, Germany).



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-06 16:29         ` Stefan Monnier
  2018-11-06 19:15           ` Alan Mackenzie
@ 2018-11-07 17:00           ` Alan Mackenzie
  2018-11-07 17:25             ` Stefan Monnier
  1 sibling, 1 reply; 44+ messages in thread
From: Alan Mackenzie @ 2018-11-07 17:00 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

Hello, Stefan.

On Tue, Nov 06, 2018 at 11:29:41 -0500, Stefan Monnier wrote:
> > I timed a bootstrap, unoptimised GCC, with an extra tag check and
> > storage to a global variable inserted into XFIXNUM.  (Currently there is
> > no such check there).  The slowdown was around 1.3%

> That accumulates for every data type, and it increases code size,
> reduces cache hit rate...

No, it applies mainly to FIXNUM, because XFIXNUM doesn't already check
the Lisp_Type.  Other object types already perform this check, so while
it would increase the code size (by how much?) it would have a lesser
run time penalty.  There would be a slow down in predicates like
symbolp, when the result is false.  This probably wouldn't amount to
much in practice.

Part of that 1.3% (I don't know how big a part) was GCC outputting
warning messages.

Anyhow, do we really need to worry about code size anymore?  temacs is
only 7.3 Mb, and the machines people will be running it on will have
several, or more usually many, Gb of RAM.  So what if it became 7.5 Mb,
or even 8.0 Mb?

> You may find it acceptable, but I don't, mostly because I know
> fundamentally it's not needed: it's only introduced for short/medium
> term convenience (to avoid having to rewrite a lot of code).
> And I can't see how we'll be able to get rid of it in the long run
> (gradually or not).

> So in the long run it's a bad option.

Yes, it may be a bad option, but possibly less bad than the other bad
options we have.

> > Many of the original forms produced by the reader survive these
> > transformations.

This, as it happens, is not true.  Many of the symbols produced by the
reader survive, none of the cons forms do.  cconv, we love you. ;-(

> Yeah, that's why I thought of using a hash-table.

> > I've tried 2., and given up on it: everywhere in the compiler where FORM
> > is transformed to NEWFORM, a copy of a hash has to be created for
> > NEWFORM.

I've rediscovered why I gave up on the hash table approach 2.  That's
because cconv-convert chews up EVERY list it is presented with and
spits out one which is not EQ to the original, though it is usually
EQUAL.  I'm not saying it was written with the object of frustrating
the current exercise (I'm sure it wasn't), but I will say that if that
had been the objective, the end result wouldn't be different from what
we now have.

cconve.el would need to be entirely rewritten if we stick to the hash
table approach.  It wouldn't survive anything like unscathed even in an
"extended Lisp Object" solution.

Maybe it would be possible to defer cconv.el processing till after macro
expansion and byte-opt.el stuff.  Would this do any good?

The only vague idea I have for saving this, and I don't like it one bit,
is somehow to redefine \` (and possibly \,) in such a way that it would
somehow copy the source position from the original list to the result.

> Same with your new scheme: everywhere where a "big cons-cell" is
> transformed, by a macro you'll get a "small cons-cell".
> That's a constant of all options, AFAICT.

The "extended" symbols would survive.  That is a big plus.

> > Also, there's no convenient key for recording the hash of an
> > occurence of a symbol (such as `if').

> Ah, right, I keep forgetting this detail.  Yes, that's a major downer.

> > 3. is what I'm proposing, I think.

> Yes [ sorry, you had to guess; I thought it was clear enough].

> > The motivating thing here is that the rest of the system can handle
> > NEW-SPECIAL-OBJECT and get the same result it would have from OBJECT.
> > Hence the use of Lisp_Type 1, or possibly a new pseudovector type.

> How 'bout we don't try to add location to all objects, but only to some
> specific objects?  E.g. only cons-cells?

This could work, together with byte-compile-enclosing-form and a subform
number N to get at the non-cons objects (symbols, strings, ..) in a cons
or vector form.

> We could add a new "big cons-cell" type which shares the same tag, and
> just adds additional info after the end of the normal cons-cell
> (cons-cell would either be allocated from small_cons_blocks or
> big_cons_blocks, so you'd have to look at the enclosing cons_block to
> determine which kind of cons-cell you have).

I've been through these sort of thoughts.  That idea would be less
effective than the "extended object", since it would only work with
conses, but might be less disruptive.  But why should it only work with
conses?  Why not with symbols, too?

> So normal code is not slowed down at all (except I guess for the GC
> which will be marginally slower).

Hmmm.  Maybe there's something in this idea.  :-)  Somehow we'd need to
determine the enclosing cons block, given the address of a cons, and
that could be slow.

>         Stefan

-- 
Alan Mackenzie (Nuremberg, Germany).

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-07 12:35               ` Alan Mackenzie
@ 2018-11-07 17:11                 ` Stefan Monnier
  0 siblings, 0 replies; 44+ messages in thread
From: Stefan Monnier @ 2018-11-07 17:11 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: emacs-devel

> Looking at notes I made at the time, I amended a small portion of e.g.
> byte-optimize-body to make a new hash entry with the same value when a
> form was transformed.  The slowdown on just the byte optimiser was
> around a factor of three.

Ouch!

> I don't think I follow myself here.  I was thinking that accessing a
> hash table element was slow, therefore keeping a table value current and
> pushing transformed forms onto it would be faster than creating a new
> hash table entry for these new forms.

Ah, so you'd keep a pointer to the list somehow and add to it by
side-effects.  Yes, I guess it would indeed be noticeably faster for the
case of copying the location info from the source code to the
transformed code.

> Looking at the code for hash tables, the access time can not be all
> that long.

Hash-table accesses are pretty costly, in my experience.


        Stefan



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-07 17:00           ` Alan Mackenzie
@ 2018-11-07 17:25             ` Stefan Monnier
  2018-11-07 18:47               ` Alan Mackenzie
  0 siblings, 1 reply; 44+ messages in thread
From: Stefan Monnier @ 2018-11-07 17:25 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: emacs-devel

>> > I timed a bootstrap, unoptimised GCC, with an extra tag check and
>> > storage to a global variable inserted into XFIXNUM.  (Currently there is
>> > no such check there).  The slowdown was around 1.3%
>
>> That accumulates for every data type, and it increases code size,
>> reduces cache hit rate...
>
> No, it applies mainly to FIXNUM, because XFIXNUM doesn't already check
> the Lisp_Type.  Other object types already perform this check, so while

I'm not sure why you say that.  XCONS/XSYMBOL don't perform the check
either (unless you compile with debug-checks, of course, but that's not
the important case).

> Yes, it may be a bad option, but possibly less bad than the other bad
> options we have.

There's indeed a pretty good set of bad options at hand.  Not sure which
one will suck less.

> cconv.el would need to be entirely rewritten if we stick to the hash
> table approach.  It wouldn't survive anything like unscathed even in an
> "extended Lisp Object" solution.

It's "only" the cconv-convert part of cconv.el that will need changes,
but yes, one way or another it will need to be changed to preserve the
location info.

> Maybe it would be possible to defer cconv.el processing till after macro
> expansion and byte-opt.el stuff.  Would this do any good?

It's already done after macro expansion (but before byte-opt).
I don't think it moving it would help.

> The only vague idea I have for saving this, and I don't like it one bit,
> is somehow to redefine \` (and possibly \,) in such a way that it would
> somehow copy the source position from the original list to the result.

Define "original list" ;-)

>> Same with your new scheme: everywhere where a "big cons-cell" is
>> transformed, by a macro you'll get a "small cons-cell".
>> That's a constant of all options, AFAICT.
> The "extended" symbols would survive.  That is a big plus.

Indeed symbols are usually preserved un-touched.

> I've been through these sort of thoughts.  That idea would be less
> effective than the "extended object", since it would only work with
> conses, but might be less disruptive.  But why should it only work with
> conses?

No particular reason at first.

> Why not with symbols, too?

Reproducing this idea for other types is not always that easy or useful:
- for pseudo-vectors the variable size aspect makes it harder to handle
  (tho not impossible).  OTOH we could probably use a bit in the header
  and thus avoid the need to place those extended objects in their
  own blocks.
- for symbols the extra info is "per symbol occurrence" rather than "per
  symbol", so we can't add this info directly to the symbol (i.e. the
  same reason the hash-table approach doesn't work for symbols).
  So we'd really want a completely separate object which then points to
  the underlying symbol object.  But yes, we could introduce a new
  symbol-occurrence object, along the lines you originally suggested but
  only for symbols (thus reducing the performance cost).


-- Stefan



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-07 17:25             ` Stefan Monnier
@ 2018-11-07 18:47               ` Alan Mackenzie
  2018-11-07 19:12                 ` Stefan Monnier
  0 siblings, 1 reply; 44+ messages in thread
From: Alan Mackenzie @ 2018-11-07 18:47 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

Hello again, Stefan.

On Wed, Nov 07, 2018 at 12:25:15 -0500, Stefan Monnier wrote:

> >> That accumulates for every data type, and it increases code size,
> >> reduces cache hit rate...

> > No, it applies mainly to FIXNUM, because XFIXNUM doesn't already check
> > the Lisp_Type.  Other object types already perform this check, so while

> I'm not sure why you say that.  XCONS/XSYMBOL don't perform the check
> either (unless you compile with debug-checks, of course, but that's not
> the important case).

Ah, really?  OK, I'd need to repeat the exercise with the checks in
XCONS and XSYMBOL, too.  I suspect the slowdown would be significant,
though perhaps not critical (say, around 5%).  For these #defines, there
must be a check on Lisp_Type somewhere, so we should be able to
incorporate that "somewhere" into the check for Lisp_Type 1.  Maybe.

[ .... ]

> There's indeed a pretty good set of bad options at hand.  Not sure which
> one will suck less.

Yes.  Things aren't looking good.

[ .... ]

> It's "only" the cconv-convert part of cconv.el that will need changes,
> but yes, one way or another it will need to be changed to preserve the
> location info.

OK.  But it's still a challenging job.

> > Maybe it would be possible to defer cconv.el processing till after macro
> > expansion and byte-opt.el stuff.  Would this do any good?

> It's already done after macro expansion (but before byte-opt).
> I don't think it moving it would help.

Maybe not.  I was thinking that if it was deferred until after byte-opt,
"all" the warning messages would have the right position info.  But
cconv.el calls byte-compile-warn, too.

> > The only vague idea I have for saving this, and I don't like it one bit,
> > is somehow to redefine \` (and possibly \,) in such a way that it would
> > somehow copy the source position from the original list to the result.

> Define "original list" ;-)

The one that has been transformed into the result.  For example, in this
fragment from the end of cconv-convert:

    (`(,func . ,forms)
     ;; First element is function or whatever function-like forms are: or, and,
     ;; if, catch, progn, prog1, prog2, while, until
     `(,func . ,(mapcar (lambda (form)
                          (cconv-convert form env extend))
                        forms)))

, the original list would be the whole FORM.  My idea would be to
rewrite the resulting form as something like:

    `(form ,func . ,(bc-mapcar (lambda (form)
                                 (cconv-convert form env extend))
                               forms))

, where the first argument in the modified \` supplies the position
information for the result list, but isn't included in the list itself.
bc-mapcar would be a version of mapcar which preserves the internal
position info in the resulting form, copying it from the original list
parameter.

As I say, I don't like the idea, but it might be the best we can come up
with, and still have a readable and maintainable cconv.el.

[ .... ]

> > I've been through these sort of thoughts.  That idea would be less
> > effective than the "extended object", since it would only work with
> > conses, but might be less disruptive.  But why should it only work
> > with conses?

> No particular reason at first.

> > Why not with symbols, too?

> Reproducing this idea for other types is not always that easy or useful:
> - for pseudo-vectors the variable size aspect makes it harder to handle
>   (tho not impossible).  OTOH we could probably use a bit in the header
>   and thus avoid the need to place those extended objects in their
>   own blocks.

Yes.

> - for symbols the extra info is "per symbol occurrence" rather than "per
>   symbol", so we can't add this info directly to the symbol (i.e. the
>   same reason the hash-table approach doesn't work for symbols).

D'oh!  Of course!

>   So we'd really want a completely separate object which then points to
>   the underlying symbol object.  But yes, we could introduce a new
>   symbol-occurrence object, along the lines you originally suggested but
>   only for symbols (thus reducing the performance cost).

:-)  This could be a pseudovector, leaving Lisp_Type 1 free for more
worthy uses.  You're suggesting a mix of approaches.  This might be more
complicated, but possibly the least pessimal.

> -- Stefan

-- 
Alan Mackenzie (Nuremberg, Germany).

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-07 18:47               ` Alan Mackenzie
@ 2018-11-07 19:12                 ` Stefan Monnier
  2018-11-08 14:08                   ` Alan Mackenzie
  0 siblings, 1 reply; 44+ messages in thread
From: Stefan Monnier @ 2018-11-07 19:12 UTC (permalink / raw)
  To: emacs-devel

>> It's "only" the cconv-convert part of cconv.el that will need changes,
>> but yes, one way or another it will need to be changed to preserve the
>> location info.
> OK.  But it's still a challenging job.

I wouldn't call it challenging: the changes are orthogonal to the actual
working of cconv, so it will likely make the code messier but
conceptually there's no significant difficulty.  I'm familiar with the
code and will be happy to help.

> Maybe not.  I was thinking that if it was deferred until after byte-opt,
> "all" the warning messages would have the right position info.  But
> cconv.el calls byte-compile-warn, too.

Some/many(most?) of the warnings come from bytecomp itself which
inevitably happens after all of the above anyway.

> As I say, I don't like the idea, but it might be the best we can come up
> with, and still have a readable and maintainable cconv.el.

Yes, we'd probably use a hack along these lines to try and limit the
impact of the change.

>>   So we'd really want a completely separate object which then points to
>>   the underlying symbol object.  But yes, we could introduce a new
>>   symbol-occurrence object, along the lines you originally suggested but
>>   only for symbols (thus reducing the performance cost).
> :-)  This could be a pseudovector, leaving Lisp_Type 1 free for more
> worthy uses.  You're suggesting a mix of approaches.  This might be more
> complicated, but possibly the least pessimal.

One possible approach is to introduce such a symbol-occurrence hack
[if this word sounds like a criticism, it's because it is] and nothing
else (i.e. not a "mix" of approaches).

To the extent that symbols aren't touched during the various phases, the
corresponding info should trivially be preserved.  The current hack we
use is also limited to tracking symbol locations, so it should never be
worse than what we already have.

        Stefan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-01 17:59 Thoughts on getting correct line numbers in the byte compiler's warning messages Alan Mackenzie
  2018-11-01 22:45 ` Stefan Monnier
@ 2018-11-08  4:47 ` Michael Heerdegen
  2018-11-08 11:07   ` Alan Mackenzie
  2018-11-08 13:45   ` Stefan Monnier
  1 sibling, 2 replies; 44+ messages in thread
From: Michael Heerdegen @ 2018-11-08  4:47 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: emacs-devel

Alan Mackenzie <acm@muc.de> writes:

> The third idea is to amend the reader so that whereas it now produces a
> form, in a byte compiler special mode, it would produce the cons (form .
> offset).  So, for example, the text "(not a)" currently gets read into
> the form (not . (a . nil)).  The amended reader would produce (((not . 1)
> . ((a . 5) . (nil . 6))) . 0) (where 0, 1, 5, and 6 are the textual
> offsets of the elements coded).

BTW, an amended version of `read' might be beneficial for other stuff,
too.  When I designed el-search, I wanted something like that.

I'm not sure which kind of position info data I would like to have.  I
think it would be good to have additionally starting positions of
conses, for example.

Michael.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-08  4:47 ` Michael Heerdegen
@ 2018-11-08 11:07   ` Alan Mackenzie
  2018-11-09  2:06     ` Michael Heerdegen
  2018-11-08 13:45   ` Stefan Monnier
  1 sibling, 1 reply; 44+ messages in thread
From: Alan Mackenzie @ 2018-11-08 11:07 UTC (permalink / raw)
  To: Michael Heerdegen; +Cc: emacs-devel

Hello, Michael.

On Thu, Nov 08, 2018 at 05:47:15 +0100, Michael Heerdegen wrote:
> Alan Mackenzie <acm@muc.de> writes:

> > The third idea is to amend the reader so that whereas it now produces a
> > form, in a byte compiler special mode, it would produce the cons (form .
> > offset).  So, for example, the text "(not a)" currently gets read into
> > the form (not . (a . nil)).  The amended reader would produce (((not . 1)
> > . ((a . 5) . (nil . 6))) . 0) (where 0, 1, 5, and 6 are the textual
> > offsets of the elements coded).

> BTW, an amended version of `read' might be beneficial for other stuff,
> too.  When I designed el-search, I wanted something like that.

As it turned out, the above scheme would not be useful, because a macro
could not manipulate such a form.

The ideas are currently in flux, in a discussion between Stefan and me,
and we've come up with several ideas, all bad.  ;-)  We're currently
trying to select the least bad idea.

> I'm not sure which kind of position info data I would like to have.  I
> think it would be good to have additionally starting positions of
> conses, for example.

I came up with a way of doing this, using the spare value of Lisp_Type
in a Lisp_Object to indicate an indirection to a structure of two
Lisp_Objects.  The first would be the actual object, the second would be
position information.

The trouble with this is it would slow down Emacs performance
significantly (possibly as much as ~10%).  It would also be difficult to
implement, since at each transformation of the form being compiled,
position information would need to be copied to the new version of form.

Stefan's latest suggestion is to use the above approach just on symbol
occurrences.  (Sorry!).  These are preserved through transformations
much more than cons cells are.  Also, the existing approach in the
compiler only tracks symbol occurrences, so we will not lose anything by
tracking only symbols, but more accurately.

Even so, this will be a lot of work.

If some code wants to get the starting position of a cons, the source
code will surely be in a buffer somewhere.  As long as there is a symbol
in the cons (i.e., we don't have ()), surely the cons position can be
found from the contained symbol, together with backward-up-list in the
source buffer.  Or something like that.

> Michael.

-- 
Alan Mackenzie (Nuremberg, Germany).

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-08  4:47 ` Michael Heerdegen
  2018-11-08 11:07   ` Alan Mackenzie
@ 2018-11-08 13:45   ` Stefan Monnier
  2018-11-09  3:06     ` Michael Heerdegen
  1 sibling, 1 reply; 44+ messages in thread
From: Stefan Monnier @ 2018-11-08 13:45 UTC (permalink / raw)
  To: emacs-devel

> BTW, an amended version of `read' might be beneficial for other stuff,
> too.  When I designed el-search, I wanted something like that.

Have you looked at edebug-read-*?


        Stefan




^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-07 19:12                 ` Stefan Monnier
@ 2018-11-08 14:08                   ` Alan Mackenzie
  2018-11-08 17:02                     ` Stefan Monnier
  2018-11-12 15:44                     ` Alan Mackenzie
  0 siblings, 2 replies; 44+ messages in thread
From: Alan Mackenzie @ 2018-11-08 14:08 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

Hello, Stefan.

On Wed, Nov 07, 2018 at 14:12:41 -0500, Stefan Monnier wrote:
> >> It's "only" the cconv-convert part of cconv.el that will need changes,
> >> but yes, one way or another it will need to be changed to preserve the
> >> location info.
> > OK.  But it's still a challenging job.

> I wouldn't call it challenging: the changes are orthogonal to the actual
> working of cconv, so it will likely make the code messier but
> conceptually there's no significant difficulty.  I'm familiar with the
> code and will be happy to help.

Thanks!  By the way, am I right in thinking that pcase does its
comparisons using equal?

[ .... ]

> >>   So we'd really want a completely separate object which then points to
> >>   the underlying symbol object.  But yes, we could introduce a new
> >>   symbol-occurrence object, along the lines you originally suggested but
> >>   only for symbols (thus reducing the performance cost).
> > :-)  This could be a pseudovector, leaving Lisp_Type 1 free for more
> > worthy uses.  You're suggesting a mix of approaches.  This might be more
> > complicated, but possibly the least pessimal.

> One possible approach is to introduce such a symbol-occurrence hack
> [if this word sounds like a criticism, it's because it is] and nothing
> else (i.e. not a "mix" of approaches).

This sounds like a good idea.

> To the extent that symbols aren't touched during the various phases, the
> corresponding info should trivially be preserved.  The current hack we
> use is also limited to tracking symbol locations, so it should never be
> worse than what we already have.

One thing we'd need to watch out for is using equal, not eq, when we
compare symbols.  (eq 'foo #<symbol foo with position 73>) will surely
be nil, but (equal ....) would be t.  Same with member and memq.

We'd also need to make sure that the reader's enabling flag for creating
these extended symbols is bound to nil whenever we suspend the byte
compiler to do something else (edebug, for example).

>         Stefan

-- 
Alan Mackenzie (Nuremberg, Germany).



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-08 14:08                   ` Alan Mackenzie
@ 2018-11-08 17:02                     ` Stefan Monnier
  2018-11-08 22:13                       ` Alan Mackenzie
  2018-11-12 15:44                     ` Alan Mackenzie
  1 sibling, 1 reply; 44+ messages in thread
From: Stefan Monnier @ 2018-11-08 17:02 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: emacs-devel

>> >> It's "only" the cconv-convert part of cconv.el that will need changes,
>> >> but yes, one way or another it will need to be changed to preserve the
>> >> location info.
>> > OK.  But it's still a challenging job.
>> I wouldn't call it challenging: the changes are orthogonal to the actual
>> working of cconv, so it will likely make the code messier but
>> conceptually there's no significant difficulty.  I'm familiar with the
>> code and will be happy to help.
> Thanks!  By the way, am I right in thinking that pcase does its
> comparisons using equal?

"as if by `equal`", so when comparing against symbols we actually use `eq`.

> One thing we'd need to watch out for is using equal, not eq, when we
> compare symbols.  (eq 'foo #<symbol foo with position 73>) will surely
> be nil, but (equal ....) would be t.  Same with member and memq.

Indeed.

> We'd also need to make sure that the reader's enabling flag for creating
> these extended symbols is bound to nil whenever we suspend the byte
> compiler to do something else (edebug, for example).

Rather than a dynamically-scoped var, it might be a better option to
either use a new function `read-with-positions`, or else use an
additional argument to `read`.


        Stefan



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-08 17:02                     ` Stefan Monnier
@ 2018-11-08 22:13                       ` Alan Mackenzie
  2018-11-11 12:59                         ` Alan Mackenzie
  0 siblings, 1 reply; 44+ messages in thread
From: Alan Mackenzie @ 2018-11-08 22:13 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

Hello, Stefan.

On Thu, Nov 08, 2018 at 12:02:01 -0500, Stefan Monnier wrote:
> >> >> It's "only" the cconv-convert part of cconv.el that will need changes,
> >> >> but yes, one way or another it will need to be changed to preserve the
> >> >> location info.
> >> > OK.  But it's still a challenging job.
> >> I wouldn't call it challenging: the changes are orthogonal to the actual
> >> working of cconv, so it will likely make the code messier but
> >> conceptually there's no significant difficulty.  I'm familiar with the
> >> code and will be happy to help.
> > Thanks!  By the way, am I right in thinking that pcase does its
> > comparisons using equal?

> "as if by `equal`", so when comparing against symbols we actually use `eq`.

... at the moment ... ;-)

equal actually tests EQ right near its start anyway, so it shouldn't be
a big deal for pcase actually to use equal.  Or am I missing something?

> > One thing we'd need to watch out for is using equal, not eq, when we
> > compare symbols.  (eq 'foo #<symbol foo with position 73>) will surely
> > be nil, but (equal ....) would be t.  Same with member and memq.

> Indeed.

> > We'd also need to make sure that the reader's enabling flag for creating
> > these extended symbols is bound to nil whenever we suspend the byte
> > compiler to do something else (edebug, for example).

> Rather than a dynamically-scoped var, it might be a better option to
> either use a new function `read-with-positions`, or else use an
> additional argument to `read`.

OK.  I've hacked together some basic infrastructure in alloc.c, lread.c,
print.c, and lisp.h.  I can now read a small test file and get back the
form with "located symbols".  I've called the new function which does
this read-locating-symbols, but that might want to change.

As soon as I've sorted out SYMBOLP and XSYMBOL, I'll create a new branch
under /scratch, commit what I've got, and then we can play with it.

>         Stefan

-- 
Alan Mackenzie (Nuremberg, Germany).



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-08 11:07   ` Alan Mackenzie
@ 2018-11-09  2:06     ` Michael Heerdegen
  2018-11-10 10:59       ` Alan Mackenzie
  0 siblings, 1 reply; 44+ messages in thread
From: Michael Heerdegen @ 2018-11-09  2:06 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: emacs-devel

Alan Mackenzie <acm@muc.de> writes:

> Stefan's latest suggestion is to use the above approach just on symbol
> occurrences.  (Sorry!).  These are preserved through transformations
> much more than cons cells are.

BTW, just to be sure, you know about the already existing variable
`read-with-symbol-positions', right?  Only a detail for what you need to
do, though.

> If some code wants to get the starting position of a cons, the source
> code will surely be in a buffer somewhere.  As long as there is a symbol
> in the cons (i.e., we don't have ()), surely the cons position can be
> found from the contained symbol, together with backward-up-list in the
> source buffer.  Or something like that.

Sure.  The problem is how to find the right cons when several such
places exist.  Likewise for strings etc.

My requirement is quite similar to yours, btw.  Say, in a buffer at some
position there is a list (X1 X2 X3) and you want to match that with
(i.e. el-search for) pattern `(,P1 ,P2 ,P3) with certain PATTERNS Pi.
In an ideal world, when the Pi are (tried to be) matched against the Xi,
the Pi would know the buffer location of Xi, so that Pi could e.g. use a
`guard' checking the "current" value of point.

Since patterns can do destructuring like above, similar to your case I
would want the position info to somehow survive transformations (mostly
list accessing functions in my case).

Thanks,

Michael.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-08 13:45   ` Stefan Monnier
@ 2018-11-09  3:06     ` Michael Heerdegen
  2018-11-09 16:15       ` Stefan Monnier
  0 siblings, 1 reply; 44+ messages in thread
From: Michael Heerdegen @ 2018-11-09  3:06 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> > BTW, an amended version of `read' might be beneficial for other stuff,
> > too.  When I designed el-search, I wanted something like that.
>
> Have you looked at edebug-read-*?

No, thanks for the idea.  I hope it would be fast and reliable enough (I
already have enough bugs from the standard reader...)

Michael.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-09  3:06     ` Michael Heerdegen
@ 2018-11-09 16:15       ` Stefan Monnier
  0 siblings, 0 replies; 44+ messages in thread
From: Stefan Monnier @ 2018-11-09 16:15 UTC (permalink / raw)
  To: emacs-devel

> No, thanks for the idea.  I hope it would be fast and reliable enough (I
> already have enough bugs from the standard reader...)

It's fast and reliable enough for Edebug, but being an Elisp emulation
of the C reader, it's obviously significantly slower than the C reader
and less reliable.

I think the reliability aspect should be good enough (or easy to fix)
for el-search, but w.r.t to speed that might be a problem.

        Stefan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-09  2:06     ` Michael Heerdegen
@ 2018-11-10 10:59       ` Alan Mackenzie
  2018-11-10 13:20         ` Stefan Monnier
  2018-11-11  7:56         ` Michael Heerdegen
  0 siblings, 2 replies; 44+ messages in thread
From: Alan Mackenzie @ 2018-11-10 10:59 UTC (permalink / raw)
  To: Michael Heerdegen; +Cc: emacs-devel

Hello, Michael.

On Fri, Nov 09, 2018 at 03:06:27 +0100, Michael Heerdegen wrote:
> Alan Mackenzie <acm@muc.de> writes:

> > Stefan's latest suggestion is to use the above approach just on symbol
> > occurrences.  (Sorry!).  These are preserved through transformations
> > much more than cons cells are.

> BTW, just to be sure, you know about the already existing variable
> `read-with-symbol-positions', right?  Only a detail for what you need to
> do, though.

Oh, yes, we know about this, all right!  It's because
read-with-symbol-positions doesn't work reliably (there are several bugs
open about warning messages reporting wrong positions) that we're trying
to develop something better.

> > If some code wants to get the starting position of a cons, the source
> > code will surely be in a buffer somewhere.  As long as there is a symbol
> > in the cons (i.e., we don't have ()), surely the cons position can be
> > found from the contained symbol, together with backward-up-list in the
> > source buffer.  Or something like that.

> Sure.  The problem is how to find the right cons when several such
> places exist.  Likewise for strings etc.

I'm not sure I follow you, here.  Surely the "right" cons is the one
containing the symbol occurrence whose position is known?  Or the one
containing that, and so on.

Literal strings could also be located, in much the same way as for
symbol occurrences.  Right at the moment, I don't know how much these
things will slow Emacs down by.

> My requirement is quite similar to yours, btw.  Say, in a buffer at some
> position there is a list (X1 X2 X3) and you want to match that with
> (i.e. el-search for) pattern `(,P1 ,P2 ,P3) with certain PATTERNS Pi.
> In an ideal world, when the Pi are (tried to be) matched against the Xi,
> the Pi would know the buffer location of Xi, so that Pi could e.g. use a
> `guard' checking the "current" value of point.

> Since patterns can do destructuring like above, similar to your case I
> would want the position info to somehow survive transformations (mostly
> list accessing functions in my case).

Yes, it sounds like this could use the "located symbols" feature, too.

Right at the moment I'm trying to get SYMBOLP to recognise both normal
symbols and "located symbols".  This is causing a segfault on building
such a test Emacs.  :-(

> Thanks,

> Michael.

-- 
Alan Mackenzie (Nuremberg, Germany).



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-10 10:59       ` Alan Mackenzie
@ 2018-11-10 13:20         ` Stefan Monnier
  2018-11-11  7:56         ` Michael Heerdegen
  1 sibling, 0 replies; 44+ messages in thread
From: Stefan Monnier @ 2018-11-10 13:20 UTC (permalink / raw)
  To: emacs-devel

> Literal strings could also be located, in much the same way as for
> symbol occurrences.  Right at the moment, I don't know how much these
> things will slow Emacs down by.

For literal strings, since they aren't "uniquified" like symbols, we can
simply put the location on the object, e.g. as a text-property.


        Stefan




^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-10 10:59       ` Alan Mackenzie
  2018-11-10 13:20         ` Stefan Monnier
@ 2018-11-11  7:56         ` Michael Heerdegen
  1 sibling, 0 replies; 44+ messages in thread
From: Michael Heerdegen @ 2018-11-11  7:56 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: emacs-devel

Alan Mackenzie <acm@muc.de> writes:

> > Sure.  The problem is how to find the right cons when several such
> > places exist.  Likewise for strings etc.
>
> I'm not sure I follow you, here.  Surely the "right" cons is the one
> containing the symbol occurrence whose position is known?  Or the one
> containing that, and so on.

Yes, if destructuring patterns also were able to desctructure the
position info structure.  Otherwise, matching `(,P1 ,P2 ,P3) against
(a a a) has the problem that the Pi don't know which of the a's they are
matched against.

> Right at the moment I'm trying to get SYMBOLP to recognise both normal
> symbols and "located symbols".  This is causing a segfault on building
> such a test Emacs.  :-(

Then good luck!


Michael.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-08 22:13                       ` Alan Mackenzie
@ 2018-11-11 12:59                         ` Alan Mackenzie
  2018-11-11 15:53                           ` Eli Zaretskii
  0 siblings, 1 reply; 44+ messages in thread
From: Alan Mackenzie @ 2018-11-11 12:59 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Michael Heerdegen, emacs-devel

Hello, Stefan.

On Thu, Nov 08, 2018 at 22:13:11 +0000, Alan Mackenzie wrote:
> On Thu, Nov 08, 2018 at 12:02:01 -0500, Stefan Monnier wrote:

[ .... ]

> OK.  I've hacked together some basic infrastructure in alloc.c, lread.c,
> print.c, and lisp.h.  I can now read a small test file and get back the
> form with "located symbols".  I've called the new function which does
> this read-locating-symbols, but that might want to change.

> As soon as I've sorted out SYMBOLP and XSYMBOL, I'll create a new branch
> under /scratch, commit what I've got, and then we can play with it.

I've now got this working, and created the new, optimistically named,
branch /scratch/accurate-warning-pos.

To use this, do something like:

    M-: (setq bar (read-locating-symbols (current-buffer)))

with point at the beginning of a (smallish) buffer.  The following form,
from Roland Winkler's bug #9109, works well:

    (unwind-protect
        (let ((foo "foo"))
          (insert foo))
      (setq foo "bar"))

.  (car bar), for example, is now a "located symbol".

Direct symbol functions are "protected" by an enabling flag
located-symbols-enabled.  This is needed, partly to minimise the run time
taken when the facility is not being used, but more pertinently to enable
Emacs to build without a segfault.  Currently this flag guards only
SYMBOLP and XSYMBOL.

So, try M-: (symbolp (car bar)).  This is nil.  But

    M-: (let ((located-symbols-enabled t)) (symbolp (car bar)))

is t.  Similarly, set, symbol-value, symbol-function, symbol-plist need
that flag to be non-nil.

> > > One thing we'd need to watch out for is using equal, not eq, when we
> > > compare symbols.  (eq 'foo #<symbol foo with position 73>) will surely
> > > be nil, but (equal ....) would be t.  Same with member and memq.

> > Indeed.

`equal' has been enhanced so that M-: (equal (car bar) 'unwind-protect)
is t.

Additionally, there are defuns only-symbol-p, located-symbol-p,
located-symbol-sym, located-symbol-loc, which do the obvious.

> > > We'd also need to make sure that the reader's enabling flag for creating
> > > these extended symbols is bound to nil whenever we suspend the byte
> > > compiler to do something else (edebug, for example).

> > Rather than a dynamically-scoped var, it might be a better option to
> > either use a new function `read-with-positions`, or else use an
> > additional argument to `read`.

As noted above I've currently got a rather untidy mixture of these two
approaches.

There's a lot left to do, but this is a start.

Incidentally, I timed a make bootstrap in this branch, comparing it with
master.  The branch was ~0.5% slower.  This might be real, it might just
be random noise.

Comments and criticism welcome!

> >         Stefan

-- 
Alan Mackenzie (Nuremberg, Germany).

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-11 12:59                         ` Alan Mackenzie
@ 2018-11-11 15:53                           ` Eli Zaretskii
  2018-11-11 20:12                             ` Alan Mackenzie
  2018-11-12 14:16                             ` Alan Mackenzie
  0 siblings, 2 replies; 44+ messages in thread
From: Eli Zaretskii @ 2018-11-11 15:53 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: michael_heerdegen, monnier, emacs-devel

> Date: Sun, 11 Nov 2018 12:59:45 +0000
> From: Alan Mackenzie <acm@muc.de>
> Cc: Michael Heerdegen <michael_heerdegen@web.de>, emacs-devel@gnu.org
> 
> I've now got this working, and created the new, optimistically named,
> branch /scratch/accurate-warning-pos.

Thanks.

  +/* Return a new located symbol with the specified SYMBOL and LOCATION. */
  +Lisp_Object
  +build_located_symbol (Lisp_Object symbol, Lisp_Object location)
  +{

I'd prefer something like symbol_with_pos instead, and accordingly in
other related symbol names.

  +DEFUN ("only-symbol-p", Fonly_symbol_p, Sonly_symbol_p, 1, 1, 0,
  +       doc: /* Return t if OBJECT is a symbol, but not a located symbol.  */
  +       attributes: const)
  +  (Lisp_Object object)

symbol-bare-p?

  +  DEFVAR_LISP ("located-symbols-enabled", Vlocated_symbols_enabled,
  +               doc: /* Non-nil when "located symbols" can be used in place of symbols.

What is the rationale for this variable?

  diff --git a/src/lisp.h b/src/lisp.h
  index eb67626..b4fc6f2 100644
  --- a/src/lisp.h
  +++ b/src/lisp.h
  @@ -323,6 +323,64 @@ typedef union Lisp_X *Lisp_Word;
   typedef EMACS_INT Lisp_Word;
   #endif

  +/* A Lisp_Object is a tagged pointer or integer.  Ordinarily it is a
  +   Lisp_Word.  However, if CHECK_LISP_OBJECT_TYPE, it is a wrapper
  +   around Lisp_Word, to help catch thinkos like 'Lisp_Object x = 0;'.
  +
  +   LISP_INITIALLY (W) initializes a Lisp object with a tagged value
  +   that is a Lisp_Word W.  It can be used in a static initializer.  */

Looks like you moved a large chunk of lisp.h to a different place in
the file.  Any reasons for that?

  +/* FIXME!!! 2018-11-09.  Consider using lisp_h_PSEUDOVECTOR here. */

What is this FIXME about?

This needs support in src/.gdbinit and documentation.

Thanks again for working in this.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-11 15:53                           ` Eli Zaretskii
@ 2018-11-11 20:12                             ` Alan Mackenzie
  2018-11-11 20:47                               ` Stefan Monnier
  2018-11-12 16:19                               ` Eli Zaretskii
  2018-11-12 14:16                             ` Alan Mackenzie
  1 sibling, 2 replies; 44+ messages in thread
From: Alan Mackenzie @ 2018-11-11 20:12 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: michael_heerdegen, monnier, emacs-devel

Hello, Eli.

Thanks for the reply and comments.

On Sun, Nov 11, 2018 at 17:53:13 +0200, Eli Zaretskii wrote:
> > Date: Sun, 11 Nov 2018 12:59:45 +0000
> > From: Alan Mackenzie <acm@muc.de>
> > Cc: Michael Heerdegen <michael_heerdegen@web.de>, emacs-devel@gnu.org
> > 
> > I've now got this working, and created the new, optimistically named,
> > branch /scratch/accurate-warning-pos.

> Thanks.

>   +/* Return a new located symbol with the specified SYMBOL and LOCATION. */
>   +Lisp_Object
>   +build_located_symbol (Lisp_Object symbol, Lisp_Object location)
>   +{

> I'd prefer something like symbol_with_pos instead, and accordingly in
> other related symbol names.

Yes, I'll do that.  "Located Symbol" is too much of a mouthfull.
Thinking up names for new things isn't my strong point.

>   +DEFUN ("only-symbol-p", Fonly_symbol_p, Sonly_symbol_p, 1, 1, 0,
>   +       doc: /* Return t if OBJECT is a symbol, but not a located symbol.  */
>   +       attributes: const)
>   +  (Lisp_Object object)

> symbol-bare-p?

How about bare-symbol-p?  symbol-bare-p has the connotations "we have a
symbol; is it bare?" rather than "have we a bare symbol?".

>   +  DEFVAR_LISP ("located-symbols-enabled", Vlocated_symbols_enabled,
>   +               doc: /* Non-nil when "located symbols" can be used in place of symbols.

> What is the rationale for this variable?

In the new lisp_h_SYMBOLP, we have

#define lisp_h_SYMBOLP(x) ((lisp_h_ONLY_SYMBOL_P (x) || \
                            (Vlocated_symbols_enabled && (lisp_h_LOCATED_SYMBOL_P (x)))))

The Vlocated_symbols_enabled should efficiently prevent a potentially
slow lisp_h_LOCATED_SYMBOL_P from being executed in the overwhelmingly
normal case that we don't have "symbols with pos".  It is a simple test
against binary zero, and the word should be permanently in cache.

Another, slightly more honest, answer is that when it wasn't there, my
Emacs build crashed with a segfault whilst loading .el files.  I didn't
get a core dump for this segfault.  Could you please tell me (or point
me in the right direction of documentation) how I configure my GNU/Linux
to generate core dumps.  I think my kernel's set up correctly, but I
don't see the dumps.

>   diff --git a/src/lisp.h b/src/lisp.h
>   index eb67626..b4fc6f2 100644
>   --- a/src/lisp.h
>   +++ b/src/lisp.h
>   @@ -323,6 +323,64 @@ typedef union Lisp_X *Lisp_Word;
>    typedef EMACS_INT Lisp_Word;
>    #endif

>   +/* A Lisp_Object is a tagged pointer or integer.  Ordinarily it is a
>   +   Lisp_Word.  However, if CHECK_LISP_OBJECT_TYPE, it is a wrapper
>   +   around Lisp_Word, to help catch thinkos like 'Lisp_Object x = 0;'.
>   +
>   +   LISP_INITIALLY (W) initializes a Lisp object with a tagged value
>   +   that is a Lisp_Word W.  It can be used in a static initializer.  */

> Looks like you moved a large chunk of lisp.h to a different place in
> the file.  Any reasons for that?

I did this to get things to compile.  lisp.h is intricate and
complicated.  But it turned out I'd moved far more than I needed.  With
the benefit of a night's sleep, I've restored most of the damage.  All
that's been moved now is some inline functions (SYMBOLP, XSYMBOL, ....,
CHECK_SYMBOL) from before More_Lisp_Bits to after it, since they now
depend on More_Lisp_Bits.

>   +/* FIXME!!! 2018-11-09.  Consider using lisp_h_PSEUDOVECTOR here. */

> What is this FIXME about?

It was a note to self about whether just to invoke the (new) macro
lisp_h_PSEUDOVECTOR, rather than repeating the logic in the inline
function.  Sorry it escaped into the wild.  The answer is, I MUST invoke
the macro, to avoid duplication of functionality.

> This needs support in src/.gdbinit and documentation.

Yes!  I think .gdbinit will be relatively straightforward.  How much to
put into the docs (the elisp manual?) is more difficult to decide.
Although primariliy for the byte compiler, Michael Heerdegen has already
said he's got other uses for it.

> Thanks again for working in this.

-- 
Alan Mackenzie (Nuremberg, Germany).

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-11 20:12                             ` Alan Mackenzie
@ 2018-11-11 20:47                               ` Stefan Monnier
  2018-11-12  3:30                                 ` Eli Zaretskii
  2018-11-12 16:19                               ` Eli Zaretskii
  1 sibling, 1 reply; 44+ messages in thread
From: Stefan Monnier @ 2018-11-11 20:47 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: michael_heerdegen, Eli Zaretskii, emacs-devel

> Another, slightly more honest, answer is that when it wasn't there, my
> Emacs build crashed with a segfault whilst loading .el files.  I didn't
> get a core dump for this segfault.  Could you please tell me (or point
> me in the right direction of documentation) how I configure my GNU/Linux
> to generate core dumps.  I think my kernel's set up correctly, but I
> don't see the dumps.

I can't rmember how to do that, but I recommend you just run emacs (or
temacs as the case may be) within GDB directly rather than go through
a core dump.


        Stefan



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-11 20:47                               ` Stefan Monnier
@ 2018-11-12  3:30                                 ` Eli Zaretskii
  0 siblings, 0 replies; 44+ messages in thread
From: Eli Zaretskii @ 2018-11-12  3:30 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: michael_heerdegen, acm, emacs-devel

> From: Stefan Monnier <monnier@IRO.UMontreal.CA>
> Cc: Eli Zaretskii <eliz@gnu.org>, michael_heerdegen@web.de,
>         emacs-devel@gnu.org
> Date: Sun, 11 Nov 2018 15:47:19 -0500
> 
> > Another, slightly more honest, answer is that when it wasn't there, my
> > Emacs build crashed with a segfault whilst loading .el files.  I didn't
> > get a core dump for this segfault.  Could you please tell me (or point
> > me in the right direction of documentation) how I configure my GNU/Linux
> > to generate core dumps.  I think my kernel's set up correctly, but I
> > don't see the dumps.
> 
> I can't rmember how to do that

"ulimit -H -c unlimited", I think.

> but I recommend you just run emacs (or temacs as the case may be)
> within GDB directly rather than go through a core dump.

Right.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-11 15:53                           ` Eli Zaretskii
  2018-11-11 20:12                             ` Alan Mackenzie
@ 2018-11-12 14:16                             ` Alan Mackenzie
  1 sibling, 0 replies; 44+ messages in thread
From: Alan Mackenzie @ 2018-11-12 14:16 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: michael_heerdegen, monnier, emacs-devel

Hello, Eli.

On Sun, Nov 11, 2018 at 17:53:13 +0200, Eli Zaretskii wrote:
> > Date: Sun, 11 Nov 2018 12:59:45 +0000
> > From: Alan Mackenzie <acm@muc.de>
> > Cc: Michael Heerdegen <michael_heerdegen@web.de>, emacs-devel@gnu.org
> > 
> > I've now got this working, and created the new, optimistically named,
> > branch /scratch/accurate-warning-pos.

> Thanks.

>   +/* Return a new located symbol with the specified SYMBOL and LOCATION. */
>   +Lisp_Object
>   +build_located_symbol (Lisp_Object symbol, Lisp_Object location)
>   +{

> I'd prefer something like symbol_with_pos instead, and accordingly in
> other related symbol names.

DONE.

>   +DEFUN ("only-symbol-p", Fonly_symbol_p, Sonly_symbol_p, 1, 1, 0,
>   +       doc: /* Return t if OBJECT is a symbol, but not a located symbol.  */
>   +       attributes: const)
>   +  (Lisp_Object object)

> symbol-bare-p?

DONE.  (bare-symbol-p)

[ .... ]

>   diff --git a/src/lisp.h b/src/lisp.h
>   index eb67626..b4fc6f2 100644
>   --- a/src/lisp.h
>   +++ b/src/lisp.h
>   @@ -323,6 +323,64 @@ typedef union Lisp_X *Lisp_Word;
>    typedef EMACS_INT Lisp_Word;
>    #endif

>   +/* A Lisp_Object is a tagged pointer or integer.  Ordinarily it is a
>   +   Lisp_Word.  However, if CHECK_LISP_OBJECT_TYPE, it is a wrapper
>   +   around Lisp_Word, to help catch thinkos like 'Lisp_Object x = 0;'.
>   +
>   +   LISP_INITIALLY (W) initializes a Lisp object with a tagged value
>   +   that is a Lisp_Word W.  It can be used in a static initializer.  */

> Looks like you moved a large chunk of lisp.h to a different place in
> the file.  Any reasons for that?

I've now moved all but a few inline functions back again.

>   +/* FIXME!!! 2018-11-09.  Consider using lisp_h_PSEUDOVECTOR here. */

> What is this FIXME about?

It's gone, the issue having been resolved.

> This needs support in src/.gdbinit and documentation.

Not yet done.

> Thanks again for working in this.

-- 
Alan Mackenzie (Nuremberg, Germany).



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-08 14:08                   ` Alan Mackenzie
  2018-11-08 17:02                     ` Stefan Monnier
@ 2018-11-12 15:44                     ` Alan Mackenzie
  2018-11-12 20:36                       ` Stefan Monnier
  1 sibling, 1 reply; 44+ messages in thread
From: Alan Mackenzie @ 2018-11-12 15:44 UTC (permalink / raw)
  To: Stefan Monnier, Eli Zaretskii; +Cc: Michael Heerdegen, emacs-devel

Hello, Stefan and Eli.

A snag.....

On Thu, Nov 08, 2018 at 14:08:43 +0000, Alan Mackenzie wrote:

[ .... ]

> One thing we'd need to watch out for is using equal, not eq, when we
> compare symbols.  (eq 'foo #<symbol foo with position 73>) will surely
> be nil, but (equal ....) would be t.  Same with member and memq.

Unfortunately, this isn't going to work.  There will be macros which do
things like:

    (cond ((eq (car form) 'bar) ....) .....)

Here, (car form) is going to be #<symbol bar at 42>, so the eq is going
to return nil.

The only way out of this I can see at the moment is to amend eq (and
memq, assq, delq, ....) so that it recognises a symbol with position as
being eq to the bare symbol.

At least when the flag variable symbols-with-pos-enabled is currently
non-nil.  At the implementation level, when that variable is nil (i.e.
for normal running), there would be a cost of one comparison of an
in-cache variable with zero on each eq operation which returns nil.

This isn't pretty.  If this modification of eq, memq, .... is too much
to take, then I think the current approach is doomed to failure.

What do you think?

[ .... ]

-- 
Alan Mackenzie (Nuremberg, Germany).

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-11 20:12                             ` Alan Mackenzie
  2018-11-11 20:47                               ` Stefan Monnier
@ 2018-11-12 16:19                               ` Eli Zaretskii
  1 sibling, 0 replies; 44+ messages in thread
From: Eli Zaretskii @ 2018-11-12 16:19 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: michael_heerdegen, monnier, emacs-devel

> Date: Sun, 11 Nov 2018 20:12:14 +0000
> Cc: michael_heerdegen@web.de, monnier@IRO.UMontreal.CA, emacs-devel@gnu.org
> From: Alan Mackenzie <acm@muc.de>
> 
> > This needs support in src/.gdbinit and documentation.
> 
> Yes!  I think .gdbinit will be relatively straightforward.  How much to
> put into the docs (the elisp manual?) is more difficult to decide.
> Although primariliy for the byte compiler, Michael Heerdegen has already
> said he's got other uses for it.

The object and its predicate(s) should be documented, as well as the
new primitive which uses it, read-positioning-symbols.  The printed
representation of the new Lisp object should also be documented (we do
that for every other Lisp object).  And there should be a short
announcement in NEWS.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-12 15:44                     ` Alan Mackenzie
@ 2018-11-12 20:36                       ` Stefan Monnier
  2018-11-12 21:35                         ` Alan Mackenzie
  0 siblings, 1 reply; 44+ messages in thread
From: Stefan Monnier @ 2018-11-12 20:36 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: Michael Heerdegen, Eli Zaretskii, emacs-devel

> Unfortunately, this isn't going to work.  There will be macros which do
> things like:
>
>     (cond ((eq (car form) 'bar) ....) .....)
>
> Here, (car form) is going to be #<symbol bar at 42>, so the eq is going
> to return nil.
[...]
> This isn't pretty.  If this modification of eq, memq, .... is too much
> to take, then I think the current approach is doomed to failure.

It's indeed a serious concern.  Maybe we can circumvent by changing
those pieces of code to use `eql` (and make sure `eql` consider
a symbol and its symbol-with-pos as equal, obviously).
Changing `eq` would better be avoided,


        Stefan



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-12 20:36                       ` Stefan Monnier
@ 2018-11-12 21:35                         ` Alan Mackenzie
  2018-11-14 13:34                           ` Stefan Monnier
  0 siblings, 1 reply; 44+ messages in thread
From: Alan Mackenzie @ 2018-11-12 21:35 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Michael Heerdegen, Eli Zaretskii, emacs-devel

Hello, Stefan.

On Mon, Nov 12, 2018 at 15:36:14 -0500, Stefan Monnier wrote:
> > Unfortunately, this isn't going to work.  There will be macros which do
> > things like:
> >
> >     (cond ((eq (car form) 'bar) ....) .....)
> >
> > Here, (car form) is going to be #<symbol bar at 42>, so the eq is going
> > to return nil.
> [...]
> > This isn't pretty.  If this modification of eq, memq, .... is too much
> > to take, then I think the current approach is doomed to failure.

> It's indeed a serious concern.  Maybe we can circumvent by changing
> those pieces of code to use `eql` (and make sure `eql` consider
> a symbol and its symbol-with-pos as equal, obviously).

We can't change those bits of code - they're in macros that we don't
necessarily control.  Or are you suggesting that we somehow compile
macros such that `eq' gets replaced by `eql' in the critical places?

> Changing `eq` would better be avoided,

I agree, but don't see how we can avoid it.

Apologies for my earlier insistence that the approach would have little
impact outside the byte compiler.

>         Stefan

-- 
Alan Mackenzie (Nuremberg, Germany).



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-12 21:35                         ` Alan Mackenzie
@ 2018-11-14 13:34                           ` Stefan Monnier
  2018-11-15 16:32                             ` Alan Mackenzie
  0 siblings, 1 reply; 44+ messages in thread
From: Stefan Monnier @ 2018-11-14 13:34 UTC (permalink / raw)
  To: emacs-devel

>> Changing `eq` would better be avoided,
> I agree, but don't see how we can avoid it.

Oh... you mean when someone else's macro does for example

   (defmacro ...
     (if (eq x 'foo)
         `(...)
       `(...)))

...hmm... yes, this is getting really ugly.

Maybe the "big cons-cells" approach is not that bad after all, since it
doesn't try to introduce new objects which are "equal but not": it just
introduces a subtype of cons-cells and that's that, so it's semantically
much simpler/cleaner.

It will require special code in alloc.c to keep the special
representation of normal cons-cells, and special extra code to propagate
the location information in macroexp.el, cconv.el, byte-opt.el,
bytecomp.el but the impact should be much more localized (and at places
where normal compilers also have to do this kind of work).

        Stefan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-14 13:34                           ` Stefan Monnier
@ 2018-11-15 16:32                             ` Alan Mackenzie
  2018-11-15 18:01                               ` Stefan Monnier
  0 siblings, 1 reply; 44+ messages in thread
From: Alan Mackenzie @ 2018-11-15 16:32 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

Hello, Stefan.

On Wed, Nov 14, 2018 at 08:34:28 -0500, Stefan Monnier wrote:
> >> Changing `eq` would better be avoided,
> > I agree, but don't see how we can avoid it.

> Oh... you mean when someone else's macro does for example

>    (defmacro ...
>      (if (eq x 'foo)
>          `(...)
>        `(...)))

Yes.

> ...hmm... yes, this is getting really ugly.

> Maybe the "big cons-cells" approach is not that bad after all, since it
> doesn't try to introduce new objects which are "equal but not": it just
> introduces a subtype of cons-cells and that's that, so it's semantically
> much simpler/cleaner.

I'm not sure about that.  We'd still have to modify EQ to cope with the
new structure no matter how we do it.

> It will require special code in alloc.c to keep the special
> representation of normal cons-cells, and special extra code to propagate
> the location information in macroexp.el, cconv.el, byte-opt.el,
> bytecomp.el but the impact should be much more localized (and at places
> where normal compilers also have to do this kind of work).

In branch scratch/accurate-warning-pos I have hacked up (but not
committed) an EQ which works with the (new as of a few days ago) PVEC
structure for symbols with position.  I am now able to byte-compile a
.el file with symbols-with-pos-enabled bound to non-nil, having sorted
out the problem that was earlier causing segfaults (probably).

This version of Emacs is slower by ~8%, but this is tempered by the EQ
implementation being extremely naive without any optimsation.  Also some
existing optimsation (e.g. #define EQ) has been commented out to enable the
files to compile.  I don't understand the relationship between "#define
EQ" and the inline function EQ at all well.  Optimsation will be surely
be possible.

>         Stefan

-- 
Alan Mackenzie (Nuremberg, Germany).

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-15 16:32                             ` Alan Mackenzie
@ 2018-11-15 18:01                               ` Stefan Monnier
  2018-11-16 14:14                                 ` Alan Mackenzie
  0 siblings, 1 reply; 44+ messages in thread
From: Stefan Monnier @ 2018-11-15 18:01 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: emacs-devel

>> Maybe the "big cons-cells" approach is not that bad after all, since it
>> doesn't try to introduce new objects which are "equal but not": it just
>> introduces a subtype of cons-cells and that's that, so it's semantically
>> much simpler/cleaner.
>
> I'm not sure about that.  We'd still have to modify EQ to cope with the
> new structure no matter how we do it.

No need to modify EQ for the big-cons cells: a big-cons-cell would be
a normal cons-cell just with more fields added at its end.  It's not
a "location + pointer to the real object" like we need to do for
symbols, so EQ will do the expected thing on it.


        Stefan



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Thoughts on getting correct line numbers in the byte compiler's warning messages
  2018-11-15 18:01                               ` Stefan Monnier
@ 2018-11-16 14:14                                 ` Alan Mackenzie
  0 siblings, 0 replies; 44+ messages in thread
From: Alan Mackenzie @ 2018-11-16 14:14 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

Hello, Stefan.

On Thu, Nov 15, 2018 at 13:01:49 -0500, Stefan Monnier wrote:
> >> Maybe the "big cons-cells" approach is not that bad after all, since it
> >> doesn't try to introduce new objects which are "equal but not": it just
> >> introduces a subtype of cons-cells and that's that, so it's semantically
> >> much simpler/cleaner.

> > I'm not sure about that.  We'd still have to modify EQ to cope with the
> > new structure no matter how we do it.

> No need to modify EQ for the big-cons cells: a big-cons-cell would be
> a normal cons-cell just with more fields added at its end.  It's not
> a "location + pointer to the real object" like we need to do for
> symbols, so EQ will do the expected thing on it.

Sorry, yes.  We'd need some way of distinguishing between the two types
of cons cell (which I think you already dealt with some while ago) and
we'd need to do an awful lot of transfer of old->new source information
in the transformation of forms.

In the mean time, I've got the symbols approach "working".  In
particular, I can byte compile the file from Roland Winkler's bug #9109,
and get the "free variable" warning message indicating the correct
source line (and, with a little more work to be done, the correct
column).  It is not quite ready to demonstrate, but quite near it.

Incidentally, why do we not print line and column numbers for warnings
in compile_defun?  It wouldn't be difficult.

>         Stefan

-- 
Alan Mackenzie (Nuremberg, Germany).

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2018-11-16 14:14 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-01 17:59 Thoughts on getting correct line numbers in the byte compiler's warning messages Alan Mackenzie
2018-11-01 22:45 ` Stefan Monnier
2018-11-05 10:53   ` Alan Mackenzie
2018-11-05 15:57     ` Eli Zaretskii
2018-11-05 16:51       ` Alan Mackenzie
2018-11-06  4:34         ` Herring, Davis
2018-11-06  8:53           ` Alan Mackenzie
2018-11-06 13:56     ` Stefan Monnier
2018-11-06 15:11       ` Alan Mackenzie
2018-11-06 16:29         ` Stefan Monnier
2018-11-06 19:15           ` Alan Mackenzie
2018-11-06 20:04             ` Stefan Monnier
2018-11-07 12:35               ` Alan Mackenzie
2018-11-07 17:11                 ` Stefan Monnier
2018-11-07 17:00           ` Alan Mackenzie
2018-11-07 17:25             ` Stefan Monnier
2018-11-07 18:47               ` Alan Mackenzie
2018-11-07 19:12                 ` Stefan Monnier
2018-11-08 14:08                   ` Alan Mackenzie
2018-11-08 17:02                     ` Stefan Monnier
2018-11-08 22:13                       ` Alan Mackenzie
2018-11-11 12:59                         ` Alan Mackenzie
2018-11-11 15:53                           ` Eli Zaretskii
2018-11-11 20:12                             ` Alan Mackenzie
2018-11-11 20:47                               ` Stefan Monnier
2018-11-12  3:30                                 ` Eli Zaretskii
2018-11-12 16:19                               ` Eli Zaretskii
2018-11-12 14:16                             ` Alan Mackenzie
2018-11-12 15:44                     ` Alan Mackenzie
2018-11-12 20:36                       ` Stefan Monnier
2018-11-12 21:35                         ` Alan Mackenzie
2018-11-14 13:34                           ` Stefan Monnier
2018-11-15 16:32                             ` Alan Mackenzie
2018-11-15 18:01                               ` Stefan Monnier
2018-11-16 14:14                                 ` Alan Mackenzie
2018-11-08  4:47 ` Michael Heerdegen
2018-11-08 11:07   ` Alan Mackenzie
2018-11-09  2:06     ` Michael Heerdegen
2018-11-10 10:59       ` Alan Mackenzie
2018-11-10 13:20         ` Stefan Monnier
2018-11-11  7:56         ` Michael Heerdegen
2018-11-08 13:45   ` Stefan Monnier
2018-11-09  3:06     ` Michael Heerdegen
2018-11-09 16:15       ` Stefan Monnier

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).