scratch/accurate-warning-pos: next steps.

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* scratch/accurate-warning-pos: next steps.
@ 2018-12-10 18:00 Alan Mackenzie
  2018-12-10 18:15 ` Eli Zaretskii
  2018-12-10 23:54 ` Paul Eggert
  0 siblings, 2 replies; 20+ messages in thread
From: Alan Mackenzie @ 2018-12-10 18:00 UTC (permalink / raw)
  To: emacs-devel

Hello, Emacs.

Here's my scheme for making further progress on the
scratch/accurate-warning-pos branch.

At the moment, it appears to display the requisite accurate source
positions in warning messages, but it slows Emacs down a little.  Hence
it has not been accepted in its current form.

Following an idea from Paul, I propose to build an alternative byte-code
interpreter alongside the primary one.  This second interpreter would
regard symbols with position as being EQ to the corresponding bare
symbols, just as the branch currently does when symbols-with-pos-enabled
is bound to non-nil.

C symbols in components of the second interpreter would be those of the
main one, prefixed by "BC_".

lisp.h would be modified to define alternative versions of EQ, NILP,
SYMBOLP, and XSYMBOL, and alternative versions of the INLINE functions
which call them.  These would be called BC_EQ, BC_NILP, BC_SYMBOLP, and
BC_XSYMBOL.

Most of the C sources would, at build time, be fed to a preprocessor
which would analyse (almost every) C function, and write a temporary file
containing the functions foo and BC_foo next to eachother.  foo would be
unchanged from the C source, BC_foo would have calls to bar modified to
BC_bar, and invocations of EQ etc., modified to BC_EQ, etc.  These
preprocessor outputs would be compiled into temacs in place of the
primary C sources.  The resulting temacs would, of course, be bigger than
the current temacs.

In particular, the byte code interpreter exec_byte_code in bytecode.c
would have its alternative BC_exec_byte_code.

The struct Lisp_Subr would be amended to hold three Lisp_Functions - the
currently live one, the normal one, and the BC_... one.  Also a next
pointer would be introduced, chaining all the subrs together.

The .el and .elc files would not require amendment (apart from
bytecomp.el, and so on, of course).

When a byte compilation is initiated, the compiler would replace the
current live function field with the corresponding BC_ function in every
Lisp_Subr, thus switching over to the BC_... interpreter.  At termination
of the compiler, an unwind-protect would restore the Lisp_Subrs to their
standard settings.

There remains the question, which C functions would get a BC_... version?
To begin with, I propose almost every C function.  Only those for which a
second version would be damaging (for example, the command loop) would
remain unique.  Once the mechanism is working, we could steadily reduce
the number of BC_... functions from "as many as possible" to "what is
needed ".  For example, surely xdisp.c, and xterm.c would not need
duplication.

This scheme would allow accurate warning line numbers to be output,
whilst not slowing down the normal operation of Emacs.  It would likely
slow down the operation of the byte compiler by several per cent, as has
been measured in the current scratch/accurate-warning-pos branch.

Comments?

-- 
Alan Mackenzie (Nuremberg, Germany).

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: scratch/accurate-warning-pos: next steps.
  2018-12-10 18:00 scratch/accurate-warning-pos: next steps Alan Mackenzie
@ 2018-12-10 18:15 ` Eli Zaretskii
  2018-12-10 18:28   ` Alan Mackenzie
  2018-12-10 23:54 ` Paul Eggert
  1 sibling, 1 reply; 20+ messages in thread
From: Eli Zaretskii @ 2018-12-10 18:15 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: emacs-devel

> Date: Mon, 10 Dec 2018 18:00:33 +0000
> From: Alan Mackenzie <acm@muc.de>
> 
> 
> Following an idea from Paul, I propose to build an alternative byte-code
> interpreter alongside the primary one.  This second interpreter would
> regard symbols with position as being EQ to the corresponding bare
> symbols, just as the branch currently does when symbols-with-pos-enabled
> is bound to non-nil.

I don't think I understood when will this alternative interpreter be
used, and when will the "primary" one be used.  Can you elaborate on
that?

Thanks.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: scratch/accurate-warning-pos: next steps.
  2018-12-10 18:15 ` Eli Zaretskii
@ 2018-12-10 18:28   ` Alan Mackenzie
  2018-12-10 18:39     ` Eli Zaretskii
  0 siblings, 1 reply; 20+ messages in thread
From: Alan Mackenzie @ 2018-12-10 18:28 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Hello, Eli.

On Mon, Dec 10, 2018 at 20:15:18 +0200, Eli Zaretskii wrote:
> > Date: Mon, 10 Dec 2018 18:00:33 +0000
> > From: Alan Mackenzie <acm@muc.de>

> > Following an idea from Paul, I propose to build an alternative byte-code
> > interpreter alongside the primary one.  This second interpreter would
> > regard symbols with position as being EQ to the corresponding bare
> > symbols, just as the branch currently does when symbols-with-pos-enabled
> > is bound to non-nil.

> I don't think I understood when will this alternative interpreter be
> used, and when will the "primary" one be used.  Can you elaborate on
> that?

Yes.  The alternative interpreter would be used only for byte
compilation (and possibly other programs which want to use the symbols
with position mechanism), the primary one will be used at all other
times.

There would be a function switch-to-BC-subrs accessible from Lisp which
would switch to the alternative interpreter, and switch-to-normal-subrs
for the reverse.  Or something like that.  byte-compile-file and friends
would use these functions.

Any recursive-edit would "bind" to the normal interpreter.  C-g, and any
other quit actions would restore the normal interpreter.

> Thanks.

-- 
Alan Mackenzie (Nuremberg, Germany).

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: scratch/accurate-warning-pos: next steps.
  2018-12-10 18:28   ` Alan Mackenzie
@ 2018-12-10 18:39     ` Eli Zaretskii
  2018-12-10 19:35       ` Alan Mackenzie
  0 siblings, 1 reply; 20+ messages in thread
From: Eli Zaretskii @ 2018-12-10 18:39 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: emacs-devel

> Date: Mon, 10 Dec 2018 18:28:30 +0000
> Cc: emacs-devel@gnu.org
> From: Alan Mackenzie <acm@muc.de>
> 
> > I don't think I understood when will this alternative interpreter be
> > used, and when will the "primary" one be used.  Can you elaborate on
> > that?
> 
> Yes.  The alternative interpreter would be used only for byte
> compilation (and possibly other programs which want to use the symbols
> with position mechanism), the primary one will be used at all other
> times.

Then how about invoking this alternative interpreter only if the prime
interpreter detected a warning or error while byte-compiling?  You
could invoke the alternative interpreter only on the form where the
problem was detected, with the goal of "drilling down" to find the
exact position of the problematic symbol(s).

This would have the advantage of not only avoiding the slow-down in
the "prime" interpreter, but also avoiding slowing down byte
compilation of error-free sources.

Does this make sense?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: scratch/accurate-warning-pos: next steps.
  2018-12-10 18:39     ` Eli Zaretskii
@ 2018-12-10 19:35       ` Alan Mackenzie
  2018-12-10 20:06         ` Eli Zaretskii
  0 siblings, 1 reply; 20+ messages in thread
From: Alan Mackenzie @ 2018-12-10 19:35 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Hello, Eli.

On Mon, Dec 10, 2018 at 20:39:56 +0200, Eli Zaretskii wrote:
> > Date: Mon, 10 Dec 2018 18:28:30 +0000
> > Cc: emacs-devel@gnu.org
> > From: Alan Mackenzie <acm@muc.de>

> > > I don't think I understood when will this alternative interpreter be
> > > used, and when will the "primary" one be used.  Can you elaborate on
> > > that?

> > Yes.  The alternative interpreter would be used only for byte
> > compilation (and possibly other programs which want to use the symbols
> > with position mechanism), the primary one will be used at all other
> > times.

> Then how about invoking this alternative interpreter only if the prime
> interpreter detected a warning or error while byte-compiling?  You
> could invoke the alternative interpreter only on the form where the
> problem was detected, with the goal of "drilling down" to find the
> exact position of the problematic symbol(s).

That would mean starting the byte compilation with no position
information being gathered, and then when an warning occurs, aborting
the compilation and starting again from scratch with the position
information being gather and alternative interpreter being used.

The problem is, that we cannot use #<symbol nil at 666> in the normal
interpreter, since it is not EQ nil there.

> This would have the advantage of not only avoiding the slow-down in
> the "prime" interpreter, but also avoiding slowing down byte
> compilation of error-free sources.

This is an optimisation.

> Does this make sense?

I understand the idea, yes.  But given the timings I measured in the
existing scratch/accurate-warning-pos (IIRC, around 11% - 12% for an
actual compilation) and the fact that in the alternative interpreter,
the slowdown will be somewhat less (one fewer flag comparison per EQ,
NILP, ...., and we can drop the traditional alist of symbols and
positions which is running alongside the new symbols with position) it
may not be worth the extra complexity.

-- 
Alan Mackenzie (Nuremberg, Germany).

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: scratch/accurate-warning-pos: next steps.
  2018-12-10 19:35       ` Alan Mackenzie
@ 2018-12-10 20:06         ` Eli Zaretskii
  2018-12-10 21:03           ` Alan Mackenzie
  0 siblings, 1 reply; 20+ messages in thread
From: Eli Zaretskii @ 2018-12-10 20:06 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: emacs-devel

> Date: Mon, 10 Dec 2018 19:35:57 +0000
> Cc: emacs-devel@gnu.org
> From: Alan Mackenzie <acm@muc.de>
> 
> > Then how about invoking this alternative interpreter only if the prime
> > interpreter detected a warning or error while byte-compiling?  You
> > could invoke the alternative interpreter only on the form where the
> > problem was detected, with the goal of "drilling down" to find the
> > exact position of the problematic symbol(s).
> 
> That would mean starting the byte compilation with no position
> information being gathered, and then when an warning occurs, aborting
> the compilation and starting again from scratch with the position
> information being gather and alternative interpreter being used.

Not necessarily.  It could mean invocation of a special code whose
goal is to find the position of an error in a given form.  The
position of the beginning of this form will have been known, as AFAIU
the existing byte compiler does collect that, or has means to
determine that.

> The problem is, that we cannot use #<symbol nil at 666> in the normal
> interpreter, since it is not EQ nil there.

I'm not sure you must use symbols with positions in the above
arrangement, you could simply invoke special-purpose code that
analyzed the problematic form.  But if you do need to use symbols with
positions, you could do this only when looking for error position, so
other symbol comparisons will not be affected.

> I understand the idea, yes.  But given the timings I measured in the
> existing scratch/accurate-warning-pos (IIRC, around 11% - 12% for an
> actual compilation) and the fact that in the alternative interpreter,
> the slowdown will be somewhat less (one fewer flag comparison per EQ,
> NILP, ...., and we can drop the traditional alist of symbols and
> positions which is running alongside the new symbols with position) it
> may not be worth the extra complexity.

Yes, but what you suggested as the implementation of the alternative
interpreter includes a heck of complexity of its own, IMO.  The idea I
proposed doesn't even require changes in basic types, it could
hopefully be implemented with "normal" Lisp.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: scratch/accurate-warning-pos: next steps.
  2018-12-10 20:06         ` Eli Zaretskii
@ 2018-12-10 21:03           ` Alan Mackenzie
  2018-12-11  6:41             ` Eli Zaretskii
  2018-12-11 19:07             ` Stefan Monnier
  0 siblings, 2 replies; 20+ messages in thread
From: Alan Mackenzie @ 2018-12-10 21:03 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Hello, Eli.

On Mon, Dec 10, 2018 at 22:06:33 +0200, Eli Zaretskii wrote:
> > Date: Mon, 10 Dec 2018 19:35:57 +0000
> > Cc: emacs-devel@gnu.org
> > From: Alan Mackenzie <acm@muc.de>

> > > Then how about invoking this alternative interpreter only if the prime
> > > interpreter detected a warning or error while byte-compiling?  You
> > > could invoke the alternative interpreter only on the form where the
> > > problem was detected, with the goal of "drilling down" to find the
> > > exact position of the problematic symbol(s).

> > That would mean starting the byte compilation with no position
> > information being gathered, and then when an warning occurs, aborting
> > the compilation and starting again from scratch with the position
> > information being gather and alternative interpreter being used.

> Not necessarily.  It could mean invocation of a special code whose
> goal is to find the position of an error in a given form.  The
> position of the beginning of this form will have been known, as AFAIU
> the existing byte compiler does collect that, or has means to
> determine that.

We know the position of the beginning of the form, yes.  We need some way
of determining the source position of a symbol, cons, or vector on the
inside of this form.

The traditional alist of symbols and positions is one way, and it no
longer works well (if it ever did).  Symbols with position is another
way, which appears to work well, in spite of the complexity.

You seem to be proposing a third way, but without giving away any
details.

> > The problem is, that we cannot use #<symbol nil at 666> in the normal
> > interpreter, since it is not EQ nil there.

> I'm not sure you must use symbols with positions in the above
> arrangement, you could simply invoke special-purpose code that
> analyzed the problematic form.  But if you do need to use symbols with
> positions, you could do this only when looking for error position, so
> other symbol comparisons will not be affected.

I'm not sure how this special-purpose code would work.  Say we find an
error or warning involving symbol foo as the car of some form, I can't
see any way of determining its source position that doesn't involve going
back to the position of the beginning of the form, and slogging through
the form, somehow.

Maybe, rather than reading the form, we could scan it a token at a time,
storing it in, say vectors, rather like a traditional non-lisp compiler
does.  But this is hardly attractive, and would be a LOT of work.

> > I understand the idea, yes.  But given the timings I measured in the
> > existing scratch/accurate-warning-pos (IIRC, around 11% - 12% for an
> > actual compilation) and the fact that in the alternative interpreter,
> > the slowdown will be somewhat less (one fewer flag comparison per EQ,
> > NILP, ...., and we can drop the traditional alist of symbols and
> > positions which is running alongside the new symbols with position) it
> > may not be worth the extra complexity.

> Yes, but what you suggested as the implementation of the alternative
> interpreter includes a heck of complexity of its own, IMO.

It does, yes.  Partly imposed by external circumstances.  ;-)

> The idea I proposed doesn't even require changes in basic types, it
> could hopefully be implemented with "normal" Lisp.

The symbols with positions idea doesn't conceptually change basic types
either - it just adds annotations to an existing type.  Anything you
could do with symbols before, you can still do with
symbols-with-positions and get the same answers now.

If you have come up with a new way of getting a source position for a
symbol or a cons, please detail it here.  I recognise the complexity in
what I'm proposing and what I've already implemented, and I don't think
it's good; it's just less bad than anything else that's come up, so far.

-- 
Alan Mackenzie (Nuremberg, Germany).

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: scratch/accurate-warning-pos: next steps.
  2018-12-10 18:00 scratch/accurate-warning-pos: next steps Alan Mackenzie
  2018-12-10 18:15 ` Eli Zaretskii
@ 2018-12-10 23:54 ` Paul Eggert
  2018-12-11 11:34   ` Alan Mackenzie
  1 sibling, 1 reply; 20+ messages in thread
From: Paul Eggert @ 2018-12-10 23:54 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: emacs-devel

On 12/10/18 10:00 AM, Alan Mackenzie wrote:
> lisp.h would be modified to define alternative versions of EQ, NILP,
> SYMBOLP, and XSYMBOL, and alternative versions of the INLINE functions
> which call them.  These would be called BC_EQ, BC_NILP, BC_SYMBOLP, and
> BC_XSYMBOL.
>
> Most of the C sources would, at build time, be fed to a preprocessor
> which would analyse (almost every) C function, and write a temporary file
> containing the functions foo and BC_foo next to eachother.

This preprocessor would be a separate program that we'd write? If so, 
that sounds error-prone. C is notoriously tricky to preprocess, and 
Emacs already uses the C preprocessor aggressively. Instead, why not use 
the C preprocessor itself, rather than writing another preprocessor for 
C? In other words, compile each file twice, once with one -D option and 
once with another.

Even with this suggestion, though, I'm leery of multiple interpreters. 
Although it'd be better to have multiple interpreters (one faster, one 
slower) than to have just a single, slower interpreter, it'd be better 
yet to have just a single, faster interpreter.

Instead, I suggest looking into Stefan's suggestion to use edebug info 
<https://lists.gnu.org/archive/html/emacs-devel/2018-11/msg00526.html>, 
which should be a much less-drastic way to address the problem; for more 
info, see Gemini's followup 
<https://lists.gnu.org/r/emacs-devel/2018-12/msg00043.html>. 
Alternatively, Yuri's suggestion of an opt-in property for macros 
<https://lists.gnu.org/r/emacs-devel/2018-12/msg00023.html> also seems 
like a much-simpler approach that should work just as well in the long 
run as multiple interpreters would.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: scratch/accurate-warning-pos: next steps.
  2018-12-10 21:03           ` Alan Mackenzie
@ 2018-12-11  6:41             ` Eli Zaretskii
  2018-12-11 19:21               ` Stefan Monnier
  2018-12-11 19:07             ` Stefan Monnier
  1 sibling, 1 reply; 20+ messages in thread
From: Eli Zaretskii @ 2018-12-11  6:41 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: emacs-devel

> Date: Mon, 10 Dec 2018 21:03:10 +0000
> Cc: emacs-devel@gnu.org
> From: Alan Mackenzie <acm@muc.de>
> 
> > > That would mean starting the byte compilation with no position
> > > information being gathered, and then when an warning occurs, aborting
> > > the compilation and starting again from scratch with the position
> > > information being gather and alternative interpreter being used.
> 
> > Not necessarily.  It could mean invocation of a special code whose
> > goal is to find the position of an error in a given form.  The
> > position of the beginning of this form will have been known, as AFAIU
> > the existing byte compiler does collect that, or has means to
> > determine that.
> 
> We know the position of the beginning of the form, yes.  We need some way
> of determining the source position of a symbol, cons, or vector on the
> inside of this form.
> 
> The traditional alist of symbols and positions is one way, and it no
> longer works well (if it ever did).  Symbols with position is another
> way, which appears to work well, in spite of the complexity.
> 
> You seem to be proposing a third way, but without giving away any
> details.

I don't have any details, just an idea.  I hope it could be helpful,
because implementing it would side-step all the problems you
discovered with the other approaches:

  . it doesn't slow down the Lisp interpreter
  . it doesn't slow down byte compilation when there are no
    errors/warnings to report
  . it probably doesn't require introduction of new low-level
    facilities, like annotating symbols with positions or redirecting
    Lisp subroutines to alternative versions

> I'm not sure how this special-purpose code would work.  Say we find an
> error or warning involving symbol foo as the car of some form, I can't
> see any way of determining its source position that doesn't involve going
> back to the position of the beginning of the form, and slogging through
> the form, somehow.

Yes, I was proposing something like that.  Why is that a problem?

> Maybe, rather than reading the form, we could scan it a token at a time,
> storing it in, say vectors, rather like a traditional non-lisp compiler
> does.  But this is hardly attractive, and would be a LOT of work.

I'm not sure why is it less attractive than the other alternatives.
But if my idea doesn't sound helpful, feel free to disregard it.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: scratch/accurate-warning-pos: next steps.
  2018-12-10 23:54 ` Paul Eggert
@ 2018-12-11 11:34   ` Alan Mackenzie
  2018-12-11 18:05     ` Paul Eggert
  0 siblings, 1 reply; 20+ messages in thread
From: Alan Mackenzie @ 2018-12-11 11:34 UTC (permalink / raw)
  To: Paul Eggert; +Cc: emacs-devel

Hello, Paul.

On Mon, Dec 10, 2018 at 15:54:02 -0800, Paul Eggert wrote:
> On 12/10/18 10:00 AM, Alan Mackenzie wrote:
> > lisp.h would be modified to define alternative versions of EQ, NILP,
> > SYMBOLP, and XSYMBOL, and alternative versions of the INLINE functions
> > which call them.  These would be called BC_EQ, BC_NILP, BC_SYMBOLP, and
> > BC_XSYMBOL.

> > Most of the C sources would, at build time, be fed to a preprocessor
> > which would analyse (almost every) C function, and write a temporary file
> > containing the functions foo and BC_foo next to eachother.

> This preprocessor would be a separate program that we'd write?

Yes.

> If so, that sounds error-prone. C is notoriously tricky to
> preprocess, ...

You don't need to tell me that.  ;-)  However, all this preprocessor
would do would be to recognise starts and ends of functions from a list
of known functions, and textually substitute BC_foo for foo, again from
that list of known substitutions.  It would need to parse comments and
strings.  The list of known functions can be reliably generated by
objdump (from binutils).

This preprocessor would be tedious rather than difficult to write.

> .... and Emacs already uses the C preprocessor aggressively. Instead,
> why not use the C preprocessor itself, rather than writing another
> preprocessor for C? In other words, compile each file twice, once with
> one -D option and once with another.

Because the two interpreters will need to share file static data, of
which there must be only one copy.  So the two versions of each function
need to be in the same "source" file.

The approach has the advantage that only minimal amendment of the C
source, if that, will be needed.

> Even with this suggestion, though, I'm leery of multiple interpreters. 
> Although it'd be better to have multiple interpreters (one faster, one 
> slower) than to have just a single, slower interpreter, it'd be better 
> yet to have just a single, faster interpreter.

Yes, we'd all like that, but several weeks of exploring alternatives has
failed to produce any workable solutions on these lines.

> Instead, I suggest looking into Stefan's suggestion to use edebug info 
> <https://lists.gnu.org/archive/html/emacs-devel/2018-11/msg00526.html>, 
> which should be a much less-drastic way to address the problem;

Not really, no.  To recap, that would involve the reader adding
annotations to every Lisp element, turning it into a list looking like:

    (locinfo FILE POS (foo (locinfo FILE POS a) (locinfo FILE POS 4)))

in place of

    (foo a 4)

.  The form Stefan suggested is MUCH bigger than the plain form, having,
perhaps four times the number of conses (I haven't counted them).  A
large part of the compiler would need to be amended to cope with the new
format, even supposing it could work with macros (which I don't think it
could).  This amendment would be uninspiring and tedious in the extreme.

I seriously doubt this would run faster than the symbols-with-position
approach (which has already been implemented), even if it could be made
to work.

> for more info, see Gemini's followup
> <https://lists.gnu.org/r/emacs-devel/2018-12/msg00043.html>.

I've read this several times.  It suffers the same drawbacks as Stefan's
idea.  In particular it doesn't give any idea how the compiler would
operate on the proposed forms.

> Alternatively, Yuri's suggestion of an opt-in property for macros
> <https://lists.gnu.org/r/emacs-devel/2018-12/msg00023.html> also seems
> like a much-simpler approach that should work just as well in the long
> run as multiple interpreters would.

I don't think it would work, either.  That idea is for macros' uses of
eq to be replaced by BC-eq inside the macro.  The trouble is, many uses
of eq are actually expansions of EQ in the C code (e.g. in Fequal,
Fassq, ....) and they would all need modifying too, and we're back in
the same situation of having an alternative interpreter.

-- 
Alan Mackenzie (Nuremberg, Germany).

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: scratch/accurate-warning-pos: next steps.
  2018-12-11 11:34   ` Alan Mackenzie
@ 2018-12-11 18:05     ` Paul Eggert
  2018-12-11 19:20       ` Alan Mackenzie
  0 siblings, 1 reply; 20+ messages in thread
From: Paul Eggert @ 2018-12-11 18:05 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: emacs-devel

On 12/11/18 3:34 AM, Alan Mackenzie wrote:

 > The list of known functions can be reliably generated by objdump 
(from binutils).

Would objdump be run on every build that compiles a .c file that goes 
into the Emacs executable? If so, aren't we limiting builds to platforms 
that have binutils, which would be a new restriction? And if not, 
wouldn't the objdump use be a hand-done process that would need to be 
redone on a binutils platform (with output committed to Git) whenever a 
significant-enough change is made to src/*? Either way, this sounds like 
a hassle.

 >> .... and Emacs already uses the C preprocessor aggressively. 
Instead, why not use the C preprocessor itself, rather than writing 
another preprocessor for C? In other words, compile each file twice, 
once with one -D option and once with another.
 >
 > Because the two interpreters will need to share file static data, of 
which there must be only one copy. So the two versions of each function 
need to be in the same "source" file.

It should be easy enough to move shared file-static data into another 
file, that would be compiled only once. We don't have so much shared 
file-static data that this would be a major obstacle.

 > The form Stefan suggested is MUCH bigger than the plain form, having,
 >
 > perhaps four times the number of conses (I haven't counted them).

This overhead would occur only when byte-compiling the form, which 
shouldn't be much of a problem in practice.

 > This preprocessor would be tedious rather than difficult to write.
 >
...
 >
 > A large part of the compiler would need to be amended to cope with 
the new format, even supposing it could work with macros (which I don't 
think it could). This amendment would be uninspiring and tedious in the 
extreme.

I agree that either approach would be tedious. :-) However, a tedious 
approach that is limited to reading and byte-compilation is better than 
a tedious approach that affects all of execution.

 >> for more info, see Gemini's followup 
<https://lists.gnu.org/r/emacs-devel/2018-12/msg00043.html>.
 >
 > I've read this several times. It suffers the same drawbacks as 
Stefan's idea. In particular it doesn't give any idea how the compiler 
would operate on the proposed forms.

As I understand it, Gemini and Stefan are thinking of essentially the 
same thing: have the reader optionally generate symbols-with-positions, 
have the compiler deal with symbols-with-positions, and have the 
compiler strip positions before passing forms to macro arguments that 
are not annotated to accept positions. Although (as you mention) this 
would require amending a large part of the compiler in a tedious way, it 
should solve the problem for macro arguments that accept positions.

 >> <https://lists.gnu.org/r/emacs-devel/2018-12/msg00023.html>
 >
 > I don't think it would work, either. That idea is for macros' uses of 
eq to be replaced by BC-eq inside the macro. The trouble is, many uses 
of eq are actually expansions of EQ in the C code (e.g. in Fequal, 
Fassq, ....) and they would all need modifying too, and we're back in 
the same situation of having an alternative interpreter.

No, the idea is that the onus of doing comparisons correctly is on the 
writer of any macro annotated to understand symbols-with-positions. That 
is, it's the macro's responsibility to use appropriate comparison 
operations, and this responsibility extends to comparison operations 
like EQ that are executed in C code.

For example, I suggested that 'equal' should ignore symbol positions, as 
this would let these macros use 'equal' instead of 'eq', 'assoc' instead 
of 'assq', etc. 
<https://lists.gnu.org/r/emacs-devel/2018-12/msg00033.html>. Although 
'assoc' is written in C and its source code uses EQ, the source code 
would not need to be changed nor would it need to be compiled twice, as 
'assoc' defers to 'equal' (i.e., Fequal in C) to do the tricky work and 
'equal' would do the right thing.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: scratch/accurate-warning-pos: next steps.
  2018-12-10 21:03           ` Alan Mackenzie
  2018-12-11  6:41             ` Eli Zaretskii
@ 2018-12-11 19:07             ` Stefan Monnier
  1 sibling, 0 replies; 20+ messages in thread
From: Stefan Monnier @ 2018-12-11 19:07 UTC (permalink / raw)
  To: emacs-devel

> The traditional alist of symbols and positions is one way, and it no
> longer works well (if it ever did).

It works as well as ever, AFAICT.  It's still a significant improvement
over the previous arrangement.  But as things get better, we get used to
the improvement and start being annoyed by other details.
IOW we get greedy ;-)

        Stefan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: scratch/accurate-warning-pos: next steps.
  2018-12-11 18:05     ` Paul Eggert
@ 2018-12-11 19:20       ` Alan Mackenzie
  2018-12-11 19:59         ` Paul Eggert
  0 siblings, 1 reply; 20+ messages in thread
From: Alan Mackenzie @ 2018-12-11 19:20 UTC (permalink / raw)
  To: Paul Eggert; +Cc: emacs-devel

Hello, Paul.

On Tue, Dec 11, 2018 at 10:05:49 -0800, Paul Eggert wrote:
> On 12/11/18 3:34 AM, Alan Mackenzie wrote:

>  > The list of known functions can be reliably generated by objdump 
>  > (from binutils).

> Would objdump be run on every build that compiles a .c file that goes 
> into the Emacs executable?

I don't know, at this stage.  Probably not.

> If so, aren't we limiting builds to platforms that have binutils,
> which would be a new restriction?

Well, we use ld, which also belongs to binutils, and that doesn't seem
to restrict the platforms.  Other platforms surely have equivalents to
both objdump and ld, and they are/would be used appropriately.

> And if not, wouldn't the objdump use be a hand-done process that would
> need to be redone on a binutils platform (with output committed to
> Git) whenever a significant-enough change is made to src/*? Either
> way, this sounds like a hassle.

globals.h seems to manage.  The objdump output could be generated
analogously.  Somehow.  Probably.

>  >> .... and Emacs already uses the C preprocessor aggressively.
>  >> Instead, why not use the C preprocessor itself, rather than
>  >> writing another preprocessor for C? In other words, compile each
>  >> file twice, once with one -D option and once with another.

>  > Because the two interpreters will need to share file static data,
>  > of which there must be only one copy. So the two versions of each
>  > function need to be in the same "source" file.

> It should be easy enough to move shared file-static data into another 
> file, that would be compiled only once. We don't have so much shared 
> file-static data that this would be a major obstacle.

Possibly not.  The same would have to be done with file global data,
too.  But doing it that way would involve a great deal of change to the
source code (testing for the -D option) which would not be popular.

>  > The form Stefan suggested is MUCH bigger than the plain form, having,
>  > perhaps four times the number of conses (I haven't counted them).

> This overhead would occur only when byte-compiling the form, which 
> shouldn't be much of a problem in practice.

It would likely slow down the compilation by a very great deal.

>  > This preprocessor would be tedious rather than difficult to write.

> ...

>  > A large part of the compiler would need to be amended to cope with
>  > the new format, even supposing it could work with macros (which I
>  > don't think it could). This amendment would be uninspiring and
>  > tedious in the extreme.

> I agree that either approach would be tedious. :-)

No.  You're conflating "tedious" with "tedious in the extreme".  They're
different.  The first is several days of work.  The second is many weeks
of work.  I tried an approach two years ago which involved amending most
of the compiler.  Although spending a fair amount of time on it, I
didn't get very far, and gave up.  Were the reader to produce "Stefan's
form", the work to amend the compiler would be more even than what I
gave up on.

> However, a tedious approach that is limited to reading and
> byte-compilation is better than a tedious approach that affects all of
> execution.

My proposed approach would affect only byte compilation (and the build
process, of course).  That's the whole point.  Besides, my approach will
work.  The competing half-baked ideas are not even fully formulated, and
likely wouldn't work.

>  >> for more info, see Gemini's followup 
>  >> <https://lists.gnu.org/r/emacs-devel/2018-12/msg00043.html>.

>  > I've read this several times. It suffers the same drawbacks as
>  > Stefan's idea. In particular it doesn't give any idea how the
>  > compiler would operate on the proposed forms.

> As I understand it, Gemini and Stefan are thinking of essentially the 
> same thing: have the reader optionally generate symbols-with-positions, 
> have the compiler deal with symbols-with-positions, and have the 
> compiler strip positions before passing forms to macro arguments that 
> are not annotated to accept positions. Although (as you mention) this 
> would require amending a large part of the compiler in a tedious way, it 
> should solve the problem for macro arguments that accept positions.

This would place onerous restrictions on what macros were allowed to do,
and likely be incompatible with a vast proportion of existing macros.

>  >> <https://lists.gnu.org/r/emacs-devel/2018-12/msg00023.html>

>  > I don't think it would work, either. That idea is for macros' uses
>  > of eq to be replaced by BC-eq inside the macro. The trouble is,
>  > many uses of eq are actually expansions of EQ in the C code (e.g.
>  > in Fequal, Fassq, ....) and they would all need modifying too, and
>  > we're back in the same situation of having an alternative
>  > interpreter.

> No, the idea is that the onus of doing comparisons correctly is on the 
> writer of any macro annotated to understand symbols-with-positions. That 
> is, it's the macro's responsibility to use appropriate comparison 
> operations, and this responsibility extends to comparison operations 
> like EQ that are executed in C code.

I think that is an unacceptable change in Emacs.  Macros are already
difficult enough to write.  Such restrictions could make macros
impossible, except for "experts".  Besides, we want to maintain
compatibility with the vast body of existing macros.

> For example, I suggested that 'equal' should ignore symbol positions, as 
> this would let these macros use 'equal' instead of 'eq', 'assoc' instead 
> of 'assq', etc. 
> <https://lists.gnu.org/r/emacs-devel/2018-12/msg00033.html>. Although 
> 'assoc' is written in C and its source code uses EQ, the source code 
> would not need to be changed nor would it need to be compiled twice, as 
> 'assoc' defers to 'equal' (i.e., Fequal in C) to do the tricky work and 
> 'equal' would do the right thing.

The trouble is, macros DO use eq.  And why not?

The contortions you're envisaging contrast horribly with the simplicity
of scratch/accurate-warning-pos, which simply works.  It works because
it has not changed any of the basic types or interactions between them.

-- 
Alan Mackenzie (Nuremberg, Germany).



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: scratch/accurate-warning-pos: next steps.
  2018-12-11  6:41             ` Eli Zaretskii
@ 2018-12-11 19:21               ` Stefan Monnier
  0 siblings, 0 replies; 20+ messages in thread
From: Stefan Monnier @ 2018-12-11 19:21 UTC (permalink / raw)
  To: emacs-devel

>> I'm not sure how this special-purpose code would work.  Say we find an
>> error or warning involving symbol foo as the car of some form, I can't
>> see any way of determining its source position that doesn't involve going
>> back to the position of the beginning of the form, and slogging through
>> the form, somehow.
> Yes, I was proposing something like that.  Why is that a problem?

IIUC what you're suggesting here is to add a heuristic which takes an
arbitrary chunk of code (can be a single symbol but not necessarily), an
approximate source location, and then tries to compute a better source
location from it.

I think making this 100% reliable (either in the sense of "return the
*right* location" or just "return a location that's sometimes/often
better and never worse") is somewhere between very hard and impossible.

But maybe a few well chosen heuristics could indeed give a significant
improvement (i.e. return a location that's sometimes/often better and
rarely worse).  To help the heuristic, we could pass it some indication
of the error (e.g. so it knows whether the symbol (or chunk of code)
we're looking for is expected to be in the position of a normal
expression, a let-binding, a var definition, a function definition,
a function call, ...).

Oh wait: I think if we return a range rather than a single position, we
could make it reliable in the sense that the actual position is within
the range (but sometimes the range will degenerate to cover a whole
top-level definition).

        Stefan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: scratch/accurate-warning-pos: next steps.
  2018-12-11 19:20       ` Alan Mackenzie
@ 2018-12-11 19:59         ` Paul Eggert
  2018-12-11 20:51           ` Alan Mackenzie
  0 siblings, 1 reply; 20+ messages in thread
From: Paul Eggert @ 2018-12-11 19:59 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: emacs-devel

On 12/11/18 11:20 AM, Alan Mackenzie wrote:
>> If so, aren't we limiting builds to platforms that have binutils,
>> which would be a new restriction?
> Well, we use ld, which also belongs to binutils, and that doesn't seem
> to restrict the platforms.  Other platforms surely have equivalents to
> both objdump and ld, and they are/would be used appropriately.

We don't use ld, at least not directly. The makefile uses $(CC), just as 
it uses $(CC) to compile. On GNU/Linux hosts this eventually uses ld (or 
gold or whatever), but those details are largely immaterial.

If we required objdump or similar utilities, that would be yet another 
porting hassle. And we might run into platforms where there is no 
objdump-like utility and we have to write one ourselves. This doesn't 
sound good at all.

> globals.h seems to manage.

globals.h manages because we decorate every symbol it needs to find. If 
we have to decorate every C function that might call EQ (either directly 
or indirectly), that would also work but it would be a lot more 
intrusive than globals.h is. And the proposal to use objdump seems to 
acknowledge this, by proposing a method that wouldn't require such 
decoration but would have significant portability problems.

>
>> It should be easy enough to move shared file-static data into another
>> file, that would be compiled only once.
> Possibly not.  The same would have to be done with file global data,
> too.  But doing it that way would involve a great deal of change to the
> source code (testing for the -D option) which would not be popular.

It'd be less change than having to decorate every function that might 
call EQ.

> It would likely slow down the compilation by a very great deal.

That's OK, if the cost is borne only by people who want accurate 
diagnostics. People who want compilation speed can simply turn off the 
accurate-diagnostics flag.

> You're conflating "tedious" with "tedious in the extreme".

We're estimating how much work would be needed. Even if there would be 
more work in changing the byte compiler, it shouldn't be so much more 
work that we need to contort all the rest of Emacs. It's better to 
localize such changes when possible.

> This would place onerous restrictions on what macros were allowed to do,
> and likely be incompatible with a vast proportion of existing macros.

But under both proposals I mentioned, existing macros would work just 
fine with no new restrictions. So what I think you're saying is that if 
people want to write macros that allow for more-accurate diagnostics, 
they'll find that they can't easily do it for some reason. What reason 
would that be? Can you give an example based on some macro already 
defined in Emacs?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: scratch/accurate-warning-pos: next steps.
  2018-12-11 19:59         ` Paul Eggert
@ 2018-12-11 20:51           ` Alan Mackenzie
  2018-12-11 21:11             ` Stefan Monnier
  2018-12-11 21:43             ` Paul Eggert
  0 siblings, 2 replies; 20+ messages in thread
From: Alan Mackenzie @ 2018-12-11 20:51 UTC (permalink / raw)
  To: Paul Eggert; +Cc: emacs-devel

Hello, Paul.

On Tue, Dec 11, 2018 at 11:59:27 -0800, Paul Eggert wrote:
> On 12/11/18 11:20 AM, Alan Mackenzie wrote:

> If we required objdump or similar utilities, that would be yet another 
> porting hassle. And we might run into platforms where there is no 
> objdump-like utility and we have to write one ourselves. This doesn't 
> sound good at all.

Let's just assume we can get a list of functions from somewhere.
Exactly how is a minor implementation detail.  I only suggested objdump
to demonstrate it was possible and easy.  I think the C compiler, any C
compiler, can generate cross references.  That would be another source
of the info.

[ .... ]

> >> It should be easy enough to move shared file-static data into another
> >> file, that would be compiled only once.
> > Possibly not.  The same would have to be done with file global data,
> > too.  But doing it that way would involve a great deal of change to the
> > source code (testing for the -D option) which would not be popular.

> It'd be less change than having to decorate every function that might 
> call EQ.

True, but an irrelevant diversion.  Nobody but you is suggesting
decorating every function.

> > It would likely slow down the compilation by a very great deal.

> That's OK, if the cost is borne only by people who want accurate 
> diagnostics. People who want compilation speed can simply turn off the 
> accurate-diagnostics flag.

WHAT????  There is no such flag, will be no such flag, MUST be no such
flag.  We give accurate diagnostics to EVERYBODY, and we do this FAST.

> > You're conflating "tedious" with "tedious in the extreme".

> We're estimating how much work would be needed. Even if there would be 
> more work in changing the byte compiler, it shouldn't be so much more 
> work that we need to contort all the rest of Emacs.

Nobody but you is talking about "contorting all the rest of Emacs".  The
byte compiler is well over 8000 lines of code, much, possibly most, of
which would need to be rewritten.

Writing the aforementioned preprocessor is MUCH less work.  It is
something I can do and intend to do.

As for amending the reader and byte compiler to work with "Stefan's
format", I know that that is beyond my hacking capacity, even if it
could work.  I suggest you take on the task yourself, or organise a team
to do it.  If at the end you have a working solution to the bug, we can
compare approaches and merge the better one into the master branch.

[ .... ]

> > This would place onerous restrictions on what macros were allowed to do,
> > and likely be incompatible with a vast proportion of existing macros.

> But under both proposals I mentioned, existing macros would work just 
> fine with no new restrictions. So what I think you're saying is that if 
> people want to write macros that allow for more-accurate diagnostics, 
> they'll find that they can't easily do it for some reason.

No, I'm saying that people writing macros don't have to and mustn't have
to care about diagnostic mechanisms.  Lisp hackers deserve to get the
best diagnostics without any such ugly compromises being made.

-- 
Alan Mackenzie (Nuremberg, Germany).

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: scratch/accurate-warning-pos: next steps.
  2018-12-11 20:51           ` Alan Mackenzie
@ 2018-12-11 21:11             ` Stefan Monnier
  2018-12-11 21:35               ` Alan Mackenzie
  2018-12-11 21:43             ` Paul Eggert
  1 sibling, 1 reply; 20+ messages in thread
From: Stefan Monnier @ 2018-12-11 21:11 UTC (permalink / raw)
  To: emacs-devel

> As for amending the reader and byte compiler to work with "Stefan's
> format",

BTW, I have suggested various approaches, not just one.
And all of them have been just rough sketches.
Whether and how they'd work is an open question.


        Stefan




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: scratch/accurate-warning-pos: next steps.
  2018-12-11 21:11             ` Stefan Monnier
@ 2018-12-11 21:35               ` Alan Mackenzie
  2018-12-11 22:58                 ` Stefan Monnier
  0 siblings, 1 reply; 20+ messages in thread
From: Alan Mackenzie @ 2018-12-11 21:35 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

Hello, Stefan.

On Tue, Dec 11, 2018 at 16:11:21 -0500, Stefan Monnier wrote:
> > As for amending the reader and byte compiler to work with "Stefan's
> > format",

> BTW, I have suggested various approaches, not just one.

Apologies, this is true.  And you also expressed a desire not to work on
the problem.  This is accepted and respected.

> And all of them have been just rough sketches.

One of them was what I've implemented as scratch/accurate-warning-pos.  :-)

> Whether and how they'd work is an open question.

The one Paul and I have been referring to was the one where the reader
would return extended lists containing location info alongside the
actual Lisp Object.

>         Stefan

-- 
Alan Mackenzie (Nuremberg, Germany).

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: scratch/accurate-warning-pos: next steps.
  2018-12-11 20:51           ` Alan Mackenzie
  2018-12-11 21:11             ` Stefan Monnier
@ 2018-12-11 21:43             ` Paul Eggert
  1 sibling, 0 replies; 20+ messages in thread
From: Paul Eggert @ 2018-12-11 21:43 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: emacs-devel

On 12/11/18 12:51 PM, Alan Mackenzie wrote:
> Let's just assume we can get a list of functions from somewhere.
> Exactly how is a minor implementation detail.  I only suggested objdump
> to demonstrate it was possible and easy.  I think the C compiler, any C
> compiler, can generate cross references.  That would be another source
> of the info.

I'm afraid that many C compilers don't generate cross references, and 
that this is not a minor implementation detail that we can assume away.

>> That's OK, if the cost is borne only by people who want accurate
>> diagnostics. People who want compilation speed can simply turn off the
>> accurate-diagnostics flag.
> WHAT????  There is no such flag, will be no such flag, MUST be no such
> flag.  We give accurate diagnostics to EVERYBODY, and we do this FAST.

That would be nice, but if we can't do it quickly (without significant 
slowdowns elsewhere, or major contortions to the code) then perhaps 
we'll have to settle for accurate diagnostics as an option.

> The byte compiler is well over 8000 lines of code, much, possibly 
> most, of which would need to be rewritten

Although it would not be trivial to modify 8000 lines of code in a 
tedious but mostly-systematic way, that is not what I would call an 
enormous project. For perspective, the Emacs patch I'm currently hacking 
on (in a different area) is currently about 3000 lines and I wouldn't be 
surprised if it doubled before it's done. Sometimes even 
reasonably-minor conceptual changes require many tedious changes to the 
source code; that's just life when hacking.

> I suggest you take on the task yourself, or organise a team

Thanks, but this issue is not that high on my priority list.

> people writing macros don't have to and mustn't have
> to care about diagnostic mechanisms.  Lisp hackers deserve to get the
> best diagnostics without any such ugly compromises being made.

As I understand it these annotations are not simply concessions to 
limitations of our location implementation; they also provide 
information that are useful for other reasons. I'm still not seeing 
examples of why it would be hard for users to provide the optional 
annotations, if they want the corresponding advantages.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: scratch/accurate-warning-pos: next steps.
  2018-12-11 21:35               ` Alan Mackenzie
@ 2018-12-11 22:58                 ` Stefan Monnier
  0 siblings, 0 replies; 20+ messages in thread
From: Stefan Monnier @ 2018-12-11 22:58 UTC (permalink / raw)
  To: emacs-devel

> The one Paul and I have been referring to was the one where the reader
> would return extended lists containing location info alongside the
> actual Lisp Object.

This is probably a good option in the long run.  But let there be no
doubt: whlie it should not impact performance of compiled code, it will
likely slow down compilation significantly, just because of the
increased size of the representation of the code that the compiler needs
to manipulate.

        Stefan

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2018-12-11 22:58 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-10 18:00 scratch/accurate-warning-pos: next steps Alan Mackenzie
2018-12-10 18:15 ` Eli Zaretskii
2018-12-10 18:28   ` Alan Mackenzie
2018-12-10 18:39     ` Eli Zaretskii
2018-12-10 19:35       ` Alan Mackenzie
2018-12-10 20:06         ` Eli Zaretskii
2018-12-10 21:03           ` Alan Mackenzie
2018-12-11  6:41             ` Eli Zaretskii
2018-12-11 19:21               ` Stefan Monnier
2018-12-11 19:07             ` Stefan Monnier
2018-12-10 23:54 ` Paul Eggert
2018-12-11 11:34   ` Alan Mackenzie
2018-12-11 18:05     ` Paul Eggert
2018-12-11 19:20       ` Alan Mackenzie
2018-12-11 19:59         ` Paul Eggert
2018-12-11 20:51           ` Alan Mackenzie
2018-12-11 21:11             ` Stefan Monnier
2018-12-11 21:35               ` Alan Mackenzie
2018-12-11 22:58                 ` Stefan Monnier
2018-12-11 21:43             ` Paul Eggert

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).