GNU is looking for Google Summer of Code Projects

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

* GNU is looking for Google Summer of Code Projects
@ 2020-03-19 15:10 Rocky Bernstein
  2020-03-19 17:35 ` Stefan Monnier
  0 siblings, 1 reply; 29+ messages in thread
From: Rocky Bernstein @ 2020-03-19 15:10 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 966 bytes --]

In another list I see that GNU has been accepted for Summer of Code and is
looking for projects.

My own favorite ones regarding GNU Emacs have to do with beefing up the
Emacs Lisp runtime and bytecode system. In particular giving proper
callback information from bytecode (bytecode offset, mapping information
from bytecode to line numbers). The bytecode decompiler I started, while it
works on simple examples, I think I could get going in a much more solid
and reliable way.

And of course on the elisp package side, realgud has always been hurting
for help, multii-display windows in the debugger.

But enough about me. What is most in need of help in GNU Emacs that a
summer student might reasonably make progress on?

Discuss the ideas here (please cc me since I don't regularly follow) and
contact me offline and I'll forward the GNU contacts. (Or you probably can
look them up for yourself if so inclined. I am not a coordinator, I am just
a backup mentor).

[-- Attachment #2: Type: text/html, Size: 1104 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: GNU is looking for Google Summer of Code Projects
  2020-03-19 15:10 GNU is looking for Google Summer of Code Projects Rocky Bernstein
@ 2020-03-19 17:35 ` Stefan Monnier
  2020-03-19 17:56   ` Andrea Corallo
  2020-03-19 20:34   ` Correct line/column numbers in byte compiler messages [Was: GNU is looking for Google Summer of Code Projects] Alan Mackenzie
  0 siblings, 2 replies; 29+ messages in thread
From: Stefan Monnier @ 2020-03-19 17:35 UTC (permalink / raw)
  To: Rocky Bernstein; +Cc: emacs-devel

> My own favorite ones regarding GNU Emacs have to do with beefing up the
> Emacs Lisp runtime and bytecode system. In particular giving proper
> callback information from bytecode (bytecode offset, mapping information
> from bytecode to line numbers). The bytecode decompiler I started, while it
> works on simple examples, I think I could get going in a much more solid
> and reliable way.

It should be easy (much smaller than a summer project) to change the
C code so that a bytecode offset can be extracted from the backtrace.

The harder and more interesting part is how to propagate source
information (line numbers and/or lexical variable names and location) to
byte-code.  There are many parts to this, so it's definitely possible to
get some summer project(s) out of it.  E.g. one such project is to change
the reader so it outputs "fat cons cells" (i.e. cons-cells with line-num
info), then arrange for that info to survive `macroexpand-all` and
`cconv.el`.  That could already be used to give more precise line
numbers in bytecompiler warnings.

Another is to devise a way to annotate bytecode objects with a map from
byte-offsets to information about the lexical vars in-scope at that point
and their location (i.e. position in the stack or in the closure).
And then teach Emacs's debugger to use that info.

> But enough about me. What is most in need of help in GNU Emacs that a
> summer student might reasonably make progress on?

I'm sure there are lots of desires.  One I'd suggest is to introduce an
"object description" that can be used both by the GC and pdump code (and
maybe also by `equal` and `print--preprocess`?), so that when changing
the representation of objects or introducing new types we don't have to
make corresponding changes in so many different places.  XEmacs had such
a thing, so there's previous experience on which we can build.
It could also be a step towards replacing our GC with one that's
incremental such the one in XEmacs (or even better: concurrent, unlike
that of XEmacs).

        Stefan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: GNU is looking for Google Summer of Code Projects
  2020-03-19 17:35 ` Stefan Monnier
@ 2020-03-19 17:56   ` Andrea Corallo
  2020-03-19 18:05     ` Andrea Corallo
                       ` (2 more replies)
  2020-03-19 20:34   ` Correct line/column numbers in byte compiler messages [Was: GNU is looking for Google Summer of Code Projects] Alan Mackenzie
  1 sibling, 3 replies; 29+ messages in thread
From: Andrea Corallo @ 2020-03-19 17:56 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Rocky Bernstein, emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> It should be easy (much smaller than a summer project) to change the
> C code so that a bytecode offset can be extracted from the backtrace.
>
> The harder and more interesting part is how to propagate source
> information (line numbers and/or lexical variable names and location) to
> byte-code.  There are many parts to this, so it's definitely possible to
> get some summer project(s) out of it.  E.g. one such project is to change
> the reader so it outputs "fat cons cells" (i.e. cons-cells with line-num
> info), then arrange for that info to survive `macroexpand-all` and
> `cconv.el`.  That could already be used to give more precise line
> numbers in bytecompiler warnings.
>
> Another is to devise a way to annotate bytecode objects with a map from
> byte-offsets to information about the lexical vars in-scope at that point
> and their location (i.e. position in the stack or in the closure).
> And then teach Emacs's debugger to use that info.
>
>> But enough about me. What is most in need of help in GNU Emacs that a
>> summer student might reasonably make progress on?
>
> I'm sure there are lots of desires.  One I'd suggest is to introduce an
> "object description" that can be used both by the GC and pdump code (and
> maybe also by `equal` and `print--preprocess`?), so that when changing
> the representation of objects or introducing new types we don't have to
> make corresponding changes in so many different places.  XEmacs had such
> a thing, so there's previous experience on which we can build.
> It could also be a step towards replacing our GC with one that's
> incremental such the one in XEmacs (or even better: concurrent, unlike
> that of XEmacs).
>
>
>         Stefan

It's probably definitely early to discuss but can't resist.

Do we really need some dedicated low level object?  This should be all
overhead that disappears with compilation anyway.

Also wanted to ask, am I wrong or something has been attempted in this
field?

I'm quite curious on this because the day we get source locations
crossing byte-code we could use the native compiler also as a diagnostic
tool.

  Andrea

-- 
akrl@sdf.org



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: GNU is looking for Google Summer of Code Projects
  2020-03-19 17:56   ` Andrea Corallo
@ 2020-03-19 18:05     ` Andrea Corallo
  2020-03-19 18:19     ` Rocky Bernstein
  2020-03-19 21:26     ` Stefan Monnier
  2 siblings, 0 replies; 29+ messages in thread
From: Andrea Corallo @ 2020-03-19 18:05 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Rocky Bernstein, emacs-devel

Andrea Corallo <akrl@sdf.org> writes:


> Also wanted to ask, am I wrong or something has been attempted on this
                                                     ^^^
                                                   already

-- 
akrl@sdf.org



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: GNU is looking for Google Summer of Code Projects
  2020-03-19 17:56   ` Andrea Corallo
  2020-03-19 18:05     ` Andrea Corallo
@ 2020-03-19 18:19     ` Rocky Bernstein
  2020-03-19 21:26     ` Stefan Monnier
  2 siblings, 0 replies; 29+ messages in thread
From: Rocky Bernstein @ 2020-03-19 18:19 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: Stefan Monnier, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2836 bytes --]

On Thu, Mar 19, 2020 at 1:56 PM Andrea Corallo <akrl@sdf.org> wrote:

> Stefan Monnier <monnier@iro.umontreal.ca> writes:
>
> > It should be easy (much smaller than a summer project) to change the
> > C code so that a bytecode offset can be extracted from the backtrace.
> >
> > The harder and more interesting part is how to propagate source
> > information (line numbers and/or lexical variable names and location) to
> > byte-code.  There are many parts to this, so it's definitely possible to
> > get some summer project(s) out of it.  E.g. one such project is to change
> > the reader so it outputs "fat cons cells" (i.e. cons-cells with line-num
> > info), then arrange for that info to survive `macroexpand-all` and
> > `cconv.el`.  That could already be used to give more precise line
> > numbers in bytecompiler warnings.
> >
> > Another is to devise a way to annotate bytecode objects with a map from
> > byte-offsets to information about the lexical vars in-scope at that point
> > and their location (i.e. position in the stack or in the closure).
> > And then teach Emacs's debugger to use that info.
> >
> >> But enough about me. What is most in need of help in GNU Emacs that a
> >> summer student might reasonably make progress on?
> >
> > I'm sure there are lots of desires.  One I'd suggest is to introduce an
> > "object description" that can be used both by the GC and pdump code (and
> > maybe also by `equal` and `print--preprocess`?), so that when changing
> > the representation of objects or introducing new types we don't have to
> > make corresponding changes in so many different places.  XEmacs had such
> > a thing, so there's previous experience on which we can build.
> > It could also be a step towards replacing our GC with one that's
> > incremental such the one in XEmacs (or even better: concurrent, unlike
> > that of XEmacs).
> >
> >
> >         Stefan
>
> It's probably definitely early to discuss but can't resist.
>
> Do we really need some dedicated low level object?  This should be all
> overhead that disappears with compilation anyway.
>
> Also wanted to ask, am I wrong or something has been attempted in this
> field?
>

In the bit that I have come across looking over byteocde work and history
e,g. see http://rocky.github.io/elisp-bytecode.pdf it has become extremely
clear that there are precious few who understand how the bytecode and
runtime system work. And the people who wrote this initially, e.g. rms, and
later jwz, no longer do so.

No slight to Stefan, Jim Blandy, Paul Eggert or Tom Tromey, but if nothing
else we need a new generation of people to pick up the torch and carry on.


>
> I'm quite curious on this because the day we get source locations
> crossing byte-code we could use the native compiler also as a diagnostic
> tool.
>
>   Andrea
>
> --
> akrl@sdf.org
>

[-- Attachment #2: Type: text/html, Size: 3840 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Correct line/column numbers in byte compiler messages [Was: GNU is looking for Google Summer of Code Projects]
  2020-03-19 17:35 ` Stefan Monnier
  2020-03-19 17:56   ` Andrea Corallo
@ 2020-03-19 20:34   ` Alan Mackenzie
  2020-03-19 20:43     ` Andrea Corallo
                       ` (2 more replies)
  1 sibling, 3 replies; 29+ messages in thread
From: Alan Mackenzie @ 2020-03-19 20:34 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Rocky Bernstein, emacs-devel

Hello, Stefan.

On Thu, Mar 19, 2020 at 13:35:08 -0400, Stefan Monnier wrote:

[ .... ]

> It should be easy (much smaller than a summer project) to change the C
> code so that a bytecode offset can be extracted from the backtrace.

> The harder and more interesting part is how to propagate source
> information (line numbers and/or lexical variable names and location)
> to byte-code.  There are many parts to this, so it's definitely
> possible to get some summer project(s) out of it.  E.g. one such
> project is to change the reader so it outputs "fat cons cells" (i.e.
> cons-cells with line-num info), then arrange for that info to survive
> `macroexpand-all` and `cconv.el`.  That could already be used to give
> more precise line numbers in bytecompiler warnings.

"More precise line numbers" is a misconstruction, even though I've used
such language myself in the past.  Line numbers don't come from a
physical instrument which measures with, say +-1% accuracy.  CORRECT
line (and column) numbers are what we need.

You will recall that the output of correct line/column numbers for byte
compiler messages is a solved problem.  I solved it and presented the
fix in December 2018.  This fix was rejected because it made Emacs
slightly slower.

In the 3½ years I've been grappling with this problem, I've tried all
sorts of things like "fat cons cells".  They don't work, and can't work.
They can't work because large chunks of our software chew up and spit
out cons cells with gay abandon (I'm talking about the byte compiler and
things like cconv.el here).  More to the point, users' macros chew up and
spit out cons cells, and we have no control over them.  So whilst we
could, with a lot of tedious effort, clean up our own software to
preserve cons cells (believe me, I've tried), this would fail in users'
macros.

Since then I've worked a fair bit on creating a "double" Emacs core, one
core being for normal use, the other for byte compiling.  There's a fair
amount of work still to do on this, but I know how to do it.  The problem
is that I have been discouraged by the prospect of having this solution
vetoed too, since it will make Emacs quite a bit bigger.

I don't think it is fair to give this problem to a group of summer
coders.  It is too hard a problem, both technically and politically.

[ .... ]

>         Stefan

-- 
Alan Mackenzie (Nuremberg, Germany).

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Correct line/column numbers in byte compiler messages [Was: GNU is looking for Google Summer of Code Projects]
  2020-03-19 20:34   ` Correct line/column numbers in byte compiler messages [Was: GNU is looking for Google Summer of Code Projects] Alan Mackenzie
@ 2020-03-19 20:43     ` Andrea Corallo
  2020-03-20 19:18       ` Alan Mackenzie
  2020-03-19 20:56     ` Correct line/column numbers in byte compiler messages [Was: GNU is looking for Google Summer of Code Projects] Rocky Bernstein
  2020-03-19 21:41     ` Stefan Monnier
  2 siblings, 1 reply; 29+ messages in thread
From: Andrea Corallo @ 2020-03-19 20:43 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: Rocky Bernstein, Stefan Monnier, emacs-devel

Alan Mackenzie <acm@muc.de> writes:

> "More precise line numbers" is a misconstruction, even though I've used
> such language myself in the past.  Line numbers don't come from a
> physical instrument which measures with, say +-1% accuracy.  CORRECT
> line (and column) numbers are what we need.
>
> You will recall that the output of correct line/column numbers for byte
> compiler messages is a solved problem.  I solved it and presented the
> fix in December 2018.  This fix was rejected because it made Emacs
> slightly slower.
>
> In the 3½ years I've been grappling with this problem, I've tried all
> sorts of things like "fat cons cells".  They don't work, and can't work.
> They can't work because large chunks of our software chew up and spit
> out cons cells with gay abandon (I'm talking about the byte compiler and
> things like cconv.el here).  More to the point, users' macros chew up and
> spit out cons cells, and we have no control over them.  So whilst we
> could, with a lot of tedious effort, clean up our own software to
> preserve cons cells (believe me, I've tried), this would fail in users'
> macros.
>
> Since then I've worked a fair bit on creating a "double" Emacs core, one
> core being for normal use, the other for byte compiling.  There's a fair
> amount of work still to do on this, but I know how to do it.  The problem
> is that I have been discouraged by the prospect of having this solution
> vetoed too, since it will make Emacs quite a bit bigger.
>
> I don't think it is fair to give this problem to a group of summer
> coders.  It is too hard a problem, both technically and politically.
>

Hi Alan,

Sorry I'm new to Emacs development, where can be found the code of your
attempt?  Is it in a feature branch?

Thanks

  Andrea

-- 
akrl@sdf.org



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Correct line/column numbers in byte compiler messages [Was: GNU is looking for Google Summer of Code Projects]
  2020-03-19 20:34   ` Correct line/column numbers in byte compiler messages [Was: GNU is looking for Google Summer of Code Projects] Alan Mackenzie
  2020-03-19 20:43     ` Andrea Corallo
@ 2020-03-19 20:56     ` Rocky Bernstein
  2020-03-19 22:05       ` Stefan Monnier
  2020-03-20 19:25       ` Alan Mackenzie
  2020-03-19 21:41     ` Stefan Monnier
  2 siblings, 2 replies; 29+ messages in thread
From: Rocky Bernstein @ 2020-03-19 20:56 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: Stefan Monnier, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 3508 bytes --]

On Thu, Mar 19, 2020 at 4:35 PM Alan Mackenzie <acm@muc.de> wrote:

> Hello, Stefan.
>
> On Thu, Mar 19, 2020 at 13:35:08 -0400, Stefan Monnier wrote:
>
> [ .... ]
>
> > It should be easy (much smaller than a summer project) to change the C
> > code so that a bytecode offset can be extracted from the backtrace.
>
> > The harder and more interesting part is how to propagate source
> > information (line numbers and/or lexical variable names and location)
> > to byte-code.  There are many parts to this, so it's definitely
> > possible to get some summer project(s) out of it.  E.g. one such
> > project is to change the reader so it outputs "fat cons cells" (i.e.
> > cons-cells with line-num info), then arrange for that info to survive
> > `macroexpand-all` and `cconv.el`.  That could already be used to give
> > more precise line numbers in bytecompiler warnings.
>
> "More precise line numbers" is a misconstruction, even though I've used
> such language myself in the past.  Line numbers don't come from a
> physical instrument which measures with, say +-1% accuracy.  CORRECT
> line (and column) numbers are what we need.
>

A bytecode offset is exact and accurate.  Right now this information
unavailable. I think the interpreter uses C pointers stored in a register.
So just recording the bytecode offset is a little bit of a slowdown, but
not that much. I doubt it would even register as %1 slower.

But just that would open the way for improvements. This is doable by a
Summer student - Stefan thinks it trivial. But tas you point out there is
overhead in getting it accepted and into GNU Emacs.

Having access to the bytecode offset in a traceback there next are several
options. At the lowest level there is just showing that along with a
disassembly of the bytecode.
And that I believe that is also doable by a summer student.

Going further are a number of options that folks have mentioned so I won't
expand on that.


> You will recall that the output of correct line/column numbers for byte
> compiler messages is a solved problem.  I solved it and presented the
> fix in December 2018.  This fix was rejected because it made Emacs
> slightly slower.
>
> In the 3½ years I've been grappling with this problem, I've tried all
> sorts of things like "fat cons cells".  They don't work, and can't work.
> They can't work because large chunks of our software chew up and spit
> out cons cells with gay abandon (I'm talking about the byte compiler and
> things like cconv.el here).  More to the point, users' macros chew up and
> spit out cons cells, and we have no control over them.  So whilst we
> could, with a lot of tedious effort, clean up our own software to
> preserve cons cells (believe me, I've tried), this would fail in users'
> macros.
>
> Since then I've worked a fair bit on creating a "double" Emacs core, one
> core being for normal use, the other for byte compiling.  There's a fair
> amount of work still to do on this, but I know how to do it.  The problem
> is that I have been discouraged by the prospect of having this solution
> vetoed too, since it will make Emacs quite a bit bigger.
>
> I don't think it is fair to give this problem to a group of summer
> coders.  It is too hard a problem, both technically and politically.
>


Ok. So do you have a suggestion for what a summer student might do?


>
> [ .... ]
>
> >         Stefan
>
> --
> Alan Mackenzie (Nuremberg, Germany).
>

[-- Attachment #2: Type: text/html, Size: 4502 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: GNU is looking for Google Summer of Code Projects
  2020-03-19 17:56   ` Andrea Corallo
  2020-03-19 18:05     ` Andrea Corallo
  2020-03-19 18:19     ` Rocky Bernstein
@ 2020-03-19 21:26     ` Stefan Monnier
  2020-03-19 21:45       ` Andrea Corallo
  2 siblings, 1 reply; 29+ messages in thread
From: Stefan Monnier @ 2020-03-19 21:26 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: Rocky Bernstein, emacs-devel

> Do we really need some dedicated low level object?

I don't know what you mean, sorry.

> This should be all overhead that disappears with compilation anyway.

I get the impression that you were referring to the part where I talked
about the "object description" for the runtime system.  Compilation is
of no help here.  It's already all happening in C code.

Maybe rewriting in a language with a bit more introspection might make
an "object description" more-or-less readily available (maybe the
Remacs work might qualify), but we'd still need to connect that with
a GC and with pdump etc...

        Stefan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Correct line/column numbers in byte compiler messages [Was: GNU is looking for Google Summer of Code Projects]
  2020-03-19 20:34   ` Correct line/column numbers in byte compiler messages [Was: GNU is looking for Google Summer of Code Projects] Alan Mackenzie
  2020-03-19 20:43     ` Andrea Corallo
  2020-03-19 20:56     ` Correct line/column numbers in byte compiler messages [Was: GNU is looking for Google Summer of Code Projects] Rocky Bernstein
@ 2020-03-19 21:41     ` Stefan Monnier
  2020-03-19 22:09       ` Stefan Monnier
  2020-03-20 20:10       ` Alan Mackenzie
  2 siblings, 2 replies; 29+ messages in thread
From: Stefan Monnier @ 2020-03-19 21:41 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: Rocky Bernstein, emacs-devel

> things like cconv.el here).  More to the point, users' macros chew up and
> spit out cons cells, and we have no control over them.  So whilst we
> could, with a lot of tedious effort, clean up our own software to
> preserve cons cells (believe me, I've tried), this would fail in users'
> macros.

I think fat-cons cells are cheap to implement (with (hopefully) no
performance impact when not used or weird semantic artifacts like the
fat-symbol approach you tried) and can work 99.9% right in the long term
with an incremental way to get there.

Furthermore it matches the "usual" way to deal with this problem, so
there's very little doubt about whether it can work or not.

> Since then I've worked a fair bit on creating a "double" Emacs core, one
> core being for normal use, the other for byte compiling.  There's a fair
> amount of work still to do on this, but I know how to do it.  The problem
> is that I have been discouraged by the prospect of having this solution
> vetoed too, since it will make Emacs quite a bit bigger.

I'd probably try to veto it, indeed.  It might be a good solution in the
short-term but it'd just slow down our progress in the long term.


        Stefan




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: GNU is looking for Google Summer of Code Projects
  2020-03-19 21:26     ` Stefan Monnier
@ 2020-03-19 21:45       ` Andrea Corallo
  2020-03-19 23:07         ` Rocky Bernstein
  0 siblings, 1 reply; 29+ messages in thread
From: Andrea Corallo @ 2020-03-19 21:45 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Rocky Bernstein, emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> Do we really need some dedicated low level object?
>
> I don't know what you mean, sorry.
>
>> This should be all overhead that disappears with compilation anyway.
>
> I get the impression that you were referring to the part where I talked
> about the "object description" for the runtime system.  Compilation is
> of no help here.  It's already all happening in C code.
>
> Maybe rewriting in a language with a bit more introspection might make
> an "object description" more-or-less readily available (maybe the
> Remacs work might qualify), but we'd still need to connect that with
> a GC and with pdump etc...

Ops I now understand, we are talking about 4 different problems:

1 source location going through the compilation pipeline
2 debug information into bytecode to debug
3 autogenerate GC and pdumper code from obj description
4 GC

Clear to me thanks.

  Andrea

--
akrl@sdf.org



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Correct line/column numbers in byte compiler messages [Was: GNU is looking for Google Summer of Code Projects]
  2020-03-19 20:56     ` Correct line/column numbers in byte compiler messages [Was: GNU is looking for Google Summer of Code Projects] Rocky Bernstein
@ 2020-03-19 22:05       ` Stefan Monnier
  2020-03-20 19:25       ` Alan Mackenzie
  1 sibling, 0 replies; 29+ messages in thread
From: Stefan Monnier @ 2020-03-19 22:05 UTC (permalink / raw)
  To: Rocky Bernstein; +Cc: Alan Mackenzie, emacs-devel

>> "More precise line numbers" is a misconstruction, even though I've used
>> such language myself in the past.  Line numbers don't come from a
>> physical instrument which measures with, say +-1% accuracy.  CORRECT
>> line (and column) numbers are what we need.
> A bytecode offset is exact and accurate.

I think he was talking about line-number info in byte-compiler warnings
(where the info comes from the source code, not from the backtrace).
Currently we use a hack that gives us approximate locations which can be
wildly off-the-mark.

> Right now this information unavailable. I think the interpreter uses
> C pointers stored in a register.  So just recording the bytecode
> offset is a little bit of a slowdown, but not that much.

Indeed.  If it's too high we could make it conditional on a boolean
variable.

> I doubt it would even register as %1 slower.

Reminds me that another project could be to try and speed up function
calls.  The difficulty here is that we don't really know what's the main
source of the cost, so there's a good chance that any specific attempt
will give disappointing results.  It'd still be useful in helping us
getting a better idea of what it is that takes time.

> But just that would open the way for improvements. This is doable by a
> Summer student - Stefan thinks it trivial.

Just recording this info in the backtrace (at a minor performance cost)
is indeed very easy.

> But tas you point out there is overhead in getting it accepted and
> into GNU Emacs.

Right.  Until this info is actually usable by tools like the debugger,
the code would inevitably be #ifdef'd out unless it has zero-cost which
seems unlikely.

> Having access to the bytecode offset in a traceback there next are
> several options.  At the lowest level there is just showing that along
> with a disassembly of the bytecode.
> And that I believe that is also doable by a summer student.

Agreed.

        Stefan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Correct line/column numbers in byte compiler messages [Was: GNU is looking for Google Summer of Code Projects]
  2020-03-19 21:41     ` Stefan Monnier
@ 2020-03-19 22:09       ` Stefan Monnier
  2020-03-20 20:10       ` Alan Mackenzie
  1 sibling, 0 replies; 29+ messages in thread
From: Stefan Monnier @ 2020-03-19 22:09 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: Rocky Bernstein, emacs-devel

> I think fat-cons cells are cheap to implement (with (hopefully) no
> performance impact when not used or weird semantic artifacts like the
> fat-symbol approach you tried) and can work 99.9% right in the long term
> with an incremental way to get there.

Reminds me that another project could be to provide something like
Scheme's `syntax-rules` or `syntax-case`.  These could be attractive on
their own while also making it easier to correctly propagate source-level
line-number information.


        Stefan




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: GNU is looking for Google Summer of Code Projects
  2020-03-19 21:45       ` Andrea Corallo
@ 2020-03-19 23:07         ` Rocky Bernstein
  0 siblings, 0 replies; 29+ messages in thread
From: Rocky Bernstein @ 2020-03-19 23:07 UTC (permalink / raw)
  Cc: Stefan Monnier, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1285 bytes --]

On Thu, Mar 19, 2020 at 5:45 PM Andrea Corallo <akrl@sdf.org> wrote:

> Stefan Monnier <monnier@iro.umontreal.ca> writes:
>
> >> Do we really need some dedicated low level object?
> >
> > I don't know what you mean, sorry.
> >
> >> This should be all overhead that disappears with compilation anyway.
> >
> > I get the impression that you were referring to the part where I talked
> > about the "object description" for the runtime system.  Compilation is
> > of no help here.  It's already all happening in C code.
> >
> > Maybe rewriting in a language with a bit more introspection might make
> > an "object description" more-or-less readily available (maybe the
> > Remacs work might qualify), but we'd still need to connect that with
> > a GC and with pdump etc...
>
> Ops I now understand, we are talking about 4 different problems:
>
> 1 source location going through the compilation pipeline
> 2 debug information into bytecode to debug
>

The above two I think a summer student could do.
Clarification of item 2. There is *reporting* location information
especially in traceback information on an error, which I suppose could be
considered "to debug".

3 autogenerate GC and pdumper code from obj description
> 4 GC
>
> Clear to me thanks.
>
>   Andrea
>
> --
> akrl@sdf.org
>

[-- Attachment #2: Type: text/html, Size: 2220 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Correct line/column numbers in byte compiler messages [Was: GNU is looking for Google Summer of Code Projects]
  2020-03-19 20:43     ` Andrea Corallo
@ 2020-03-20 19:18       ` Alan Mackenzie
  2020-03-21 11:22         ` Andrea Corallo
  0 siblings, 1 reply; 29+ messages in thread
From: Alan Mackenzie @ 2020-03-20 19:18 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: Rocky Bernstein, Stefan Monnier, emacs-devel

Hello, Andrea.

On Thu, Mar 19, 2020 at 20:43:15 +0000, Andrea Corallo wrote:
> Alan Mackenzie <acm@muc.de> writes:

> > "More precise line numbers" is a misconstruction, even though I've used
> > such language myself in the past.  Line numbers don't come from a
> > physical instrument which measures with, say +-1% accuracy.  CORRECT
> > line (and column) numbers are what we need.

> > You will recall that the output of correct line/column numbers for byte
> > compiler messages is a solved problem.  I solved it and presented the
> > fix in December 2018.  This fix was rejected because it made Emacs
> > slightly slower.

> > In the 3½ years I've been grappling with this problem, I've tried all
> > sorts of things like "fat cons cells".  They don't work, and can't work.
> > They can't work because large chunks of our software chew up and spit
> > out cons cells with gay abandon (I'm talking about the byte compiler and
> > things like cconv.el here).  More to the point, users' macros chew up and
> > spit out cons cells, and we have no control over them.  So whilst we
> > could, with a lot of tedious effort, clean up our own software to
> > preserve cons cells (believe me, I've tried), this would fail in users'
> > macros.

> > Since then I've worked a fair bit on creating a "double" Emacs core, one
> > core being for normal use, the other for byte compiling.  There's a fair
> > amount of work still to do on this, but I know how to do it.  The problem
> > is that I have been discouraged by the prospect of having this solution
> > vetoed too, since it will make Emacs quite a bit bigger.

> > I don't think it is fair to give this problem to a group of summer
> > coders.  It is too hard a problem, both technically and politically.


> Hi Alan,

> Sorry I'm new to Emacs development, where can be found the code of your
> attempt?  Is it in a feature branch?

It's in the branch scratch/accurate-warning-pos.  The commit which
converted the unfinished work to a bug fix was:

    commit 2e04ddadab266d245a3bd0f6c19223ea515bdb90
    Author: Alan Mackenzie <acm@muc.de>
    Date:   Fri Nov 30 14:55:48 2018 +0000

        Sundry amendments to branch scratch/accurate-warning-pos.

(except, I think it still outputs two positions for each warning
message: the traditional one, and the new correct one).

> Thanks

>   Andrea

> -- 
> akrl@sdf.org

-- 
Alan Mackenzie (Nuremberg, Germany).



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Correct line/column numbers in byte compiler messages [Was: GNU is looking for Google Summer of Code Projects]
  2020-03-19 20:56     ` Correct line/column numbers in byte compiler messages [Was: GNU is looking for Google Summer of Code Projects] Rocky Bernstein
  2020-03-19 22:05       ` Stefan Monnier
@ 2020-03-20 19:25       ` Alan Mackenzie
  1 sibling, 0 replies; 29+ messages in thread
From: Alan Mackenzie @ 2020-03-20 19:25 UTC (permalink / raw)
  To: Rocky Bernstein; +Cc: Stefan Monnier, emacs-devel

Hello, Rocky.

On Thu, Mar 19, 2020 at 16:56:45 -0400, Rocky Bernstein wrote:
> On Thu, Mar 19, 2020 at 4:35 PM Alan Mackenzie <acm@muc.de> wrote:

[ .... ]

> > I don't think it is fair to give this problem to a group of summer
> > coders.  It is too hard a problem, both technically and politically.

> Ok. So do you have a suggestion for what a summer student might do?

Sorry, no I don't.  It would need to be something in that sweet spot
between being dull and tedious and being too challenging and difficult.

-- 
Alan Mackenzie (Nuremberg, Germany).



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Correct line/column numbers in byte compiler messages [Was: GNU is looking for Google Summer of Code Projects]
  2020-03-19 21:41     ` Stefan Monnier
  2020-03-19 22:09       ` Stefan Monnier
@ 2020-03-20 20:10       ` Alan Mackenzie
  2020-03-20 21:23         ` Rocky Bernstein
                           ` (2 more replies)
  1 sibling, 3 replies; 29+ messages in thread
From: Alan Mackenzie @ 2020-03-20 20:10 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Rocky Bernstein, emacs-devel

Hello, Stefan.

On Thu, Mar 19, 2020 at 17:41:30 -0400, Stefan Monnier wrote:
> > things like cconv.el here).  More to the point, users' macros chew up and
> > spit out cons cells, and we have no control over them.  So whilst we
> > could, with a lot of tedious effort, clean up our own software to
> > preserve cons cells (believe me, I've tried), this would fail in users'
> > macros.

> I think fat-cons cells are cheap to implement (with (hopefully) no
> performance impact when not used .....

They may be cheap to implement in themselves, but adapting the entire
byte compiler and all our macros to the heavily restricted semantics
they would impose would be an enormous job.  I've tried something
similar, and gave up in exhaustion.

> or weird semantic artifacts like the fat-symbol approach you tried),

Er, not "tried" but "implemented", please.  The implementation was
complete, and was capable of bootstrapping Emacs with correct positions
for all the (then plentiful) warning messages.

> and can work 99.9% right in the long term with an incremental way to
> get there.

Where does this 99.9% come from?  How is this cons tracking you're
proposing supposed to work, when there are an infinite number of
occurrences of the likes of

    (cons (car form) (cdr form))

in our code?

> Furthermore it matches the "usual" way to deal with this problem, so
> there's very little doubt about whether it can work or not.

Are you saying that this is how other Lisp compilers deal with source
code positions?  How do they deal with the difficult problem of user
macros?  Could you give me an example of a free Lisp system which works
this way?  I'd be interested in having a look at it.

I think there's quite a bit of doubt as to whether this could work
effectively in Emacs.  The way to dispel this doubt is for Somebody (tm)
to implement it.

> > Since then I've worked a fair bit on creating a "double" Emacs core,
> > one core being for normal use, the other for byte compiling.
> > There's a fair amount of work still to do on this, but I know how to
> > do it.  The problem is that I have been discouraged by the prospect
> > of having this solution vetoed too, since it will make Emacs quite a
> > bit bigger.

> I'd probably try to veto it, indeed.  It might be a good solution in
> the short-term but it'd just slow down our progress in the long term.

Fixing bugs slows down our progress?

To which the answer is to install the working solution pending the
implementation of something better, after which it can be superseded.
Somehow, even that strategy tends to get vetoed.

>         Stefan

-- 
Alan Mackenzie (Nuremberg, Germany).

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Correct line/column numbers in byte compiler messages [Was: GNU is looking for Google Summer of Code Projects]
  2020-03-20 20:10       ` Alan Mackenzie
@ 2020-03-20 21:23         ` Rocky Bernstein
  2020-03-20 21:27         ` Clément Pit-Claudel
  2020-03-20 21:30         ` Stefan Monnier
  2 siblings, 0 replies; 29+ messages in thread
From: Rocky Bernstein @ 2020-03-20 21:23 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: Stefan Monnier, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 6080 bytes --]

Before I begin, as has been pointed out, let us be clear that the
discussion has changed. Originally I was interested in better call stack
and traceback information, a *run-time *thing,  which I was proposing as a
Summer of Code project. The discussion now is compiler locations at
*compile* time.

So be it.

The problems however do have one thing in common: how to represent a
location.

Let me also correct one earler correction that the "better" location was
construed to be a single line and column number. A better way, I believe,
to think of locations is as

   1. an *container*, where *container* is defined to be something,
   2. an offset off of that container, of some kind where "units" are
   defined to be something, and
   3. an optional length of those units. When the length isn't given it is
   assumed to be the value one.

For example, if you are intersted in only representing a line and column
number, one value, an offset would do it.

Note that this abstraction works equally well for other kinds of things
like bytecode and the offset would be the bytecode offset. Many times
contiguous sequence of bytecode many times maps to a contiguous sequence in
the source code. Of course that's not necessarily *always* the case, but
already this is wandering astray of the proposal to follow for me describe
how to deal with this. But let me say again if you just care about a single
bytecode instruction, set the length to be 1 or leave out the length field.

I know this might not be satisfying to some, but here is a extremely simple
but accurate proposal that and doesn't incur a lot of overhead and can deal
with a lot of generality.

A unit of compilation I think is a *function. *That is the container part.
Attach to the function its location information in some other way (e.g.
it's container might be a file name if that is appropriate, or defined
inside a macro...)

A function before bytecompile compiles it is a kind of lambda which is a
kind of S-Expression. A location inside that could simply be a tree node's
preorder number. Or the pre-order number and a number of successor nodes in
preorder traversal. As with the simple-minded run-time error location
proposal: when we have a bytecode offset,  mark that position in a
disassembly, the same thing can be done here: show the position or range of
nodes in the S-expresion that you've got.

What if the bytecode compiler has done some wild and weird optimization
changes?  Just show what S-exp you were working on and mark where you were.

I know for some or many it may not be satisfying, but it is the honest
truth and I'd rather have that than nothing or the wrong guess.

Having done this first step, the problem is divided a little bit so carry
on: discuss and conquer. A separate tool outside of the compiler proper can
be written to take this and given pointers to where the source might be
located figure out where in the source code that might be. Maybe pattern
matching would work, dunno, but let me not try to speculate too much.

Finally, in this proposal though I am not suggesting changing the current
behavior: by default the additional precise geeky information might be
shown only in some sort of "super hacker" verbose compilation mode.

On Fri, Mar 20, 2020 at 4:10 PM Alan Mackenzie <acm@muc.de> wrote:

> Hello, Stefan.
>
> On Thu, Mar 19, 2020 at 17:41:30 -0400, Stefan Monnier wrote:
> > > things like cconv.el here).  More to the point, users' macros chew up
> and
> > > spit out cons cells, and we have no control over them.  So whilst we
> > > could, with a lot of tedious effort, clean up our own software to
> > > preserve cons cells (believe me, I've tried), this would fail in users'
> > > macros.
>
> > I think fat-cons cells are cheap to implement (with (hopefully) no
> > performance impact when not used .....
>
> They may be cheap to implement in themselves, but adapting the entire
> byte compiler and all our macros to the heavily restricted semantics
> they would impose would be an enormous job.  I've tried something
> similar, and gave up in exhaustion.
>
> > or weird semantic artifacts like the fat-symbol approach you tried),
>
> Er, not "tried" but "implemented", please.  The implementation was
> complete, and was capable of bootstrapping Emacs with correct positions
> for all the (then plentiful) warning messages.
>
> > and can work 99.9% right in the long term with an incremental way to
> > get there.
>
> Where does this 99.9% come from?  How is this cons tracking you're
> proposing supposed to work, when there are an infinite number of
> occurrences of the likes of
>
>     (cons (car form) (cdr form))
>
> in our code?
>
> > Furthermore it matches the "usual" way to deal with this problem, so
> > there's very little doubt about whether it can work or not.
>
> Are you saying that this is how other Lisp compilers deal with source
> code positions?  How do they deal with the difficult problem of user
> macros?  Could you give me an example of a free Lisp system which works
> this way?  I'd be interested in having a look at it.
>
> I think there's quite a bit of doubt as to whether this could work
> effectively in Emacs.  The way to dispel this doubt is for Somebody (tm)
> to implement it.
>
> > > Since then I've worked a fair bit on creating a "double" Emacs core,
> > > one core being for normal use, the other for byte compiling.
> > > There's a fair amount of work still to do on this, but I know how to
> > > do it.  The problem is that I have been discouraged by the prospect
> > > of having this solution vetoed too, since it will make Emacs quite a
> > > bit bigger.
>
> > I'd probably try to veto it, indeed.  It might be a good solution in
> > the short-term but it'd just slow down our progress in the long term.
>
> Fixing bugs slows down our progress?
>
> To which the answer is to install the working solution pending the
> implementation of something better, after which it can be superseded.
> Somehow, even that strategy tends to get vetoed.
>
> >         Stefan
>
> --
> Alan Mackenzie (Nuremberg, Germany).
>

[-- Attachment #2: Type: text/html, Size: 7257 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Correct line/column numbers in byte compiler messages [Was: GNU is looking for Google Summer of Code Projects]
  2020-03-20 20:10       ` Alan Mackenzie
  2020-03-20 21:23         ` Rocky Bernstein
@ 2020-03-20 21:27         ` Clément Pit-Claudel
  2020-03-20 23:46           ` Stefan Monnier
  2020-03-20 21:30         ` Stefan Monnier
  2 siblings, 1 reply; 29+ messages in thread
From: Clément Pit-Claudel @ 2020-03-20 21:27 UTC (permalink / raw)
  To: emacs-devel

On 20/03/2020 16.10, Alan Mackenzie wrote:
> Are you saying that this is how other Lisp compilers deal with
> source code positions?  How do they deal with the difficult problem
> of user macros?  Could you give me an example of a free Lisp system
> which works this way?  I'd be interested in having a look at it.

not sure if it counts as a Lisp compiler, but Racket does this; the "fat cons cells" are called syntax objects.  See https://blog.racket-lang.org/2011/04/writing-syntax-case-macros.html for a good explanation, including this intro:

> The main idea with Racket’s macro system (and with other syntax-case
> systems) is that macros are syntax-to-syntax functions, just like the
> case of defmacro, except that instead of raw S-expressions you’re
> dealing with syntax objects. This becomes very noticeable when
> identifiers are handled: instead of dealing with plain symbols,
> you’re dealing with these syntax values (called “identifiers” in this
> case) that are essentially a symbol and some opaque information that
> represents the lexical scope for its source. In several syntax-case
> systems this is the only difference from defmacro macros, but in the
> Racket case this applies to everything — identifiers, numbers, other
> immediate constants, and even function applications, etc — they are
> all the same S-expression values that you’re used to, except wrapped
> with additional information. Another thing that is unique to Racket
> is the extra information: in addition to the opaque lexical context,
> there is also source information and arbitrary properties (there are
> also certificates, but that’s ignorable for this text).
It would be worth checking more closely what Guile does.  Its syntax-manipulating functions automatically propagate "source properties", but from reading https://www.gnu.org/software/guile/manual/html_node/Source-Properties.html it seems that it might use something similar to your approach?

Clément.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Correct line/column numbers in byte compiler messages [Was: GNU is looking for Google Summer of Code Projects]
  2020-03-20 20:10       ` Alan Mackenzie
  2020-03-20 21:23         ` Rocky Bernstein
  2020-03-20 21:27         ` Clément Pit-Claudel
@ 2020-03-20 21:30         ` Stefan Monnier
  2 siblings, 0 replies; 29+ messages in thread
From: Stefan Monnier @ 2020-03-20 21:30 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: Rocky Bernstein, emacs-devel

>> I think fat-cons cells are cheap to implement (with (hopefully) no
>> performance impact when not used .....
> They may be cheap to implement in themselves, but adapting the entire
> byte compiler and all our macros to the heavily restricted semantics
> they would impose would be an enormous job.

The idea is that you want to make it work acceptably even if only some
of the cons-cells are fat.  This way, as you adapt the existing code to
pay attention/preserve fat-cons-cells, your location information gets
more and more precise, but even before you've done this enormous job,
you already get some of the benefit.

> I've tried something similar, and gave up in exhaustion.

If you want "exact" results, then you'll get tired long before getting
there, yes.  But it's not needed.

> Where does this 99.9% come from?  How is this cons tracking you're
> proposing supposed to work, when there are an infinite number of
> occurrences of the likes of
>
>     (cons (car form) (cdr form))
>
> in our code?

This still preserves info inside the fat-cons-cells contained in (car
form) and (cdr form), so it's not as bad as it looks.

Of course, when such code is applied recursively on all sub-expressions
(i.e. in a code-walker such as macroexpand-all, cconv, and byte-opt)
then we lose all the info, so we do need to change those before we can
benefit, but AFAICT those 3 are the only crucial ones (there are a few
other code-walkers around, such as generator.el) and hopefully some of
that rewrite can be made fairly mechanically.

> Are you saying that this is how other Lisp compilers deal with source
> code positions?  How do they deal with the difficult problem of user
> macros?

Not sure about Common-Lisp, but Scheme systems deal with it by
distinguishing "sexp" from "syntax objects" where syntax objects are
basically sexps wrapped (recursively) within location wrappers.

> I think there's quite a bit of doubt as to whether this could work
> effectively in Emacs.

I have no doubt that it can work.

I am not sure it'll be acceptable, OTOH, because it will depend on the
overhead it will impose on the execution of the byte-compiler.

> The way to dispel this doubt is for Somebody (tm) to implement it.

Exactly.

> To which the answer is to install the working solution pending the
> implementation of something better, after which it can be superseded.

Ever heard of temporary hacks that end up permanent?

Take for example the issue of .... oh, I don't know ... line numbers in
error messages?  ;-)

To a large extent the reason we don't have better line-numbers right now
is because of the hack we accepted some years ago, so now instead of
working on "giving line-numbers in error messages", we're reduced to
"improve the precision of line-numbers in error messages" which is not
nearly as pressing an issue.

        Stefan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Correct line/column numbers in byte compiler messages [Was: GNU is looking for Google Summer of Code Projects]
  2020-03-20 21:27         ` Clément Pit-Claudel
@ 2020-03-20 23:46           ` Stefan Monnier
  0 siblings, 0 replies; 29+ messages in thread
From: Stefan Monnier @ 2020-03-20 23:46 UTC (permalink / raw)
  To: Clément Pit-Claudel; +Cc: emacs-devel

> properties", but from reading
> https://www.gnu.org/software/guile/manual/html_node/Source-Properties.html
> it seems that it might use something similar to your approach?

It seems that Guile does it along the lines of "fat cons cells"
according to their example:

    scheme@(guile-user)> (xxx)
    <unnamed port>:4:1: In procedure module-lookup:
    <unnamed port>:4:1: Unbound variable: xxx
    
    scheme@(guile-user)> xxx
    ERROR: In procedure module-lookup:
    ERROR: Unbound variable: xxx

where only the code with a cons-cell gets location information.
That's also what the earlier text says:

    The way that source properties are stored means that Guile cannot
    associate source properties with individual symbols, keywords,
    characters, booleans, or small integers.

Tho, IIUC it seems that rather than "fat cons cells" they may be using
a hash-table indexed with the object (cons-cells or otherwise).


        Stefan




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Correct line/column numbers in byte compiler messages [Was: GNU is looking for Google Summer of Code Projects]
  2020-03-20 19:18       ` Alan Mackenzie
@ 2020-03-21 11:22         ` Andrea Corallo
  2020-03-21 15:30           ` Correct line/column numbers in byte compiler messages Alan Mackenzie
  0 siblings, 1 reply; 29+ messages in thread
From: Andrea Corallo @ 2020-03-21 11:22 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: Rocky Bernstein, Stefan Monnier, emacs-devel

Alan Mackenzie <acm@muc.de> writes:

> It's in the branch scratch/accurate-warning-pos.  The commit which
> converted the unfinished work to a bug fix was:
>
>     commit 2e04ddadab266d245a3bd0f6c19223ea515bdb90
>     Author: Alan Mackenzie <acm@muc.de>
>     Date:   Fri Nov 30 14:55:48 2018 +0000
>
>         Sundry amendments to branch scratch/accurate-warning-pos.
>
> (except, I think it still outputs two positions for each warning
> message: the traditional one, and the new correct one).
>

I all,

I've took a very quick look to the accurate-warning-pos and did some
measures.

I've measured the bootstrap time and run elisp-benchmarks (dhrystone
take out cause broken on both branches) comparing accurate-warning-pos
against the last in-tree commit it's based on.  Here what I see on my
dev machine:

* b071398ba3 @ scratch/accurate-warning-pos

** bootstrap

   real 2m31.076s
   user 15m8.049s
   sys  0m38.087s

** elisp-benckmarks

   | test           | non-gc avg (s) | gc avg (s) | gcs avg | tot avg (s) | tot avg err (s) |
   |----------------+----------------+------------+---------+-------------+-----------------|
   | bubble-no-cons |          11.53 |       0.04 |       4 |       11.57 |            0.01 |
   | bubble         |           4.74 |       3.81 |     484 |        8.55 |            0.00 |
   | fibn-rec       |           6.35 |       0.00 |       0 |        6.35 |            0.00 |
   | fibn-tc        |           5.59 |       0.00 |       0 |        5.59 |            0.02 |
   | fibn           |          11.90 |       0.00 |       0 |       11.90 |            0.01 |
   | inclist        |          17.86 |       0.01 |       1 |       17.87 |            0.01 |
   | listlen-tc     |           6.48 |       0.00 |       0 |        6.48 |            0.01 |
   | nbody          |           3.58 |       6.70 |     839 |       10.28 |            0.01 |
   | pidigits       |           5.60 |       5.68 |     457 |       11.28 |            0.03 |
   |----------------+----------------+------------+---------+-------------+-----------------|
   | total          |          73.62 |      16.24 |    1785 |       89.86 |            0.04 |


* b619777dd6 (baseline)

** bootstrap

   real 2m20.762s
   user 13m35.418s
   sys  0m37.349s

** elisp-benckmarks

   | test           | non-gc avg (s) | gc avg (s) | gcs avg | tot avg (s) | tot avg err (s) |
   |----------------+----------------+------------+---------+-------------+-----------------|
   | bubble-no-cons |          11.43 |       0.04 |       4 |       11.47 |            0.00 |
   | bubble         |           4.67 |       3.58 |     487 |        8.25 |            0.01 |
   | fibn-rec       |           6.21 |       0.00 |       0 |        6.21 |            0.00 |
   | fibn-tc        |           5.68 |       0.00 |       0 |        5.68 |            0.00 |
   | fibn           |          11.47 |       0.00 |       0 |       11.47 |            0.00 |
   | inclist        |          17.37 |       0.01 |       1 |       17.38 |            0.00 |
   | listlen-tc     |           6.46 |       0.00 |       0 |        6.46 |            0.00 |
   | nbody          |           3.36 |       6.24 |     839 |        9.60 |            0.01 |
   | pidigits       |           5.66 |       5.53 |     457 |       11.19 |            0.03 |
   |----------------+----------------+------------+---------+-------------+-----------------|
   | total          |          72.32 |      15.39 |    1788 |       87.71 |            0.03 |

The outcome as I see it is that total bootstrap time gets bigger 1.1x
while normal runtime appears not affected.

For my quick understanding of how it works this is expected.  The
additional branch and compare against symbols_with_pos_enabled in `eq'
is a kind of branch that is very easily predictable by any modern CPU,
therefore when the feature is off (not compiling) it becomes transparent
(I'd see a compiler branch hit there too).

elisp-benchmarks are not completely rapresentative for now but
again... better than nothing.

Am I missing something else here or we are trading out the exact
solution for like ~15% off the byte compile-time?  I think this feature
would be a big step forward for our toolchain opening many
possibilities.  I suspect fat conses will requires more modifications
across the whole compilation pipeline (including macros?) bringing a
less accurate result and still they have to prove the smaller overhead.

At this point I start suspecting I'm missing something very big here, am
I?

Anyway thanks Alan for this.

  Andrea

--
akrl@sdf.org



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Correct line/column numbers in byte compiler messages
  2020-03-21 11:22         ` Andrea Corallo
@ 2020-03-21 15:30           ` Alan Mackenzie
  2020-03-21 16:28             ` Andrea Corallo
  0 siblings, 1 reply; 29+ messages in thread
From: Alan Mackenzie @ 2020-03-21 15:30 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: Rocky Bernstein, Stefan Monnier, emacs-devel

Hello, Andrea.

On Sat, Mar 21, 2020 at 11:22:03 +0000, Andrea Corallo wrote:
> Alan Mackenzie <acm@muc.de> writes:

> > It's in the branch scratch/accurate-warning-pos.  The commit which
> > converted the unfinished work to a bug fix was:

> >     commit 2e04ddadab266d245a3bd0f6c19223ea515bdb90
> >     Author: Alan Mackenzie <acm@muc.de>
> >     Date:   Fri Nov 30 14:55:48 2018 +0000

> >         Sundry amendments to branch scratch/accurate-warning-pos.

> > (except, I think it still outputs two positions for each warning
> > message: the traditional one, and the new correct one).


> I all,

> I've took a very quick look to the accurate-warning-pos and did some
> measures.

Thanks, that's appreciated.

> I've measured the bootstrap time and run elisp-benchmarks (dhrystone
> take out cause broken on both branches) comparing accurate-warning-pos
> against the last in-tree commit it's based on.  Here what I see on my
> dev machine:

> * b071398ba3 @ scratch/accurate-warning-pos

> ** bootstrap

>    real 2m31.076s
>    user 15m8.049s
>    sys  0m38.087s

> ** elisp-benckmarks

>    | test           | non-gc avg (s) | gc avg (s) | gcs avg | tot avg (s) | tot avg err (s) |
>    |----------------+----------------+------------+---------+-------------+-----------------|
>    | bubble-no-cons |          11.53 |       0.04 |       4 |       11.57 |            0.01 |
>    | bubble         |           4.74 |       3.81 |     484 |        8.55 |            0.00 |
>    | fibn-rec       |           6.35 |       0.00 |       0 |        6.35 |            0.00 |
>    | fibn-tc        |           5.59 |       0.00 |       0 |        5.59 |            0.02 |
>    | fibn           |          11.90 |       0.00 |       0 |       11.90 |            0.01 |
>    | inclist        |          17.86 |       0.01 |       1 |       17.87 |            0.01 |
>    | listlen-tc     |           6.48 |       0.00 |       0 |        6.48 |            0.01 |
>    | nbody          |           3.58 |       6.70 |     839 |       10.28 |            0.01 |
>    | pidigits       |           5.60 |       5.68 |     457 |       11.28 |            0.03 |
>    |----------------+----------------+------------+---------+-------------+-----------------|
>    | total          |          73.62 |      16.24 |    1785 |       89.86 |            0.04 |


> * b619777dd6 (baseline)

> ** bootstrap

>    real 2m20.762s
>    user 13m35.418s
>    sys  0m37.349s

> ** elisp-benckmarks

>    | test           | non-gc avg (s) | gc avg (s) | gcs avg | tot avg (s) | tot avg err (s) |
>    |----------------+----------------+------------+---------+-------------+-----------------|
>    | bubble-no-cons |          11.43 |       0.04 |       4 |       11.47 |            0.00 |
>    | bubble         |           4.67 |       3.58 |     487 |        8.25 |            0.01 |
>    | fibn-rec       |           6.21 |       0.00 |       0 |        6.21 |            0.00 |
>    | fibn-tc        |           5.68 |       0.00 |       0 |        5.68 |            0.00 |
>    | fibn           |          11.47 |       0.00 |       0 |       11.47 |            0.00 |
>    | inclist        |          17.37 |       0.01 |       1 |       17.38 |            0.00 |
>    | listlen-tc     |           6.46 |       0.00 |       0 |        6.46 |            0.00 |
>    | nbody          |           3.36 |       6.24 |     839 |        9.60 |            0.01 |
>    | pidigits       |           5.66 |       5.53 |     457 |       11.19 |            0.03 |
>    |----------------+----------------+------------+---------+-------------+-----------------|
>    | total          |          72.32 |      15.39 |    1788 |       87.71 |            0.03 |

> The outcome as I see it is that total bootstrap time gets bigger 1.1x
> while normal runtime appears not affected.

Well, it looks like the normal runtime is around 2.x% slower for
scratch/accurate-warning-pos.

> For my quick understanding of how it works this is expected.  The
> additional branch and compare against symbols_with_pos_enabled in `eq'
> is a kind of branch that is very easily predictable by any modern CPU,
> therefore when the feature is off (not compiling) it becomes transparent
> (I'd see a compiler branch hit there too).

In other words, the processor will test symbols_with_pos_enabled
simultaneously with starting the continuation for the "not" case.

This extra test in the EQ code was always the main thing in the slowdown
occurring in this git branch.

When I timed things back in 2018, I got a slowdown of somewhat more than
2.x%.  May I ask what sort of processor you're using?  Mine (unchanged
since then) is an AMD Ryzen.

> elisp-benchmarks are not completely rapresentative for now but
> again... better than nothing.

> Am I missing something else here or we are trading out the exact
> solution for like ~15% off the byte compile-time?  I think this feature
> would be a big step forward for our toolchain opening many
> possibilities.  I suspect fat conses will requires more modifications
> across the whole compilation pipeline (including macros?) bringing a
> less accurate result and still they have to prove the smaller overhead.

> At this point I start suspecting I'm missing something very big here, am
> I?

> Anyway thanks Alan for this.

Thanks!

>   Andrea

> --
> akrl@sdf.org

-- 
Alan Mackenzie (Nuremberg, Germany).



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Correct line/column numbers in byte compiler messages
  2020-03-21 15:30           ` Correct line/column numbers in byte compiler messages Alan Mackenzie
@ 2020-03-21 16:28             ` Andrea Corallo
  2020-03-21 18:37               ` Andrea Corallo
  0 siblings, 1 reply; 29+ messages in thread
From: Andrea Corallo @ 2020-03-21 16:28 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: Rocky Bernstein, Stefan Monnier, emacs-devel

Alan Mackenzie <acm@muc.de> writes:

> Hello, Andrea.
>
> On Sat, Mar 21, 2020 at 11:22:03 +0000, Andrea Corallo wrote:

>> The outcome as I see it is that total bootstrap time gets bigger 1.1x
>> while normal runtime appears not affected.
>
> Well, it looks like the normal runtime is around 2.x% slower for
> scratch/accurate-warning-pos.

Well I studied physics so for me 2% is pretty much zero :) :) Joking
apart I'm not sure this is really sufficient to conclude is noise or
not.

>> For my quick understanding of how it works this is expected.  The
>> additional branch and compare against symbols_with_pos_enabled in `eq'
>> is a kind of branch that is very easily predictable by any modern CPU,
>> therefore when the feature is off (not compiling) it becomes transparent
>> (I'd see a compiler branch hit there too).
>
> In other words, the processor will test symbols_with_pos_enabled
> simultaneously with starting the continuation for the "not" case.

The processor will just speculate guessing the target branch without
having to wait for symbols_with_pos_enabled value to be loaded.  Given
this change rarely, speculation there should be pretty much always
correct.

I'd wrap symbols_with_pos_enabled into something like:

#define SYMBOLS_WITH_POS_ENABLED \
   __builtin_expect(symbols_with_pos_enabled, 0) 

To make sure we minimize instruction cache overhead too.

> This extra test in the EQ code was always the main thing in the slowdown
> occurring in this git branch.

Is the EQ overhead the main/only one?  Also GC seems marginally affected.

I think would be interesting to write a nano benchmark EQ focused to
test this accurately.

> When I timed things back in 2018, I got a slowdown of somewhat more than
> 2.x%.  May I ask what sort of processor you're using?  Mine (unchanged
> since then) is an AMD Ryzen.

I did the test on a "Xeon E5-1660 v3".  I think we can classify it as a
good system from few (6?) years ago.  Not very fast by today's standards
but still quite beefy in terms of caches.

Bests

  Andrea

--
akrl@sdf.org

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Correct line/column numbers in byte compiler messages
  2020-03-21 16:28             ` Andrea Corallo
@ 2020-03-21 18:37               ` Andrea Corallo
  2020-03-21 20:19                 ` Alan Mackenzie
  0 siblings, 1 reply; 29+ messages in thread
From: Andrea Corallo @ 2020-03-21 18:37 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: Rocky Bernstein, Stefan Monnier, emacs-devel

Have to apologize this is probably the quarantine effect but I couldn't
resist testing this:

#+BEGIN_SRC lisp
;; -*- lexical-binding: t; -*-
(require 'cl-lib)
(defvar elb-list (cl-loop for i from 0 to 1500000
                          if (cl-oddp i)
                          collect 'a
                          else
                          collect 'b))

(defun elb-eq ()
  (let ((n 0))
    (dolist (l elb-list n)
      (when (eq 'b l)
        (cl-incf n)))))

(defun elb-eq-entry ()
  (dotimes (_ 1000)
    (elb-eq)))
#+END_SRC

Results:

b619777dd6 (baseline) 50.09s
accurate-warning-pos  51.28s

This is about 2% perf penalty.

Interestingly with the __builtin_expect trick applied exec time gets
back to 50.65s.

We could probably find a benchmark that better highlights the difference
(this is potentially dominated by cache misses while pointer chasing the
list) but is it worth?

Regards

  Andrea

--
akrl@sdf.org



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Correct line/column numbers in byte compiler messages
  2020-03-21 18:37               ` Andrea Corallo
@ 2020-03-21 20:19                 ` Alan Mackenzie
  2020-03-21 21:08                   ` Andrea Corallo
  2020-03-22 11:26                   ` Alan Mackenzie
  0 siblings, 2 replies; 29+ messages in thread
From: Alan Mackenzie @ 2020-03-21 20:19 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: Rocky Bernstein, Stefan Monnier, emacs-devel

Hello, Andrea.

On Sat, Mar 21, 2020 at 18:37:13 +0000, Andrea Corallo wrote:
> Have to apologize this is probably the quarantine effect ....

As of today, we're under quarantine, too.  :-(

> .... but I couldn't resist testing this:

> #+BEGIN_SRC lisp
> ;; -*- lexical-binding: t; -*-
> (require 'cl-lib)
> (defvar elb-list (cl-loop for i from 0 to 1500000
>                           if (cl-oddp i)
>                           collect 'a
>                           else
>                           collect 'b))

> (defun elb-eq ()
>   (let ((n 0))
>     (dolist (l elb-list n)
>       (when (eq 'b l)
>         (cl-incf n)))))

> (defun elb-eq-entry ()
>   (dotimes (_ 1000)
>     (elb-eq)))
> #+END_SRC

> Results:

> b619777dd6 (baseline) 50.09s
> accurate-warning-pos  51.28s

> This is about 2% perf penalty.

On my Ryzen, I'm seeing a 50% penalty.  :-(  (Admittedly that's
comparing the year old branch to current master.  I suppose I should
build the correct comparable revision and try again.)  This suggests
that the branch prediction logic isn't present (or isn't active) on the
Ryzen.

> Interestingly with the __builtin_expect trick applied exec time gets
> back to 50.65s.

How do you do this?  I couldn't make much sense of the documentation of
__builtin_expect.  :-(

> We could probably find a benchmark that better highlights the difference
> (this is potentially dominated by cache misses while pointer chasing the
> list) but is it worth?

Could I ask you to do the following timing.

Evaluate the following (e.g. in *scratch*):

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(defmacro time-it (&rest forms)
  "Time the running of a sequence of forms using `float-time'.
Call like this: \"M-: (time-it (foo ...) (bar ...) ...)\"."
  `(let ((start (float-time)))
    ,@forms
    (- (float-time) start)))

(defun time-scroll (&optional arg)
  (interactive "P")
  (message "%s"
           (time-it
            (condition-case nil
                (while t
                  (if arg (scroll-down) (scroll-up))
                  (sit-for 0))
              (error nil)))))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

, visit .../emacs/src/xdisp.c, and do M-: (time-scroll).  This scrolls
through the buffer and prints a timing in the minibuffer.  (N.B. to run
this again, type something at BOB and undo it, thus marking the
fontification as stale.)

I'm seeing 19.4s vs. 22.2s, which is around 15% difference.  :-(

> Regards

>   Andrea

> --
> akrl@sdf.org

-- 
Alan Mackenzie (Nuremberg, Germany).



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Correct line/column numbers in byte compiler messages
  2020-03-21 20:19                 ` Alan Mackenzie
@ 2020-03-21 21:08                   ` Andrea Corallo
  2020-03-21 23:39                     ` Andrea Corallo
  2020-03-22 11:26                   ` Alan Mackenzie
  1 sibling, 1 reply; 29+ messages in thread
From: Andrea Corallo @ 2020-03-21 21:08 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: Rocky Bernstein, Stefan Monnier, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2600 bytes --]

Alan Mackenzie <acm@muc.de> writes:

> On my Ryzen, I'm seeing a 50% penalty.  :-(  (Admittedly that's
> comparing the year old branch to current master.  I suppose I should
> build the correct comparable revision and try again.)  This suggests
> that the branch prediction logic isn't present (or isn't active) on the
> Ryzen.

This is very strange.  You cerntaly have to compare branches from the
same epoch.  I pretty sure in the last year Paul pushed changes to the
inline policy with some measureble effect on performance.

>> Interestingly with the __builtin_expect trick applied exec time gets
>> back to 50.65s.
>
> How do you do this?  I couldn't make much sense of the documentation of
> __builtin_expect.  :-(

I attach the very simple patch I tried.  Basically the compiler has an
euristic branch predictor (in GCC predict.c) that is used to order the
final basic block output.  The wanted outcome is to have the most likely
execution line as sequential, this on modern CPUs to maximize the
front-end bandwidth.  "__builtin_expect" is just a strong hint to this
predictor.

>> We could probably find a benchmark that better highlights the difference
>> (this is potentially dominated by cache misses while pointer chasing the
>> list) but is it worth?
>
> Could I ask you to do the following timing.
>
> Evaluate the following (e.g. in *scratch*):
>
> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> (defmacro time-it (&rest forms)
>   "Time the running of a sequence of forms using `float-time'.
> Call like this: \"M-: (time-it (foo ...) (bar ...) ...)\"."
>   `(let ((start (float-time)))
>     ,@forms
>     (- (float-time) start)))
>
> (defun time-scroll (&optional arg)
>   (interactive "P")
>   (message "%s"
>            (time-it
>             (condition-case nil
>                 (while t
>                   (if arg (scroll-down) (scroll-up))
>                   (sit-for 0))
>               (error nil)))))
> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>
> , visit .../emacs/src/xdisp.c, and do M-: (time-scroll).  This scrolls
> through the buffer and prints a timing in the minibuffer.  (N.B. to run
> this again, type something at BOB and undo it, thus marking the
> fontification as stale.)
>
> I'm seeing 19.4s vs. 22.2s, which is around 15% difference.  :-(

I get 19.30 sec against 16.65 that is 15% difference here too.  This is
extremely interesting and would be worth profiling.

I bet on the GC for this! (Note I'm notoriously wrong when speculating
on benchmarks :)

Regards

  Andrea
  
-- 
akrl@sdf.org


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: comp-hint.patch --]
[-- Type: text/x-diff, Size: 1572 bytes --]

diff --git a/src/lisp.h b/src/lisp.h
index a22043026a..6e3cca1bbc 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -394,8 +394,12 @@ typedef EMACS_INT Lisp_Word;
 /* #define lisp_h_EQ(x, y) (XLI (x) == XLI (y)) */
 
 /* verify (NIL_IS_ZERO) */
+
+#define SYMBOLS_WITH_POS_ENABLED			\
+  __builtin_expect(symbols_with_pos_enabled, 0)
+
 #define lisp_h_EQ(x, y) ((XLI ((x)) == XLI ((y)))       \
-  || (symbols_with_pos_enabled    \
+  || (SYMBOLS_WITH_POS_ENABLED    \
   && (SYMBOL_WITH_POS_P ((x))                        \
       ? BARE_SYMBOL_P ((y))                               \
         ? (XSYMBOL_WITH_POS((x)))->sym == (y)          \
@@ -424,7 +428,7 @@ typedef EMACS_INT Lisp_Word;
 #define lisp_h_BARE_SYMBOL_P(x) TAGGEDP ((x), Lisp_Symbol)
 /* verify (NIL_IS_ZERO) */
 #define lisp_h_SYMBOLP(x) ((BARE_SYMBOL_P ((x)) ||               \
-                            (symbols_with_pos_enabled && (SYMBOL_WITH_POS_P ((x))))))
+                            (SYMBOLS_WITH_POS_ENABLED && (SYMBOL_WITH_POS_P ((x))))))
 #define lisp_h_TAGGEDP(a, tag) \
    (! (((unsigned) (XLI (a) >> (USE_LSB_TAG ? 0 : VALBITS)) \
 	- (unsigned) (tag)) \
@@ -463,7 +467,7 @@ typedef EMACS_INT Lisp_Word;
 /* verify (NIL_IS_ZERO) */
 # define lisp_h_XSYMBOL(a)                      \
      (eassert (SYMBOLP ((a))),                      \
-      (!symbols_with_pos_enabled             \
+      (!SYMBOLS_WITH_POS_ENABLED             \
       ? (XBARE_SYMBOL ((a)))             \
        : (BARE_SYMBOL_P ((a)))           \
       ? (XBARE_SYMBOL ((a)))                                    \

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: Correct line/column numbers in byte compiler messages
  2020-03-21 21:08                   ` Andrea Corallo
@ 2020-03-21 23:39                     ` Andrea Corallo
  0 siblings, 0 replies; 29+ messages in thread
From: Andrea Corallo @ 2020-03-21 23:39 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: Rocky Bernstein, Stefan Monnier, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1155 bytes --]

Andrea Corallo <akrl@sdf.org> writes:

> Alan Mackenzie <acm@muc.de> writes:
>
>>
>> I'm seeing 19.4s vs. 22.2s, which is around 15% difference.  :-(
>
> I get 19.30 sec against 16.65 that is 15% difference here too.  This is
> extremely interesting and would be worth profiling.
>
> I bet on the GC for this! (Note I'm notoriously wrong when speculating
> on benchmarks :)

At this point the evening has been dedicated to this.  Apparently part
of the issue is that GCC is quite conservative on the inline policy and
because the more complex condition in EQ decide not to inline this (at
least I see this is not done always).  This is true for EQ and some of
his friends defined in lisp.h.  Part of the cost is not the branch
itself but the additional procedure activations.

Pushing a little more into inlining GCC with the raw attached patch I
got it down on the mackenzie-test to 18.17s that is still/just 9% out.
The remaining part is probably a little harder to investigate but I
still suspect more of a 'side reasons' than a fundamental one.

I suggest we try the fine tuning when rebased on 28 if not too hard.

Regards

  Andrea

--
akrl@sdf.org

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: tmp.patch --]
[-- Type: text/x-diff, Size: 3857 bytes --]

diff --git a/src/lisp.h b/src/lisp.h
index a22043026a..d0c56d7bbb 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -394,8 +394,12 @@ typedef EMACS_INT Lisp_Word;
 /* #define lisp_h_EQ(x, y) (XLI (x) == XLI (y)) */
 
 /* verify (NIL_IS_ZERO) */
+
+#define SYMBOLS_WITH_POS_ENABLED			\
+  __builtin_expect(symbols_with_pos_enabled, 0)
+
 #define lisp_h_EQ(x, y) ((XLI ((x)) == XLI ((y)))       \
-  || (symbols_with_pos_enabled    \
+  || (SYMBOLS_WITH_POS_ENABLED    \
   && (SYMBOL_WITH_POS_P ((x))                        \
       ? BARE_SYMBOL_P ((y))                               \
         ? (XSYMBOL_WITH_POS((x)))->sym == (y)          \
@@ -424,7 +428,7 @@ typedef EMACS_INT Lisp_Word;
 #define lisp_h_BARE_SYMBOL_P(x) TAGGEDP ((x), Lisp_Symbol)
 /* verify (NIL_IS_ZERO) */
 #define lisp_h_SYMBOLP(x) ((BARE_SYMBOL_P ((x)) ||               \
-                            (symbols_with_pos_enabled && (SYMBOL_WITH_POS_P ((x))))))
+                            (SYMBOLS_WITH_POS_ENABLED && (SYMBOL_WITH_POS_P ((x))))))
 #define lisp_h_TAGGEDP(a, tag) \
    (! (((unsigned) (XLI (a) >> (USE_LSB_TAG ? 0 : VALBITS)) \
 	- (unsigned) (tag)) \
@@ -463,7 +467,7 @@ typedef EMACS_INT Lisp_Word;
 /* verify (NIL_IS_ZERO) */
 # define lisp_h_XSYMBOL(a)                      \
      (eassert (SYMBOLP ((a))),                      \
-      (!symbols_with_pos_enabled             \
+      (!SYMBOLS_WITH_POS_ENABLED             \
       ? (XBARE_SYMBOL ((a)))             \
        : (BARE_SYMBOL_P ((a)))           \
       ? (XBARE_SYMBOL ((a)))                                    \
@@ -1137,38 +1141,38 @@ enum More_Lisp_Bits
 #define MOST_POSITIVE_FIXNUM (EMACS_INT_MAX >> INTTYPEBITS)
 #define MOST_NEGATIVE_FIXNUM (-1 - MOST_POSITIVE_FIXNUM)
 \f
-INLINE bool
+INLINE bool  __attribute__ ((always_inline))
 PSEUDOVECTORP (Lisp_Object a, int code)
 {
   return lisp_h_PSEUDOVECTORP (a, code);
 }
 
-INLINE bool
+INLINE bool  __attribute__ ((always_inline))
 (BARE_SYMBOL_P) (Lisp_Object x)
 {
   return lisp_h_BARE_SYMBOL_P (x);
 }
 
-INLINE bool
+INLINE bool  __attribute__ ((always_inline))
 (SYMBOL_WITH_POS_P) (Lisp_Object x)
 {
   return lisp_h_SYMBOL_WITH_POS_P (x);
 }
 
-INLINE bool
+INLINE bool  __attribute__ ((always_inline))
 (SYMBOLP) (Lisp_Object x)
 {
   return lisp_h_SYMBOLP (x);
 }
 
-INLINE struct Lisp_Symbol_With_Pos *
+INLINE struct Lisp_Symbol_With_Pos *  __attribute__ ((always_inline))
 XSYMBOL_WITH_POS (Lisp_Object a)
 {
     eassert (SYMBOL_WITH_POS_P (a));
     return XUNTAG (a, Lisp_Vectorlike, struct Lisp_Symbol_With_Pos);
 }
 
-INLINE struct Lisp_Symbol * ATTRIBUTE_NO_SANITIZE_UNDEFINED
+INLINE struct Lisp_Symbol *  __attribute__ ((always_inline)) ATTRIBUTE_NO_SANITIZE_UNDEFINED
 (XBARE_SYMBOL) (Lisp_Object a)
 {
 #if USE_LSB_TAG
@@ -1186,7 +1190,7 @@ INLINE struct Lisp_Symbol * ATTRIBUTE_NO_SANITIZE_UNDEFINED
 #endif
 }
 
-INLINE struct Lisp_Symbol * ATTRIBUTE_NO_SANITIZE_UNDEFINED
+INLINE struct Lisp_Symbol *  __attribute__ ((always_inline)) ATTRIBUTE_NO_SANITIZE_UNDEFINED
 (XSYMBOL) (Lisp_Object a)
 {
   return lisp_h_XSYMBOL (a);
@@ -1336,7 +1340,7 @@ INLINE bool
 
 /* Return true if X and Y are the same object, reckoning a symbol with
    position as being the same as the bare symbol.  */
-INLINE bool
+inline bool __attribute__ ((always_inline))
 (EQ) (Lisp_Object x, Lisp_Object y)
 {
   return lisp_h_EQ (x, y);
@@ -2690,7 +2694,7 @@ XOVERLAY (Lisp_Object a)
   return XUNTAG (a, Lisp_Vectorlike, struct Lisp_Overlay);
 }
 
-INLINE Lisp_Object
+INLINE Lisp_Object  __attribute__ ((always_inline))
 SYMBOL_WITH_POS_SYM (Lisp_Object a)
 {
   if (!SYMBOL_WITH_POS_P (a))
@@ -2698,7 +2702,7 @@ SYMBOL_WITH_POS_SYM (Lisp_Object a)
   return XSYMBOL_WITH_POS (a)->sym;
 }
 
-INLINE Lisp_Object
+INLINE Lisp_Object  __attribute__ ((always_inline))
 SYMBOL_WITH_POS_POS (Lisp_Object a)
 {
   if (!SYMBOL_WITH_POS_P (a))

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: Correct line/column numbers in byte compiler messages
  2020-03-21 20:19                 ` Alan Mackenzie
  2020-03-21 21:08                   ` Andrea Corallo
@ 2020-03-22 11:26                   ` Alan Mackenzie
  1 sibling, 0 replies; 29+ messages in thread
From: Alan Mackenzie @ 2020-03-22 11:26 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: Rocky Bernstein, Stefan Monnier, emacs-devel

Hello, Andrea.

On Sat, Mar 21, 2020 at 20:19:54 +0000, Alan Mackenzie wrote:
> On Sat, Mar 21, 2020 at 18:37:13 +0000, Andrea Corallo wrote:
> > Have to apologize this is probably the quarantine effect ....

> As of today, we're under quarantine, too.  :-(

> > .... but I couldn't resist testing this:

> > #+BEGIN_SRC lisp
> > ;; -*- lexical-binding: t; -*-
> > (require 'cl-lib)
> > (defvar elb-list (cl-loop for i from 0 to 1500000
> >                           if (cl-oddp i)
> >                           collect 'a
> >                           else
> >                           collect 'b))

> > (defun elb-eq ()
> >   (let ((n 0))
> >     (dolist (l elb-list n)
> >       (when (eq 'b l)
> >         (cl-incf n)))))

> > (defun elb-eq-entry ()
> >   (dotimes (_ 1000)
> >     (elb-eq)))
> > #+END_SRC

> > Results:

> > b619777dd6 (baseline) 50.09s
> > accurate-warning-pos  51.28s

> > This is about 2% perf penalty.

> On my Ryzen, I'm seeing a 50% penalty.  :-(  (Admittedly that's
> comparing the year old branch to current master.  I suppose I should
> build the correct comparable revision and try again.)  This suggests
> that the branch prediction logic isn't present (or isn't active) on the
> Ryzen.

OK, I've done just that (with revision
b619777dd67e271d639c6fb1d031650af8fd79e6 from 2019-03-30) and I now see
what you see:
b619777:                      76.067s
scratch/accurate-warning-pos: 77.656s.
master:                       52.423s

So, clearly, optimisations to Emacs in the last year have borne fruit.
Maybe that optimisaton would be useful in s/a-w-p.

> > Interestingly with the __builtin_expect trick applied exec time gets
> > back to 50.65s.

> How do you do this?  I couldn't make much sense of the documentation of
> __builtin_expect.  :-(

I've read your patch in your other mail, and I will apply it and try it
out.

[ .... ]

> > Regards

> >   Andrea

> > --
> > akrl@sdf.org

-- 
Alan Mackenzie (Nuremberg, Germany).



^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2020-03-22 11:26 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-03-19 15:10 GNU is looking for Google Summer of Code Projects Rocky Bernstein
2020-03-19 17:35 ` Stefan Monnier
2020-03-19 17:56   ` Andrea Corallo
2020-03-19 18:05     ` Andrea Corallo
2020-03-19 18:19     ` Rocky Bernstein
2020-03-19 21:26     ` Stefan Monnier
2020-03-19 21:45       ` Andrea Corallo
2020-03-19 23:07         ` Rocky Bernstein
2020-03-19 20:34   ` Correct line/column numbers in byte compiler messages [Was: GNU is looking for Google Summer of Code Projects] Alan Mackenzie
2020-03-19 20:43     ` Andrea Corallo
2020-03-20 19:18       ` Alan Mackenzie
2020-03-21 11:22         ` Andrea Corallo
2020-03-21 15:30           ` Correct line/column numbers in byte compiler messages Alan Mackenzie
2020-03-21 16:28             ` Andrea Corallo
2020-03-21 18:37               ` Andrea Corallo
2020-03-21 20:19                 ` Alan Mackenzie
2020-03-21 21:08                   ` Andrea Corallo
2020-03-21 23:39                     ` Andrea Corallo
2020-03-22 11:26                   ` Alan Mackenzie
2020-03-19 20:56     ` Correct line/column numbers in byte compiler messages [Was: GNU is looking for Google Summer of Code Projects] Rocky Bernstein
2020-03-19 22:05       ` Stefan Monnier
2020-03-20 19:25       ` Alan Mackenzie
2020-03-19 21:41     ` Stefan Monnier
2020-03-19 22:09       ` Stefan Monnier
2020-03-20 20:10       ` Alan Mackenzie
2020-03-20 21:23         ` Rocky Bernstein
2020-03-20 21:27         ` Clément Pit-Claudel
2020-03-20 23:46           ` Stefan Monnier
2020-03-20 21:30         ` Stefan Monnier

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.