Circular records: how do I best handle them? (The new correct warning position branch now bootstraps in native compilation!)

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Circular records: how do I best handle them?  (The new correct warning position branch now bootstraps in native compilation!)
@ 2021-12-23 21:15 Alan Mackenzie
  2021-12-24 13:56 ` Stefan Monnier
  0 siblings, 1 reply; 9+ messages in thread
From: Alan Mackenzie @ 2021-12-23 21:15 UTC (permalink / raw)
  To: emacs-devel

Hello, Emacs.

Firstly, the scratch/correct-warning-pos git branch (which is to deliver
the correct source code position in byte compilation warning messages)
bootstrapped with native compilation for the first time yesterday
evening.  Basically, it is now working.  :-)

However, several files.el failed to compile due to infinite recursion.
A bit of gdb'ing in Ffuncall showed that whilst compiling ffap.el (at
least), there were circular record structures (where @dfn{records} are a
pseudovector type very like normal vectors, but with their zeroth
element supposedly representing their type with a symbol).

The recursion was in the (new) function `byte-compile-strip-s-p-1'
(where "strip-s-p" stands for "strip-symbol-positions").  This function
recursively descends list and vector structures, stripping positions
from symbols-with-position it finds.

However, I seem to have circularity in a record structure, where two
records seem to point to eachother.  I suspect that it is the zeroth
elements of these records (which are meant to be symbols) that are
pointing at eachother.

Would somebody (?Stefan M, perhaps) please suggest to me how I might
efficiently cope with these circular structures.  Do I need to maintain
a list of already encountered Lisp Objects, somehow, and check this list
before recursing into the function?  It would be nice if the new
mechanism were reasonably efficient (but I don't know how important that
is).

Thanks!

-- 
Alan Mackenzie (Nuremberg, Germany).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Circular records: how do I best handle them?  (The new correct warning position branch now bootstraps in native compilation!)
  2021-12-23 21:15 Circular records: how do I best handle them? (The new correct warning position branch now bootstraps in native compilation!) Alan Mackenzie
@ 2021-12-24 13:56 ` Stefan Monnier
  2021-12-24 20:37   ` Alan Mackenzie
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Monnier @ 2021-12-24 13:56 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: emacs-devel

> The recursion was in the (new) function `byte-compile-strip-s-p-1'
> (where "strip-s-p" stands for "strip-symbol-positions").  This function
> recursively descends list and vector structures, stripping positions
> from symbols-with-position it finds.
>
> However, I seem to have circularity in a record structure, where two
> records seem to point to eachother.  I suspect that it is the zeroth
> elements of these records (which are meant to be symbols) that are
> pointing at eachother.

Hmm... circularity is quite normal in data structures, yes.
But presumably this is only applied to source code where circularity is
very rare.  Could it be that you end up recursing in elements which
actually aren't part of the source code (and hence can't have
symbols-with-positions)?

Also, I see in `byte-compile-strip-s-p-1` that you only look inside conses
and vectors.  So I'm not sure what makes you say the recursion was in
records since records are similar to vectors but aren't `vectorp` so
AFAICT your code won't recurse into them.

Also that means your code won't handle the case where the source code
includes literal hash-tables, literal records, literal char-tables,
literal strings-with-properties, ...

These are quite rare and maybe it's OK to disallow them in source code,
but maybe a more robust approach would be to make sure your lread.c code
doesn't generate symbols-with-positions within anything else than conses
and vectors?
[ Tho it wouldn't prevent a macro from expanding into a literal hash-table
  from source that only has conses&vectors :-(  ]

> Would somebody (?Stefan M, perhaps) please suggest to me how I might
> efficiently cope with these circular structures.  Do I need to maintain
> a list of already encountered Lisp Objects, somehow, and check this list
> before recursing into the function?

That's what we do elsewhere, yes, except that history taught us that
a hash-table is a better choice to avoid scalability problems.
Tho in your case you'd only need to keep the stack of objects inside of
which you're currently recursing, so maybe a list is good enough.

BTW, instead of doing it by hand, you can call `print--preprocess` which
will do the recurse-and-fill-hash-table for you.  You can see how it's
used in `cl-print.el`.

        Stefan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Circular records: how do I best handle them?  (The new correct warning position branch now bootstraps in native compilation!)
  2021-12-24 13:56 ` Stefan Monnier
@ 2021-12-24 20:37   ` Alan Mackenzie
  2021-12-26 20:35     ` Stefan Monnier
  0 siblings, 1 reply; 9+ messages in thread
From: Alan Mackenzie @ 2021-12-24 20:37 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

Hello, Stefan.

On Fri, Dec 24, 2021 at 08:56:56 -0500, Stefan Monnier wrote:
> > The recursion was in the (new) function `byte-compile-strip-s-p-1'
> > (where "strip-s-p" stands for "strip-symbol-positions").  This function
> > recursively descends list and vector structures, stripping positions
> > from symbols-with-position it finds.

> > However, I seem to have circularity in a record structure, where two
> > records seem to point to eachother.  I suspect that it is the zeroth
> > elements of these records (which are meant to be symbols) that are
> > pointing at eachother.

> Hmm... circularity is quite normal in data structures, yes.
> But presumably this is only applied to source code where circularity is
> very rare.  Could it be that you end up recursing in elements which
> actually aren't part of the source code (and hence can't have
> symbols-with-positions)?

I honestly don't know at the moment.

> Also, I see in `byte-compile-strip-s-p-1` that you only look inside conses
> and vectors.  So I'm not sure what makes you say the recursion was in
> records since records are similar to vectors but aren't `vectorp` so
> AFAICT your code won't recurse into them.

byte-compile-strip-s-p-1 has been enhanced to handle records, too,
though I haven't committed that bit yet (along with quite a lot of other
amendments, too).

> Also that means your code won't handle the case where the source code
> includes literal hash-tables, literal records, literal char-tables,
> literal strings-with-properties, ...

The positions get stripped off hash-table keys in puthash.  I'm unsure
about the others, just offhand.

Put it this way, make bootstrap is currently working, although a bit
delicate.  My preliminary timings on a benchmark are as fast as
expected, so it's looking good.

> These are quite rare and maybe it's OK to disallow them in source code,
> but maybe a more robust approach would be to make sure your lread.c code
> doesn't generate symbols-with-positions within anything else than conses
> and vectors?
> [ Tho it wouldn't prevent a macro from expanding into a literal hash-table
>   from source that only has conses&vectors :-(  ]

> > Would somebody (?Stefan M, perhaps) please suggest to me how I might
> > efficiently cope with these circular structures.  Do I need to maintain
> > a list of already encountered Lisp Objects, somehow, and check this list
> > before recursing into the function?

> That's what we do elsewhere, yes, except that history taught us that
> a hash-table is a better choice to avoid scalability problems.
> Tho in your case you'd only need to keep the stack of objects inside of
> which you're currently recursing, so maybe a list is good enough.

I've tried the list approach (using memq to check for an already
processed cons/vector/record.  It fell flat on its face with
lisp/leim/ja-dic/ja-dec.el, which has a list with over 60,000 strings in
it.  I Ctrl-C'd out of this after five minutes, and it took me a while
to establish I didn't have an infinite loop; not quite.

So I disabled the checking for circularity in conses, leaving it in for
vectors and records.  That's what I meant when I said "a bit delicate"
above.  There's nothing to stop somebody building a circular list in a
source file.  Maybe the way to handle this would be to allow it to hit
max-lisp-eval-depth, catch the error, then turn on the circularity
detector and try again.  Circular lists in source code are surely rare.
Large circular lists must be rarer still.

> BTW, instead of doing it by hand, you can call `print--preprocess` which
> will do the recurse-and-fill-hash-table for you.  You can see how it's
> used in `cl-print.el`.

Thanks, I'll have a closer look at this, probably in a few days time.

Have a good tomorrow!

>         Stefan

-- 
Alan Mackenzie (Nuremberg, Germany).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Circular records: how do I best handle them?  (The new correct warning position branch now bootstraps in native compilation!)
  2021-12-24 20:37   ` Alan Mackenzie
@ 2021-12-26 20:35     ` Stefan Monnier
  2021-12-30 16:49       ` Alan Mackenzie
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Monnier @ 2021-12-26 20:35 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: emacs-devel

>> Hmm... circularity is quite normal in data structures, yes.
>> But presumably this is only applied to source code where circularity is
>> very rare.  Could it be that you end up recursing in elements which
>> actually aren't part of the source code (and hence can't have
>> symbols-with-positions)?
> I honestly don't know at the moment.

I think it's worth the effort to try and track this down.  Maybe we can
completely circumvent the problem.

>> Also, I see in `byte-compile-strip-s-p-1` that you only look inside conses
>> and vectors.  So I'm not sure what makes you say the recursion was in
>> records since records are similar to vectors but aren't `vectorp` so
>> AFAICT your code won't recurse into them.
>
> byte-compile-strip-s-p-1 has been enhanced to handle records, too,
> though I haven't committed that bit yet (along with quite a lot of other
> amendments, too).

Hmm... now that I think about it, you only generate
symbols-with-positions (symposes) when byte-compiling, right?
And you can restrict this to the case where we byte-compile into a file
(as opposed to the rare case where we just call `byte-compile`).
So the symposes can end up in 2 places:
- in the .elc file: no need to strip the pos here, just make sure the
  symbols get printed without their position.
- elsewhere: that's the problematic part because this only occurs where
  the source code gets stealthy passed elsewhere, e.g. when a macro
  calls (put ARG1 'foo ARG2) during the macro expansion (rather than
  returning that chunk of code in the expansion).  Here we don't have
  much control over where the symposes end up and I don't think
  `byte-compile-strip-s-p` can help us (unless we call it before passing
  the result to the macro, but I don't think that's what we want to do).

So where/why do we need `byte-compile-strip-s-p`?

>> That's what we do elsewhere, yes, except that history taught us that
>> a hash-table is a better choice to avoid scalability problems.
>> Tho in your case you'd only need to keep the stack of objects inside of
>> which you're currently recursing, so maybe a list is good enough.
> I've tried the list approach (using memq to check for an already
> processed cons/vector/record.  It fell flat on its face with
> lisp/leim/ja-dic/ja-dec.el, which has a list with over 60,000 strings
> in it.

Oh, right, we have to add to the list all the conses rather than only
the head conses, so you definitely want to use a hash-table.


        Stefan




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Circular records: how do I best handle them?  (The new correct warning position branch now bootstraps in native compilation!)
  2021-12-26 20:35     ` Stefan Monnier
@ 2021-12-30 16:49       ` Alan Mackenzie
  2021-12-30 18:37         ` Stefan Monnier
  0 siblings, 1 reply; 9+ messages in thread
From: Alan Mackenzie @ 2021-12-30 16:49 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

Hello, Stefan.

On Sun, Dec 26, 2021 at 15:35:35 -0500, Stefan Monnier wrote:
> >> Hmm... circularity is quite normal in data structures, yes.
> >> But presumably this is only applied to source code where circularity is
> >> very rare.  Could it be that you end up recursing in elements which
> >> actually aren't part of the source code (and hence can't have
> >> symbols-with-positions)?
> > I honestly don't know at the moment.

> I think it's worth the effort to try and track this down.  Maybe we can
> completely circumvent the problem.

I don't think there are any such cases.  I'll think it through fully,
some time.

> >> Also, I see in `byte-compile-strip-s-p-1` that you only look inside conses
> >> and vectors.  So I'm not sure what makes you say the recursion was in
> >> records since records are similar to vectors but aren't `vectorp` so
> >> AFAICT your code won't recurse into them.

> > byte-compile-strip-s-p-1 has been enhanced to handle records, too,
> > though I haven't committed that bit yet (along with quite a lot of other
> > amendments, too).

> Hmm... now that I think about it, you only generate
> symbols-with-positions (symposes) when byte-compiling, right?

Correct.

> And you can restrict this to the case where we byte-compile into a file
> (as opposed to the rare case where we just call `byte-compile`).

I suppose this could be done, but there's no need.  compile-defun isn't
that rare a function, and we want the correct warning messages from it.

> So the symposes can end up in 2 places:
> - in the .elc file: no need to strip the pos here, just make sure the
>   symbols get printed without their position.

The positions get stripped before the code is dumped to the .elc.

> - elsewhere: that's the problematic part because this only occurs where
>   the source code gets stealthy passed elsewhere, e.g. when a macro
>   calls (put ARG1 'foo ARG2) during the macro expansion (rather than
>   returning that chunk of code in the expansion).

This isn't a problem.  If it is a compiled macro doing this, the
positions will already be gone from the symbols.  If it is from an
uncompiled macro, XSYMBOL in Feval's subroutines does the Right Thing.

>   Here we don't have much control over where the symposes end up and I
>   don't think `byte-compile-strip-s-p` can help us (unless we call it
>   before passing the result to the macro, but I don't think that's
>   what we want to do).

> So where/why do we need `byte-compile-strip-s-p`?

It's now become macroexp-strip-symbol-position, so that it is always
loaded early, and there is no need for a duplicate function in
cl-macs.el any more.  There didn't seem to be a better place to put it.

It's used all over the place.  In eval-when/and-compile, it is used
before the evaluation.  It is used before dumping the byte compiled code
to the file.elc, and before passing this code to the native compiler.
Several (?most) of the byte-compile-file-form-... functions use it.
It's used in the newish keymap functions near the end of bytecomp.el, in
byte-compile-annotate-call-tree, etc.  Also in cl-define-compiler-macro,
and internal-macro-expand-for-load.  Additionally, also from Fput, to
prevent symbols with positions getting into symbol property lists.

> >> That's what we do elsewhere, yes, except that history taught us that
> >> a hash-table is a better choice to avoid scalability problems.
> >> Tho in your case you'd only need to keep the stack of objects inside of
> >> which you're currently recursing, so maybe a list is good enough.
> > I've tried the list approach (using memq to check for an already
> > processed cons/vector/record.  It fell flat on its face with
> > lisp/leim/ja-dic/ja-dec.el, which has a list with over 60,000 strings
> > in it.

> Oh, right, we have to add to the list all the conses rather than only
> the head conses, so you definitely want to use a hash-table.

Yes, I have to do this.  I am still debating whether just to do it
(which might slow things down quite a bit), or to do it in a
condition-case handler after the recursion has exceeded the 1,600
max-lisp-eval-depth.  I'm inclined towards the latter at the moment.

For other Lisp objects with a read syntax, such as char tables and
decorated strings, I intend to amend the reader just to output plain
symbols for them.

>         Stefan

-- 
Alan Mackenzie (Nuremberg, Germany).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Circular records: how do I best handle them?  (The new correct warning position branch now bootstraps in native compilation!)
  2021-12-30 16:49       ` Alan Mackenzie
@ 2021-12-30 18:37         ` Stefan Monnier
  2021-12-31 21:53           ` Alan Mackenzie
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Monnier @ 2021-12-30 18:37 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: emacs-devel

>> >> Hmm... circularity is quite normal in data structures, yes.
>> >> But presumably this is only applied to source code where circularity is
>> >> very rare.  Could it be that you end up recursing in elements which
>> >> actually aren't part of the source code (and hence can't have
>> >> symbols-with-positions)?
>> > I honestly don't know at the moment.
>> I think it's worth the effort to try and track this down.  Maybe we can
>> completely circumvent the problem.
> I don't think there are any such cases.

Hmm... I thought this whole circular records thread started because you
bumped into such a case.  I feel like I'm misunderstanding something.

>> So the symposes can end up in 2 places:
>> - in the .elc file: no need to strip the pos here, just make sure the
>>   symbols get printed without their position.
> The positions get stripped before the code is dumped to the .elc.

Why bother?  You can just have a `print-symbols-without-position` which
you let-bind around the printing code.

>> - elsewhere: that's the problematic part because this only occurs where
>>   the source code gets stealthy passed elsewhere, e.g. when a macro
>>   calls (put ARG1 'foo ARG2) during the macro expansion (rather than
>>   returning that chunk of code in the expansion).
> This isn't a problem.  If it is a compiled macro doing this, the
> positions will already be gone from the symbols.  If it is from an
> uncompiled macro, XSYMBOL in Feval's subroutines does the Right Thing.

I didn't mean sympos coming from the macro but sympos coming from the
args passed to the macro.  Something like:

    (defmacro foobar-really (arg1 arg2)
      (puthash arg1 arg2 foobar-remember)
      `(progn (do-something ,arg1) (do-something-else ,arg2)))

The `remember` property will end up containing symbols-with-pos if
`arg2` contains symbols.

> It's used all over the place.  In eval-when/and-compile, it is used
> before the evaluation.  It is used before dumping the byte compiled code
> to the file.elc, and before passing this code to the native compiler.
> Several (?most) of the byte-compile-file-form-... functions use it.
> It's used in the newish keymap functions near the end of bytecomp.el, in
> byte-compile-annotate-call-tree, etc.  Also in cl-define-compiler-macro,
> and internal-macro-expand-for-load.

Interesting.  Why do you need it at so many places?
What is it usually used for?

> Additionally, also from Fput, to prevent symbols with positions
> getting into symbol property lists.

IIUC this is for the kind of example I showed above (tho I used
`puthash` instead of `put`)?

> Yes, I have to do this.  I am still debating whether just to do it
> (which might slow things down quite a bit), or to do it in a
> condition-case handler after the recursion has exceeded the 1,600
> max-lisp-eval-depth.  I'm inclined towards the latter at the moment.

Using a (weak) hash-table may actually speed things up if you call it
from lots and lots of places and it thus ends up being applied several
times (redundantly) to the same data.

> For other Lisp objects with a read syntax, such as char tables and
> decorated strings, I intend to amend the reader just to output plain
> symbols for them.

Sounds reasonable.


        Stefan




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Circular records: how do I best handle them?  (The new correct warning position branch now bootstraps in native compilation!)
  2021-12-30 18:37         ` Stefan Monnier
@ 2021-12-31 21:53           ` Alan Mackenzie
  2022-01-01 17:31             ` Stefan Monnier
  0 siblings, 1 reply; 9+ messages in thread
From: Alan Mackenzie @ 2021-12-31 21:53 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

Hello, Stefan.

On Thu, Dec 30, 2021 at 13:37:47 -0500, Stefan Monnier wrote:
> >> >> Hmm... circularity is quite normal in data structures, yes.
> >> >> But presumably this is only applied to source code where circularity is
> >> >> very rare.  Could it be that you end up recursing in elements which
> >> >> actually aren't part of the source code (and hence can't have
> >> >> symbols-with-positions)?
> >> > I honestly don't know at the moment.
> >> I think it's worth the effort to try and track this down.  Maybe we can
> >> completely circumvent the problem.
> > I don't think there are any such cases.

> Hmm... I thought this whole circular records thread started because you
> bumped into such a case.  I feel like I'm misunderstanding something.

What I bumped into was circularly linked vectors in the source code
being compiled.

I've amended the reader so that it doesn't put positions on symbols
which are read as components of other structures such as byte compiled
functions, text property lists in strings, and so on.  (Actually, there
was very little to amend.).

I've amended macroexp-strip-symbol-positions so that it ignores
circularity unless it hits an infinite recursion, in which case it
starts again, recording all components found in hash tables.

I committed these changes a short time ago.

> >> So the symposes can end up in 2 places:
> >> - in the .elc file: no need to strip the pos here, just make sure the
> >>   symbols get printed without their position.
> > The positions get stripped before the code is dumped to the .elc.

> Why bother?  You can just have a `print-symbols-without-position` which
> you let-bind around the printing code.

I think I've got that already, though it's a long time since I looked at
it.

> >> - elsewhere: that's the problematic part because this only occurs where
> >>   the source code gets stealthy passed elsewhere, e.g. when a macro
> >>   calls (put ARG1 'foo ARG2) during the macro expansion (rather than
> >>   returning that chunk of code in the expansion).
> > This isn't a problem.  If it is a compiled macro doing this, the
> > positions will already be gone from the symbols.  If it is from an
> > uncompiled macro, XSYMBOL in Feval's subroutines does the Right Thing.

> I didn't mean sympos coming from the macro but sympos coming from the
> args passed to the macro.  Something like:

>     (defmacro foobar-really (arg1 arg2)
>       (puthash arg1 arg2 foobar-remember)
>       `(progn (do-something ,arg1) (do-something-else ,arg2)))

Args which are symbols with positions are first and foremost symbols.
They behave like symbols when used.  The test of this is that Emacs
bootstraps, despite having many macro expressions like ,@body which
expand to expressions with positions.

> The `remember` property will end up containing symbols-with-pos if
> `arg2` contains symbols.

In that (puthash arg1 arg2 foobar-remember), if the key is a symbol with
position, it is stripped.  I think the value will keep its positions.
This might still be a problem.

> > It's used all over the place.  In eval-when/and-compile, it is used
> > before the evaluation.  It is used before dumping the byte compiled code
> > to the file.elc, and before passing this code to the native compiler.
> > Several (?most) of the byte-compile-file-form-... functions use it.
> > It's used in the newish keymap functions near the end of bytecomp.el, in
> > byte-compile-annotate-call-tree, etc.  Also in cl-define-compiler-macro,
> > and internal-macro-expand-for-load.

> Interesting.  Why do you need it at so many places?
> What is it usually used for?

Stipping positions from compiled code before dumping it to an .elc file,
and also before passing the compiled code to the native compiler.

The fact is these positions on the symbols are unwanted for most uses of
symbols around compilation, being needed only in the analysis phase of
the source code.

> > Additionally, also from Fput, to prevent symbols with positions
> > getting into symbol property lists.

> IIUC this is for the kind of example I showed above (tho I used
> `puthash` instead of `put`)?

> > Yes, I have to do this.  I am still debating whether just to do it
> > (which might slow things down quite a bit), or to do it in a
> > condition-case handler after the recursion has exceeded the 1,600
> > max-lisp-eval-depth.  I'm inclined towards the latter at the moment.

> Using a (weak) hash-table may actually speed things up if you call it
> from lots and lots of places and it thus ends up being applied several
> times (redundantly) to the same data.

In the end I went with the condition-case approach, based on a gut
feeling that hash tables aren't very fast, and they're needed only
rarely, certainly whilst bootstrapping Emacs.

> > For other Lisp objects with a read syntax, such as char tables and
> > decorated strings, I intend to amend the reader just to output plain
> > symbols for them.

> Sounds reasonable.

I've done this now.  Feel free to look at the new version of
scratch/correct-warning-pos.

And a Happy New Year!

>         Stefan

-- 
Alan Mackenzie (Nuremberg, Germany).



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Circular records: how do I best handle them?  (The new correct warning position branch now bootstraps in native compilation!)
  2021-12-31 21:53           ` Alan Mackenzie
@ 2022-01-01 17:31             ` Stefan Monnier
  2022-01-07 16:44               ` Alan Mackenzie
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Monnier @ 2022-01-01 17:31 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: emacs-devel

>> >> >> Hmm... circularity is quite normal in data structures, yes.
>> >> >> But presumably this is only applied to source code where circularity is
>> >> >> very rare.  Could it be that you end up recursing in elements which
>> >> >> actually aren't part of the source code (and hence can't have
>> >> >> symbols-with-positions)?
>> >> > I honestly don't know at the moment.
>> >> I think it's worth the effort to try and track this down.  Maybe we can
>> >> completely circumvent the problem.
>> > I don't think there are any such cases.
>> Hmm... I thought this whole circular records thread started because you
>> bumped into such a case.  I feel like I'm misunderstanding something.
> What I bumped into was circularly linked vectors in the source code
> being compiled.

Then my question above turns into: what is this source code?

> I've amended the reader so that it doesn't put positions on symbols
> which are read as components of other structures such as byte compiled
> functions, text property lists in strings, and so on.  (Actually, there
> was very little to amend.).

OK.

>> > The positions get stripped before the code is dumped to the .elc.
>> Why bother?  You can just have a `print-symbols-without-position` which
>> you let-bind around the printing code.
> I think I've got that already, though it's a long time since I looked at
> it.

So why do you need to strip the positions before dumping the code into
the `.elc`?

>>     (defmacro foobar-really (arg1 arg2)
>>       (puthash arg1 arg2 foobar-remember)
>>       `(progn (do-something ,arg1) (do-something-else ,arg2)))
[...]
> In that (puthash arg1 arg2 foobar-remember), if the key is a symbol with
> position, it is stripped.  I think the value will keep its positions.
> This might still be a problem.

`put` and `puthash` are just some of the ways a macro's arg can
"escape".  A macro may also something like

    (push arg my-list-of-stuff)

Having to strip symbol positions in `put` and `puthash` (i.e. having
this implementation detail leak to those places which aren't directly
related to compilation) is pretty ugly.  Do we really want to extend
that to `setq`, `aset`, and whatnot?

Maybe we should "bite the bullet" and expect macros to announce whether
they support sympos or not and if they don't we strip the positions
before calling them (we can try to be a bit more clever by using the
Edebug spec to find args where sympos will probably be harmless).

>> > It's used all over the place.  In eval-when/and-compile, it is used
>> > before the evaluation.  It is used before dumping the byte compiled code
>> > to the file.elc, and before passing this code to the native compiler.
>> > Several (?most) of the byte-compile-file-form-... functions use it.
>> > It's used in the newish keymap functions near the end of bytecomp.el, in
>> > byte-compile-annotate-call-tree, etc.  Also in cl-define-compiler-macro,
>> > and internal-macro-expand-for-load.
>> Interesting.  Why do you need it at so many places?
>> What is it usually used for?
> Stipping positions from compiled code before dumping it to an .elc file,

That sounds like "one place"

> and also before passing the compiled code to the native compiler.

And that sounds like "a second place".
In contrast above you list all kinds of *other* places.
Why do we need to strip positions in those other places?
Could we instead change the code (e.g. byte-compile-annotate-call-tree)
so it works with sympos?

> The fact is these positions on the symbols are unwanted for most uses of
> symbols around compilation, being needed only in the analysis phase of
> the source code.

So you're saying the problem is that your compiler doesn't separate the
front end from the backend?  That's indeed an inconvenient of the
current bytecompiler code.



        Stefan




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Circular records: how do I best handle them?  (The new correct warning position branch now bootstraps in native compilation!)
  2022-01-01 17:31             ` Stefan Monnier
@ 2022-01-07 16:44               ` Alan Mackenzie
  0 siblings, 0 replies; 9+ messages in thread
From: Alan Mackenzie @ 2022-01-07 16:44 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

Hello, Stefan.

On Sat, Jan 01, 2022 at 12:31:51 -0500, Stefan Monnier wrote:

[ .... ]

> > What I bumped into was circularly linked vectors in the source code
> > being compiled.

> Then my question above turns into: what is this source code?

> > I've amended the reader so that it doesn't put positions on symbols
> > which are read as components of other structures such as byte compiled
> > functions, text property lists in strings, and so on.  (Actually, there
> > was very little to amend.).

> OK.

> >> > The positions get stripped before the code is dumped to the .elc.
> >> Why bother?  You can just have a `print-symbols-without-position` which
> >> you let-bind around the printing code.
> > I think I've got that already, though it's a long time since I looked at
> > it.

> So why do you need to strip the positions before dumping the code into
> the `.elc`?

Thank you very much indeed for this tip.  I don't need to strip the
positions.  eval already handles symbols with position (provided
symbols-with-pos-enabled is non-nil), as does pretty much everything
else, including the native-compiler.

Binding that variable and print-symbols-bare to non-nil rather than
stripping positions was actually quite simple, compared with the mess I
was in trying to deal with the circularity in some of the
lists/vectors/records.  I profiled some of the compilation runs with the
stripping strategy, and garbage collection was consuming around 70% of
the run time.  :-(

I've now got the thing working modulo tidying up.  A make bootstrap now
takes 7min 45sec on my machine, compared with 7min 18sec for the same on
the master branch.  That's a 7% difference.  However, I've still got to
strip out the old warning position mechanism, which should shave
something off of that 7% difference.

[ .... ]

> `put` and `puthash` are just some of the ways a macro's arg can
> "escape".  A macro may also something like

>     (push arg my-list-of-stuff)

> Having to strip symbol positions in `put` and `puthash` (i.e. having
> this implementation detail leak to those places which aren't directly
> related to compilation) is pretty ugly.  Do we really want to extend
> that to `setq`, `aset`, and whatnot?

No.  What we have to do is NOT to strip positions off of these objects,
instead warning users to be careful about saving bits of code in a way
that survives the byte compilation.  Possibly we should give them the
position stripping function to use at their discretion.  What do you
think?

[ .... ]

>         Stefan

-- 
Alan Mackenzie (Nuremberg, Germany).

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-01-07 16:44 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-23 21:15 Circular records: how do I best handle them? (The new correct warning position branch now bootstraps in native compilation!) Alan Mackenzie
2021-12-24 13:56 ` Stefan Monnier
2021-12-24 20:37   ` Alan Mackenzie
2021-12-26 20:35     ` Stefan Monnier
2021-12-30 16:49       ` Alan Mackenzie
2021-12-30 18:37         ` Stefan Monnier
2021-12-31 21:53           ` Alan Mackenzie
2022-01-01 17:31             ` Stefan Monnier
2022-01-07 16:44               ` Alan Mackenzie

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).