* native compilation units
@ 2022-05-31 1:02 Lynn Winebarger
2022-06-01 13:50 ` Andrea Corallo
0 siblings, 1 reply; 46+ messages in thread
From: Lynn Winebarger @ 2022-05-31 1:02 UTC (permalink / raw)
To: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1802 bytes --]
Hi,
Since the native compiler does not support linking eln files, I'm curious
if anyone has tried combining elisp files as source code files and
compiling the result as a unit?
Has there been any testing to determine if larger compilation units would
be more efficient either in terms of loading or increased optimization
opportunities visible to the compiler?
Just as a thought experiment, there are about 1500 .el files in the lisp
directory. Running from the root of the source tree, let's say I make 2
new directories, ct-lisp and lib-lisp, and then do
cp -Rf lisp/* ct-lisp
echo "(provide 'lib-emacs)" >lib-lisp/lib-emacs.el
find lisp -name '*.el' | while read src; do cat $src
>>lib-lisp/lib-emacs.el; done
EMACS_LOAD_PATH='' ./src/emacs -batch -nsl --no-site-file --eval "(progn
(setq load-path '(\"ct-lisp\" \"lib-lisp\")) (batch-native-compile 't))"
lib-lisp/lib-emacs.el
find lisp -name '*.el' | while read src; do
cat >lib-lisp/$(basename $src) <<EOF
;; -*-no-byte-compile: t; -*-
(require 'lib-emacs)
EOF
./src/emacs --eval "(setq load-path '(\"lib-lisp\"))" &
This is just a thought experiment, so assume the machine running this
compilation has infinite memory and completes the compilation within a
reasonable amount of time, and assume this sloppy approach doesn't yield an
divergent metarecursion.
If you actually loaded all 1500 modules at once, what would be the
difference between having 1500+ files versus the one large so (assuming all
1500+ were compiled AOT to be fair).
I'm assuming in practice you would want to choose units with a bit more
care, of course. It just seems like there would be some more optimal
approach for using the native compiler than having all these tiny
compilation units, especially once you get into any significant number of
packages.
Lynn
[-- Attachment #2: Type: text/html, Size: 2209 bytes --]
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-05-31 1:02 native compilation units Lynn Winebarger
@ 2022-06-01 13:50 ` Andrea Corallo
2022-06-03 14:17 ` Lynn Winebarger
0 siblings, 1 reply; 46+ messages in thread
From: Andrea Corallo @ 2022-06-01 13:50 UTC (permalink / raw)
To: Lynn Winebarger; +Cc: emacs-devel
Lynn Winebarger <owinebar@gmail.com> writes:
> Hi,
> Since the native compiler does not support linking eln files, I'm curious if anyone has tried combining elisp files as
> source code files and compiling the result as a unit?
> Has there been any testing to determine if larger compilation units would be more efficient either in terms of loading or
> increased optimization opportunities visible to the compiler?
Hi,
the compiler can't take advantage of interprocedural optimizations (such
as inline etc) as every function in Lisp can be redefined in every
moment.
You can trigger those optimizations anyway using native-comp-speed 3 but
each time one of the function in the compilation unit is redefined
you'll have to recompile the whole CU to make sure all changes take
effect.
This strategy might be useful, but I guess limited to some specific
application.
Best Regards
Andrea
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-01 13:50 ` Andrea Corallo
@ 2022-06-03 14:17 ` Lynn Winebarger
2022-06-03 16:05 ` Eli Zaretskii
2022-06-03 18:15 ` Stefan Monnier
0 siblings, 2 replies; 46+ messages in thread
From: Lynn Winebarger @ 2022-06-03 14:17 UTC (permalink / raw)
To: Andrea Corallo; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 2428 bytes --]
Thanks.
There was a thread in January starting at
https://lists.gnu.org/archive/html/emacs-devel/2022-01/msg01005.html that
gets at one scenario. At least in pre-10 versions in my experience,
Windows has not dealt well with large numbers of files in a single
directory, at least if it's on a network drive. There's some super-linear
behavior just listing the contents of a directory that makes having more
than, say, a thousand files in a directory impractical. That makes
packaging emacs with all files on the system load path precompiled
inadvisable. If you add any significant number of pre-compiled site-lisp
libraries (eg a local elpa mirror), it will get worse.
Aside from explicit interprocedural optimization, is it possible libgccjit
would lay out the code in a more optimal way in terms of memory locality?
If the only concern for semantic safety with -O3 is the redefinability of
all symbols, that's already the case for emacs lisp primitives implemented
in C. It should be similar to putting the code into a let block with all
defined functions bound in the block, then setting the global definitions
to the locally defined versions, except for any variations in forms with
semantics that depend on whether they appear at top-level or in a lexical
scope. It might be interesting to extend the language with a form that
makes the unsafe optimizations safe with respect to the compilation unit.
On Wed, Jun 1, 2022 at 9:50 AM Andrea Corallo <akrl@sdf.org> wrote:
> Lynn Winebarger <owinebar@gmail.com> writes:
>
> > Hi,
> > Since the native compiler does not support linking eln files, I'm
> curious if anyone has tried combining elisp files as
> > source code files and compiling the result as a unit?
> > Has there been any testing to determine if larger compilation units
> would be more efficient either in terms of loading or
> > increased optimization opportunities visible to the compiler?
>
> Hi,
>
> the compiler can't take advantage of interprocedural optimizations (such
> as inline etc) as every function in Lisp can be redefined in every
> moment.
>
> You can trigger those optimizations anyway using native-comp-speed 3 but
> each time one of the function in the compilation unit is redefined
> you'll have to recompile the whole CU to make sure all changes take
> effect.
>
> This strategy might be useful, but I guess limited to some specific
> application.
>
> Best Regards
>
> Andrea
>
[-- Attachment #2: Type: text/html, Size: 3268 bytes --]
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-03 14:17 ` Lynn Winebarger
@ 2022-06-03 16:05 ` Eli Zaretskii
[not found] ` <CAM=F=bDxxyHurxM_xdbb7XJtP8rdK16Cwp30ti52Ox4nv19J_w@mail.gmail.com>
2022-06-03 18:15 ` Stefan Monnier
1 sibling, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2022-06-03 16:05 UTC (permalink / raw)
To: Lynn Winebarger; +Cc: akrl, emacs-devel
> From: Lynn Winebarger <owinebar@gmail.com>
> Date: Fri, 3 Jun 2022 10:17:25 -0400
> Cc: emacs-devel@gnu.org
>
> There was a thread in January starting at
> https://lists.gnu.org/archive/html/emacs-devel/2022-01/msg01005.html that gets at one scenario. At least in
> pre-10 versions in my experience, Windows has not dealt well with large numbers of files in a single
> directory, at least if it's on a network drive. There's some super-linear behavior just listing the contents of a
> directory that makes having more than, say, a thousand files in a directory impractical.
Is this only on networked drives? I have a directory with almost 5000
files, and I see no issues there. Could you show a recipe for
observing the slow-down you are describing?
> That makes
> packaging emacs with all files on the system load path precompiled inadvisable. If you add any significant
> number of pre-compiled site-lisp libraries (eg a local elpa mirror), it will get worse.
ELPA files are supposed to be compiled into the user's eln-cache
directory, not into the native-lisp subdirectory of lib/emacs/, so we
are okay there. And users can split their eln-cache directory into
several ones (and update native-comp-eln-load-path accordingly) if
needed.
But I admit that I never saw anything like what you describe, so I'm
curious what and why is going on in these cases, and how bad is the
slow-down.
> Aside from explicit interprocedural optimization, is it possible libgccjit would lay out the code in a more
> optimal way in terms of memory locality?
>
> If the only concern for semantic safety with -O3 is the redefinability of all symbols, that's already the case for
> emacs lisp primitives implemented in C. It should be similar to putting the code into a let block with all
> defined functions bound in the block, then setting the global definitions to the locally defined versions, except
> for any variations in forms with semantics that depend on whether they appear at top-level or in a lexical
> scope. It might be interesting to extend the language with a form that makes the unsafe optimizations safe
> with respect to the compilation unit.
I believe this is an entirely different subject?
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-03 14:17 ` Lynn Winebarger
2022-06-03 16:05 ` Eli Zaretskii
@ 2022-06-03 18:15 ` Stefan Monnier
2022-06-04 2:43 ` Lynn Winebarger
1 sibling, 1 reply; 46+ messages in thread
From: Stefan Monnier @ 2022-06-03 18:15 UTC (permalink / raw)
To: Lynn Winebarger; +Cc: Andrea Corallo, emacs-devel
> There was a thread in January starting at
> https://lists.gnu.org/archive/html/emacs-devel/2022-01/msg01005.html that
> gets at one scenario. At least in pre-10 versions in my experience,
> Windows has not dealt well with large numbers of files in a single
> directory, at least if it's on a network drive.
Hmm... I count a bit over 6K ELisp files in Emacs + (Non)GNU ELPA, so
the ELN cache should presumably not go much past 10K files.
Performance issues with read access to directories containing less than
10K files seems like something that was solved last century, so
I wouldn't worry very much about it.
[ But that doesn't mean we shouldn't try to compile several ELisp files
into a single ELN file, especially since the size of ELN files seems
to be proportionally larger for small ELisp files than for large
ones. ]
> Aside from explicit interprocedural optimization, is it possible libgccjit
> would lay out the code in a more optimal way in terms of memory locality?
Could be, but I doubt it because I don't think GCC gets enough info to
make such a decision. For lazily-compiled ELN files I could imagine
collecting some amount of profiling info to generate better code, but
our code generation is definitely not that sophisticated.
> If the only concern for semantic safety with -O3 is the redefinability of
> all symbols, that's already the case for emacs lisp primitives implemented
> in C.
Not really:
- Most ELisp primitives implemented in C can be redefined just fine.
The problem is about *calls* to those primitives, where the
redefinition may fail to apply to those calls that are made from C.
- While the problem is similar the scope is very different.
> It should be similar to putting the code into a let block with all
> defined functions bound in the block, then setting the global
> definitions to the locally defined versions, except for any variations
> in forms with semantics that depend on whether they appear at
> top-level or in a lexical scope.
IIUC the current native-compiler will actually leave those
locally-defined functions in their byte-code form :-(
IOW, there are lower-hanging fruits to pick first.
> It might be interesting to extend the language with a form that
> makes the unsafe optimizations safe with respect to the compilation unit.
Yes, in the context of Scheme I think this is called "sealing".
Stefan
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-03 18:15 ` Stefan Monnier
@ 2022-06-04 2:43 ` Lynn Winebarger
2022-06-04 14:32 ` Stefan Monnier
2022-06-08 6:46 ` Andrea Corallo
0 siblings, 2 replies; 46+ messages in thread
From: Lynn Winebarger @ 2022-06-04 2:43 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Andrea Corallo, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 5094 bytes --]
On Fri, Jun 3, 2022 at 2:15 PM Stefan Monnier <monnier@iro.umontreal.ca>
wrote:
> > There was a thread in January starting at
> > https://lists.gnu.org/archive/html/emacs-devel/2022-01/msg01005.html
> that
> > gets at one scenario. At least in pre-10 versions in my experience,
> > Windows has not dealt well with large numbers of files in a single
> > directory, at least if it's on a network drive.
>
> Hmm... I count a bit over 6K ELisp files in Emacs + (Non)GNU ELPA, so
> the ELN cache should presumably not go much past 10K files.
>
> Performance issues with read access to directories containing less than
> 10K files seems like something that was solved last century, so
> I wouldn't worry very much about it.
>
> Per my response to Eli, I see (network) directories become almost unusable
somewhere around 1000 files, but it seems that's a consequence of the
network and/or security configuration.
> [ But that doesn't mean we shouldn't try to compile several ELisp files
> into a single ELN file, especially since the size of ELN files seems
> to be proportionally larger for small ELisp files than for large
> ones. ]
>
Since I learned of the native compiler in 28.1, I decided to try it out and
also "throw the spaghetti at the wall" with a bunch of packages that
provide features similar to those found in more "modern" IDEs. In terms of
startup time, the normal package system does not deal well with hundreds of
directories on the load path, regardless of AOR native compilation, so I'm
tranforming the packages to install in the version-specific load path, and
compiling that ahead of time. At least for the ones amenable to such
treatment.
Given I'm compiling all the files AOT for use in a common installation
(this is on Linux, not Windows), the natural question for me is whether
larger compilation units would be more efficient, particularly at startup.
Would there be advantages comparable to including packages in the dump
file, for example?
I posed the question to the list mostly to see if the approach (or similar)
had already been tested for viability or effectiveness, so I can avoid
unnecessary experimentation if the answer is already well-understood.
> > Aside from explicit interprocedural optimization, is it possible
> libgccjit
> > would lay out the code in a more optimal way in terms of memory locality?
>
> Could be, but I doubt it because I don't think GCC gets enough info to
> make such a decision. For lazily-compiled ELN files I could imagine
> collecting some amount of profiling info to generate better code, but
> our code generation is definitely not that sophisticated.
I don't know enough about modern library loading to know whether you'd
expect N distinct but interdependent dynamic libraries to be loaded in as
compact a memory region as a single dynamic library formed from the same
underlying object code.
> > If the only concern for semantic safety with -O3 is the redefinability of
> > all symbols, that's already the case for emacs lisp primitives
> implemented
> > in C.
>
> Not really:
> - Most ELisp primitives implemented in C can be redefined just fine.
> The problem is about *calls* to those primitives, where the
> redefinition may fail to apply to those calls that are made from C.
> - While the problem is similar the scope is very different.
>
From Andrea's description, this would be the primary "unsafe" aspect of
intraprocedural optimizations applied to one of these aggregated
compilation units. That is, that the semantics of redefining function
symbols would not apply to points in the code at which the compiler had
made optimizations based on assuming the function definitions were
constants. It's not clear to me whether those points are limited to call
sites or not.
> > It should be similar to putting the code into a let block with all
> > defined functions bound in the block, then setting the global
> > definitions to the locally defined versions, except for any variations
> > in forms with semantics that depend on whether they appear at
> > top-level or in a lexical scope.
>
> IIUC the current native-compiler will actually leave those
> locally-defined functions in their byte-code form :-(
>
That's not what I understood from
https://akrl.sdf.org/gccemacs.html#org0f21a5b
As you deduce below, I come from a Scheme background - cl-flet is the form
I should have referenced, not let.
>
> IOW, there are lower-hanging fruits to pick first.
>
This is mainly of interest if a simple transformation of the sort I
originally suggested can provide benefits in either reducing startup time
for large sets of preloaded packages, or by enabling additional
optimizations. Primarily the former for me, but the latter would be
interesting. It seems more straightforward than trying to link the eln
files into larger units after compilation.
> > It might be interesting to extend the language with a form that
> > makes the unsafe optimizations safe with respect to the compilation unit.
>
> Yes, in the context of Scheme I think this is called "sealing".
>
>
> Stefan
> No
[-- Attachment #2: Type: text/html, Size: 7505 bytes --]
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
[not found] ` <CAM=F=bDxxyHurxM_xdbb7XJtP8rdK16Cwp30ti52Ox4nv19J_w@mail.gmail.com>
@ 2022-06-04 5:57 ` Eli Zaretskii
2022-06-05 13:53 ` Lynn Winebarger
0 siblings, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2022-06-04 5:57 UTC (permalink / raw)
To: Lynn Winebarger; +Cc: Andrea Corallo, emacs-devel
[Please use Reply All, to keep the mailing list and other interested
people part of this discussion.]
> From: Lynn Winebarger <owinebar@gmail.com>
> Date: Fri, 3 Jun 2022 15:17:51 -0400
>
> Unfortunately most of my "productive" experience in a Windows environment has been in a corporate
> environment where the configuration is opaque to end users. For all I know, it's not just a network issue but
> could also involve the security/antivirus infrastructure.
> I can tell you that at approximately 1000 files in a directory, any process I've designed that uses said
> directory slows down dramatically. Just displaying the contents in file explorer exhibits quadratic behavior as
> the process appears to start refreshing the listing before completing one pass.
You can try setting the w32-get-true-file-attributes variable to the
value 'local.
Or maybe the following entry from etc/PROBLEMS will help:
** A few seconds delay is seen at startup and for many file operations
This happens when the Net Logon service is enabled. During Emacs
startup, this service issues many DNS requests looking up for the
Windows Domain Controller. When Emacs accesses files on networked
drives, it automatically logs on the user into those drives, which
again causes delays when Net Logon is running.
The solution seems to be to disable Net Logon with this command typed
at the Windows shell prompt:
net stop netlogon
To start the service again, type "net start netlogon". (You can also
stop and start the service from the Computer Management application,
accessible by right-clicking "My Computer" or "Computer", selecting
"Manage", then clicking on "Services".)
> As for elpa being created in the user's cache, that depends on whether the user has access to the gccjit
> infrastructure
If the user cannot use libgccjit on the user's system, then why *.eln
files from external packages are relevant? They will never appear,
because native compilation is not available.
So I don't think I understand what you are saying here.
If you have in mind ELPA packages that come with precompiled *.eln
files (are there packages like that?), then the user can place them in
several directories and adapt native-comp-eln-load-path accordingly.
So again I don't think I understand the problem you describe.
> this was one of the points mentioned in
> https://lists.gnu.org/archive/html/emacs-devel/2022-01/msg01005.html as it related to the system lisp files.
Sorry, I don't see anything about the issue of eln-cache location
there. Could you be more specific and point to what was said there
that is relevant to this discussion?
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-04 2:43 ` Lynn Winebarger
@ 2022-06-04 14:32 ` Stefan Monnier
2022-06-05 12:16 ` Lynn Winebarger
2022-06-08 6:56 ` Andrea Corallo
2022-06-08 6:46 ` Andrea Corallo
1 sibling, 2 replies; 46+ messages in thread
From: Stefan Monnier @ 2022-06-04 14:32 UTC (permalink / raw)
To: Lynn Winebarger; +Cc: Andrea Corallo, emacs-devel
>> Performance issues with read access to directories containing less than
>> 10K files seems like something that was solved last century, so
>> I wouldn't worry very much about it.
> Per my response to Eli, I see (network) directories become almost unusable
> somewhere around 1000 files,
I don't doubt there are still (in the current century) cases where
largish directories get slow, but what I meant is that it's now
considered as a problem that should be solved by making those
directories fast rather than by avoiding making them so large.
>> [ But that doesn't mean we shouldn't try to compile several ELisp files
>> into a single ELN file, especially since the size of ELN files seems
>> to be proportionally larger for small ELisp files than for large
>> ones. ]
>
> Since I learned of the native compiler in 28.1, I decided to try it out and
> also "throw the spaghetti at the wall" with a bunch of packages that
> provide features similar to those found in more "modern" IDEs. In terms of
> startup time, the normal package system does not deal well with hundreds of
> directories on the load path, regardless of AOR native compilation, so I'm
> tranforming the packages to install in the version-specific load path, and
> compiling that ahead of time. At least for the ones amenable to such
> treatment.
There are two load-paths at play (`load-path` and
`native-comp-eln-load-path`) and I'm not sure which one you're taking
about. OT1H `native-comp-eln-load-path` should not grow with the number
of packages so it typically contains exactly 2 entries, and definitely
not hundreds. OTOH `load-path` is unrelated to native compilation.
I also don't understand what you mean by "version-specific load path".
Also, what kind of startup time are you talking about?
E.g., are you using `package-quickstart`?
> Given I'm compiling all the files AOT for use in a common installation
> (this is on Linux, not Windows), the natural question for me is whether
> larger compilation units would be more efficient, particularly at startup.
It all depends where the slowdown comes from :-)
E.g. `package-quickstart` follows a similar idea to the one you propose
by collecting all the `<pkg>-autoloads.el` into one bug file, which
saves us from having to load separately all those little files. It also
saves us from having to look for them through those hundreds
of directories.
I suspect a long `load-path` can itself be a source of slow down
especially during startup, but I haven't bumped into that yet.
There are ways we could speed it up, if needed:
- create "meta packages" (or just one containing all your packages),
which would bring together in a single directory the files of several
packages (and presumably also bring together their
`<pkg>-autoloads.el` into a larger combined one). Under GNU/Linux we
could have this metapackage be made of symlinks, making it fairly
efficient an non-obtrusive (e.g. `C-h o` could still get you to the
actual file rather than its metapackage-copy).
- Manage a cache of where are our ELisp files (i.e. a hash table
mapping relative ELisp file names to the absolute file name returned
by looking for them in `load-path`). This way we can usually avoid
scanning those hundred directories to find the .elc file we need, and
go straight to it.
> I posed the question to the list mostly to see if the approach (or similar)
> had already been tested for viability or effectiveness, so I can avoid
> unnecessary experimentation if the answer is already well-understood.
I don't think it has been tried, no.
> I don't know enough about modern library loading to know whether you'd
> expect N distinct but interdependent dynamic libraries to be loaded in as
> compact a memory region as a single dynamic library formed from the same
> underlying object code.
I think you're right here, but I'd expect the effect to be fairly small
except when the .elc/.eln files are themselves small.
> It's not clear to me whether those points are limited to call
> sites or not.
I believe it is: the optimization is to replace a call via `Ffuncall` to
a "symbol" (which looks up the value stored in the `symbol-function`
cell), with a direct call to the actual C function contained in the
"subr" object itself (expected to be) contained in the
`symbol-function` cell.
Andrea would know if there are other semantic-non-preserving
optimizations in the level 3 of the optimizations, but IIUC this is very
much the main one.
>> IIUC the current native-compiler will actually leave those
>> locally-defined functions in their byte-code form :-(
> That's not what I understood from
> https://akrl.sdf.org/gccemacs.html#org0f21a5b
> As you deduce below, I come from a Scheme background - cl-flet is the form
> I should have referenced, not let.
Indeed you're right that those functions can be native compiled, tho only if
they're closed (i.e. if they don't refer to surrounding lexical
variables).
[ I always forget that little detail :-( ]
> It seems more straightforward than trying to link the eln
> files into larger units after compilation.
That seems like too much trouble, indeed.
Stefan
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-04 14:32 ` Stefan Monnier
@ 2022-06-05 12:16 ` Lynn Winebarger
2022-06-05 14:08 ` Lynn Winebarger
2022-06-05 14:20 ` Stefan Monnier
2022-06-08 6:56 ` Andrea Corallo
1 sibling, 2 replies; 46+ messages in thread
From: Lynn Winebarger @ 2022-06-05 12:16 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Andrea Corallo, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 9912 bytes --]
On Sat, Jun 4, 2022, 10:32 AM Stefan Monnier <monnier@iro.umontreal.ca>
wrote:
> >> Performance issues with read access to directories containing less than
> >> 10K files seems like something that was solved last century, so
> >> I wouldn't worry very much about it.
> > Per my response to Eli, I see (network) directories become almost
> unusable
> > somewhere around 1000 files,
>
> I don't doubt there are still (in the current century) cases where
> largish directories get slow, but what I meant is that it's now
> considered as a problem that should be solved by making those
> directories fast rather than by avoiding making them so large.
>
Unfortunately sometimes we have to cope with environment we use. And for
all I know some of the performance penalties may be inherent in the
(security related) infrastructure requirements in a highly regulated
industry.
Not that that should be a primary concern for the development team, but it
is something a local packager might be stuck with.
> >> [ But that doesn't mean we shouldn't try to compile several ELisp files
> >> into a single ELN file, especially since the size of ELN files seems
> >> to be proportionally larger for small ELisp files than for large
> >> ones. ]
> >
> > Since I learned of the native compiler in 28.1, I decided to try it out
> and
> > also "throw the spaghetti at the wall" with a bunch of packages that
> > provide features similar to those found in more "modern" IDEs. In terms
> of
> > startup time, the normal package system does not deal well with hundreds
> of
> > directories on the load path, regardless of AOR native compilation, so
> I'm
> > tranforming the packages to install in the version-specific load path,
> and
> > compiling that ahead of time. At least for the ones amenable to such
> > treatment.
>
> There are two load-paths at play (`load-path` and
> `native-comp-eln-load-path`) and I'm not sure which one you're taking
> about. OT1H `native-comp-eln-load-path` should not grow with the number
> of packages so it typically contains exactly 2 entries, and definitely
> not hundreds. OTOH `load-path` is unrelated to native compilation.
>
Not entirely - as I understand it, the load system first finds the source
file and computers a hash before determining if there is an ELN file
corresponding to it.
Although I do wonder if there is some optimization for ELN files in the
system directory as opposed to the user's cache. I have one build where I
native compiled (but not byte compiled) all the el files in the lisp
directory, and another where I byte compiled and then native compiled the
same set of files. In both cases I used the flag to batch-native-compile
to put the ELN file in the system cache. In the first case a number of
files failed to compile, and in the second, they all compiled. I've also
observed another situation where a file will only (bye or native) compile
if one of its required files has been byte compiled ahead of time - but
only native compiling that dependency resulted in the same behavior as not
compiling it at all. I planned to send a separate mail to the list asking
whether it was intended behavior once I had reduced it to a simple case, or
if it should be submitted as a bug.
In any case, I noticed that the "browse customization groups" buffer is
noticeable faster in the second case. I need to try it again to confirm
that it wasn't just waiting on the relevant source files to compile in the
first case.
I also don't understand what you mean by "version-specific load path".
>
In the usual unix installation, there will be a "site-lisp" one directory
above the version specific installation directory, and another site-lisp in
the version-specific installation directory. I'm referring to installing
the source (ultimately) in ..../emacs/28.1/site-lisp. During the build
it's just in the site-lisp subdirectory of the source root path.
> Also, what kind of startup time are you talking about?
> E.g., are you using `package-quickstart`?
>
That was the first alternative I tried. With 1250 packages, it did not
work. First, the file consisted of a series of "let" forms corresponding
to the package directories, and apparently the autoload forms are ignored
if they appear anywhere below top-level. At least I got a number of
warnings to that effect.
The other problem was that I got a "bytecode overflow error". I only got
the first error after chopping off the file approximately after the first
10k lines. Oddly enough, when I put all the files in the site-lisp
directory, and collect all the autoloads for that directory in a single
file, it has no problem with the 80k line file that results.
> > Given I'm compiling all the files AOT for use in a common installation
> > (this is on Linux, not Windows), the natural question for me is whether
> > larger compilation units would be more efficient, particularly at
> startup.
>
> It all depends where the slowdown comes from :-)
>
> E.g. `package-quickstart` follows a similar idea to the one you propose
> by collecting all the `<pkg>-autoloads.el` into one bug file, which
> saves us from having to load separately all those little files. It also
> saves us from having to look for them through those hundreds
> of directories.
>
> I suspect a long `load-path` can itself be a source of slow down
> especially during startup, but I haven't bumped into that yet.
> There are ways we could speed it up, if needed:
>
> - create "meta packages" (or just one containing all your packages),
> which would bring together in a single directory the files of several
> packages (and presumably also bring together their
> `<pkg>-autoloads.el` into a larger combined one). Under GNU/Linux we
> could have this metapackage be made of symlinks, making it fairly
> efficient an non-obtrusive (e.g. `C-h o` could still get you to the
> actual file rather than its metapackage-copy).
> - Manage a cache of where are our ELisp files (i.e. a hash table
> mapping relative ELisp file names to the absolute file name returned
> by looking for them in `load-path`). This way we can usually avoid
> scanning those hundred directories to find the .elc file we need, and
> go straight to it.
>
I'm pretty sure the load-path is an issue with 1250 packages, even if half
of them consist of single files.
Since I'm preparing this for a custom installation that will be accessible
for multiple users, I decided to try putting everything in site-lisp and
native compile everything AOT. Most of the other potential users are not
experienced Unix users, which is why I'm trying to make everything work
smoothly up front and have features they would find familiar from other
editors.
One issue with this approach is that the package selection mechanism
doesn't recognize the modules as being installed, or provide any assistance
in selectively activating modules.
Other places where there is a noticeable slowdown with large numbers of
packages:
* Browsing customization groups - just unfolding a single group can take
minutes (this is on fast server hardware with a lot of free memory)
* Browsing custom themes with many theme packages installed
I haven't gotten to the point that I can test the same situation by
explicitly loading the same modules from the site-lisp directory that had
been activated as packages. Installing the themes in the system directory
does skip the "suspicious files" check that occurs when loading them from
the user configuration.
> > I posed the question to the list mostly to see if the approach (or
> similar)
> > had already been tested for viability or effectiveness, so I can avoid
> > unnecessary experimentation if the answer is already well-understood.
>
> I don't think it has been tried, no.
>
> > I don't know enough about modern library loading to know whether you'd
> > expect N distinct but interdependent dynamic libraries to be loaded in as
> > compact a memory region as a single dynamic library formed from the same
> > underlying object code.
>
> I think you're right here, but I'd expect the effect to be fairly small
> except when the .elc/.eln files are themselves small.
>
There are a lot of packages that have fairly small source files, just
because they've factored their code the same way it would be in languages
where the shared libraries are not in 1-1 correspondence with source files.
>
> > It's not clear to me whether those points are limited to call
> > sites or not.
>
> I believe it is: the optimization is to replace a call via `Ffuncall` to
> a "symbol" (which looks up the value stored in the `symbol-function`
> cell), with a direct call to the actual C function contained in the
> "subr" object itself (expected to be) contained in the
> `symbol-function` cell.
>
> Andrea would know if there are other semantic-non-preserving
> optimizations in the level 3 of the optimizations, but IIUC this is very
> much the main one.
>
> >> IIUC the current native-compiler will actually leave those
> >> locally-defined functions in their byte-code form :-(
> > That's not what I understood from
> > https://akrl.sdf.org/gccemacs.html#org0f21a5b
> > As you deduce below, I come from a Scheme background - cl-flet is the
> form
> > I should have referenced, not let.
>
> Indeed you're right that those functions can be native compiled, tho only
> if
> they're closed (i.e. if they don't refer to surrounding lexical
> variables).
> [ I always forget that little detail :-( ]
>
I would expect this would apply to most top-level defuns in elisp
packages/modules. From my cursory review, it looks like the ability to
redefine these defuns is mostly useful when developing the packages
themselves, and "sealing" them for use would be appropriate.
I'm not clear on whether this optimization is limited to the case of
calling functions defined in the compilation unit, or applied more broadly.
Thanks,
Lynn
>
[-- Attachment #2: Type: text/html, Size: 13343 bytes --]
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-04 5:57 ` Eli Zaretskii
@ 2022-06-05 13:53 ` Lynn Winebarger
0 siblings, 0 replies; 46+ messages in thread
From: Lynn Winebarger @ 2022-06-05 13:53 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Andrea Corallo, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 4177 bytes --]
On Sat, Jun 4, 2022, 1:57 AM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Lynn Winebarger <owinebar@gmail.com>
> > Date: Fri, 3 Jun 2022 15:17:51 -0400
> >
> > Unfortunately most of my "productive" experience in a Windows
> environment has been in a corporate
> > environment where the configuration is opaque to end users. For all I
> know, it's not just a network issue but
> > could also involve the security/antivirus infrastructure.
> > I can tell you that at approximately 1000 files in a directory, any
> process I've designed that uses said
> > directory slows down dramatically. Just displaying the contents in file
> explorer exhibits quadratic behavior as
> > the process appears to start refreshing the listing before completing
> one pass.
>
> You can try setting the w32-get-true-file-attributes variable to the
> value 'local.
>
> Or maybe the following entry from etc/PROBLEMS will help:
>
> ** A few seconds delay is seen at startup and for many file operations
>
> This happens when the Net Logon service is enabled. During Emacs
> startup, this service issues many DNS requests looking up for the
> Windows Domain Controller. When Emacs accesses files on networked
> drives, it automatically logs on the user into those drives, which
> again causes delays when Net Logon is running.
>
> The solution seems to be to disable Net Logon with this command typed
> at the Windows shell prompt:
>
> net stop netlogon
>
> To start the service again, type "net start netlogon". (You can also
> stop and start the service from the Computer Management application,
> accessible by right-clicking "My Computer" or "Computer", selecting
> "Manage", then clicking on "Services".)
>
I was only intending to illustrate a situation in which a local packager
(internal to an organization) might want to (a) provide pre-compiled
versions of elisp files that may or may not be from files installed in the
"lisp" directory, while (b) not wanting to have huge numbers of files in a
particular directory for performance reasons.
The performance issues I've experienced are not particular to any
individual application, and the way the Windows systems are configured I
may not even reliably be able to tell if a given application is stored on a
local or network drive (although performance may lead me to believe it is
one or the other). They do appear to be particular to the context in which
I have been using Windows, though.
> As for elpa being created in the user's cache, that depends on whether
> the user has access to the gccjit
> > infrastructure
>
> If the user cannot use libgccjit on the user's system, then why *.eln
> files from external packages are relevant? They will never appear,
> because native compilation is not available.
>
> So I don't think I understand what you are saying here.
>
> If you have in mind ELPA packages that come with precompiled *.eln
> files (are there packages like that?), then the user can place them in
> several directories and adapt native-comp-eln-load-path accordingly.
> So again I don't think I understand the problem you describe.
>
A local packager can precompile anything they like and put it in the system
native-lisp directory, no?
I'm not sure if the package system would find it if installed as a package
by the user, but many packages are just single files that can just be
placed directly in site-lisp and used directly.
> > this was one of the points mentioned in
> > https://lists.gnu.org/archive/html/emacs-devel/2022-01/msg01005.html as
> it related to the system lisp files.
>
> Sorry, I don't see anything about the issue of eln-cache location
> there. Could you be more specific and point to what was said there
> that is relevant to this discussion?
>
I was thinking of these:
https://lists.gnu.org/archive/html/emacs-devel/2022-01/msg01005.html
particularly:
I don't understand yet the packaging requirements, is it not possible to
copy additionally the native-lisp/ folder to the package?
and then these points:
https://lists.gnu.org/archive/html/emacs-devel/2022-01/msg01009.html
https://lists.gnu.org/archive/html/emacs-devel/2022-01/msg01020.html
Lynn
[-- Attachment #2: Type: text/html, Size: 6498 bytes --]
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-05 12:16 ` Lynn Winebarger
@ 2022-06-05 14:08 ` Lynn Winebarger
2022-06-05 14:46 ` Stefan Monnier
2022-06-05 14:20 ` Stefan Monnier
1 sibling, 1 reply; 46+ messages in thread
From: Lynn Winebarger @ 2022-06-05 14:08 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Andrea Corallo, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 13967 bytes --]
On Sun, Jun 5, 2022 at 8:16 AM Lynn Winebarger <owinebar@gmail.com> wrote:
> On Sat, Jun 4, 2022, 10:32 AM Stefan Monnier <monnier@iro.umontreal.ca>
> wrote:
>
>> >> [ But that doesn't mean we shouldn't try to compile several ELisp files
>>
> >> into a single ELN file, especially since the size of ELN files seems
>> >> to be proportionally larger for small ELisp files than for large
>> >> ones. ]
>> >
>> > Since I learned of the native compiler in 28.1, I decided to try it out
>> and
>> > also "throw the spaghetti at the wall" with a bunch of packages that
>> > provide features similar to those found in more "modern" IDEs. In
>> terms of
>> > startup time, the normal package system does not deal well with
>> hundreds of
>> > directories on the load path, regardless of AOR native compilation, so
>> I'm
>> > tranforming the packages to install in the version-specific load path,
>> and
>> > compiling that ahead of time. At least for the ones amenable to such
>> > treatment.
>>
>> There are two load-paths at play (`load-path` and
>> `native-comp-eln-load-path`) and I'm not sure which one you're taking
>> about. OT1H `native-comp-eln-load-path` should not grow with the number
>> of packages so it typically contains exactly 2 entries, and definitely
>> not hundreds. OTOH `load-path` is unrelated to native compilation.
>>
>
> Not entirely - as I understand it, the load system first finds the source
> file and computers a hash before determining if there is an ELN file
> corresponding to it.
> Although I do wonder if there is some optimization for ELN files in the
> system directory as opposed to the user's cache. I have one build where I
> native compiled (but not byte compiled) all the el files in the lisp
> directory, and another where I byte compiled and then native compiled the
> same set of files. In both cases I used the flag to batch-native-compile
> to put the ELN file in the system cache. In the first case a number of
> files failed to compile, and in the second, they all compiled. I've also
> observed another situation where a file will only (bye or native) compile
> if one of its required files has been byte compiled ahead of time - but
> only native compiling that dependency resulted in the same behavior as not
> compiling it at all. I planned to send a separate mail to the list asking
> whether it was intended behavior once I had reduced it to a simple case, or
> if it should be submitted as a bug.
>
Unrelated, but the one type of file I don't seem to be able to produce AOT
(because I have no way to specify them) in the system directory are the
subr/trampoline files. Any hints on how to make those AOT in the system
directory?
>
>> Also, what kind of startup time are you talking about?
>> E.g., are you using `package-quickstart`?
>>
> That was the first alternative I tried. With 1250 packages, it did not
> work. First, the file consisted of a series of "let" forms corresponding
> to the package directories, and apparently the autoload forms are ignored
> if they appear anywhere below top-level. At least I got a number of
> warnings to that effect.
> The other problem was that I got a "bytecode overflow error". I only got
> the first error after chopping off the file approximately after the first
> 10k lines. Oddly enough, when I put all the files in the site-lisp
> directory, and collect all the autoloads for that directory in a single
> file, it has no problem with the 80k line file that results.
>
>>
>> Also, I should have responded to the first question - "minutes" on recent
server-grade hardware with 24 cores and >100GB of RAM. That was with 1193
enabled packages in my .emacs file.
On Sun, Jun 5, 2022 at 8:16 AM Lynn Winebarger <owinebar@gmail.com> wrote:
> On Sat, Jun 4, 2022, 10:32 AM Stefan Monnier <monnier@iro.umontreal.ca>
> wrote:
>
>> >> Performance issues with read access to directories containing less than
>> >> 10K files seems like something that was solved last century, so
>> >> I wouldn't worry very much about it.
>> > Per my response to Eli, I see (network) directories become almost
>> unusable
>> > somewhere around 1000 files,
>>
>> I don't doubt there are still (in the current century) cases where
>> largish directories get slow, but what I meant is that it's now
>> considered as a problem that should be solved by making those
>> directories fast rather than by avoiding making them so large.
>>
> Unfortunately sometimes we have to cope with environment we use. And for
> all I know some of the performance penalties may be inherent in the
> (security related) infrastructure requirements in a highly regulated
> industry.
> Not that that should be a primary concern for the development team, but it
> is something a local packager might be stuck with.
>
>
>> >> [ But that doesn't mean we shouldn't try to compile several ELisp files
>> >> into a single ELN file, especially since the size of ELN files seems
>> >> to be proportionally larger for small ELisp files than for large
>> >> ones. ]
>> >
>> > Since I learned of the native compiler in 28.1, I decided to try it out
>> and
>> > also "throw the spaghetti at the wall" with a bunch of packages that
>> > provide features similar to those found in more "modern" IDEs. In
>> terms of
>> > startup time, the normal package system does not deal well with
>> hundreds of
>> > directories on the load path, regardless of AOR native compilation, so
>> I'm
>> > tranforming the packages to install in the version-specific load path,
>> and
>> > compiling that ahead of time. At least for the ones amenable to such
>> > treatment.
>>
>> There are two load-paths at play (`load-path` and
>> `native-comp-eln-load-path`) and I'm not sure which one you're taking
>> about. OT1H `native-comp-eln-load-path` should not grow with the number
>> of packages so it typically contains exactly 2 entries, and definitely
>> not hundreds. OTOH `load-path` is unrelated to native compilation.
>>
>
> Not entirely - as I understand it, the load system first finds the source
> file and computers a hash before determining if there is an ELN file
> corresponding to it.
> Although I do wonder if there is some optimization for ELN files in the
> system directory as opposed to the user's cache. I have one build where I
> native compiled (but not byte compiled) all the el files in the lisp
> directory, and another where I byte compiled and then native compiled the
> same set of files. In both cases I used the flag to batch-native-compile
> to put the ELN file in the system cache. In the first case a number of
> files failed to compile, and in the second, they all compiled. I've also
> observed another situation where a file will only (bye or native) compile
> if one of its required files has been byte compiled ahead of time - but
> only native compiling that dependency resulted in the same behavior as not
> compiling it at all. I planned to send a separate mail to the list asking
> whether it was intended behavior once I had reduced it to a simple case, or
> if it should be submitted as a bug.
> In any case, I noticed that the "browse customization groups" buffer is
> noticeable faster in the second case. I need to try it again to confirm
> that it wasn't just waiting on the relevant source files to compile in the
> first case.
>
> I also don't understand what you mean by "version-specific load path".
>>
> In the usual unix installation, there will be a "site-lisp" one directory
> above the version specific installation directory, and another site-lisp in
> the version-specific installation directory. I'm referring to installing
> the source (ultimately) in ..../emacs/28.1/site-lisp. During the build
> it's just in the site-lisp subdirectory of the source root path.
>
>
>> Also, what kind of startup time are you talking about?
>> E.g., are you using `package-quickstart`?
>>
> That was the first alternative I tried. With 1250 packages, it did not
> work. First, the file consisted of a series of "let" forms corresponding
> to the package directories, and apparently the autoload forms are ignored
> if they appear anywhere below top-level. At least I got a number of
> warnings to that effect.
> The other problem was that I got a "bytecode overflow error". I only got
> the first error after chopping off the file approximately after the first
> 10k lines. Oddly enough, when I put all the files in the site-lisp
> directory, and collect all the autoloads for that directory in a single
> file, it has no problem with the 80k line file that results.
>
>
>> > Given I'm compiling all the files AOT for use in a common installation
>> > (this is on Linux, not Windows), the natural question for me is whether
>> > larger compilation units would be more efficient, particularly at
>> startup.
>>
>> It all depends where the slowdown comes from :-)
>>
>> E.g. `package-quickstart` follows a similar idea to the one you propose
>> by collecting all the `<pkg>-autoloads.el` into one bug file, which
>> saves us from having to load separately all those little files. It also
>> saves us from having to look for them through those hundreds
>> of directories.
>>
>> I suspect a long `load-path` can itself be a source of slow down
>> especially during startup, but I haven't bumped into that yet.
>> There are ways we could speed it up, if needed:
>>
>> - create "meta packages" (or just one containing all your packages),
>> which would bring together in a single directory the files of several
>> packages (and presumably also bring together their
>> `<pkg>-autoloads.el` into a larger combined one). Under GNU/Linux we
>> could have this metapackage be made of symlinks, making it fairly
>> efficient an non-obtrusive (e.g. `C-h o` could still get you to the
>> actual file rather than its metapackage-copy).
>> - Manage a cache of where are our ELisp files (i.e. a hash table
>> mapping relative ELisp file names to the absolute file name returned
>> by looking for them in `load-path`). This way we can usually avoid
>> scanning those hundred directories to find the .elc file we need, and
>> go straight to it.
>>
> I'm pretty sure the load-path is an issue with 1250 packages, even if half
> of them consist of single files.
>
> Since I'm preparing this for a custom installation that will be accessible
> for multiple users, I decided to try putting everything in site-lisp and
> native compile everything AOT. Most of the other potential users are not
> experienced Unix users, which is why I'm trying to make everything work
> smoothly up front and have features they would find familiar from other
> editors.
>
> One issue with this approach is that the package selection mechanism
> doesn't recognize the modules as being installed, or provide any assistance
> in selectively activating modules.
>
> Other places where there is a noticeable slowdown with large numbers of
> packages:
> * Browsing customization groups - just unfolding a single group can take
> minutes (this is on fast server hardware with a lot of free memory)
> * Browsing custom themes with many theme packages installed
> I haven't gotten to the point that I can test the same situation by
> explicitly loading the same modules from the site-lisp directory that had
> been activated as packages. Installing the themes in the system directory
> does skip the "suspicious files" check that occurs when loading them from
> the user configuration.
>
>
>> > I posed the question to the list mostly to see if the approach (or
>> similar)
>> > had already been tested for viability or effectiveness, so I can avoid
>> > unnecessary experimentation if the answer is already well-understood.
>>
>> I don't think it has been tried, no.
>>
>> > I don't know enough about modern library loading to know whether you'd
>> > expect N distinct but interdependent dynamic libraries to be loaded in
>> as
>> > compact a memory region as a single dynamic library formed from the same
>> > underlying object code.
>>
>> I think you're right here, but I'd expect the effect to be fairly small
>> except when the .elc/.eln files are themselves small.
>>
>
> There are a lot of packages that have fairly small source files, just
> because they've factored their code the same way it would be in languages
> where the shared libraries are not in 1-1 correspondence with source files.
>
>>
>> > It's not clear to me whether those points are limited to call
>> > sites or not.
>>
>> I believe it is: the optimization is to replace a call via `Ffuncall` to
>> a "symbol" (which looks up the value stored in the `symbol-function`
>> cell), with a direct call to the actual C function contained in the
>> "subr" object itself (expected to be) contained in the
>> `symbol-function` cell.
>>
>> Andrea would know if there are other semantic-non-preserving
>> optimizations in the level 3 of the optimizations, but IIUC this is very
>> much the main one.
>>
>> >> IIUC the current native-compiler will actually leave those
>> >> locally-defined functions in their byte-code form :-(
>> > That's not what I understood from
>> > https://akrl.sdf.org/gccemacs.html#org0f21a5b
>> > As you deduce below, I come from a Scheme background - cl-flet is the
>> form
>> > I should have referenced, not let.
>>
>> Indeed you're right that those functions can be native compiled, tho only
>> if
>> they're closed (i.e. if they don't refer to surrounding lexical
>> variables).
>> [ I always forget that little detail :-( ]
>>
>
> I would expect this would apply to most top-level defuns in elisp
> packages/modules. From my cursory review, it looks like the ability to
> redefine these defuns is mostly useful when developing the packages
> themselves, and "sealing" them for use would be appropriate.
> I'm not clear on whether this optimization is limited to the case of
> calling functions defined in the compilation unit, or applied more broadly.
>
> Thanks,
> Lynn
>
>
>>
[-- Attachment #2: Type: text/html, Size: 19291 bytes --]
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-05 12:16 ` Lynn Winebarger
2022-06-05 14:08 ` Lynn Winebarger
@ 2022-06-05 14:20 ` Stefan Monnier
2022-06-06 4:12 ` Lynn Winebarger
2022-06-14 4:19 ` Lynn Winebarger
1 sibling, 2 replies; 46+ messages in thread
From: Stefan Monnier @ 2022-06-05 14:20 UTC (permalink / raw)
To: Lynn Winebarger; +Cc: Andrea Corallo, emacs-devel
> Unfortunately sometimes we have to cope with environment we use. And for
> all I know some of the performance penalties may be inherent in the
> (security related) infrastructure requirements in a highly regulated
> industry.
What we learned at the end of last century is exactly that there aren't
any such *inherent* performance penalties. It may take extra coding
work in the file-system to make it fast with 10k entries. It may take
yet more work to make it fast with 10G entries. But it can be done (and
has been done), and compared to the overall complexity of current
kernels, it's a drop in the bucket.
So nowadays if it's slow with 10k entries you should treat it as a bug
(could be a configuration problem, or some crap software (anti-virus?)
getting in the way, or ...).
> Not that that should be a primary concern for the development team, but it
> is something a local packager might be stuck with.
Indeed. Especially if it only affects a few rare Emacs users which
don't have much leverage with the MS-certified sysadmins.
>> >> [ But that doesn't mean we shouldn't try to compile several ELisp files
>> >> into a single ELN file, especially since the size of ELN files seems
>> >> to be proportionally larger for small ELisp files than for large
>> >> ones. ]
>> >
>> > Since I learned of the native compiler in 28.1, I decided to try it out
>> and
>> > also "throw the spaghetti at the wall" with a bunch of packages that
>> > provide features similar to those found in more "modern" IDEs. In terms
>> of
>> > startup time, the normal package system does not deal well with hundreds
>> of
>> > directories on the load path, regardless of AOR native compilation, so
>> I'm
>> > tranforming the packages to install in the version-specific load path,
>> and
>> > compiling that ahead of time. At least for the ones amenable to such
>> > treatment.
>>
>> There are two load-paths at play (`load-path` and
>> `native-comp-eln-load-path`) and I'm not sure which one you're taking
>> about. OT1H `native-comp-eln-load-path` should not grow with the number
>> of packages so it typically contains exactly 2 entries, and definitely
>> not hundreds. OTOH `load-path` is unrelated to native compilation.
>>
>
> Not entirely - as I understand it, the load system first finds the source
> file and computers a hash before determining if there is an ELN file
> corresponding to it.
`load-path` is used for native-compiled files, yes. But it's used
in exactly the same way (and should hence cost the same) for:
- No native compilation
- AOT native compilation
- lazy native compilation
Which is what I meant by "unrelated to native compilation".
> Although I do wonder if there is some optimization for ELN files in the
> system directory as opposed to the user's cache. I have one build where I
> native compiled (but not byte compiled) all the el files in the lisp
> directory,
IIUC current code only loads an ELN file if there is a corresponding ELC
file, so natively compiling a file without also byte-compiling it is
definitely not part of the expected situation. Buyer beware.
>> I also don't understand what you mean by "version-specific load path".
> In the usual unix installation, there will be a "site-lisp" one directory
> above the version specific installation directory, and another site-lisp in
> the version-specific installation directory. I'm referring to installing
> the source (ultimately) in ..../emacs/28.1/site-lisp. During the build
> it's just in the site-lisp subdirectory of the source root path.
I'm not following you. Are you talking about compiling third-party
packages during the compilation of Emacs itself by placing them into
a `site-lisp` subdirectory inside Emacs's own source code tree, and then
moving the resulting `.el` and `.elc` files to the `../NN.MM/site-lisp`
subdirectory in Emacs's installation target directory?
And you're saying that whether you place them in `../NN.MM/site-lisp`
rather than in `../site-lisp` makes a significant performance difference?
>> Also, what kind of startup time are you talking about?
>> E.g., are you using `package-quickstart`?
> That was the first alternative I tried. With 1250 packages, it did not
> work.
Please `M-x report-emacs-bug` (and put me in `X-Debbugs-Cc`).
> First, the file consisted of a series of "let" forms corresponding
> to the package directories, and apparently the autoload forms are ignored
> if they appear anywhere below top-level. At least I got a number of
> warnings to that effect.
> The other problem was that I got a "bytecode overflow error". I only got
> the first error after chopping off the file approximately after the first
> 10k lines. Oddly enough, when I put all the files in the site-lisp
> directory, and collect all the autoloads for that directory in a single
> file, it has no problem with the 80k line file that results.
We need to fix those problems. Please try and give as much detail as
possible in your bug report so we can try and reproduce it on our end
(both for the warnings about non-top-level forms and for the bytecode
overflow).
> I'm pretty sure the load-path is an issue with 1250 packages, even if half
> of them consist of single files.
I'm afraid so, indeed.
> One issue with this approach is that the package selection mechanism
> doesn't recognize the modules as being installed, or provide any assistance
> in selectively activating modules.
Indeed, since the selective activation relies crucially on the
`load-path` for that.
> Other places where there is a noticeable slowdown with large numbers of
> packages:
> * Browsing customization groups - just unfolding a single group can take
> minutes (this is on fast server hardware with a lot of free memory)
Hmm... can't think of why that would be. You might want to make
a separate bug-report for that.
> * Browsing custom themes with many theme packages installed
> I haven't gotten to the point that I can test the same situation by
> explicitly loading the same modules from the site-lisp directory that had
> been activated as packages. Installing the themes in the system directory
> does skip the "suspicious files" check that occurs when loading them from
> the user configuration.
Same here. I'm not very familiar with the custom-theme code, but it
does seem "unrelated" in the sense that I don't think fixing some of the
other problems you've encountered will fix this one.
>> I think you're right here, but I'd expect the effect to be fairly small
>> except when the .elc/.eln files are themselves small.
> There are a lot of packages that have fairly small source files, just
> because they've factored their code the same way it would be in languages
> where the shared libraries are not in 1-1 correspondence with source files.
Oh, indeed, small source files are quite common.
> I would expect this would apply to most top-level defuns in elisp
> packages/modules. From my cursory review, it looks like the ability to
> redefine these defuns is mostly useful when developing the packages
> themselves, and "sealing" them for use would be appropriate.
Advice are not used very often, but it's very hard to predict on which
function(s) they may end up being needed, and sealing would make advice
ineffective. I would personally recommend to just stay away from the
level 3 of the native compiler's optimization. Or at least, only use it
in targeted ways, i.e. only at the very rare few spots where you've
clearly found it to have a noticeable performance benefit.
In lower levels of optimization, those same calls are still optimized
but just less aggressively, which basically means they turn into:
if (<symbol unchanged)
<call the C function directly>;
else
<use the old slow but correct code path>;
Stefan
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-05 14:08 ` Lynn Winebarger
@ 2022-06-05 14:46 ` Stefan Monnier
0 siblings, 0 replies; 46+ messages in thread
From: Stefan Monnier @ 2022-06-05 14:46 UTC (permalink / raw)
To: Lynn Winebarger; +Cc: Andrea Corallo, emacs-devel
> Unrelated, but the one type of file I don't seem to be able to produce AOT
> (because I have no way to specify them) in the system directory are the
> subr/trampoline files. Any hints on how to make those AOT in the system
> directory?
[ No idea, sorry. ]
> Also, I should have responded to the first question - "minutes" on recent
> server-grade hardware with 24 cores and >100GB of RAM. That was with 1193
> enabled packages in my .emacs file.
And those minutes are all spent in `package-activate-all` or are they
spent in other parts of the init file?
[ Also, in my experience several packages are poorly behaved in the sense
that they presume that if you install them you will probably use them in
all Emacs sessions so they eagerly load/execute a lot of code
during startup (some even enable themselves unconditionally).
In those cases `package-quickstart` doesn't help very much. ]
Stefan
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-05 14:20 ` Stefan Monnier
@ 2022-06-06 4:12 ` Lynn Winebarger
2022-06-06 6:12 ` Stefan Monnier
2022-06-14 4:19 ` Lynn Winebarger
1 sibling, 1 reply; 46+ messages in thread
From: Lynn Winebarger @ 2022-06-06 4:12 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Andrea Corallo, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 7535 bytes --]
On Sun, Jun 5, 2022, 10:20 AM Stefan Monnier <monnier@iro.umontreal.ca>
wrote:
>
> >> >> [ But that doesn't mean we shouldn't try to compile several ELisp
> files
> >> >> into a single ELN file, especially since the size of ELN files
> seems
> >> >> to be proportionally larger for small ELisp files than for large
> >> >> ones. ]
> >> >
>
Not sure if these general statistics are of much use, but of 4324 source
files successfully compiled (1557 from the lisp directory), with a total
size of 318MB, including 13 trampolines,
The smallest 450 are 17632 bytes or less, with the trampolines at 16744
bytes, total of 7.4M
The smallest 1000 are under 25700 bytes, totaling 20M
The smallest 2000 are under 38592 bytes, totaling 48M
The smallest 3000 are under 62832 bytes, totaling 95M
The smallest 4000 are under 188440 bytes, totaling 194M
There are only 58 over 500k in size, and only 13 over 1M (max is 3.1M)
Those last 58 total about 52M in size.
I am curious as to why the system doesn't just produce trampolines for all
the system calls AOT in a single module.
`load-path` is used for native-compiled files, yes. But it's used
> in exactly the same way (and should hence cost the same) for:
> - No native compilation
> - AOT native compilation
> - lazy native compilation
> Which is what I meant by "unrelated to native compilation".
>
True, but it does lead to a little more disappointment when that 2.5-5x
speedup is dominated by the load-path length while starting up.
> > Although I do wonder if there is some optimization for ELN files in the
> > system directory as opposed to the user's cache. I have one build where
> I
> > native compiled (but not byte compiled) all the el files in the lisp
> > directory,
>
> IIUC current code only loads an ELN file if there is a corresponding ELC
> file, so natively compiling a file without also byte-compiling it is
> definitely not part of the expected situation. Buyer beware.
>
That would explain the behavior I've seen. If that's the case, shouldn't
batch-native-compile produce the byte-compiled file if it doesn't exist?
I'm not following you. Are you talking about compiling third-party
> packages during the compilation of Emacs itself by placing them into
> a `site-lisp` subdirectory inside Emacs's own source code tree, and then
> moving the resulting `.el` and `.elc` files to the `../NN.MM/site-lisp`
> <http://NN.MM/site-lisp>
> subdirectory in Emacs's installation target directory?
>
That's the way I'm doing it. Compatibility of these packages with Emacs
versions varies too much for me to want to treat them as
version-independent. I got burned in an early attempt where I didn't set
the prefix, and emacs kept adding the /usr/share site-lisp paths even
running from the build directory, and the version of auctex that is
installed there is compatible with 24.3 but not 28.1, so I kept getting
mysterious compile errors for the auctex packages until I realized what was
going on.
And you're saying that whether you place them in `../NN.MM/site-lisp`
> <http://NN.MM/site-lisp>
>
rather than in `../site-lisp` makes a significant performance difference?
>
Sorry, no. I meant I'm curious if having them in the user's cache versus
the system ELN cache would make any difference in start-up time, ignoring
the initial async native compilation. In particular whether the checksum
calculation is bypassed in one case but not the other (by keeping a
permanent mapping from the system load-path to the system cache, say).
other problem was that I got a "bytecode overflow error". I only got
> > the first error after chopping off the file approximately after the first
> > 10k lines. Oddly enough, when I put all the files in the site-lisp
> > directory, and collect all the autoloads for that directory in a single
> > file, it has no problem with the 80k line file that results.
>
> We need to fix those problems. Please try and give as much detail as
> possible in your bug report so we can try and reproduce it on our end
> (both for the warnings about non-top-level forms and for the bytecode
> overflow).
>
> > I'm pretty sure the load-path is an issue with 1250 packages, even if
> half
> > of them consist of single files.
>
> I'm afraid so, indeed.
>
> > One issue with this approach is that the package selection mechanism
> > doesn't recognize the modules as being installed, or provide any
> assistance
> > in selectively activating modules.
>
> Indeed, since the selective activation relies crucially on the
> `load-path` for that.
>
> > Other places where there is a noticeable slowdown with large numbers of
> > packages:
> > * Browsing customization groups - just unfolding a single group can
> take
> > minutes (this is on fast server hardware with a lot of free memory)
>
> Hmm... can't think of why that would be. You might want to make
> a separate bug-report for that.
>
> > * Browsing custom themes with many theme packages installed
> > I haven't gotten to the point that I can test the same situation by
> > explicitly loading the same modules from the site-lisp directory that had
> > been activated as packages. Installing the themes in the system
> directory
> > does skip the "suspicious files" check that occurs when loading them from
> > the user configuration.
>
> Same here. I'm not very familiar with the custom-theme code, but it
> does seem "unrelated" in the sense that I don't think fixing some of the
> other problems you've encountered will fix this one.
>
I agree, but there was the possiblity the compilation process (I'm assuming
the byte-compile stage would do this, if it were done at all) would
precompute things like customization groups for the compilation unit. Then
aggregating the source of compilation units into larger libraries might be
expected to significantly decrease the amount of dynamic computation
currently required.
I know there's no inherent link to native compilation, it's more a case of
if NC makes the implementation fast enough to make these additional
packages attractive, you're more likely to see the consequences of design
choices made assuming the byte code interpreter would be the bottleneck,
etc.
> I would expect this would apply to most top-level defuns in elisp
> > packages/modules. From my cursory review, it looks like the ability to
> > redefine these defuns is mostly useful when developing the packages
> > themselves, and "sealing" them for use would be appropriate.
>
> Advice are not used very often, but it's very hard to predict on which
> function(s) they may end up being needed, and sealing would make advice
> ineffective. I would personally recommend to just stay away from the
> level 3 of the native compiler's optimization. Or at least, only use it
> in targeted ways, i.e. only at the very rare few spots where you've
> clearly found it to have a noticeable performance benefit.
>
> In lower levels of optimization, those same calls are still optimized
> but just less aggressively, which basically means they turn into:
>
> if (<symbol unchanged)
> <call the C function directly>;
> else
> <use the old slow but correct code path>;
I'm guessing the native compiled code is making the GC's performance a more
noticeable chunk of overhead. I'd really love to see something like
Chromium's concurrent gc integrated into emacs.
If I do any rigorous experiments to see if there's anything resembling a
virtuous cycle in larger compilation units + higher intraprocedural
optimizations, I'll report back.
Lynn
[-- Attachment #2: Type: text/html, Size: 10454 bytes --]
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-06 4:12 ` Lynn Winebarger
@ 2022-06-06 6:12 ` Stefan Monnier
2022-06-06 10:39 ` Eli Zaretskii
2022-06-06 16:13 ` Lynn Winebarger
0 siblings, 2 replies; 46+ messages in thread
From: Stefan Monnier @ 2022-06-06 6:12 UTC (permalink / raw)
To: Lynn Winebarger; +Cc: Andrea Corallo, emacs-devel
> Not sure if these general statistics are of much use, but of 4324 source
> files successfully compiled (1557 from the lisp directory), with a total
> size of 318MB, including 13 trampolines,
> The smallest 450 are 17632 bytes or less, with the trampolines at 16744
> bytes, total of 7.4M
> The smallest 1000 are under 25700 bytes, totaling 20M
> The smallest 2000 are under 38592 bytes, totaling 48M
> The smallest 3000 are under 62832 bytes, totaling 95M
> The smallest 4000 are under 188440 bytes, totaling 194M
> There are only 58 over 500k in size, and only 13 over 1M (max is 3.1M)
> Those last 58 total about 52M in size.
The way I read this, the small files don't dominate, so bundling them
may still be a good idea but it's probably not going to make
a big difference.
> I am curious as to why the system doesn't just produce trampolines for all
> the system calls AOT in a single module.
Trampolines are needed for any native-compiled function which
gets redefined. We could try to build them eagerly when the
native-compiled function is compiled, and there could be various other
ways to handle this. There's room for improvement here, but the current
system works well enough for a first version.
> True, but it does lead to a little more disappointment when that 2.5-5x
> speedup is dominated by the load-path length while starting up.
I don't know where you got that 2.5-5x expectation, but native
compilation will often result in "no speed up at all".
> That would explain the behavior I've seen. If that's the case, shouldn't
> batch-native-compile produce the byte-compiled file if it doesn't exist?
Sounds about right, tho maybe there's a good reason for the current
behavior, I don't know. Maybe you should `M-x report-emacs-bug`.
> Sorry, no. I meant I'm curious if having them in the user's cache versus
> the system ELN cache would make any difference in start-up time, ignoring
> the initial async native compilation. In particular whether the checksum
> calculation is bypassed in one case but not the other (by keeping a
> permanent mapping from the system load-path to the system cache, say).
No, I don't think it should make any difference in this respect.
> I'm guessing the native compiled code is making the GC's performance a more
> noticeable chunk of overhead.
Indeed, the GC is the same and the native compiler does not make many
efforts to reduce memory allocations, so fraction of time spent in GC
tends to increase.
> I'd really love to see something like Chromium's concurrent gc
> integrated into Emacs.
Our GC is in serious need of improvement, yes. Bolting some existing GC
onto Emacs won't be easy, tho.
> If I do any rigorous experiments to see if there's anything resembling a
> virtuous cycle in larger compilation units + higher intraprocedural
> optimizations, I'll report back.
Looking forward to it, thanks,
I'd be interested as well in seeing a `profile-report` output covering
your minute-long startup.
Stefan
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-06 6:12 ` Stefan Monnier
@ 2022-06-06 10:39 ` Eli Zaretskii
2022-06-06 16:23 ` Lynn Winebarger
2022-06-06 16:13 ` Lynn Winebarger
1 sibling, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2022-06-06 10:39 UTC (permalink / raw)
To: Stefan Monnier; +Cc: owinebar, akrl, emacs-devel
> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Andrea Corallo <akrl@sdf.org>, emacs-devel@gnu.org
> Date: Mon, 06 Jun 2022 02:12:30 -0400
>
> > That would explain the behavior I've seen. If that's the case, shouldn't
> > batch-native-compile produce the byte-compiled file if it doesn't exist?
>
> Sounds about right, tho maybe there's a good reason for the current
> behavior, I don't know.
Of course, there is: that function is what is invoked when building a
release tarball, where the *.elc files are already present. See
lisp/Makefile.in.
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-06 6:12 ` Stefan Monnier
2022-06-06 10:39 ` Eli Zaretskii
@ 2022-06-06 16:13 ` Lynn Winebarger
2022-06-07 2:39 ` Lynn Winebarger
1 sibling, 1 reply; 46+ messages in thread
From: Lynn Winebarger @ 2022-06-06 16:13 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Andrea Corallo, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 3596 bytes --]
On Mon, Jun 6, 2022 at 2:12 AM Stefan Monnier <monnier@iro.umontreal.ca>
wrote:
>
> Trampolines are needed for any native-compiled function which
> gets redefined. We could try to build them eagerly when the
> native-compiled function is compiled, and there could be various other
> ways to handle this. There's room for improvement here, but the current
> system works well enough for a first version.
>
> Yes, I agree. As I wrote in the initial email, my questions are primarily
curiosity about how the new capability can be further exploited. When I'm
not
loading the build down with a ridiculous number of packages, it performs
very well.
> > True, but it does lead to a little more disappointment when that 2.5-5x
> > speedup is dominated by the load-path length while starting up.
>
> I don't know where you got that 2.5-5x expectation, but native
> compilation will often result in "no speed up at all".
>
That's a good question - it was one of the articles I read when I first
learned
about this new capability. It was in the context of overall emacs
performance
with the feature enabled, rather than any particular piece of code.
> > Sorry, no. I meant I'm curious if having them in the user's cache versus
> > the system ELN cache would make any difference in start-up time, ignoring
> > the initial async native compilation. In particular whether the checksum
> > calculation is bypassed in one case but not the other (by keeping a
> > permanent mapping from the system load-path to the system cache, say).
>
> No, I don't think it should make any difference in this respect.
>
> > I'm guessing the native compiled code is making the GC's performance a
> more
> > noticeable chunk of overhead.
>
> Indeed, the GC is the same and the native compiler does not make many
> efforts to reduce memory allocations, so fraction of time spent in GC
> tends to increase.
>
> > I'd really love to see something like Chromium's concurrent gc
> > integrated into Emacs.
>
> Our GC is in serious need of improvement, yes. Bolting some existing GC
> onto Emacs won't be easy, tho.
>
Chromium came to mind primarily because I've been tracking V8's refactoring
of the "Oilpan" gc for use as a stand-alone collector for other projects
I'm interested in.
Though I believe V8 uses a type-tagging system treated specially by the
collector separately
from the C++ classes managed by the stand-alone collector. That's the
piece I think would
be adapted for lisp GC, with the added benefit of offering integrated GC
for types using the
cppgc interface for additional modules.
I did see a thread in the archives of emacs-devel that someone hacked
spider monkey's
collector a few years ago (2017 I believe) into emacs as a proof of
concept. My very cursory
inspection of the memory allocation bits of the emacs core give me the
impression the abstraction
boundaries set by the simple interface are not rampantly violated. I would
hope that at this
point adapting the V8 (or similar) collector would be more straightforward
than that effort was.
I'm also not sure whether code derived from V8 would be eligible for
incorporation into emacs directly, given
the legal requirements for explicit copyright assignment. Maybe the best
bet would be to define a
rigorous interface and allow alternative GC implementations to be plugged
in. That would make it easier
to experiment with alternative garbage collectors more generally, which
would probably be a general positive
if you were looking to improve that part of the system in general while
maintaining the current safe implementation.
Lynn
[-- Attachment #2: Type: text/html, Size: 4790 bytes --]
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-06 10:39 ` Eli Zaretskii
@ 2022-06-06 16:23 ` Lynn Winebarger
2022-06-06 16:58 ` Eli Zaretskii
0 siblings, 1 reply; 46+ messages in thread
From: Lynn Winebarger @ 2022-06-06 16:23 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Stefan Monnier, Andrea Corallo, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1273 bytes --]
On Mon, Jun 6, 2022 at 6:39 AM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Stefan Monnier <monnier@iro.umontreal.ca>
> > Cc: Andrea Corallo <akrl@sdf.org>, emacs-devel@gnu.org
> > Date: Mon, 06 Jun 2022 02:12:30 -0400
> >
> > > That would explain the behavior I've seen. If that's the case,
> shouldn't
> > > batch-native-compile produce the byte-compiled file if it doesn't
> exist?
> >
> > Sounds about right, tho maybe there's a good reason for the current
> > behavior, I don't know.
>
> Of course, there is: that function is what is invoked when building a
> release tarball, where the *.elc files are already present. See
> lisp/Makefile.in.
>
That's what I expected was the case, but the question is whether it "should"
check for those .elc files and create them only if they do not exist, as
opposed
to batch-byte+native-compile, which creates both unconditionally. Or
perhaps
just note the possible hiccup in the docstring for batch-native-compile?
However, since the eln file can be generated without the elc file, it also
begs the question
of why the use of the eln file is conditioned on the existence of the elc
file in the
first place. Are there situations where the eln file would be incorrect to
use
without the byte-compiled file in place?
Lynn
[-- Attachment #2: Type: text/html, Size: 1979 bytes --]
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-06 16:23 ` Lynn Winebarger
@ 2022-06-06 16:58 ` Eli Zaretskii
2022-06-07 2:14 ` Lynn Winebarger
0 siblings, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2022-06-06 16:58 UTC (permalink / raw)
To: Lynn Winebarger; +Cc: monnier, akrl, emacs-devel
> From: Lynn Winebarger <owinebar@gmail.com>
> Date: Mon, 6 Jun 2022 12:23:49 -0400
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>, Andrea Corallo <akrl@sdf.org>, emacs-devel@gnu.org
>
> Of course, there is: that function is what is invoked when building a
> release tarball, where the *.elc files are already present. See
> lisp/Makefile.in.
>
> That's what I expected was the case, but the question is whether it "should"
> check for those .elc files and create them only if they do not exist, as opposed
> to batch-byte+native-compile, which creates both unconditionally. Or perhaps
> just note the possible hiccup in the docstring for batch-native-compile?
You are describing a different function. batch-native-compile was
explicitly written to support the build of a release tarball, where
the *.elc files are always present, and regenerating them is just a
waste of cycles, and also runs the risk of creating a .elc file that
is not fully functional, due to some peculiarity of the platform or
the build environment.
> However, since the eln file can be generated without the elc file, it also begs the question
> of why the use of the eln file is conditioned on the existence of the elc file in the
> first place. Are there situations where the eln file would be incorrect to use
> without the byte-compiled file in place?
Andrea was asked this question several times and explained his design,
you can find it in the archives. Basically, native compilation is
driven by byte compilation, and is a kind of side effect of it.
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-06 16:58 ` Eli Zaretskii
@ 2022-06-07 2:14 ` Lynn Winebarger
2022-06-07 10:53 ` Eli Zaretskii
0 siblings, 1 reply; 46+ messages in thread
From: Lynn Winebarger @ 2022-06-07 2:14 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Stefan Monnier, Andrea Corallo, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 3703 bytes --]
On Mon, Jun 6, 2022 at 12:58 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Lynn Winebarger <owinebar@gmail.com>
> > Date: Mon, 6 Jun 2022 12:23:49 -0400
> > Cc: Stefan Monnier <monnier@iro.umontreal.ca>, Andrea Corallo <
> akrl@sdf.org>, emacs-devel@gnu.org
> >
> > Of course, there is: that function is what is invoked when building a
> > release tarball, where the *.elc files are already present. See
> > lisp/Makefile.in.
> >
> > That's what I expected was the case, but the question is whether it
> "should"
> > check for those .elc files and create them only if they do not exist, as
> opposed
> > to batch-byte+native-compile, which creates both unconditionally. Or
> perhaps
> > just note the possible hiccup in the docstring for batch-native-compile?
>
> You are describing a different function. batch-native-compile was
> explicitly written to support the build of a release tarball, where
> the *.elc files are always present, and regenerating them is just a
> waste of cycles, and also runs the risk of creating a .elc file that
> is not fully functional, due to some peculiarity of the platform or
> the build environment.
>
Ok - I'm not sure why only generating the .elc in the case that it does not
already exist is inconsistent with the restriction you describe.
Ignoring that, according to
https://github.com/emacs-mirror/emacs/blob/master/lisp/emacs-lisp/comp.el the
signature and docstring are:
(defun batch-native-compile (&optional for-tarball) "Perform batch native
compilation of remaining command-line arguments.
Native compilation equivalent of `batch-byte-compile'.
Use this from the command line, with `-batch'; it won't work
in an interactive Emacs session.
Optional argument FOR-TARBALL non-nil means the file being compiled
as part of building the source tarball, in which case the .eln file
will be placed under the native-lisp/ directory (actually, in the
last directory in `native-comp-eln-load-path')."
If the restriction you describe is the intent, why not
(1) make "for-tarball" non-optional and remove that argument, and
(2) put that intent in the documentation so we would know not to use it
> > However, since the eln file can be generated without the elc file, it
> also begs the question
> > of why the use of the eln file is conditioned on the existence of the
> elc file in the
> > first place. Are there situations where the eln file would be incorrect
> to use
> > without the byte-compiled file in place?
>
> Andrea was asked this question several times and explained his design,
> you can find it in the archives. Basically, native compilation is
> driven by byte compilation, and is a kind of side effect of it.
>
I understood that already - the question was why the .elc file, as an
artifact, was required to exist in addition to the .eln file.
I did follow your (implied?) suggestion and went back through the archives
for 2021 and 2020 and saw some relevant discussions.
The last relevant post I saw was from Andrea indicating he thought it
shouldn't be required, but then it was just dropped:
https://lists.gnu.org/archive/html/emacs-devel/2020-08/msg00561.html
I have an experimental branch where the .elc are not produced at all by
make bootstrap. The only complication is that for the Emacs build I had
to modify the process to depose files containing the doc so
make-docfile.c can eat those instead of the .elc files. Other than that
we should re-add .eln to load-suffixes. But as I'm not sure this is a
requirement I'd prefer first to converge with the current setup. Unless
I get some specific input on that I think I'll keep this idea and its
branch aside for now :)
I may have missed a relevant subsequent post.
Lynn
[-- Attachment #2: Type: text/html, Size: 13765 bytes --]
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-06 16:13 ` Lynn Winebarger
@ 2022-06-07 2:39 ` Lynn Winebarger
2022-06-07 11:50 ` Stefan Monnier
0 siblings, 1 reply; 46+ messages in thread
From: Lynn Winebarger @ 2022-06-07 2:39 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Andrea Corallo, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 979 bytes --]
On Mon, Jun 6, 2022 at 2:12 AM Stefan Monnier <monnier@iro.umontreal.ca>
wrote:
> > I am curious as to why the system doesn't just produce trampolines for
> all
>
> the system calls AOT in a single module.
>
> Trampolines are needed for any native-compiled function which
> gets redefined. We could try to build them eagerly when the
> native-compiled function is compiled, and there could be various other
> ways to handle this. There's room for improvement here, but the current
> system works well enough for a first version.
>
While I was going over the archives for answers to my questions (following
Eli's
observation), I found these gems:
https://lists.gnu.org/archive/html/emacs-devel/2021-02/msg00599.html
https://lists.gnu.org/archive/html/emacs-devel/2021-02/msg00724.html
I have the impression these ideas/concerns got lost in all the other work
required
to get the first release ready, but I could have missed a follow-up
definitively knocking
them down.
Lynn
[-- Attachment #2: Type: text/html, Size: 1901 bytes --]
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-07 2:14 ` Lynn Winebarger
@ 2022-06-07 10:53 ` Eli Zaretskii
0 siblings, 0 replies; 46+ messages in thread
From: Eli Zaretskii @ 2022-06-07 10:53 UTC (permalink / raw)
To: Lynn Winebarger; +Cc: monnier, akrl, emacs-devel
> From: Lynn Winebarger <owinebar@gmail.com>
> Date: Mon, 6 Jun 2022 22:14:00 -0400
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>, Andrea Corallo <akrl@sdf.org>, emacs-devel@gnu.org
>
> > Of course, there is: that function is what is invoked when building a
> > release tarball, where the *.elc files are already present. See
> > lisp/Makefile.in.
> >
> > That's what I expected was the case, but the question is whether it "should"
> > check for those .elc files and create them only if they do not exist, as opposed
> > to batch-byte+native-compile, which creates both unconditionally. Or perhaps
> > just note the possible hiccup in the docstring for batch-native-compile?
>
> You are describing a different function. batch-native-compile was
> explicitly written to support the build of a release tarball, where
> the *.elc files are always present, and regenerating them is just a
> waste of cycles, and also runs the risk of creating a .elc file that
> is not fully functional, due to some peculiarity of the platform or
> the build environment.
>
> Ok - I'm not sure why only generating the .elc in the case that it does not already exist is inconsistent with the
> restriction you describe.
Because this function is for the case where producing *.elc files is
not wanted.
> Ignoring that, according to https://github.com/emacs-mirror/emacs/blob/master/lisp/emacs-lisp/comp.el the
> signature and docstring are:
>
> (defun batch-native-compile (&optional for-tarball) "Perform batch native compilation of remaining
> command-line arguments.
>
> Native compilation equivalent of `batch-byte-compile'.
> Use this from the command line, with `-batch'; it won't work
> in an interactive Emacs session.
> Optional argument FOR-TARBALL non-nil means the file being compiled
> as part of building the source tarball, in which case the .eln file
> will be placed under the native-lisp/ directory (actually, in the
> last directory in `native-comp-eln-load-path')."
> If the restriction you describe is the intent, why not
> (1) make "for-tarball" non-optional and remove that argument, and
> (2) put that intent in the documentation so we would know not to use it
Because that function could be used in contexts other than building a
release tarball, and I see no need to restrict it.
And I don't think I understand the use case you want to support.
When is it useful to produce *.eln files for all the *.el files, but
*.elc files only for those *.el files that were modified or for which
*.elc doesn't exist?
> > However, since the eln file can be generated without the elc file, it also begs the question
> > of why the use of the eln file is conditioned on the existence of the elc file in the
> > first place. Are there situations where the eln file would be incorrect to use
> > without the byte-compiled file in place?
>
> Andrea was asked this question several times and explained his design,
> you can find it in the archives. Basically, native compilation is
> driven by byte compilation, and is a kind of side effect of it.
>
> I understood that already - the question was why the .elc file, as an artifact, was required to exist in addition
> to the .eln file.
Where do you see that requirement?
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-07 2:39 ` Lynn Winebarger
@ 2022-06-07 11:50 ` Stefan Monnier
2022-06-07 13:11 ` Eli Zaretskii
0 siblings, 1 reply; 46+ messages in thread
From: Stefan Monnier @ 2022-06-07 11:50 UTC (permalink / raw)
To: Lynn Winebarger; +Cc: Andrea Corallo, emacs-devel
> I have the impression these ideas/concerns got lost in all the other
> work required to get the first release ready, but I could have missed
> a follow-up definitively knocking them down.
I don't think they got lost. They have simply been put
aside temporarily.
Stefan
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-07 11:50 ` Stefan Monnier
@ 2022-06-07 13:11 ` Eli Zaretskii
0 siblings, 0 replies; 46+ messages in thread
From: Eli Zaretskii @ 2022-06-07 13:11 UTC (permalink / raw)
To: Stefan Monnier; +Cc: owinebar, akrl, emacs-devel
> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Andrea Corallo <akrl@sdf.org>, emacs-devel@gnu.org
> Date: Tue, 07 Jun 2022 07:50:56 -0400
>
> > I have the impression these ideas/concerns got lost in all the other
> > work required to get the first release ready, but I could have missed
> > a follow-up definitively knocking them down.
>
> I don't think they got lost. They have simply been put
> aside temporarily.
More accurately, they are waiting for Someone(TM) to work on them. As
always happens in Emacs with useful ideas that got "put aside".
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-04 2:43 ` Lynn Winebarger
2022-06-04 14:32 ` Stefan Monnier
@ 2022-06-08 6:46 ` Andrea Corallo
1 sibling, 0 replies; 46+ messages in thread
From: Andrea Corallo @ 2022-06-08 6:46 UTC (permalink / raw)
To: Lynn Winebarger; +Cc: Stefan Monnier, emacs-devel
Lynn Winebarger <owinebar@gmail.com> writes:
[...]
> From Andrea's description, this would be the primary "unsafe" aspect of intraprocedural optimizations applied to one of
> these aggregated compilation units. That is, that the semantics of redefining function symbols would not apply to points
> in the code at which the compiler had made optimizations based on assuming the function definitions were constants. It's
> not clear to me whether those points are limited to call sites or not.
Yes, they are limited to the call site.
Andrea
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-04 14:32 ` Stefan Monnier
2022-06-05 12:16 ` Lynn Winebarger
@ 2022-06-08 6:56 ` Andrea Corallo
2022-06-11 16:13 ` Lynn Winebarger
1 sibling, 1 reply; 46+ messages in thread
From: Andrea Corallo @ 2022-06-08 6:56 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Lynn Winebarger, emacs-devel
Stefan Monnier <monnier@iro.umontreal.ca> writes:
>>> Performance issues with read access to directories containing less than
>>> 10K files seems like something that was solved last century, so
>>> I wouldn't worry very much about it.
>> Per my response to Eli, I see (network) directories become almost unusable
>> somewhere around 1000 files,
>
> I don't doubt there are still (in the current century) cases where
> largish directories get slow, but what I meant is that it's now
> considered as a problem that should be solved by making those
> directories fast rather than by avoiding making them so large.
>
>>> [ But that doesn't mean we shouldn't try to compile several ELisp files
>>> into a single ELN file, especially since the size of ELN files seems
>>> to be proportionally larger for small ELisp files than for large
>>> ones. ]
>>
>> Since I learned of the native compiler in 28.1, I decided to try it out and
>> also "throw the spaghetti at the wall" with a bunch of packages that
>> provide features similar to those found in more "modern" IDEs. In terms of
>> startup time, the normal package system does not deal well with hundreds of
>> directories on the load path, regardless of AOR native compilation, so I'm
>> tranforming the packages to install in the version-specific load path, and
>> compiling that ahead of time. At least for the ones amenable to such
>> treatment.
>
> There are two load-paths at play (`load-path` and
> `native-comp-eln-load-path`) and I'm not sure which one you're taking
> about. OT1H `native-comp-eln-load-path` should not grow with the number
> of packages so it typically contains exactly 2 entries, and definitely
> not hundreds. OTOH `load-path` is unrelated to native compilation.
>
> I also don't understand what you mean by "version-specific load path".
>
> Also, what kind of startup time are you talking about?
> E.g., are you using `package-quickstart`?
>
>> Given I'm compiling all the files AOT for use in a common installation
>> (this is on Linux, not Windows), the natural question for me is whether
>> larger compilation units would be more efficient, particularly at startup.
>
> It all depends where the slowdown comes from :-)
>
> E.g. `package-quickstart` follows a similar idea to the one you propose
> by collecting all the `<pkg>-autoloads.el` into one bug file, which
> saves us from having to load separately all those little files. It also
> saves us from having to look for them through those hundreds
> of directories.
>
> I suspect a long `load-path` can itself be a source of slow down
> especially during startup, but I haven't bumped into that yet.
> There are ways we could speed it up, if needed:
>
> - create "meta packages" (or just one containing all your packages),
> which would bring together in a single directory the files of several
> packages (and presumably also bring together their
> `<pkg>-autoloads.el` into a larger combined one). Under GNU/Linux we
> could have this metapackage be made of symlinks, making it fairly
> efficient an non-obtrusive (e.g. `C-h o` could still get you to the
> actual file rather than its metapackage-copy).
> - Manage a cache of where are our ELisp files (i.e. a hash table
> mapping relative ELisp file names to the absolute file name returned
> by looking for them in `load-path`). This way we can usually avoid
> scanning those hundred directories to find the .elc file we need, and
> go straight to it.
>
>> I posed the question to the list mostly to see if the approach (or similar)
>> had already been tested for viability or effectiveness, so I can avoid
>> unnecessary experimentation if the answer is already well-understood.
>
> I don't think it has been tried, no.
>
>> I don't know enough about modern library loading to know whether you'd
>> expect N distinct but interdependent dynamic libraries to be loaded in as
>> compact a memory region as a single dynamic library formed from the same
>> underlying object code.
>
> I think you're right here, but I'd expect the effect to be fairly small
> except when the .elc/.eln files are themselves small.
>
>> It's not clear to me whether those points are limited to call
>> sites or not.
>
> I believe it is: the optimization is to replace a call via `Ffuncall` to
> a "symbol" (which looks up the value stored in the `symbol-function`
> cell), with a direct call to the actual C function contained in the
> "subr" object itself (expected to be) contained in the
> `symbol-function` cell.
>
> Andrea would know if there are other semantic-non-preserving
> optimizations in the level 3 of the optimizations, but IIUC this is very
> much the main one.
Correct that's the main one: it does that for all calls to C primitives
and for all calls to lisp function defined in the same compilation unit.
Other than that speed 3 enables pure function optimization and self tail
recursion optimization.
Andrea
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-08 6:56 ` Andrea Corallo
@ 2022-06-11 16:13 ` Lynn Winebarger
2022-06-11 16:37 ` Stefan Monnier
0 siblings, 1 reply; 46+ messages in thread
From: Lynn Winebarger @ 2022-06-11 16:13 UTC (permalink / raw)
To: Andrea Corallo; +Cc: Stefan Monnier, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1693 bytes --]
On Wed, Jun 8, 2022, 2:56 AM Andrea Corallo <akrl@sdf.org> wrote:
> Stefan Monnier <monnier@iro.umontreal.ca> writes:
>
>
> >> It's not clear to me whether those points are limited to call
> >> sites or not.
> >
> > I believe it is: the optimization is to replace a call via `Ffuncall` to
> > a "symbol" (which looks up the value stored in the `symbol-function`
> > cell), with a direct call to the actual C function contained in the
> > "subr" object itself (expected to be) contained in the
> > `symbol-function` cell.
> >
> > Andrea would know if there are other semantic-non-preserving
> > optimizations in the level 3 of the optimizations, but IIUC this is very
> > much the main one.
>
> Correct that's the main one: it does that for all calls to C primitives
> and for all calls to lisp function defined in the same compilation unit.
>
> Other than that speed 3 enables pure function optimization and self tail
> recursion optimization.
>
Would it make sense to add a feature for declaring a function symbol value
is constant and non-advisable, at least within some notion of explicitly
named scope(s)? That would allow developers to be more selective about
which functions are "exported" to library users, and which are defined as
global function symbols because it's more convenient than wrapping
everything in a package/module/namespace in a giant cl-flet and then
explicitly "exporting" functions and macros via fset.
Then intraprocedural optimization within the named scopes would be
consistent with the language.
I'm thinking of using semantic/wisent for a modern for a proprietary
language. I am curious whether these optimizations are used or usable in
that context.
Lynn
[-- Attachment #2: Type: text/html, Size: 2393 bytes --]
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-11 16:13 ` Lynn Winebarger
@ 2022-06-11 16:37 ` Stefan Monnier
2022-06-11 17:49 ` Lynn Winebarger
0 siblings, 1 reply; 46+ messages in thread
From: Stefan Monnier @ 2022-06-11 16:37 UTC (permalink / raw)
To: Lynn Winebarger; +Cc: Andrea Corallo, emacs-devel
> Would it make sense to add a feature for declaring a function symbol value
> is constant and non-advisable, at least within some notion of explicitly
> named scope(s)? That would allow developers to be more selective about
> which functions are "exported" to library users, and which are defined as
> global function symbols because it's more convenient than wrapping
> everything in a package/module/namespace in a giant cl-flet and then
> explicitly "exporting" functions and macros via fset.
In which sense would it be different from:
(cl-flet
...
(defun ...)
(defun ...)
...)
-- Stefan
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-11 16:37 ` Stefan Monnier
@ 2022-06-11 17:49 ` Lynn Winebarger
2022-06-11 20:34 ` Stefan Monnier
0 siblings, 1 reply; 46+ messages in thread
From: Lynn Winebarger @ 2022-06-11 17:49 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Andrea Corallo, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1593 bytes --]
On Sat, Jun 11, 2022 at 12:37 PM Stefan Monnier <monnier@iro.umontreal.ca>
wrote:
> > Would it make sense to add a feature for declaring a function symbol
> value
> > is constant and non-advisable, at least within some notion of explicitly
> > named scope(s)? That would allow developers to be more selective about
> > which functions are "exported" to library users, and which are defined as
> > global function symbols because it's more convenient than wrapping
> > everything in a package/module/namespace in a giant cl-flet and then
> > explicitly "exporting" functions and macros via fset.
>
> In which sense would it be different from:
>
> (cl-flet
> ...
> (defun ...)
> (defun ...)
> ...)
>
>
Good point - it's my scheme background confusing me. I was thinking defun
would operate with similar scoping rules as defvar and establish a local
binding, where fset (like setq) would not create any new bindings.
(1) I don't know how much performance difference (if any) there is between
(fsetq exported-fxn #'internal-implementation)
and
(defun exported-fxn (x y ...) (internal-implementation x y ...))
(2) I'm also thinking of more aggressively forcing const-ness at run-time
with something like:
(eval-when-compile
(cl-flet ((internal-implemenation (x y ...) body ...))
(fset exported-fxn #'internal-implementation)))
(fset exported-fxn (eval-when-compile #'exported-fxn))
If that makes sense, is there a way to do the same thing with defun?
Or perhaps cl-labels instead of cl-flet, assuming they are both optimized
the same way.
Lynn
[-- Attachment #2: Type: text/html, Size: 2200 bytes --]
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-11 17:49 ` Lynn Winebarger
@ 2022-06-11 20:34 ` Stefan Monnier
2022-06-12 17:38 ` Lynn Winebarger
0 siblings, 1 reply; 46+ messages in thread
From: Stefan Monnier @ 2022-06-11 20:34 UTC (permalink / raw)
To: Lynn Winebarger; +Cc: Andrea Corallo, emacs-devel
>> In which sense would it be different from:
>>
>> (cl-flet
>> ...
>> (defun ...)
>> (defun ...)
>> ...)
>>
>>
> Good point - it's my scheme background confusing me. I was thinking defun
> would operate with similar scoping rules as defvar and establish a local
> binding, where fset (like setq) would not create any new bindings.
I was not talking about performance but about semantics (under the
assumption that if the semantics is the same then it should be possible
to get the same performance somehow).
> (1) I don't know how much performance difference (if any) there is between
> (fsetq exported-fxn #'internal-implementation)
> and
> (defun exported-fxn (x y ...) (internal-implementation x y ...))
If you don't want the indirection, then use `defalias` (which is like
`fset` but registers the action as one that *defines* the function, for
the purpose of `C-h f` and the likes, and they also have slightly
different semantics w.r.t advice).
> (2) I'm also thinking of more aggressively forcing const-ness at run-time
> with something like:
> (eval-when-compile
> (cl-flet ((internal-implemenation (x y ...) body ...))
> (fset exported-fxn #'internal-implementation)))
> (fset exported-fxn (eval-when-compile #'exported-fxn))
>
> If that makes sense, is there a way to do the same thing with defun?
I don't know what the above code snippet is intended to show/do, sorry :-(
Stefan
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-11 20:34 ` Stefan Monnier
@ 2022-06-12 17:38 ` Lynn Winebarger
2022-06-12 18:47 ` Stefan Monnier
0 siblings, 1 reply; 46+ messages in thread
From: Lynn Winebarger @ 2022-06-12 17:38 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Andrea Corallo, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 14046 bytes --]
On Sat, Jun 11, 2022 at 4:34 PM Stefan Monnier <monnier@iro.umontreal.ca>
wrote:
> >> In which sense would it be different from:
> >>
> >> (cl-flet
> >> ...
> >> (defun ...)
> >> (defun ...)
> >> ...)
> >>
> >>
> > Good point - it's my scheme background confusing me. I was thinking
> defun
> > would operate with similar scoping rules as defvar and establish a local
> > binding, where fset (like setq) would not create any new bindings.
>
> I was not talking about performance but about semantics (under the
> assumption that if the semantics is the same then it should be possible
> to get the same performance somehow).
>
I'm trying to determine if there's a set of expressions for which it is
semantically sound
to perform the intraprocedural optimizations by -O3 - that is, where it is
correct to
treat functions in operator position as constants rather than a reference
through a
symbol's function cell.
>
> > (1) I don't know how much performance difference (if any) there is
> between
> > (fsetq exported-fxn #'internal-implementation)
> > and
> > (defun exported-fxn (x y ...) (internal-implementation x y ...))
>
> If you don't want the indirection, then use `defalias` (which is like
> `fset` but registers the action as one that *defines* the function, for
> the purpose of `C-h f` and the likes, and they also have slightly
> different semantics w.r.t advice).
>
> What I'm looking for is for a function as a first class value, whether as
a byte-code vector,
a symbolic reference to a position in the .text section (or equivalent) of
a shared object that may or may
not have been loaded, or a pointer to a region that is allowed to be
executed.
> > (2) I'm also thinking of more aggressively forcing const-ness at run-time
> > with something like:
> > (eval-when-compile
> > (cl-flet ((internal-implemenation (x y ...) body ...))
> > (fset exported-fxn #'internal-implementation)))
> > (fset exported-fxn (eval-when-compile #'exported-fxn))
> >
> > If that makes sense, is there a way to do the same thing with defun?
>
> I don't know what the above code snippet is intended to show/do, sorry :-(
>
I'm trying to capture a function as a first class value.
Better example - I put the following in ~/test1.el and byte compiled it
(with emacs 28.1 running on cygwin).
-------------
(require 'cl-lib)
(eval-when-compile
(cl-labels ((my-evenp (n) (if (= n 0) t (my-oddp (1- n))))
(my-oddp (n) (if (= n 0) nil (my-evenp (1- n)))))
(defun my-global-evenp (n) (my-evenp n))
(defun my-global-oddp (n) (my-oddp n))))
-----------------
I get the following (expected) error when running in batch (or
interactively, if only loading the compiled file)
$ emacs -batch --eval '(load "~/test1.elc")' --eval '(message "%s"
(my-global-evenp 5))'
Loading ~/test1.elc...
Debugger entered--Lisp error: (void-function my-global-evenp)
(my-global-evenp 5)
(message "%s" (my-global-evenp 5))
eval((message "%s" (my-global-evenp 5)) t)
command-line-1(("--eval" "(load \"~/test1.elc\")" "--eval" "(message
\"%s\" (my-global-evenp 5))"))
command-line()
normal-top-level()
The function symbol is only defined at compile time by the defun, so it is
undefined when the byte-compiled file is loaded in a clean environment.
When I tried using (fset 'my-global-evenp (eval-when-compile
#'my-ct-global-evenp) it just produced a symbol indirection, which was
disappointing.
So here there are global compile time variables being assigned trampolines
to the local functions at compile time as values.
-------------------------------
(require 'cl-lib)
(eval-when-compile
(defvar my-ct-global-evenp nil)
(defvar my-ct-global-oddp nil)
(cl-labels ((my-evenp (n) (if (= n 0) t (my-oddp (1- n))))
(my-oddp (n) (if (= n 0) nil (my-evenp (1- n)))))
(setq my-ct-global-evenp (lambda (n) (my-evenp n)))
(setq my-ct-global-oddp (lambda (n) (my-oddp n)))))
(fset 'my-global-evenp (eval-when-compile my-ct-global-evenp))
(fset 'my-global-oddp (eval-when-compile my-ct-global-oddp))
-------------------------------
Then I get
$ emacs -batch --eval '(load "~/test2.elc")' --eval '(message "%s"
(my-global-evenp 5))'
Loading ~/test2.elc...
Debugger entered--Lisp error: (void-variable --cl-my-evenp--)
my-global-evenp(5)
(message "%s" (my-global-evenp 5))
eval((message "%s" (my-global-evenp 5)) t)
command-line-1(("--eval" "(load \"~/test2.elc\")" "--eval" "(message
\"%s\" (my-global-evenp 5))"))
command-line()
normal-top-level()
This I did not expect. Maybe the variable name is just an artifact of the
way cl-labels is implemented and not a fundamental limitation.
Third attempt to express a statically allocated closure with constant code
(which is one way of viewing an ELF shared object):
--------------------------------
(require 'cl-lib)
(eval-when-compile
(defvar my-ct-global-evenp nil)
(defvar my-ct-global-oddp nil)
(let (my-evenp my-oddp)
(setq my-evenp (lambda (n) (if (= n 0) t (funcall my-oddp (1- n)))))
(setq my-oddp (lambda (n) (if (= n 0) nil (funcall my-evenp (1- n)))))
(setq my-ct-global-evenp (lambda (n) (funcall my-evenp n)))
(setq my-ct-global-oddp (lambda (n) (funcall my-oddp n)))))
(fset 'my-global-evenp (eval-when-compile my-ct-global-evenp))
(fset 'my-global-oddp (eval-when-compile my-ct-global-oddp))
--------------------------------
And the result is worse:
$ emacs -batch --eval '(load "~/test3.elc")' --eval '(message "%s"
(my-global-evenp 5))'
Loading ~/test3.elc...
Debugger entered--Lisp error: (void-variable my-evenp)
my-global-evenp(5)
(message "%s" (my-global-evenp 5))
eval((message "%s" (my-global-evenp 5)) t)
command-line-1(("--eval" "(load \"~/test3.elc\")" "--eval" "(message
\"%s\" (my-global-evenp 5))"))
command-line()
normal-top-level()
This was not expected with lexical scope.
$ emacs -batch --eval '(load "~/test3.elc")' --eval "(message \"%s\"
(symbol-function 'my-global-evenp))"
Loading ~/test3.elc...
#[(n) !\207 [my-evenp n] 2]
At least my-global-evenp has byte-code as a value, not a symbol, which was
the intent. I get the same result if I wrap the two lambdas
stored in the my-ct-* variables with "byte-compile", which is what I
intended (for the original to be equivalent to explicitly compiling the
form).
However, what I expected would have been the byte-code equivalent of an ELF
object with 2 symbols defined for relocation.
So why is the compiler producing code that would correspond to the "let"
binding my-evenp and my-oddp being dynamically scoped?
That made me curious, so I found https://rocky.github.io/elisp-bytecode.pdf
and reviewed it.
I believe I see the issue now. With the current byte-codes, there's just
no way to express a call to an offset in the current byte-vector.
There's not even a way to reference the address of the current byte vector
to use as an argument to funcall. There's no way to reference
symbols that were resolved at compile-time at all, which would require the
equivalent of dl symbols embedded in a code vector
that would be patched at load time. That forces the compiler to emit a
call to a symbol. And when the manual talks about lexical scope,
it's only for "variables" not function symbols.
That explains a lot. The reason Andrea had to use LAP as the starting
point for optimizations, for example. I can't find a spec for
Emacs's version of LAP, but I'm guessing it can still express symbolic
names for local function expressions in a way byte-code
simply cannot.
I don't see how the language progresses without resolving the
inconsistency between what's expressible in ELF and what's expressible
in a byte-code object.
One possible set of changes to make the two compatible - and I'd use the
relative goto byte codes if they haven't been produced by emacs since v19.
I'd also add a few special registers. There's already one used to enable
GOTO (i.e. the program counter)
- byte codes for call/returns directly into/from byte code objects
- CALL-RELATIVE - execute a function call to the current byte-vector
object with the pc set to the pc+operand0 - basically PIC code
If a return is required, the byte compiler should arrange for the
return address to be pushed before other operands to the function being
called
No additional manipulation of the stack is required, since funcall
would just pop the arguments and then immediately push them again.
Alternatively, you could have a byte-code that explicitly allocates a
stack frame (if needed), push the return offset, then goto
- CALL-ABSOLUTE - execute a function call to a specified byte-vector
object + pc as the first 2 operands, This is useless until the
byte-code
object
supports a notional of relocation symbols, i.e. named compile-time
constants that get patched on load in one way or another, e.g. directly by
modifying the byte-string with the value at run-time (assuming eager
loading), or indirectly by adding a "linkage table" of external symbols
that will be filled in at load and specifying an index into that
table.
- RETURN-RELATIVE - operand is the number of items that have to be
popped from the stack to get the return address, which is an
offset in the
current
byte-vector object. Alternatively, could be implemented as "discardN
<n>; goto"
- RETURN-ABSOLUTE - same as return-relative, but the return address
is given by two operands, a byte-vector and offset in the byte-vector
- Alternate formulation
- RESERVE-STACK operand is a byte-vector object (reference) that will
be used to determine how much total stack space will be required for
safety, and
ensure enough space is allocated.
- GOTO-ABSOLUTE - operand is a byte-vector object and an offset.
Immediate control transfer to the specified context
- These two are adequate to implement the above
- Additional registers and related instructions
- PC - register already exists
- PUSH-PC - the opposite of goto, which pops the stack into the PC
register.
- GOT - a table of byte-vectors + offsets corresponding to a PLT
section of the byte-vector specifying the compile-time symbols
that have to
be resolved
- The byte-vector references + offset in the "absolute"
instructions above would be specified as an index into this table.
Otherwise the byte-vector could
not be saved and directly loaded for later execution.
- STATIC - a table for the lexical variables allocated and accessible
to the closures at compile-time. Compiler should treat all sexp as
occuring at the
top-level with regard to the run-time lexical environment. A form
like (let ((x 5)) (byte-compile (lambda (n) (+ n
(eval-when-compile x)))))
should produce
byte-code with the constant 5, while (let ((x 5)) (byte-compile
(lambda (n) (+ n x)))) should produce byte code adding the argument n to
the value of the
global variable x at run-time
- PUSH-STATIC
- POP-STATIC
- ENV - the environment register.
- ENV-PUSH-FRAME - operand is number of stack items to capture as
a (freshly allocated) frame, which is then added as a rib to a new
environment pointed to by the
ENV register
- PUSH-ENV - push the value of ENV onto the stack
- POP-ENV - pop the top of the stack into ENV, discarding any
value there
- Changes to byte-code object
- IMPORTS table of symbols defined at compile-time requiring
resolution to constants at load-time, particularly for references to
compilation units
(byte-vector or native code) and exported symbols bound to constants
(really immutable)
Note - the "relative" versions of call and return above could be
eliminated if "IMPORTS" includes self-references into the byte-vector
object itself
- EXPORTS table of symbols available to be called or referenced
externally
- Static table with values initialized from the values in the closure
at compile-time
- Constant table and byte string remain
- Changes to byte-code loader
- Read the new format
- Resolve symbols - should link to specific compilation units rather
than "features", as compilation units will define specific exported
symbols, while
features do not support that detail. Source could still use
"require", but the symbols referenced from the compile-time environment
would have
to be traced back to the compilation unit supplying them (unless they
are recorded as constants by an expression like
(eval-when-compile (setq v (eval-when-compile some-imported-symbol)))
- Allocate and initialize the static segment
- Create a "static closure" for the compilation unit = loaded
object + GOT + static frame - record as singleton entry mapping
compilation
units to closures (hence "static")
- Changes to funcall
- invoking a function from a compilation unit would require setting
the GOT, STATIC and setting the ENV register to point to STATIC as the
first rib (directly or indirectly)
- invoking a closure with a "code" element pointing to an "exported"
symbol from a compilation unit + an environment pointer
- Set GOT and STATIC according to the byte-vector's static closure
- Dispatch according to whether compilation unit is native or
byte-compiled, but both have the above elements
- Changes to byte-compiler
- Correct the issues with compile-time evaluation + lexical scope of
function names above
- Emit additional sections in byte-code
- Should be able to implement the output of native-compiler pass
(pre-libgccjit) with "-O3" flags in byte-code correctly
Lynn
[-- Attachment #2: Type: text/html, Size: 19242 bytes --]
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-12 17:38 ` Lynn Winebarger
@ 2022-06-12 18:47 ` Stefan Monnier
2022-06-13 16:33 ` Lynn Winebarger
0 siblings, 1 reply; 46+ messages in thread
From: Stefan Monnier @ 2022-06-12 18:47 UTC (permalink / raw)
To: Lynn Winebarger; +Cc: Andrea Corallo, emacs-devel
>> >> In which sense would it be different from:
>> >>
>> >> (cl-flet
>> >> ...
>> >> (defun ...)
>> >> (defun ...)
>> >> ...)
>> >>
>> >>
>> > Good point - it's my scheme background confusing me. I was thinking defun
>> > would operate with similar scoping rules as defvar and establish a local
>> > binding, where fset (like setq) would not create any new bindings.
>>
>> I was not talking about performance but about semantics (under the
>> assumption that if the semantics is the same then it should be possible
>> to get the same performance somehow).
>
> I'm trying to determine if there's a set of expressions for which it
> is semantically sound to perform the intraprocedural optimizations
The cl-flet above is such an example, AFAIK. Or maybe I don't
understand what you mean.
> I'm trying to capture a function as a first class value.
Functions are first class values and they can be trivially captured via
things like (setq foo (lambda ...)), (defalias 'foo (lambda ...)) and
a lot more, so I there's some additional constraint you're expecting but
I don't know what that is.
> This was not expected with lexical scope.
You explicitly write `(require 'cl-lib)` but I don't see any
-*- lexical-binding:t -*-
anywhere, so I suspect you forgot to add those cookies that are needed
to get proper lexical scoping.
> With the current byte-codes, there's just no way to express a call to
> an offset in the current byte-vector.
Indeed, but you can call a byte-code object instead.
Stefan
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-12 18:47 ` Stefan Monnier
@ 2022-06-13 16:33 ` Lynn Winebarger
2022-06-13 17:15 ` Stefan Monnier
0 siblings, 1 reply; 46+ messages in thread
From: Lynn Winebarger @ 2022-06-13 16:33 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Andrea Corallo, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 6284 bytes --]
On Sun, Jun 12, 2022 at 2:47 PM Stefan Monnier <monnier@iro.umontreal.ca>
wrote:
> >> >> In which sense would it be different from:
> >> >>
> >> >> (cl-flet
> >> >> ...
> >> >> (defun ...)
> >> >> (defun ...)
> >> >> ...)
> >> >>
> > I'm trying to determine if there's a set of expressions for which it
> > is semantically sound to perform the intraprocedural optimizations
>
> The cl-flet above is such an example, AFAIK. Or maybe I don't
> understand what you mean.
>
To be clear, I'm trying to first understand what Andrea means by "safe".
I'm assuming it
means the result agrees with whatever the byte compiler and VM would
produce for the
same code. I doubt I'm bringing up topics or ideas that are new to you.
But if I do make
use of semantic/wisent, I'd like to know the result can be fast (modulo
garbage collection, anyway).
I've been operating under the assumption that
- Compiled code objects should be first class in the sense that they can
be serialized
just by using print and read. That seems to have been important
historically, and
was true for byte-code vectors for dynamically scoped functions. It's
still true for
byte-code vectors of top-level functions, but is not true for byte-code
vectors for
closures (and hasn't been for at least a decade, apparently).
- It's still worthwhile to have a class of code objects that are
immutable in the VM
semantics, but now because there are compiler passes implemented that can
make use of that as an invariant
- cl-flet doesn't allow mutual recursion, and there is no shared state
above,
so there's nothing to optimize intraprocedurally.
- cl-labels is implemented with closures, so (as I understand it) the
native
compiler would not be able to produce code if you asked it to compile
the closure returned by a form like (cl-labels ((f ..) (g...) ...) f)
I also mistakenly thought byte-code-vectors of the sort saved in ".elc"
files would not
be able to represent closures without being consed, as the components (at
least
the first 4) are nominally constant. But I see that closures are being
implemented
by calling an ordinary function that side-effects the "constants" vector.
That's unfortunate
because it means the optimizer cannot assume byte-vectors are constants
that can be
freely propagated. OTOH, prior to commit
https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=d0c47652e527397cae96444c881bf60455c763c1
it looks like the closures were constructed at compile time rather than by
side-effect,
which would mean the VM would be expected to treat them as immutable, at
least.
Wedging closures into the byte-code format that works for dynamic scoping
could be made to work with shared structures, but you'd need to modify
print to always capture shared structure (at least for byte-code vectors),
not just when there's a cycle. The approach that's been implemented only
works at run-time when there's shared state between closures, at least as
far
asI can tell.
However, it's a hack that will never really correspond closely to the
semantics
of shared objects with explicit tracking and load-time linking of
compile-time
symbols, because the relocations are already performed and there's no way to
back out where they occured from the value itself. If a goal is to have a
semantics in which you can
1. unambiguously specify that at load/run time a function or variable
name
is resolved in the compile time environment provided by a separate
compilation unit as an immutable constant at run-time
2. serialize compiled closures as compilation units that provide a
well-defined
compile-time environment for linking
3. reduce the headaches of the compiler writer by making it easy to
produce code that is eligible for their optimizations
Then I think the current approach is suboptimal. The current byte-code
representation
is analogous to the a.out format. Because the .elc files run code on load
you can
put an arbitrary amount of infrastructure in there to support an
implementation
of compilation units with exported compile-time symbols, but it puts a lot
more
burden on the compiler and linker/loader writers than just being explicit
would.
And I'm not sure what the payoff is. When there wasn't a native compiler
(and
associated optimization passes), I suppose there was no pressing reason
to upend backward compatibility. Then again, I've never been responsible
for maintaining a 3-4 decade old application with I don't have any idea how
large an installed user base ranging in size from chips running "smart"
electric
switches to (I assume) the biggest of "big iron", whatever that means these
days.
> > I'm trying to capture a function as a first class value.
>
> Functions are first class values and they can be trivially captured via
> things like (setq foo (lambda ...)), (defalias 'foo (lambda ...)) and
> a lot more, so I there's some additional constraint you're expecting but
> I don't know what that is.
>
Yes, I thought byte-code would be treated as constant. I still think it
makes a lot of sense
to make it so.
>
> > This was not expected with lexical scope.
>
> You explicitly write `(require 'cl-lib)` but I don't see any
>
> -*- lexical-binding:t -*-
>
> anywhere, so I suspect you forgot to add those cookies that are needed
> to get proper lexical scoping.
>
> Ok, wow, I really misread the NEWS for 28.1 where it said
The 'lexical-binding' local variable is always enabled.
As meaning "always set". My fault.
> With the current byte-codes, there's just no way to express a call to
> > an offset in the current byte-vector.
>
> Indeed, but you can call a byte-code object instead.
>
> Creating the byte code with shared structure was what I meant by one of
the solutions being to
"patch compile-time constants" at load, i.e. perform the relocations
directly. The current
implementation effectively inlines copies of the constants (byte-code
objects), which is fine for shared code but not
for shared variables. That is, the values that are assigned to
my-global-oddp and my-global-evenp (for test2 after
correcting the lexical-binding setting) do not reference each other. Each
is created with an independent copy of
the other.
to
[-- Attachment #2: Type: text/html, Size: 8574 bytes --]
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-13 16:33 ` Lynn Winebarger
@ 2022-06-13 17:15 ` Stefan Monnier
2022-06-15 3:03 ` Lynn Winebarger
0 siblings, 1 reply; 46+ messages in thread
From: Stefan Monnier @ 2022-06-13 17:15 UTC (permalink / raw)
To: Lynn Winebarger; +Cc: Andrea Corallo, emacs-devel
> To be clear, I'm trying to first understand what Andrea means by "safe".
> I'm assuming it means the result agrees with whatever the byte
> compiler and VM would produce for the same code.
Not directly. It means that it agrees with the intended semantics.
That semantics is sometimes accidentally defined by the actual
implementation in the Lisp interpreter or the bytecode compiler, but
that's secondary.
The semantic issue is that if you call
(foo bar baz)
it normally (when `foo` is a global function) means you're calling the
function contained in the `symbol-function` of the `foo` symbol *at the
time of the function call*. So compiling this to jump directly to the
code that happens to be contained there during compilation (or the code
which the compiler expects to be there at that point) is unsafe in
the sense that you don't know whether that symbol's `symbol-function`
will really have that value when we get to executing that function call.
The use of `cl-flet` (or `cl-labels`) circumvents this problem since the
call to `foo` is now to a lexically-scoped function `foo`, so the
compiler knows that the code that is called is always that same one
(there is no way to modify it between the compilation time and the
runtime).
> I doubt I'm bringing up topics or ideas that are new to you. But if
> I do make use of semantic/wisent, I'd like to know the result can be
> fast (modulo garbage collection, anyway).
It's also "modulo enough work on the compiler (and potentially some
primitive functions) to make the code fast".
> I've been operating under the assumption that
>
> - Compiled code objects should be first class in the sense that
> they can be serialized just by using print and read. That seems to
> have been important historically, and was true for byte-code
> vectors for dynamically scoped functions. It's still true for
> byte-code vectors of top-level functions, but is not true for
> byte-code vectors for closures (and hasn't been for at least
> a decade, apparently).
It's also true for byte-compiled closures, although, inevitably, this
holds only for closures that capture only serializable values.
> But I see that closures are being implemented by calling an ordinary
> function that side-effects the "constants" vector.
I don't think that's the case. Where do you see that?
The constants vector is implemented as a normal vector, so strictly
speaking it is mutable, but the compiler will never generate code that
mutates it, AFAIK, so you'd have to write ad-hoc code that digs inside
a byte-code closure and mutates the constants vector for that to happen
(and I don't know of such code out in the wild).
> OTOH, prior to commit
> https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=d0c47652e527397cae96444c881bf60455c763c1
> it looks like the closures were constructed at compile time rather than by
> side-effect,
No, this commit only changes the *way* they're constructed but not the
when and both the before and the after result in constant vectors which
are not side-effected (every byte-code closure gets its own fresh
constants-vector).
> Wedging closures into the byte-code format that works for dynamic scoping
> could be made to work with shared structures, but you'd need to modify
> print to always capture shared structure (at least for byte-code vectors),
> not just when there's a cycle.
It already does.
> The approach that's been implemented only works at run-time when
> there's shared state between closures, at least as far asI can tell.
There can be problems if two *toplevel* definitions are serialized and
they share common objects, indeed. The byte-compiler may fail to
preserve the shared structure in that case, IIRC. I have some vague
recollection of someone bumping into that limitation at some point, but
it should be easy to circumvent.
> Then I think the current approach is suboptimal. The current
> byte-code representation is analogous to the a.out format.
> Because the .elc files run code on load you can put an arbitrary
> amount of infrastructure in there to support an implementation of
> compilation units with exported compile-time symbols, but it puts
> a lot more burden on the compiler and linker/loader writers than just
> being explicit would.
I think the practical performance issues with ELisp code are very far
removed from these problems. Maybe some day we'll have to face them,
but we still have a long way to go.
>> You explicitly write `(require 'cl-lib)` but I don't see any
>>
>> -*- lexical-binding:t -*-
>>
>> anywhere, so I suspect you forgot to add those cookies that are needed
>> to get proper lexical scoping.
>> Ok, wow, I really misread the NEWS for 28.1 where it said
> The 'lexical-binding' local variable is always enabled.
Are you sure? How do you do that?
Some of the errors you showed seem to point very squarely towards the
code being compiled as dyn-bound ELisp.
Stefan
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-05 14:20 ` Stefan Monnier
2022-06-06 4:12 ` Lynn Winebarger
@ 2022-06-14 4:19 ` Lynn Winebarger
2022-06-14 12:23 ` Stefan Monnier
1 sibling, 1 reply; 46+ messages in thread
From: Lynn Winebarger @ 2022-06-14 4:19 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Andrea Corallo, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1738 bytes --]
On Sun, Jun 5, 2022 at 10:20 AM Stefan Monnier <monnier@iro.umontreal.ca>
wrote:
>
> >> Also, what kind of startup time are you talking about?
> >> E.g., are you using `package-quickstart`?
> > That was the first alternative I tried. With 1250 packages, it did not
> > work.
>
> Please `M-x report-emacs-bug` (and put me in `X-Debbugs-Cc`).
>
>
I was able to reproduce this at home on cygwin with 940 packages. I had
tried to install a few more than that (maybe 945 or something), I just
removed the corresponding sections of the last few until it did compile.
Then I verified that it didn't matter whether I removed the first or the
last
package autoloads, I would get the overflow regardless.
After spending the weekend going over byte code examples, I looked
at the output, and it's literally just hitting 64k instructions. Each
package
uses 17 instructions just putting itself on the loadpath, which accounts
for ~15000
instructions. That means every package uses about 50 instructions on
average,
so (if that's representative), you wouldn't expect to be able to do much
more than 300
or so additional packages just from putting those paths in an array and
looping over them.
Most of the forms are just calls to a handful of operators with constant
arguments,
so I would assume you could just create arrays for the most common
instruction types, put the
argument lists in a giant vector, and then just loop over those vectors
performing the operator.
Then there'd be a handful of oddball expressions to handle.
Or, you could just create a vector with one thunk for each package and loop
through it
invoking each one. It wouldn't be as space efficient, but it would be
trivially correct.
I'll put this in a bug report.
Lynn
[-- Attachment #2: Type: text/html, Size: 2353 bytes --]
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-14 4:19 ` Lynn Winebarger
@ 2022-06-14 12:23 ` Stefan Monnier
2022-06-14 14:55 ` Lynn Winebarger
0 siblings, 1 reply; 46+ messages in thread
From: Stefan Monnier @ 2022-06-14 12:23 UTC (permalink / raw)
To: Lynn Winebarger; +Cc: Andrea Corallo, emacs-devel
> Or, you could just create a vector with one thunk for each package and
> loop through it invoking each one. It wouldn't be as space
> efficient, but it would be trivially correct.
IIRC the compiler has code to split a bytecode object into two to try
and circumvent the 64k limit and it should definitely be applicable here
(it's more problematic when it's inside a loop), which is why I think
it's a plain bug.
Stefan
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-14 12:23 ` Stefan Monnier
@ 2022-06-14 14:55 ` Lynn Winebarger
0 siblings, 0 replies; 46+ messages in thread
From: Lynn Winebarger @ 2022-06-14 14:55 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Andrea Corallo, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1636 bytes --]
I think you may be remembering an intention to implement that. The issue
came up in 2009 (before the byte-compiler
even caught the error), and there's only the initial patch you committed to
signal the error:
https://git.savannah.gnu.org/cgit/emacs.git/commit/lisp/emacs-lisp/bytecomp.el?id=8476cfaf3dadf04379fde65cd7e24820151f78a9
and one more changing a variable name:
https://git.savannah.gnu.org/cgit/emacs.git/commit/lisp/emacs-lisp/bytecomp.el?id=d9bbf40098801a859f4625c4aa7a8cbe99949705
so lines 954-961 of bytecomp.el still read:
(dolist (bytes-tail patchlist)
(setq pc (caar bytes-tail)) ; Pick PC from goto's tag.
;; Splits PC's value into 2 bytes. The jump address is
;; "reconstructed" by the `FETCH2' macro in `bytecode.c'.
(setcar (cdr bytes-tail) (logand pc 255))
(setcar bytes-tail (ash pc -8))
;; FIXME: Replace this by some workaround.
(or (<= 0 (car bytes-tail) 255) (error "Bytecode overflow")))
I mainly quote this to say: see what I mean about losing starting off as
putting aside temporarily? :-)
I sent in the bug report.
On Tue, Jun 14, 2022 at 8:23 AM Stefan Monnier <monnier@iro.umontreal.ca>
wrote:
> > Or, you could just create a vector with one thunk for each package and
> > loop through it invoking each one. It wouldn't be as space
> > efficient, but it would be trivially correct.
>
> IIRC the compiler has code to split a bytecode object into two to try
> and circumvent the 64k limit and it should definitely be applicable here
> (it's more problematic when it's inside a loop), which is why I think
> it's a plain bug.
>
>
> Stefan
>
>
[-- Attachment #2: Type: text/html, Size: 2528 bytes --]
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-13 17:15 ` Stefan Monnier
@ 2022-06-15 3:03 ` Lynn Winebarger
2022-06-15 12:23 ` Stefan Monnier
0 siblings, 1 reply; 46+ messages in thread
From: Lynn Winebarger @ 2022-06-15 3:03 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Andrea Corallo, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 10140 bytes --]
On Mon, Jun 13, 2022 at 1:15 PM Stefan Monnier <monnier@iro.umontreal.ca>
wrote:
> > To be clear, I'm trying to first understand what Andrea means by "safe".
> > I'm assuming it means the result agrees with whatever the byte
> > compiler and VM would produce for the same code.
>
> Not directly. It means that it agrees with the intended semantics.
> That semantics is sometimes accidentally defined by the actual
> implementation in the Lisp interpreter or the bytecode compiler, but
> that's secondary.
>
What I mean is, there's not really a spec defining the semantics to
judge against. But every emacs has a working byte compiler, and only
some have a native compiler. If the users with the byte compiler get a
different result than the users that have the native compiler, my guess
is that the code would be expected to be rewritten so that it produces the
expected result from the byte compiler (at least until the byte compiler is
revised). To the extent the byte compiler is judged to produce an
incorrect
result, it's probably an area of the language that was not considered
well-defined
enough (or useful enough) to have been used previously. Or it was known
that the byte compiler's semantics weren't very useful for a particular
family
of expressions.
> The semantic issue is that if you call
>
> (foo bar baz)
>
> it normally (when `foo` is a global function) means you're calling the
> function contained in the `symbol-function` of the `foo` symbol *at the
> time of the function call*. So compiling this to jump directly to the
> code that happens to be contained there during compilation (or the code
> which the compiler expects to be there at that point) is unsafe in
> the sense that you don't know whether that symbol's `symbol-function`
> will really have that value when we get to executing that function call.
>
> The use of `cl-flet` (or `cl-labels`) circumvents this problem since the
> call to `foo` is now to a lexically-scoped function `foo`, so the
> compiler knows that the code that is called is always that same one
> (there is no way to modify it between the compilation time and the
> runtime).
>
The fact that cl-flet (and cl-labels) are defined to provide immutable
bindings is really a surprise to me.
However, what I was trying to do originally was figure out if there was
any situation where Andrea's statement (in another reply):
the compiler can't take advantage of interprocedural optimizations (such
> as inline etc) as every function in Lisp can be redefined in every
> moment.
Remember, I was asking whether concatenating a bunch of files
together as a library would have the same meaning as compiling and linking
the object files.
There is one kind of expression where Andrea isn't quite correct, and that
is with respect to (eval-when-compile ...). Those *can* be treated as
constants,
even without actually compiling them first. If I understand the
CL-Hyperspec/Emacs Lisp manual, the following expression:
------------------------------------
(let ()
(eval-when-compile (defvar a (lambda (f) (lambda (x) (f (+ x 5))))))
(eval-when-compile (defvar b (lambda (y) (* y 3))))
(let ((f (eval-when-compile (a b))))
(lambda (z)
(pow z (f 6)))))
------------------------------------
can be rewritten (using a new form "define-eval-time-constant") as
------------------------------------
(eval-when-compile
(define-eval-time-constant ct-r1 (defvar a (lambda (f) (lambda (x) (f (+
x 5))))))
(define-eval-time-constant ct-r2 (defvar b (lambda (y) (* y 3))))
(define-eval-time-constant ct-r3 (a b)))
(let ()
ct-r1
ct-r2
(let ((f ct-r3))
(lambda (z)
(pow z (f 6)))))
------------------------------------
Now the optimizer can treat ct-r1,ct-r2, and ct-r3 as constants for the
purpose of propagation,
*without actually determining their value*. So this could be rewritten as
-------------------------------------------
(eval-when-compile
(define-eval-time-constant ct-r1 (defvar a (lambda (f) (lambda (x) (f (+
x 5))))))
(define-eval-time-constant ct-r2 (defvar b (lambda (y) (* y 3))))
(define-eval-time-constant ct-r3 (a b)))
(let ()
(lambda (z)
(pow z (ct-r3 6))))
------------------------------------------------
If I wanted to "link" files A, B, and C together, with A exporting symbols
a1,..., and b exporting symbols b1,....,
I could do the following:
(eval-when-compile
(eval-when-compile
<text of A>
)
<text of B with a1,...,and replaced by (eval-when-compile a1), ....>
)
<text of C with a1,... replaced by (eval-when-compile (eval-when-compile
a1))... and b1,... replaced by (eval-when-compile b1),...
And now the (eval-when-compile) expressions can be freely propagated within
the code of each file,
as they are constant expressions.
I don't know how the native compiler is handling "eval-when-compile"
expressions now, but this should
give that optimizer pass a class of expressions where "-O3" is in fact safe
to apply.
Then it's just a matter of creating the macros to make producing those
expressions in appropriate contexts
convenient to do in practice.
> I doubt I'm bringing up topics or ideas that are new to you. But if
> > I do make use of semantic/wisent, I'd like to know the result can be
> > fast (modulo garbage collection, anyway).
>
> It's also "modulo enough work on the compiler (and potentially some
> primitive functions) to make the code fast".
>
Absolutely, it just doesn't look to me like a very big lift compared to,
say, what Andrea did.
> > I've been operating under the assumption that
> >
> > - Compiled code objects should be first class in the sense that
> > they can be serialized just by using print and read. That seems to
> > have been important historically, and was true for byte-code
> > vectors for dynamically scoped functions. It's still true for
> > byte-code vectors of top-level functions, but is not true for
> > byte-code vectors for closures (and hasn't been for at least
> > a decade, apparently).
>
> It's also true for byte-compiled closures, although, inevitably, this
> holds only for closures that capture only serializable values.
>
> > But I see that closures are being implemented by calling an ordinary
> > function that side-effects the "constants" vector.
>
> I don't think that's the case. Where do you see that?
>
My misreading, unfortunately.
That does seem like a lot of copying for anyone relying on efficient
closures.
Does this mean the native compiled code can only produce closures in
byte-code
form? Assuming dlopen loads the shared object into read-only memory for
execution.
> > Wedging closures into the byte-code format that works for dynamic scoping
> > could be made to work with shared structures, but you'd need to modify
> > print to always capture shared structure (at least for byte-code
> vectors),
> > not just when there's a cycle.
>
> It already does.
>
> Ok, I must be missing it. I know eval_byte_code *creates* the result shown
below with shared structure (the '(5)], but I don't see anything in the
printed
text to indicate it if read back in.
(defvar z
(byte-compile-sexp
'(let ((lx 5))
(let ((f (lambda () lx))
(g (lambda (ly) (setq lx ly))))
`(,f ,g)))))
(ppcb z)
(byte-code "\300C\301\302 \"\301\303 \" D\207"
[5 make-closure
#[0 "\300\242\207"
[V0]
1]
#[257 "\300 \240\207"
[V0]
3 "\n\n(fn LY)"]]
5)
(defvar zv (eval z))
(ppcb zv)
(#[0 "\300\242\207"
[(5)]
1]
#[257 "\300 \240\207"
[(5)]
3 "\n\n(fn LY)"])
(defvar zvs (prin1-to-string zv))
(ppcb zvs)
"(#[0 \"\\300\\242\\207\" [(5)] 1] #[257 \"\\300 \\240\\207\" [(5)] 3
\"\n\n(fn LY)\"])"
(defvar zz (car (read-from-string zvs)))
(ppcb zz)
(#[0 "\300\242\207"
[(5)]
1]
#[257 "\300 \240\207"
[(5)]
3 "\n\n(fn LY)"])
(let ((f (car zz)) (g (cadr zz)))
(print (eq (aref (aref f 2) 0) (aref (aref g 2) 0)) (current-buffer)))
nil
Of course, those last bindings of f and g were just vectors, not byte-code
vectors, but
the (5) is no longer shared state.
> > Then I think the current approach is suboptimal. The current
> > byte-code representation is analogous to the a.out format.
> > Because the .elc files run code on load you can put an arbitrary
> > amount of infrastructure in there to support an implementation of
> > compilation units with exported compile-time symbols, but it puts
> > a lot more burden on the compiler and linker/loader writers than just
> > being explicit would.
>
> I think the practical performance issues with ELisp code are very far
> removed from these problems. Maybe some day we'll have to face them,
> but we still have a long way to go.
>
I'm sure you're correct in terms of the current code base. But isn't the
history
of these kinds of improvements in compilers for functional languages that
coding styles that had been avoided in the past can be adopted and produce
faster code than the original? In this case, it would be enabling the
pervasive
use of recursion and less reliance on side-effects. Improvements in the gc
wouldn't hurt, either.
> >> You explicitly write `(require 'cl-lib)` but I don't see any
> >>
> >> -*- lexical-binding:t -*-
> >>
> >> anywhere, so I suspect you forgot to add those cookies that are needed
> >> to get proper lexical scoping.
> >> Ok, wow, I really misread the NEWS for 28.1 where it said
> > The 'lexical-binding' local variable is always enabled.
>
> Are you sure? How do you do that?
> Some of the errors you showed seem to point very squarely towards the
> code being compiled as dyn-bound ELisp.
>
> My quoting wasn't very effective. That last line was actually line 2902
of NEWS.28:
"** The 'lexical-binding' local variable is always enabled.
Previously, if 'enable-local-variables' was nil, a 'lexical-binding'
local variable would not be heeded. This has now changed, and a file
with a 'lexical-binding' cookie is always heeded. To revert to the
old behavior, set 'permanently-enabled-local-variables' to nil."
I feel a little less silly about my optimistic misreading of the first
line, at least.
Lynn
[-- Attachment #2: Type: text/html, Size: 15237 bytes --]
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-15 3:03 ` Lynn Winebarger
@ 2022-06-15 12:23 ` Stefan Monnier
2022-06-19 17:52 ` Lynn Winebarger
0 siblings, 1 reply; 46+ messages in thread
From: Stefan Monnier @ 2022-06-15 12:23 UTC (permalink / raw)
To: Lynn Winebarger; +Cc: Andrea Corallo, emacs-devel
> The fact that cl-flet (and cl-labels) are defined to provide immutable
> bindings is really a surprise to me.
Whether they are mutable or not is not directly relevant, tho: the
import part is that being lexically scoped, the compiler gets to see all
the places where it's used and can thus determine that it's
ever mutated.
> There is one kind of expression where Andrea isn't quite correct, and that
> is with respect to (eval-when-compile ...).
You don't need `eval-when-compile`. It's already "not quite correct"
for lambda expressions. What he meant is that the function associated
with a symbol can be changed in every moment. But if you call
a function without going through such a globally-mutable indirection the
problem vanishes.
> Now the optimizer can treat ct-r1,ct-r2, and ct-r3 as constants for the
> purpose of propagation,
Same holds for
(let* ((a (lambda (f) (lambda (x) (f (+ x 5)))))
(b (lambda (y) (* y 3)))
(f (funcall a b)))
(lambda (z)
(pow z (funcall f 6))))
>> It's also "modulo enough work on the compiler (and potentially some
>> primitive functions) to make the code fast".
> Absolutely, it just doesn't look to me like a very big lift compared to,
> say, what Andrea did.
It very depends on the specifics, but it's definitely not obviously true.
ELisp like Python has grown around a "slow language" so its code is
structured in such a way that most of the time the majority of the code
that's executed is actually not ELisp but C, over which the native
compiler has no impact.
> Does this mean the native compiled code can only produce closures in
> byte-code form?
Not directly, no. But currently that's the case, yes.
> below with shared structure (the '(5)], but I don't see anything in
> the printed text to indicate it if read back in.
You need to print with `print-circle` bound to t, like the compiler does
when writing to a `.elc` file.
> I'm sure you're correct in terms of the current code base. But isn't
> the history of these kinds of improvements in compilers for functional
> languages that coding styles that had been avoided in the past can be
> adopted and produce faster code than the original?
Right, but it's usually a slow co-evolution.
> In this case, it would be enabling the pervasive use of recursion and
> less reliance on side-effects.
Not everyone would agree that "pervasive use of recursion" is an improvement.
> Improvements in the gc wouldn't hurt, either.
Actually, nowadays lots of benchmarks are already bumping into the GC as
the main bottleneck.
> ** The 'lexical-binding' local variable is always enabled.
Indeed, that's misleading. Not sure how none of us noticed it before.
Stefan
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-15 12:23 ` Stefan Monnier
@ 2022-06-19 17:52 ` Lynn Winebarger
2022-06-19 23:02 ` Stefan Monnier
0 siblings, 1 reply; 46+ messages in thread
From: Lynn Winebarger @ 2022-06-19 17:52 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Andrea Corallo, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 7169 bytes --]
On Wed, Jun 15, 2022 at 8:23 AM Stefan Monnier <monnier@iro.umontreal.ca>
wrote:
> > There is one kind of expression where Andrea isn't quite correct, and
> that
> > is with respect to (eval-when-compile ...).
>
> You don't need `eval-when-compile`. It's already "not quite correct"
> for lambda expressions. What he meant is that the function associated
> with a symbol can be changed in every moment. But if you call
> a function without going through such a globally-mutable indirection the
> problem vanishes.
>
> I'm not sure what the point here is. If all programs were written with
every variable
and function name lexically bound, then there wouldn't be an issue.
After Andrea's response to my original question, I was curious if the kind
of
semantic object that an ELF shared-object file *is* can be captured
(directly) in the
semantic model of emacs lisp, including the fact that some symbols in ELF
are bound
to truly immutable constants at runtime by the loader. Also, if someone
were to
rewrite some of the primitives now in C in Lisp and rely on the compiler
for their use,
would there be a way to write them with the same semantics they have now
(not
referencing the run-time bindings of other primitives).
Based on what I've observed in this thread, I think the answer is either
yes or almost yes.
The one sticking point is that there is no construct for retaining the
compile-time environment.
If I "link" files by concatenating the source together, it's not an issue,
but I can't replicate
that with the results the byte-compiler currently produces.
What would also be useful is some analogue to Common Lisp's package
construct, but extended
so that symbols could be imported from compile-time environments as
immutable bindings.
Now, that would be a change in the current semantics of symbols,
unquestionably, but
not one that would break the existing code base. It would only come into
play compiling
a file as a library, with semantics along the lines of:
(eval-when-compile
(namespace <name of library obstack>)
<library code> ...
(export <symbol> ...)
)
Currently compiling a top-level expression wrapped in eval-when-compile by
itself leaves
no residue in the compiled output, but I would want to make the above
evaluate
to an object at run-time where the exported symbols in the obstack are
immutable.
Since no existing code uses the above constructs - because they are not
currently defined -
it would only be an extension.
I don't want to restart the namespace debates - I'm not suggesting anything
to do
with the reader parsing symbol names spaces from prefixes in the symbol
name.
> >> It's also "modulo enough work on the compiler (and potentially some
> >> primitive functions) to make the code fast".
> > Absolutely, it just doesn't look to me like a very big lift compared to,
> > say, what Andrea did.
>
> It very depends on the specifics, but it's definitely not obviously true.
> ELisp like Python has grown around a "slow language" so its code is
> structured in such a way that most of the time the majority of the code
> that's executed is actually not ELisp but C, over which the native
> compiler has no impact.
>
> That's why I said "look[s] to me", and inquired here before proceeding.
Having looked more closely, it appears the most obvious safe approach,
that doesn't require any ability to manipulate the C call stack, is to
introduce
another manually managed call stack as is done for the specpdl stack, but
malloced (I haven't reviewed that implementation closely enough to tell if
it
is stack or heap allocated). That does complicate matters.
That part would be for allowing calls to (and returns from) arbitrary
points in
byte-code (or native-code) instruction arrays. This would in turn enable
implementing proper tail recursion as "goto with arguments".
These changes would be one way to address the items in the TODO file for
28.1, starting at line 173:
> * Important features
> ** Speed up Elisp execution [...]
> *** Speed up function calls [..]
> ** Add an "indirect goto" byte-code [...]
> *** Compile efficiently local recursive functions [...]
As for the other elements - introducing additional registers to facilitate
efficient lexical closures and namespaces - it still doesn't look like a
huge lift
to introduce them into the bytecode interpreter, although there is still
the work
to make effective use of them in the output of the compilers.
I have been thinking that some additional reader syntax for what might be
called "meta-evaluation quasiquotation" (better name welcome) could be
useful.
I haven't worked out the details yet, though. I would make #, and #,@
effectively
be shorthand for eval-when-compile. Using #` inside eval-when-compile
should
produce an expression that, after compilation, would provide the meta-quoted
expression with the semantics it would have outside an eval-when-compile
form.
> Does this mean the native compiled code can only produce closures in
> > byte-code form?
>
> Not directly, no. But currently that's the case, yes.
>
> > below with shared structure (the '(5)], but I don't see anything in
> > the printed text to indicate it if read back in.
>
> You need to print with `print-circle` bound to t, like the compiler does
> when writing to a `.elc` file.
>
I feel silly again. I've *used* emacs for years, but have (mostly) avoided
using
emacs lisp for programming because of the default dynamic scoping and the
implications that has for the efficiency of lexical closures.
>
> > I'm sure you're correct in terms of the current code base. But isn't
> > the history of these kinds of improvements in compilers for functional
> > languages that coding styles that had been avoided in the past can be
> > adopted and produce faster code than the original?
>
> Right, but it's usually a slow co-evolution.
>
I don't think I've suggested anything else. I don't think my proposed
changes to the byte-code
VM would change the semantics of emacs LISP, just the semantics of the
byte-code
VM. Which you've already stated do not dictate the semantics of emacs LISP.
> > In this case, it would be enabling the pervasive use of recursion and
> > less reliance on side-effects.
>
> Not everyone would agree that "pervasive use of recursion" is an
> improvement.
>
True, but it's still a lisp - no one is required to write code in any
particular style. It would
be peculiar (these days, anyway) to expect a lisp compiler to optimize
imperative-style code
more effectively than code employing recursion.
> > Improvements in the gc wouldn't hurt, either.
>
> Actually, nowadays lots of benchmarks are already bumping into the GC as
> the main bottleneck.
>
I'm not familiar with emacs's profiling facilities. Is it possible to tell
how much of the
allocated space/time spent in gc is due to the constant vectors of lexical
closures? In particular,
how much of the constant vectors are copied elements independent of the
lexical environment?
That would provide some measure of any gc-related benefit that *might* be
gained from using an
explicit environment register for closures, instead of embedding it in the
byte-code vector.
Lynn
[-- Attachment #2: Type: text/html, Size: 9581 bytes --]
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-19 17:52 ` Lynn Winebarger
@ 2022-06-19 23:02 ` Stefan Monnier
2022-06-20 1:39 ` Lynn Winebarger
0 siblings, 1 reply; 46+ messages in thread
From: Stefan Monnier @ 2022-06-19 23:02 UTC (permalink / raw)
To: Lynn Winebarger; +Cc: Andrea Corallo, emacs-devel
> Currently compiling a top-level expression wrapped in
> eval-when-compile by itself leaves no residue in the compiled output,
`eval-when-compile` has 2 effects:
1- Run the code within the compiler's process.
E.g. (eval-when-compile (require 'cl-lib)).
This is somewhat comparable to loading a gcc plugin during
a compilation: it affects the GCC process itself, rather than the
code it emits.
2- It replaces the (eval-when-compile ...) thingy with the value
returned by the evaluation of this code. So you can do (defvar
my-str (eval-when-compile (concat "foo" "bar"))) and you know that
the concatenation will be done during compilation.
> but I would want to make the above evaluate to an object at run-time
> where the exported symbols in the obstack are immutable.
Then it wouldn't be called `eval-when-compile` because it would do
something quite different from what `eval-when-compile` does :-)
> byte-code (or native-code) instruction arrays. This would in turn enable
> implementing proper tail recursion as "goto with arguments".
Proper tail recursion elimination would require changing the *normal*
function call protocol. I suspect you're thinking of a smaller-scale
version of it specifically tailored to self-recursion, kind of like
what `named-let` provides. Note that such ad-hoc TCO tends to hit the same
semantic issues as the -O3 optimization of the native compiler.
E.g. in code like the following:
(defun vc-foo-register (file)
(when (some-hint-is-true)
(load "vc-foo")
(vc-foo-register file)))
the final call to `vc-foo-register` is in tail position but is not
a self call because loading `vc-foo` is expected to redefine
`vc-foo-register` with the real implementation.
> I'm not familiar with emacs's profiling facilities. Is it possible to
> tell how much of the allocated space/time spent in gc is due to the
> constant vectors of lexical closures? In particular, how much of the
> constant vectors are copied elements independent of the lexical
> environment? That would provide some measure of any gc-related
> benefit that *might* be gained from using an explicit environment
> register for closures, instead of embedding it in the
> byte-code vector.
No, I can't think of any profiling tool we currently have that can help
with that, sorry :-(
Note that when support for native closures is added to the native
compiler, it will hopefully not be using this clunky representation
where capture vars are mixed in with the vector of constants, so that
might be a more promising direction (may be able to skip the step where
we need to change the bytecode).
Stefan
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-19 23:02 ` Stefan Monnier
@ 2022-06-20 1:39 ` Lynn Winebarger
2022-06-20 12:14 ` Lynn Winebarger
` (2 more replies)
0 siblings, 3 replies; 46+ messages in thread
From: Lynn Winebarger @ 2022-06-20 1:39 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Andrea Corallo, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 6773 bytes --]
On Sun, Jun 19, 2022 at 7:02 PM Stefan Monnier <monnier@iro.umontreal.ca>
wrote:
> > Currently compiling a top-level expression wrapped in
> > eval-when-compile by itself leaves no residue in the compiled output,
>
> `eval-when-compile` has 2 effects:
>
> 1- Run the code within the compiler's process.
> E.g. (eval-when-compile (require 'cl-lib)).
> This is somewhat comparable to loading a gcc plugin during
> a compilation: it affects the GCC process itself, rather than the
> code it emits.
>
> 2- It replaces the (eval-when-compile ...) thingy with the value
> returned by the evaluation of this code. So you can do (defvar
> my-str (eval-when-compile (concat "foo" "bar"))) and you know that
> the concatenation will be done during compilation.
>
> > but I would want to make the above evaluate to an object at run-time
> > where the exported symbols in the obstack are immutable.
>
> Then it wouldn't be called `eval-when-compile` because it would do
> something quite different from what `eval-when-compile` does :-)
>
>
The informal semantics of "eval-when-compile" from the elisp info file are
that
This form marks BODY to be evaluated at compile time but not when
the compiled program is loaded. The result of evaluation by the
compiler becomes a constant which appears in the compiled program.
If you load the source file, rather than compiling it, BODY is
evaluated normally.
I'm not sure what I have proposed that would be inconsistent with "the
result of evaluation
by the compiler becomes a constant which appears in the compiled program".
The exact form of that appearance in the compiled program is not specified.
For example, the byte-compile of (eval-when-compile (cl-labels ((f...) (g
...)))
currently produces a byte-code vector in which f and g are byte-code
vectors with
shared structure. However, that representation is only one choice.
It is inconsistent with the semantics of *symbols* as they currently stand,
as I have already admitted.
Even there, you could advance a model where it is not inconsistent. For
example,
if you view the binding of symbol to value as having two components - the
binding and the cell
holding the mutable value during the extent of the symbol as a
global/dynamically scoped variable,
then having the binding of the symbol to the final value of the cell before
the dynamic extent of the variable
terminates would be consistent. That's not how it's currently implemented,
because there is no way to
express the final compile-time environment as a value after compilation has
completed with the
current semantics.
The part that's incompatible with current semantics of symbols is importing
that symbol as
an immutable symbolic reference. Not really a "variable" reference, but as
a binding
of a symbol to a value in the run-time namespace (or package in CL
terminology, although
CL did not allow any way to specify what I'm suggesting either, as far as I
know).
However, that would capture the semantics of ELF shared objects with the
text and ro_data
segments loaded into memory that is in fact immutable for a userspace
program.
> > byte-code (or native-code) instruction arrays. This would in turn enable
> > implementing proper tail recursion as "goto with arguments".
>
> Proper tail recursion elimination would require changing the *normal*
> function call protocol. I suspect you're thinking of a smaller-scale
version of it specifically tailored to self-recursion, kind of like
> what `named-let` provides. Note that such ad-hoc TCO tends to hit the same
> semantic issues as the -O3 optimization of the native compiler.
> E.g. in code like the following:
>
> (defun vc-foo-register (file)
> (when (some-hint-is-true)
> (load "vc-foo")
> (vc-foo-register file)))
>
> the final call to `vc-foo-register` is in tail position but is not
> a self call because loading `vc-foo` is expected to redefine
> `vc-foo-register` with the real implementation.
>
> I'm only talking about the steps that are required to allow the compiler
to
produce code that implements proper tail recursion.
With the abstract machine currently implemented by the byte-code VM,
the "call[n]" instructions will always be needed to call out according to
the C calling conventions.
The call[-absolute/relative] or [goto-absolute] instructions I suggested
*would be* used in the "normal" function-call protocol in place of the
current
funcall dispatch, at least to functions defined in lisp.
This is necessary but not sufficient for proper tail recursion.
To actually get proper tail recursion requires the compiler to use the
instructions
for implementing the appropriate function call protocol, especially if
"goto-absolute" is the instruction provided for changing the PC register.
Other instructions would have to be issued to manage the stack frame
explicitly if that were the route taken. Or, a more CISCish call-absolute
type of instruction could be used that would perform that stack frame
management implicitly.
EIther way, it's the compiler that has to determine whether a return
instruction following a control transfer can be safely eliminated or not.
If the "goto-absolute" instruction were used, the compiler would
have to decide whether the address following the "goto-absolute"
should be pushed in a new frame, or if it can be "pre-emptively
garbage collected" at compile time because it's a tail call.
> > I'm not familiar with emacs's profiling facilities. Is it possible to
> > tell how much of the allocated space/time spent in gc is due to the
> > constant vectors of lexical closures? In particular, how much of the
> > constant vectors are copied elements independent of the lexical
> > environment? That would provide some measure of any gc-related
> > benefit that *might* be gained from using an explicit environment
> > register for closures, instead of embedding it in the
> > byte-code vector.
>
> No, I can't think of any profiling tool we currently have that can help
> with that, sorry :-(
>
> Note that when support for native closures is added to the native
> compiler, it will hopefully not be using this clunky representation
> where capture vars are mixed in with the vector of constants, so that
> might be a more promising direction (may be able to skip the step where
> we need to change the bytecode).
>
>
The trick is to make the implementation of the abstract machine by each of
the
compilers have enough in common to support calling one from the other.
The extensions I've suggested for the byte-code VM and lisp semantics
are intended to support that interoperation, so the semantics of the
byte-code
implementation won't unnecessarily constrain the semantics of the
native-code
implementation.
Lynn
[-- Attachment #2: Type: text/html, Size: 8690 bytes --]
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-20 1:39 ` Lynn Winebarger
@ 2022-06-20 12:14 ` Lynn Winebarger
2022-06-20 12:34 ` Lynn Winebarger
2022-06-25 18:12 ` Lynn Winebarger
2 siblings, 0 replies; 46+ messages in thread
From: Lynn Winebarger @ 2022-06-20 12:14 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Andrea Corallo, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1463 bytes --]
On Sun, Jun 19, 2022, 9:39 PM Lynn Winebarger <owinebar@gmail.com> wrote:
> The part that's incompatible with current semantics of symbols is
> importing that symbol as
> an immutable symbolic reference. Not really a "variable" reference, but
> as a binding
> of a symbol to a value in the run-time namespace (or package in CL
> terminology, although
> CL did not allow any way to specify what I'm suggesting either, as far as
> I know).
>
An alternative would be to extend the semantics of symbols with two
additional immutable bindings - one for constant values and another for
constant functions. These would be shadowed by the mutable bindings during
evaluation, then (if unset) be bound to the final value assigned to the
mutable bindings when the namespace is finalized. Then, when a symbol is
imported from a compile time environment, the import would be to the
constant (value or function) binding, which could be shadowed by an
evaluation-time variable/function.
That should qualify as a consistent extension of the current semantics
rather than a modification. It would be a lisp-4 instead of a lisp-2.
Personally, I'd also like to have a way to define a global variable that
does not modify the lexical scoping of let for that variable. Say,
"defstatic" - corresponding to a variable with static global storage. I
kind of hate that the semantics of "let" (or lambda parameters) are
determined by the global state at evaluation time.
Lynn
>
[-- Attachment #2: Type: text/html, Size: 2298 bytes --]
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-20 1:39 ` Lynn Winebarger
2022-06-20 12:14 ` Lynn Winebarger
@ 2022-06-20 12:34 ` Lynn Winebarger
2022-06-25 18:12 ` Lynn Winebarger
2 siblings, 0 replies; 46+ messages in thread
From: Lynn Winebarger @ 2022-06-20 12:34 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Andrea Corallo, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 3323 bytes --]
On Sun, Jun 19, 2022 at 9:39 PM Lynn Winebarger <owinebar@gmail.com> wrote:
> On Sun, Jun 19, 2022 at 7:02 PM Stefan Monnier <monnier@iro.umontreal.ca>
> wrote:
>
>>
>> Proper tail recursion elimination would require changing the *normal*
>> function call protocol. I suspect you're thinking of a smaller-scale
>
> version of it specifically tailored to self-recursion, kind of like
>> what `named-let` provides. Note that such ad-hoc TCO tends to hit the
>> same
>> semantic issues as the -O3 optimization of the native compiler.
>> E.g. in code like the following:
>>
>> (defun vc-foo-register (file)
>> (when (some-hint-is-true)
>> (load "vc-foo")
>> (vc-foo-register file)))
>>
>> the final call to `vc-foo-register` is in tail position but is not
>> a self call because loading `vc-foo` is expected to redefine
>> `vc-foo-register` with the real implementation.
>>
>> I'm only talking about the steps that are required to allow the compiler
> to
> produce code that implements proper tail recursion.
> With the abstract machine currently implemented by the byte-code VM,
> the "call[n]" instructions will always be needed to call out according to
> the C calling conventions.
> The call[-absolute/relative] or [goto-absolute] instructions I suggested
> *would be* used in the "normal" function-call protocol in place of the
> current
> funcall dispatch, at least to functions defined in lisp.
> This is necessary but not sufficient for proper tail recursion.
> To actually get proper tail recursion requires the compiler to use the
> instructions
> for implementing the appropriate function call protocol, especially if
> "goto-absolute" is the instruction provided for changing the PC register.
> Other instructions would have to be issued to manage the stack frame
> explicitly if that were the route taken. Or, a more CISCish call-absolute
> type of instruction could be used that would perform that stack frame
> management implicitly.
> EIther way, it's the compiler that has to determine whether a return
> instruction following a control transfer can be safely eliminated or not.
> If the "goto-absolute" instruction were used, the compiler would
> have to decide whether the address following the "goto-absolute"
> should be pushed in a new frame, or if it can be "pre-emptively
> garbage collected" at compile time because it's a tail call.
>
>
For the record, my point of reference for a classic implementation of
efficient
lexical closures and proper tail recursion is Clinger's TwoBit compiler for
Larceny Scheme, and the associated "MacScheme" abstract machine:
https://www.larcenists.org/twobit.html. That system is implemented
in several variants. Each has a well-defined mapping of the state of the
MacScheme machine state to the actual machine state for compiled code.
That system does not have the constraint of having a byte-code interpreter
and native-code implementation co-existing, but if they do coexist and
are expected to be able to call each other with the "normal" (lisp, not C)
calling conventions, defining the abstract machine state that has to be
maintained between calls would be a key step.
If calling between byte-code and native-code is expected to have the same
overhead as calling between lisp and C, then I suppose that's not necessary.
Lynn
[-- Attachment #2: Type: text/html, Size: 4713 bytes --]
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-20 1:39 ` Lynn Winebarger
2022-06-20 12:14 ` Lynn Winebarger
2022-06-20 12:34 ` Lynn Winebarger
@ 2022-06-25 18:12 ` Lynn Winebarger
2022-06-26 14:14 ` Lynn Winebarger
2 siblings, 1 reply; 46+ messages in thread
From: Lynn Winebarger @ 2022-06-25 18:12 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Andrea Corallo, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 4052 bytes --]
On Sun, Jun 19, 2022, 9:39 PM Lynn Winebarger <owinebar@gmail.com> wrote:
>
>
> On Sun, Jun 19, 2022 at 7:02 PM Stefan Monnier <monnier@iro.umontreal.ca>
> wrote:
>
>> > Currently compiling a top-level expression wrapped in
>> > eval-when-compile by itself leaves no residue in the compiled output,
>>
>> `eval-when-compile` has 2 effects:
>>
>> 1- Run the code within the compiler's process.
>> E.g. (eval-when-compile (require 'cl-lib)).
>> This is somewhat comparable to loading a gcc plugin during
>> a compilation: it affects the GCC process itself, rather than the
>> code it emits.
>>
>> 2- It replaces the (eval-when-compile ...) thingy with the value
>> returned by the evaluation of this code. So you can do (defvar
>> my-str (eval-when-compile (concat "foo" "bar"))) and you know that
>> the concatenation will be done during compilation.
>>
>> > but I would want to make the above evaluate to an object at run-time
>> > where the exported symbols in the obstack are immutable.
>>
>> Then it wouldn't be called `eval-when-compile` because it would do
>> something quite different from what `eval-when-compile` does :-)
>>
>>
> The informal semantics of "eval-when-compile" from the elisp info file are
> that
> This form marks BODY to be evaluated at compile time but not when
> the compiled program is loaded. The result of evaluation by the
> compiler becomes a constant which appears in the compiled program.
> If you load the source file, rather than compiling it, BODY is
> evaluated normally.
> I'm not sure what I have proposed that would be inconsistent with "the
> result of evaluation
> by the compiler becomes a constant which appears in the compiled program".
> The exact form of that appearance in the compiled program is not specified.
> For example, the byte-compile of (eval-when-compile (cl-labels ((f...) (g
> ...)))
> currently produces a byte-code vector in which f and g are byte-code
> vectors with
> shared structure. However, that representation is only one choice.
>
> It is inconsistent with the semantics of *symbols* as they currently
> stand, as I have already admitted.
> Even there, you could advance a model where it is not inconsistent. For
> example,
> if you view the binding of symbol to value as having two components - the
> binding and the cell
> holding the mutable value during the extent of the symbol as a
> global/dynamically scoped variable,
> then having the binding of the symbol to the final value of the cell
> before the dynamic extent of the variable
> terminates would be consistent. That's not how it's currently
> implemented, because there is no way to
> express the final compile-time environment as a value after compilation
> has completed with the
> current semantics.
>
> The part that's incompatible with current semantics of symbols is
> importing that symbol as
> an immutable symbolic reference. Not really a "variable" reference, but
> as a binding
> of a symbol to a value in the run-time namespace (or package in CL
> terminology, although
> CL did not allow any way to specify what I'm suggesting either, as far as
> I know).
>
> However, that would capture the semantics of ELF shared objects with the
> text and ro_data
> segments loaded into memory that is in fact immutable for a userspace
> program.
>
It looks to me like the portable dump code/format could be adapted to serve
the purpose I have in mind here. What needs to be added is a way to limit
the scope of the dump so only the appropriate set of objects are captured.
There would probably also need to be a separate load-path for these
libraries similar to the approach employed for native compiled files.
It could be neat if all LISP code and constants eventually lived in some
larger associated compilation units (scope-limited pdmp file), to have a
residual dump at any time of the remaining live objects, most corresponding
to the space of global/dynamic variables. That could in turn be used for
local debugging or in actual bug reporting.
Lynn
[-- Attachment #2: Type: text/html, Size: 5245 bytes --]
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units
2022-06-25 18:12 ` Lynn Winebarger
@ 2022-06-26 14:14 ` Lynn Winebarger
0 siblings, 0 replies; 46+ messages in thread
From: Lynn Winebarger @ 2022-06-26 14:14 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Andrea Corallo, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 8479 bytes --]
On Sat, Jun 25, 2022, 2:12 PM Lynn Winebarger <owinebar@gmail.com> wrote:
> The part that's incompatible with current semantics of symbols is
>> importing that symbol as
>> an immutable symbolic reference. Not really a "variable" reference, but
>> as a binding
>> of a symbol to a value in the run-time namespace (or package in CL
>> terminology, although
>> CL did not allow any way to specify what I'm suggesting either, as far as
>> I know).
>>
>> However, that would capture the semantics of ELF shared objects with the
>> text and ro_data
>> segments loaded into memory that is in fact immutable for a userspace
>> program.
>>
>
> It looks to me like the portable dump code/format could be adapted to
> serve the purpose I have in mind here. What needs to be added is a way to
> limit the scope of the dump so only the appropriate set of objects are
> captured.
>
I'm going to start with a copy of pdumper.c and pdumper.h renamed to
ndumper (n for namespace). The pdmp format conceptually organizes the
emacs executable space into a graph with three nodes - an "Emacs
executable" node (or the temacs text and ro sections), "Emacs static"
(sections of the executable loaded into writeable memory), and a "dump"
node, corresponding to heap-allocated objects that were live at the time of
the dump. The dump node has relocations that can point into itself or to
the emacs executable, and "discardable" relocations for values instantiated
into the "Emacs static". While the data structure doesn't require it, the
only values saved from the Emacs static data are symbols, primitive subrs
(not native compiled), and the thread structure for the main thread.
There can be cycles between these nodes in the memory graph, but cutting
the edge[s] between the emacs executable and the Emacs static nodes yields
a DAG.
Note, pdumper does not make the partition I'm describing explicitly. I'm
inferring that there must be such a partition. The discardable relocations
should be ones that instantiate into static data of the temacs executable.
My plan is to refine the structure of the Emacs process introduced by
pdumper to yield a namespace graph structure with the same property -
cutting the edge from executable to runtime state yields a DAG whose only
root is the emacs executable.
Each ndmp namespace (or module or cl-package) would have its own symbol
table and a unique namespace identifier, with a runtime mapping to the file
backing it (if loaded from a file).
Interned symbols will be extended with three additional properties: static
value, constant value and constant function. For variables, scope
resolution will be done at compile time:
* Value if not void (undefined), else
* Static value
A constant symbol is referenced by importing a constant symbol, either from
another namespace or a variable in the current namespace's compile-time
environment. The attempt at run-time to rebind a symbol bound by an import
form will signal an error. Multiple imports binding a particular symbol at
run-time will effectively cause the shadowing of an earlier binding by the
later binding. Any sequence of imports and other forms that would result
in the ambiguity of the resolution of a particular variable at compile time
will signal an error. That is, a given symbol will have only one
associated binding in the namespace scope during a particular evaluation
time (eval, compile, compile-compile, etc)
A static value binding will be global but not dynamic. A constant value
binding will result from an export form in an eval-when-compile form
encountered while compiling the source of the ndmp module. Since static
bindings capture the "global" aspect of the current semantics of special
variable bindings, dynamic scope can be safely restricted to provide
thread-local semantics. Instantiation of a compiled ndmp object will
initialize the bindings to be consistent with the current semantics of
defvar and setq in global scope, as well as the separation of compile-time
and eval-time variable bindings. [I am not certain what the exact approach
will be to ensure that will be]. Note constant bindings are only created
by "importing" from the compile-time environment through eval-when-compile
under the current semantics model. This approach simply avoids the beta
substitution of compile-time variable references performed in the current
implementation of eval-when-compile semantics. Macro expansion is still
available to insert such values directly in forms from the compile-time
environment.
A function symbol will resolve to the function property if not void, and
the constant function property otherwise.
Each ndmp module will explicitly identify the symbols it exports, and those
it imports. The storage of variable bindings for unexported symbols will
not be directly referenceable from any other namespace. Constant bindings
may be enforced by loading into a read-only page of memory, a write barrier
implemented by the system, or unenforced. In other words, attempting to set
a constant binding is an error with unspecified effect. Additional
declarations may be provided to require the signaling of an error, the
enforcement of constancy (without an error), both, or neither. The storage
of static and constant variables may or may not be incorporated directly in
the symbol object. For example, such storage may be allocated using
separate hash tables for static and constant symbol tables to reduce the
allocation of space for variables without a static or constant binding.
When compiling a form that imports a symbol from an ndmp module, importing
in an eval-when-compile context will resolve to the constant value binding
of the symbol, as though the source forms were concatenated during
compilation to have a single compile time environment. Otherwise, the
resolution will proceed as described above.
There will be a distinguished ndmp object that contains relocations
instantiated into the Emacs static nodes, serving the baseline function of
pdmp. There will also be a distinguished ndmp object "ELISP" that exports
all the primitives of Emacs lisp. The symbols of this namespace will be
implicitly imported into every ndmp unless overridden by a special form to
be specified. In this way, a namespace may use an alternative lisp
semantic model, eg CL. Additonal forms for importing symbols from other
namespaces remain to be specified.
Ideally the byte code vm would be able to treat an ndmp object as an
extended byte code vector, but the restriction of the byte-codes to 16-bit
addressing is problematic.
For 64-bit machines, the ndmp format will restrict the (stored) addresses
to 32 bits, and use the remaining bits of relocs not already used for
administrative purposes as an index into a vector of imported namespaces in
the ndmp file itself, where the 0 value corresponds to an "un-interned"
namespace that is not backed by a (permanent) file. I don't know what the
split should be in 32-bit systems (without the wide-int option). The
interpretation of the bits is specific to file-backed compiled namespaces,
so it may restrict the number of namespace imports in a compiled object
without restricting the number of namespaces imported in the runtime
namespace.
Once implemented, this functionality should significantly reduce the need
for a monolithic dump or "redumping" functionality. Or rather, "dumping"
will be done incrementally.
My ultimate goal is to introduce a clean way to express a compiled object
that has multiple code labels, and a mechanism to call or jump to them
directly, so that the expressible control-flow structure of native and byte
compiled code will be equivalent (I believe the technical term is that
there will be a bisimulation between their operational semantics, but it's
been a while). An initial version might move in this direction by encoding
the namespaces using a byte-code vector to trampoline
to the code-entry points, but this would not provide a bisimulation.
Eventually, the byte-code VM and compiler will have to be modified to make
full use of ndmp objects as primary semantic objects without intermediation
through byte-code vectors as currently implemented.
If there's an error in my interpretation of current implementation
(particular pdumper), I'd be happy to find out about it now.
As a practical matter, I've been working with the 28.1 source. Am I better
off continuing with that, or starting from a more recent commit to the main
branch?
Lynn
[-- Attachment #2: Type: text/html, Size: 10355 bytes --]
^ permalink raw reply [flat|nested] 46+ messages in thread
end of thread, other threads:[~2022-06-26 14:14 UTC | newest]
Thread overview: 46+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-05-31 1:02 native compilation units Lynn Winebarger
2022-06-01 13:50 ` Andrea Corallo
2022-06-03 14:17 ` Lynn Winebarger
2022-06-03 16:05 ` Eli Zaretskii
[not found] ` <CAM=F=bDxxyHurxM_xdbb7XJtP8rdK16Cwp30ti52Ox4nv19J_w@mail.gmail.com>
2022-06-04 5:57 ` Eli Zaretskii
2022-06-05 13:53 ` Lynn Winebarger
2022-06-03 18:15 ` Stefan Monnier
2022-06-04 2:43 ` Lynn Winebarger
2022-06-04 14:32 ` Stefan Monnier
2022-06-05 12:16 ` Lynn Winebarger
2022-06-05 14:08 ` Lynn Winebarger
2022-06-05 14:46 ` Stefan Monnier
2022-06-05 14:20 ` Stefan Monnier
2022-06-06 4:12 ` Lynn Winebarger
2022-06-06 6:12 ` Stefan Monnier
2022-06-06 10:39 ` Eli Zaretskii
2022-06-06 16:23 ` Lynn Winebarger
2022-06-06 16:58 ` Eli Zaretskii
2022-06-07 2:14 ` Lynn Winebarger
2022-06-07 10:53 ` Eli Zaretskii
2022-06-06 16:13 ` Lynn Winebarger
2022-06-07 2:39 ` Lynn Winebarger
2022-06-07 11:50 ` Stefan Monnier
2022-06-07 13:11 ` Eli Zaretskii
2022-06-14 4:19 ` Lynn Winebarger
2022-06-14 12:23 ` Stefan Monnier
2022-06-14 14:55 ` Lynn Winebarger
2022-06-08 6:56 ` Andrea Corallo
2022-06-11 16:13 ` Lynn Winebarger
2022-06-11 16:37 ` Stefan Monnier
2022-06-11 17:49 ` Lynn Winebarger
2022-06-11 20:34 ` Stefan Monnier
2022-06-12 17:38 ` Lynn Winebarger
2022-06-12 18:47 ` Stefan Monnier
2022-06-13 16:33 ` Lynn Winebarger
2022-06-13 17:15 ` Stefan Monnier
2022-06-15 3:03 ` Lynn Winebarger
2022-06-15 12:23 ` Stefan Monnier
2022-06-19 17:52 ` Lynn Winebarger
2022-06-19 23:02 ` Stefan Monnier
2022-06-20 1:39 ` Lynn Winebarger
2022-06-20 12:14 ` Lynn Winebarger
2022-06-20 12:34 ` Lynn Winebarger
2022-06-25 18:12 ` Lynn Winebarger
2022-06-26 14:14 ` Lynn Winebarger
2022-06-08 6:46 ` Andrea Corallo
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).