On Sun, Jun 5, 2022, 10:20 AM Stefan Monnier <monnier@iro.umontreal.ca> wrote:

>> >> [ But that doesn't mean we shouldn't try to compile several ELisp files
>> >> into a single ELN file, especially since the size of ELN files seems
>> >> to be proportionally larger for small ELisp files than for large
>> >> ones. ]
>> >

Not sure if these general statistics are of much use, but of 4324 source files successfully compiled (1557 from the lisp directory), with a total size of 318MB, including 13 trampolines,

The smallest 450 are 17632 bytes or less, with the trampolines at 16744 bytes, total of 7.4M

The smallest 1000 are under 25700 bytes, totaling 20M

The smallest 2000 are under 38592 bytes, totaling 48M

The smallest 3000 are under 62832 bytes, totaling 95M

The smallest 4000 are under 188440 bytes, totaling 194M

There are only 58 over 500k in size, and only 13 over 1M (max is 3.1M)

Those last 58 total about 52M in size.

I am curious as to why the system doesn't just produce trampolines for all the system calls AOT in a single module.

`load-path` is used for native-compiled files, yes. But it's used
in exactly the same way (and should hence cost the same) for:
- No native compilation
- AOT native compilation
- lazy native compilation
Which is what I meant by "unrelated to native compilation".

True, but it does lead to a little more disappointment when that 2.5-5x speedup is dominated by the load-path length while starting up.

> Although I do wonder if there is some optimization for ELN files in the
> system directory as opposed to the user's cache. I have one build where I
> native compiled (but not byte compiled) all the el files in the lisp
> directory,

IIUC current code only loads an ELN file if there is a corresponding ELC
file, so natively compiling a file without also byte-compiling it is
definitely not part of the expected situation. Buyer beware.

That would explain the behavior I've seen. If that's the case, shouldn't batch-native-compile produce the byte-compiled file if it doesn't exist?

I'm not following you. Are you talking about compiling third-party
packages during the compilation of Emacs itself by placing them into
a `site-lisp` subdirectory inside Emacs's own source code tree, and then
moving the resulting `.el` and `.elc` files to the `../NN.MM/site-lisp`
subdirectory in Emacs's installation target directory?

That's the way I'm doing it. Compatibility of these packages with Emacs versions varies too much for me to want to treat them as version-independent. I got burned in an early attempt where I didn't set the prefix, and emacs kept adding the /usr/share site-lisp paths even running from the build directory, and the version of auctex that is installed there is compatible with 24.3 but not 28.1, so I kept getting mysterious compile errors for the auctex packages until I realized what was going on.

And you're saying that whether you place them in `../NN.MM/site-lisp`

rather than in `../site-lisp` makes a significant performance difference?

Sorry, no. I meant I'm curious if having them in the user's cache versus the system ELN cache would make any difference in start-up time, ignoring the initial async native compilation. In particular whether the checksum calculation is bypassed in one case but not the other (by keeping a permanent mapping from the system load-path to the system cache, say).

other problem was that I got a "bytecode overflow error". I only got
> the first error after chopping off the file approximately after the first
> 10k lines. Oddly enough, when I put all the files in the site-lisp
> directory, and collect all the autoloads for that directory in a single
> file, it has no problem with the 80k line file that results.

We need to fix those problems. Please try and give as much detail as
possible in your bug report so we can try and reproduce it on our end
(both for the warnings about non-top-level forms and for the bytecode
overflow).

> I'm pretty sure the load-path is an issue with 1250 packages, even if half
> of them consist of single files.

I'm afraid so, indeed.

> One issue with this approach is that the package selection mechanism
> doesn't recognize the modules as being installed, or provide any assistance
> in selectively activating modules.

Indeed, since the selective activation relies crucially on the
`load-path` for that.

> Other places where there is a noticeable slowdown with large numbers of
> packages:
> * Browsing customization groups - just unfolding a single group can take
> minutes (this is on fast server hardware with a lot of free memory)

Hmm... can't think of why that would be. You might want to make
a separate bug-report for that.

> * Browsing custom themes with many theme packages installed
> I haven't gotten to the point that I can test the same situation by
> explicitly loading the same modules from the site-lisp directory that had
> been activated as packages. Installing the themes in the system directory
> does skip the "suspicious files" check that occurs when loading them from
> the user configuration.

Same here. I'm not very familiar with the custom-theme code, but it
does seem "unrelated" in the sense that I don't think fixing some of the
other problems you've encountered will fix this one.

I agree, but there was the possiblity the compilation process (I'm assuming the byte-compile stage would do this, if it were done at all) would precompute things like customization groups for the compilation unit. Then aggregating the source of compilation units into larger libraries might be expected to significantly decrease the amount of dynamic computation currently required.

I know there's no inherent link to native compilation, it's more a case of if NC makes the implementation fast enough to make these additional packages attractive, you're more likely to see the consequences of design choices made assuming the byte code interpreter would be the bottleneck, etc.

> I would expect this would apply to most top-level defuns in elisp
> packages/modules. From my cursory review, it looks like the ability to
> redefine these defuns is mostly useful when developing the packages
> themselves, and "sealing" them for use would be appropriate.

Advice are not used very often, but it's very hard to predict on which
function(s) they may end up being needed, and sealing would make advice
ineffective. I would personally recommend to just stay away from the
level 3 of the native compiler's optimization. Or at least, only use it
in targeted ways, i.e. only at the very rare few spots where you've
clearly found it to have a noticeable performance benefit.

In lower levels of optimization, those same calls are still optimized
but just less aggressively, which basically means they turn into:

if (<symbol unchanged)
<call the C function directly>;
else
<use the old slow but correct code path>;

I'm guessing the native compiled code is making the GC's performance a more noticeable chunk of overhead. I'd really love to see something like Chromium's concurrent gc integrated into emacs.

If I do any rigorous experiments to see if there's anything resembling a virtuous cycle in larger compilation units + higher intraprocedural optimizations, I'll report back.

Lynn