unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Lynn Winebarger <owinebar@gmail.com>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: Andrea Corallo <akrl@sdf.org>, emacs-devel@gnu.org
Subject: Re: native compilation units
Date: Sun, 5 Jun 2022 08:16:25 -0400	[thread overview]
Message-ID: <CAM=F=bDzv-=r-zZR+ti708aeg7_iXZMnqf-o_5a-kaXB=VY-pw@mail.gmail.com> (raw)
In-Reply-To: <jwvilpgwmuw.fsf-monnier+emacs@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 9912 bytes --]

On Sat, Jun 4, 2022, 10:32 AM Stefan Monnier <monnier@iro.umontreal.ca>
wrote:

> >> Performance issues with read access to directories containing less than
> >> 10K files seems like something that was solved last century, so
> >> I wouldn't worry very much about it.
> > Per my response to Eli, I see (network) directories become almost
> unusable
> > somewhere around 1000 files,
>
> I don't doubt there are still (in the current century) cases where
> largish directories get slow, but what I meant is that it's now
> considered as a problem that should be solved by making those
> directories fast rather than by avoiding making them so large.
>
Unfortunately sometimes we have to cope with environment we use.  And for
all I know some of the performance penalties may be inherent in the
(security related) infrastructure requirements in a highly regulated
industry.
Not that that should be a primary concern for the development team, but it
is something a local packager might be stuck with.


> >> [ But that doesn't mean we shouldn't try to compile several ELisp files
> >>   into a single ELN file, especially since the size of ELN files seems
> >>   to be proportionally larger for small ELisp files than for large
> >>   ones.  ]
> >
> > Since I learned of the native compiler in 28.1, I decided to try it out
> and
> > also "throw the spaghetti at the wall" with a bunch of packages that
> > provide features similar to those found in more "modern" IDEs.  In terms
> of
> > startup time, the normal package system does not deal well with hundreds
> of
> > directories on the load path, regardless of AOR native compilation, so
> I'm
> > tranforming the packages to install in the version-specific load path,
> and
> > compiling that ahead of time.  At least for the ones amenable to such
> > treatment.
>
> There are two load-paths at play (`load-path` and
> `native-comp-eln-load-path`) and I'm not sure which one you're taking
> about.  OT1H `native-comp-eln-load-path` should not grow with the number
> of packages so it typically contains exactly 2 entries, and definitely
> not hundreds.  OTOH `load-path` is unrelated to native compilation.
>

Not entirely - as I understand it, the load system first finds the source
file and computers a hash before determining if there is an ELN file
corresponding to it.
Although I do wonder if there is some optimization for ELN files in the
system directory as opposed to the user's cache.  I have one build where I
native compiled (but not byte compiled) all the el files in the lisp
directory, and another where I byte compiled and then native compiled the
same set of files.  In both cases I used the flag to batch-native-compile
to put the ELN file in the system cache.  In the first case a number of
files failed to compile, and in the second, they all compiled.  I've also
observed another situation where a file will only (bye or native) compile
if one of its required files has been byte compiled ahead of time - but
only native compiling that dependency resulted in the same behavior as not
compiling it at all.  I planned to send a separate mail to the list asking
whether it was intended behavior once I had reduced it to a simple case, or
if it should be submitted as a bug.
In any case, I noticed that the "browse customization groups" buffer is
noticeable faster in the second case.  I need to try it again to confirm
that it wasn't just waiting on the relevant source files to compile in the
first case.

I also don't understand what you mean by "version-specific load path".
>
In the usual unix installation, there will be a "site-lisp" one directory
above the version specific installation directory, and another site-lisp in
the version-specific installation directory.  I'm referring to installing
the source (ultimately) in ..../emacs/28.1/site-lisp.  During the build
it's just in the site-lisp subdirectory of the source root path.


> Also, what kind of startup time are you talking about?
> E.g., are you using `package-quickstart`?
>
That was the first alternative I tried.  With 1250 packages, it did not
work.  First, the file consisted of a series of "let" forms corresponding
to the package directories, and apparently the autoload forms are ignored
if they appear anywhere below top-level.  At least I got a number of
warnings to that effect.
The other problem was that I got a "bytecode overflow error".  I only got
the first error after chopping off the file approximately after the first
10k lines.  Oddly enough, when I put all the files in the site-lisp
directory, and collect all the autoloads for that directory in a single
file, it has no problem with the 80k line file that results.


> > Given I'm compiling all the files AOT for use in a common installation
> > (this is on Linux, not Windows), the natural question for me is whether
> > larger compilation units would be more efficient, particularly at
> startup.
>
> It all depends where the slowdown comes from :-)
>
> E.g. `package-quickstart` follows a similar idea to the one you propose
> by collecting all the `<pkg>-autoloads.el` into one bug file, which
> saves us from having to load separately all those little files.  It also
> saves us from having to look for them through those hundreds
> of directories.
>
> I suspect a long `load-path` can itself be a source of slow down
> especially during startup, but I haven't bumped into that yet.
> There are ways we could speed it up, if needed:
>
> - create "meta packages" (or just one containing all your packages),
>   which would bring together in a single directory the files of several
>   packages (and presumably also bring together their
>   `<pkg>-autoloads.el` into a larger combined one).  Under GNU/Linux we
>   could have this metapackage be made of symlinks, making it fairly
>   efficient an non-obtrusive (e.g. `C-h o` could still get you to the
>   actual file rather than its metapackage-copy).
> - Manage a cache of where are our ELisp files (i.e. a hash table
>   mapping relative ELisp file names to the absolute file name returned
>   by looking for them in `load-path`).  This way we can usually avoid
>   scanning those hundred directories to find the .elc file we need, and
>   go straight to it.
>
I'm pretty sure the load-path is an issue with 1250 packages, even if half
of them consist of single files.

Since I'm preparing this for a custom installation that will be accessible
for multiple users, I decided to try putting everything in site-lisp and
native compile everything AOT.  Most of the other potential users are not
experienced Unix users, which is why I'm trying to make everything work
smoothly up front and have features they would find familiar from other
editors.

One issue with this approach is that the package selection mechanism
doesn't recognize the modules as being installed, or provide any assistance
in selectively activating modules.

Other places where there is a noticeable slowdown with large numbers of
packages:
  * Browsing customization groups - just unfolding a single group can take
minutes (this is on fast server hardware with a lot of free memory)
  * Browsing custom themes with many theme packages installed
I haven't gotten to the point that I can test the same situation by
explicitly loading the same modules from the site-lisp directory that had
been activated as packages.  Installing the themes in the system directory
does skip the "suspicious files" check that occurs when loading them from
the user configuration.


> > I posed the question to the list mostly to see if the approach (or
> similar)
> > had already been tested for viability or effectiveness, so I can avoid
> > unnecessary experimentation if the answer is already well-understood.
>
> I don't think it has been tried, no.
>
> > I don't know enough about modern library loading to know whether you'd
> > expect N distinct but interdependent dynamic libraries to be loaded in as
> > compact a memory region as a single dynamic library formed from the same
> > underlying object code.
>
> I think you're right here, but I'd expect the effect to be fairly small
> except when the .elc/.eln files are themselves small.
>

There are a lot of packages that have fairly small source files, just
because they've factored their code the same way it would be in languages
where the shared libraries are not in 1-1 correspondence with source files.

>
> > It's not clear to me whether those points are limited to call
> > sites or not.
>
> I believe it is: the optimization is to replace a call via `Ffuncall` to
> a "symbol" (which looks up the value stored in the `symbol-function`
> cell), with a direct call to the actual C function contained in the
> "subr" object itself (expected to be) contained in the
> `symbol-function` cell.
>
> Andrea would know if there are other semantic-non-preserving
> optimizations in the level 3 of the optimizations, but IIUC this is very
> much the main one.
>
> >> IIUC the current native-compiler will actually leave those
> >> locally-defined functions in their byte-code form :-(
> > That's not what I understood from
> > https://akrl.sdf.org/gccemacs.html#org0f21a5b
> > As you deduce below, I come from a Scheme background - cl-flet is the
> form
> > I should have referenced, not let.
>
> Indeed you're right that those functions can be native compiled, tho only
> if
> they're closed (i.e. if they don't refer to surrounding lexical
> variables).
> [ I always forget that little detail :-(  ]
>

I would expect this would apply to most top-level defuns in elisp
packages/modules.  From my cursory review, it looks like the ability to
redefine these defuns is mostly useful when developing the packages
themselves, and "sealing" them for use would be appropriate.
I'm not clear on whether this optimization is limited to the case of
calling functions defined in the compilation unit, or applied more broadly.

Thanks,
Lynn


>

[-- Attachment #2: Type: text/html, Size: 13343 bytes --]

  reply	other threads:[~2022-06-05 12:16 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-31  1:02 native compilation units Lynn Winebarger
2022-06-01 13:50 ` Andrea Corallo
2022-06-03 14:17   ` Lynn Winebarger
2022-06-03 16:05     ` Eli Zaretskii
     [not found]       ` <CAM=F=bDxxyHurxM_xdbb7XJtP8rdK16Cwp30ti52Ox4nv19J_w@mail.gmail.com>
2022-06-04  5:57         ` Eli Zaretskii
2022-06-05 13:53           ` Lynn Winebarger
2022-06-03 18:15     ` Stefan Monnier
2022-06-04  2:43       ` Lynn Winebarger
2022-06-04 14:32         ` Stefan Monnier
2022-06-05 12:16           ` Lynn Winebarger [this message]
2022-06-05 14:08             ` Lynn Winebarger
2022-06-05 14:46               ` Stefan Monnier
2022-06-05 14:20             ` Stefan Monnier
2022-06-06  4:12               ` Lynn Winebarger
2022-06-06  6:12                 ` Stefan Monnier
2022-06-06 10:39                   ` Eli Zaretskii
2022-06-06 16:23                     ` Lynn Winebarger
2022-06-06 16:58                       ` Eli Zaretskii
2022-06-07  2:14                         ` Lynn Winebarger
2022-06-07 10:53                           ` Eli Zaretskii
2022-06-06 16:13                   ` Lynn Winebarger
2022-06-07  2:39                     ` Lynn Winebarger
2022-06-07 11:50                       ` Stefan Monnier
2022-06-07 13:11                         ` Eli Zaretskii
2022-06-14  4:19               ` Lynn Winebarger
2022-06-14 12:23                 ` Stefan Monnier
2022-06-14 14:55                   ` Lynn Winebarger
2022-06-08  6:56           ` Andrea Corallo
2022-06-11 16:13             ` Lynn Winebarger
2022-06-11 16:37               ` Stefan Monnier
2022-06-11 17:49                 ` Lynn Winebarger
2022-06-11 20:34                   ` Stefan Monnier
2022-06-12 17:38                     ` Lynn Winebarger
2022-06-12 18:47                       ` Stefan Monnier
2022-06-13 16:33                         ` Lynn Winebarger
2022-06-13 17:15                           ` Stefan Monnier
2022-06-15  3:03                             ` Lynn Winebarger
2022-06-15 12:23                               ` Stefan Monnier
2022-06-19 17:52                                 ` Lynn Winebarger
2022-06-19 23:02                                   ` Stefan Monnier
2022-06-20  1:39                                     ` Lynn Winebarger
2022-06-20 12:14                                       ` Lynn Winebarger
2022-06-20 12:34                                       ` Lynn Winebarger
2022-06-25 18:12                                       ` Lynn Winebarger
2022-06-26 14:14                                         ` Lynn Winebarger
2022-06-08  6:46         ` Andrea Corallo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAM=F=bDzv-=r-zZR+ti708aeg7_iXZMnqf-o_5a-kaXB=VY-pw@mail.gmail.com' \
    --to=owinebar@gmail.com \
    --cc=akrl@sdf.org \
    --cc=emacs-devel@gnu.org \
    --cc=monnier@iro.umontreal.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).