From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Lynn Winebarger Newsgroups: gmane.emacs.devel Subject: Re: native compilation units Date: Mon, 6 Jun 2022 00:12:29 -0400 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="000000000000c2798905e0bfae4f" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="26826"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Andrea Corallo , emacs-devel@gnu.org To: Stefan Monnier Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Mon Jun 06 12:32:30 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nyA29-0006mG-Sc for ged-emacs-devel@m.gmane-mx.org; Mon, 06 Jun 2022 12:32:30 +0200 Original-Received: from localhost ([::1]:41356 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nyA27-0008RF-UD for ged-emacs-devel@m.gmane-mx.org; Mon, 06 Jun 2022 06:32:27 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:45030) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ny46e-0002hN-NN for emacs-devel@gnu.org; Mon, 06 Jun 2022 00:12:45 -0400 Original-Received: from mail-lj1-x22c.google.com ([2a00:1450:4864:20::22c]:34631) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1ny46b-000663-FU for emacs-devel@gnu.org; Mon, 06 Jun 2022 00:12:44 -0400 Original-Received: by mail-lj1-x22c.google.com with SMTP id r8so13921604ljp.1 for ; Sun, 05 Jun 2022 21:12:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=noRLegYcu2s1za3JSf9HFObbAEQlMzXtyuoocPJsmBI=; b=GFsnAIOwOoxTERpFo4DZbkInhfPlbzowI5KWi04x2S6hJJNI6mKRAov04qhwjwLOd4 Rn/5bQpDeUG5bG6mGI2Zx57dXeMcxOSscsjgyc7mALhcktgW3n0UaVKV48CqvNH27k9z 1V0IS+6dxRcU7VCcS6G/TxQBqp1hyKncUEcS1dtAoUzS3UPPPfs+FudxCq7jymvoK0Y+ kQ6ZiUTsps7O5HXzcsId/HOv5TClDMhN1RHaXFPOy3ostwiamM5UxEh6YYk+0eS7QnhJ VdyCPrzLiT2Fhi6awwBogldA2bL4VgbIcUTGWLq1uYMGpehMZ5umdwPJMq3so2N0JRR+ isig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=noRLegYcu2s1za3JSf9HFObbAEQlMzXtyuoocPJsmBI=; b=JZeE5I/eyil0f9y1+uwWiy+cq06wTsdn90NWVwizZ9JfGidI1r0eypOPH3RLbC/1Ld laZj4fbAvngl3qZLJfmg2T4wPJKgVoVWzlt6Pw+BE14PdzCNl4aNYleH1knCVXM4rJwC seG/gJnbK6IgH/8T1w5UrDqEZFXZHTsi9j/5wkE2nAOESiOeDMRM9AlBVK/rp73Pb2TD l9RRNvxPBqMx16MOxzcj31X560PvDHJEguTyaLOgHq4Gfikh+0Kmm6DmPe6zCCD+W7vz 7oX/nWt8+RC/IivNSnfSDOCnJM+pDwViDpXCVp6iA7V5NGS0udw4zgLCNddJ6kUlNuSj jM8g== X-Gm-Message-State: AOAM533R0RKqQHhnajMRwkaJR4Nf30DgmVKyVuePDBHpiDoSCBj7LMWK vN2/WHNRl0l+DvW/oodgid7O0FxQzmoSoMkI4wg= X-Google-Smtp-Source: ABdhPJxuFU+MyVnjxBuSdu+BI3ZQR8XlP6m5mMVj6T2in+VPVvtIeSh1j5K/v0wZ+RUXMrHDc9DpvU0uWWSeWn6gGr0= X-Received: by 2002:a05:651c:1506:b0:255:4b71:9bd with SMTP id e6-20020a05651c150600b002554b7109bdmr23034795ljf.275.1654488758908; Sun, 05 Jun 2022 21:12:38 -0700 (PDT) In-Reply-To: Received-SPF: pass client-ip=2a00:1450:4864:20::22c; envelope-from=owinebar@gmail.com; helo=mail-lj1-x22c.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Mon, 06 Jun 2022 06:27:39 -0400 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:290778 Archived-At: --000000000000c2798905e0bfae4f Content-Type: text/plain; charset="UTF-8" On Sun, Jun 5, 2022, 10:20 AM Stefan Monnier wrote: > > >> >> [ But that doesn't mean we shouldn't try to compile several ELisp > files > >> >> into a single ELN file, especially since the size of ELN files > seems > >> >> to be proportionally larger for small ELisp files than for large > >> >> ones. ] > >> > > Not sure if these general statistics are of much use, but of 4324 source files successfully compiled (1557 from the lisp directory), with a total size of 318MB, including 13 trampolines, The smallest 450 are 17632 bytes or less, with the trampolines at 16744 bytes, total of 7.4M The smallest 1000 are under 25700 bytes, totaling 20M The smallest 2000 are under 38592 bytes, totaling 48M The smallest 3000 are under 62832 bytes, totaling 95M The smallest 4000 are under 188440 bytes, totaling 194M There are only 58 over 500k in size, and only 13 over 1M (max is 3.1M) Those last 58 total about 52M in size. I am curious as to why the system doesn't just produce trampolines for all the system calls AOT in a single module. `load-path` is used for native-compiled files, yes. But it's used > in exactly the same way (and should hence cost the same) for: > - No native compilation > - AOT native compilation > - lazy native compilation > Which is what I meant by "unrelated to native compilation". > True, but it does lead to a little more disappointment when that 2.5-5x speedup is dominated by the load-path length while starting up. > > Although I do wonder if there is some optimization for ELN files in the > > system directory as opposed to the user's cache. I have one build where > I > > native compiled (but not byte compiled) all the el files in the lisp > > directory, > > IIUC current code only loads an ELN file if there is a corresponding ELC > file, so natively compiling a file without also byte-compiling it is > definitely not part of the expected situation. Buyer beware. > That would explain the behavior I've seen. If that's the case, shouldn't batch-native-compile produce the byte-compiled file if it doesn't exist? I'm not following you. Are you talking about compiling third-party > packages during the compilation of Emacs itself by placing them into > a `site-lisp` subdirectory inside Emacs's own source code tree, and then > moving the resulting `.el` and `.elc` files to the `../NN.MM/site-lisp` > > subdirectory in Emacs's installation target directory? > That's the way I'm doing it. Compatibility of these packages with Emacs versions varies too much for me to want to treat them as version-independent. I got burned in an early attempt where I didn't set the prefix, and emacs kept adding the /usr/share site-lisp paths even running from the build directory, and the version of auctex that is installed there is compatible with 24.3 but not 28.1, so I kept getting mysterious compile errors for the auctex packages until I realized what was going on. And you're saying that whether you place them in `../NN.MM/site-lisp` > > rather than in `../site-lisp` makes a significant performance difference? > Sorry, no. I meant I'm curious if having them in the user's cache versus the system ELN cache would make any difference in start-up time, ignoring the initial async native compilation. In particular whether the checksum calculation is bypassed in one case but not the other (by keeping a permanent mapping from the system load-path to the system cache, say). other problem was that I got a "bytecode overflow error". I only got > > the first error after chopping off the file approximately after the first > > 10k lines. Oddly enough, when I put all the files in the site-lisp > > directory, and collect all the autoloads for that directory in a single > > file, it has no problem with the 80k line file that results. > > We need to fix those problems. Please try and give as much detail as > possible in your bug report so we can try and reproduce it on our end > (both for the warnings about non-top-level forms and for the bytecode > overflow). > > > I'm pretty sure the load-path is an issue with 1250 packages, even if > half > > of them consist of single files. > > I'm afraid so, indeed. > > > One issue with this approach is that the package selection mechanism > > doesn't recognize the modules as being installed, or provide any > assistance > > in selectively activating modules. > > Indeed, since the selective activation relies crucially on the > `load-path` for that. > > > Other places where there is a noticeable slowdown with large numbers of > > packages: > > * Browsing customization groups - just unfolding a single group can > take > > minutes (this is on fast server hardware with a lot of free memory) > > Hmm... can't think of why that would be. You might want to make > a separate bug-report for that. > > > * Browsing custom themes with many theme packages installed > > I haven't gotten to the point that I can test the same situation by > > explicitly loading the same modules from the site-lisp directory that had > > been activated as packages. Installing the themes in the system > directory > > does skip the "suspicious files" check that occurs when loading them from > > the user configuration. > > Same here. I'm not very familiar with the custom-theme code, but it > does seem "unrelated" in the sense that I don't think fixing some of the > other problems you've encountered will fix this one. > I agree, but there was the possiblity the compilation process (I'm assuming the byte-compile stage would do this, if it were done at all) would precompute things like customization groups for the compilation unit. Then aggregating the source of compilation units into larger libraries might be expected to significantly decrease the amount of dynamic computation currently required. I know there's no inherent link to native compilation, it's more a case of if NC makes the implementation fast enough to make these additional packages attractive, you're more likely to see the consequences of design choices made assuming the byte code interpreter would be the bottleneck, etc. > I would expect this would apply to most top-level defuns in elisp > > packages/modules. From my cursory review, it looks like the ability to > > redefine these defuns is mostly useful when developing the packages > > themselves, and "sealing" them for use would be appropriate. > > Advice are not used very often, but it's very hard to predict on which > function(s) they may end up being needed, and sealing would make advice > ineffective. I would personally recommend to just stay away from the > level 3 of the native compiler's optimization. Or at least, only use it > in targeted ways, i.e. only at the very rare few spots where you've > clearly found it to have a noticeable performance benefit. > > In lower levels of optimization, those same calls are still optimized > but just less aggressively, which basically means they turn into: > > if ( ; > else > ; I'm guessing the native compiled code is making the GC's performance a more noticeable chunk of overhead. I'd really love to see something like Chromium's concurrent gc integrated into emacs. If I do any rigorous experiments to see if there's anything resembling a virtuous cycle in larger compilation units + higher intraprocedural optimizations, I'll report back. Lynn --000000000000c2798905e0bfae4f Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

On Sun, Jun 5, 2022, 10:20 AM Stefan Monnier <monnier@iro.umontreal.ca> wrote= :

>> >> [ But that doesn't mean we shouldn't try to compi= le several ELisp files
>> >>=C2=A0 =C2=A0into a single ELN file, especially since the = size of ELN files seems
>> >>=C2=A0 =C2=A0to be proportionally larger for small ELisp f= iles than for large
>> >>=C2=A0 =C2=A0ones.=C2=A0 ]
>> >
Not sure if the= se general statistics are of much use, but of 4324 source files successfull= y compiled (1557 from the lisp directory), with a total size of 318MB, incl= uding 13 trampolines,
The smallest 450 are 17632 byt= es or less, with the trampolines at 16744 bytes, total of 7.4M
The smallest 1000 are under 25700 bytes, totaling 20M
The smallest 2000 are under 38592 bytes, totaling 48M
The smallest 3000 are under 62832 bytes, totaling 95M
<= div dir=3D"auto">The smallest 4000 are under 188440 bytes, totaling 194M
There are only 58 over 500k in size, and only 13 over = 1M (max is 3.1M)=C2=A0
Those last 58 total about 52M= in size.

I am curious a= s to why the system doesn't just produce trampolines for all the system= calls AOT in a single module.

`load-pa= th` is used for native-compiled files, yes.=C2=A0 But it's used
in exactly the same way (and should hence cost the same) for:
- No native compilation
- AOT native compilation
- lazy native compilation
Which is what I meant by "unrelated to native compilation".
True, but it does lead to a little= more disappointment when that 2.5-5x speedup is dominated by the load-path= length while starting up.=C2=A0=C2=A0

> Although I do wonder if there is some optimization for ELN files in th= e
> system directory as opposed to the user's cache.=C2=A0 I have one = build where I
> native compiled (but not byte compiled) all the el files in the lisp > directory,

IIUC current code only loads an ELN file if there is a corresponding ELC file, so natively compiling a file without also byte-compiling it is
definitely not part of the expected situation.=C2=A0 Buyer beware.
That would explain the behavior I'= ;ve seen.=C2=A0 If that's the case, shouldn't batch-native-compile = produce the byte-compiled file if it doesn't exist?=C2=A0=C2=A0

I'm not following you.=C2=A0 Are you talking= about compiling third-party
packages during the compilation of Emacs itself by placing them into
a `site-lisp` subdirectory inside Emacs's own source code tree, and the= n
moving the resulting `.el` and `.elc` files to the `../NN.MM/site-l= isp`
subdirectory in Emacs's installation target directory?
=

That's the wa= y I'm doing it.=C2=A0 Compatibility of these packages with Emacs versio= ns varies too much for me to want to treat them as version-independent.=C2= =A0 I got burned in an early attempt where I didn't set the prefix, and= emacs kept adding the /usr/share site-lisp paths even running from the bui= ld directory, and the version of auctex that is installed there is compatib= le with 24.3 but not 28.1, so I kept getting mysterious compile errors for = the auctex packages until I realized what was going on.

And you're saying that whether you place them in `../NN.MM/site-lisp`
rather than in `../site-lisp` makes a significant performance difference?

So= rry, no.=C2=A0 I meant I'm curious if having them in the user's cac= he versus the system ELN cache would make any difference in start-up time, = ignoring the initial async native compilation.=C2=A0 In particular whether = the checksum calculation is bypassed in one case but not the other (by keep= ing a permanent mapping from the system load-path to the system cache, say)= .

=C2=A0other problem was that I got a &q= uot;bytecode overflow error".=C2=A0 I only got
> the first error after chopping off the file approximately after the fi= rst
> 10k lines.=C2=A0 Oddly enough, when I put all the files in the site-li= sp
> directory, and collect all the autoloads for that directory in a singl= e
> file, it has no problem with the 80k line file that results.

We need to fix those problems.=C2=A0 Please try and give as much detail as<= br> possible in your bug report so we can try and reproduce it on our end
(both for the warnings about non-top-level forms and for the bytecode
overflow).

> I'm pretty sure the load-path is an issue with 1250 packages, even= if half
> of them consist of single files.

I'm afraid so, indeed.

> One issue with this approach is that the package selection mechanism > doesn't recognize the modules as being installed, or provide any a= ssistance
> in selectively activating modules.

Indeed, since the selective activation relies crucially on the
`load-path` for that.

> Other places where there is a noticeable slowdown with large numbers o= f
> packages:
>=C2=A0 =C2=A0* Browsing customization groups - just unfolding a single = group can take
> minutes (this is on fast server hardware with a lot of free memory)
Hmm... can't think of why that would be.=C2=A0 You might want to make a separate bug-report for that.

>=C2=A0 =C2=A0* Browsing custom themes with many theme packages installe= d
> I haven't gotten to the point that I can test the same situation b= y
> explicitly loading the same modules from the site-lisp directory that = had
> been activated as packages.=C2=A0 Installing the themes in the system = directory
> does skip the "suspicious files" check that occurs when load= ing them from
> the user configuration.

Same here.=C2=A0 I'm not very familiar with the custom-theme code, but = it
does seem "unrelated" in the sense that I don't think fixing = some of the
other problems you've encountered will fix this one.

I agree, but there = was the possiblity the compilation process (I'm assuming the byte-compi= le stage would do this, if it were done at all) would precompute things lik= e customization groups for the compilation unit.=C2=A0 Then aggregating the= source of compilation units into larger libraries might be expected to sig= nificantly decrease the amount of dynamic computation currently required.

I know there's no inh= erent link to native compilation, it's more a case of if NC makes the i= mplementation fast enough to make these additional packages attractive, you= 're more likely to see the consequences of design choices made assuming= the byte code interpreter would be the bottleneck, etc.

> I would expect this would apply to most top-level defu= ns in elisp
> packages/modules.=C2=A0 From my cursory review, it looks like the abil= ity to
> redefine these defuns is mostly useful when developing the packages > themselves, and "sealing" them for use would be appropriate.=

Advice are not used very often, but it's very hard to predict on which<= br> function(s) they may end up being needed, and sealing would make advice
ineffective.=C2=A0 I would personally recommend to just stay away from the<= br> level 3 of the native compiler's optimization.=C2=A0 Or at least, only = use it
in targeted ways, i.e. only at the very rare few spots where you've
clearly found it to have a noticeable performance benefit.

In lower levels of optimization, those same calls are still optimized
but just less aggressively, which basically means they turn into:

=C2=A0 =C2=A0 if (<symbol unchanged)
=C2=A0 =C2=A0 =C2=A0 =C2=A0<call the C function directly>;
=C2=A0 =C2=A0 else
=C2=A0 =C2=A0 =C2=A0 =C2=A0<use the old slow but correct code path>;<= /blockquote>

I'= ;m guessing the native compiled code is making the GC's performance a m= ore noticeable chunk of overhead. I'd really love to see something like= Chromium's concurrent gc integrated into emacs.

If I do any rigorous experiments to see if the= re's anything resembling a virtuous cycle in larger compilation units += higher intraprocedural optimizations, I'll report back.

Lynn

--000000000000c2798905e0bfae4f--