* native compilation units
@ 2022-05-31 1:02 Lynn Winebarger
2022-06-01 13:50 ` Andrea Corallo
0 siblings, 1 reply; 46+ messages in thread
From: Lynn Winebarger @ 2022-05-31 1:02 UTC (permalink / raw)
To: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1802 bytes --]
Hi,
Since the native compiler does not support linking eln files, I'm curious
if anyone has tried combining elisp files as source code files and
compiling the result as a unit?
Has there been any testing to determine if larger compilation units would
be more efficient either in terms of loading or increased optimization
opportunities visible to the compiler?
Just as a thought experiment, there are about 1500 .el files in the lisp
directory. Running from the root of the source tree, let's say I make 2
new directories, ct-lisp and lib-lisp, and then do
cp -Rf lisp/* ct-lisp
echo "(provide 'lib-emacs)" >lib-lisp/lib-emacs.el
find lisp -name '*.el' | while read src; do cat $src
>>lib-lisp/lib-emacs.el; done
EMACS_LOAD_PATH='' ./src/emacs -batch -nsl --no-site-file --eval "(progn
(setq load-path '(\"ct-lisp\" \"lib-lisp\")) (batch-native-compile 't))"
lib-lisp/lib-emacs.el
find lisp -name '*.el' | while read src; do
cat >lib-lisp/$(basename $src) <<EOF
;; -*-no-byte-compile: t; -*-
(require 'lib-emacs)
EOF
./src/emacs --eval "(setq load-path '(\"lib-lisp\"))" &
This is just a thought experiment, so assume the machine running this
compilation has infinite memory and completes the compilation within a
reasonable amount of time, and assume this sloppy approach doesn't yield an
divergent metarecursion.
If you actually loaded all 1500 modules at once, what would be the
difference between having 1500+ files versus the one large so (assuming all
1500+ were compiled AOT to be fair).
I'm assuming in practice you would want to choose units with a bit more
care, of course. It just seems like there would be some more optimal
approach for using the native compiler than having all these tiny
compilation units, especially once you get into any significant number of
packages.
Lynn
[-- Attachment #2: Type: text/html, Size: 2209 bytes --]
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-05-31 1:02 native compilation units Lynn Winebarger @ 2022-06-01 13:50 ` Andrea Corallo 2022-06-03 14:17 ` Lynn Winebarger 0 siblings, 1 reply; 46+ messages in thread From: Andrea Corallo @ 2022-06-01 13:50 UTC (permalink / raw) To: Lynn Winebarger; +Cc: emacs-devel Lynn Winebarger <owinebar@gmail.com> writes: > Hi, > Since the native compiler does not support linking eln files, I'm curious if anyone has tried combining elisp files as > source code files and compiling the result as a unit? > Has there been any testing to determine if larger compilation units would be more efficient either in terms of loading or > increased optimization opportunities visible to the compiler? Hi, the compiler can't take advantage of interprocedural optimizations (such as inline etc) as every function in Lisp can be redefined in every moment. You can trigger those optimizations anyway using native-comp-speed 3 but each time one of the function in the compilation unit is redefined you'll have to recompile the whole CU to make sure all changes take effect. This strategy might be useful, but I guess limited to some specific application. Best Regards Andrea ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-01 13:50 ` Andrea Corallo @ 2022-06-03 14:17 ` Lynn Winebarger 2022-06-03 16:05 ` Eli Zaretskii 2022-06-03 18:15 ` Stefan Monnier 0 siblings, 2 replies; 46+ messages in thread From: Lynn Winebarger @ 2022-06-03 14:17 UTC (permalink / raw) To: Andrea Corallo; +Cc: emacs-devel [-- Attachment #1: Type: text/plain, Size: 2428 bytes --] Thanks. There was a thread in January starting at https://lists.gnu.org/archive/html/emacs-devel/2022-01/msg01005.html that gets at one scenario. At least in pre-10 versions in my experience, Windows has not dealt well with large numbers of files in a single directory, at least if it's on a network drive. There's some super-linear behavior just listing the contents of a directory that makes having more than, say, a thousand files in a directory impractical. That makes packaging emacs with all files on the system load path precompiled inadvisable. If you add any significant number of pre-compiled site-lisp libraries (eg a local elpa mirror), it will get worse. Aside from explicit interprocedural optimization, is it possible libgccjit would lay out the code in a more optimal way in terms of memory locality? If the only concern for semantic safety with -O3 is the redefinability of all symbols, that's already the case for emacs lisp primitives implemented in C. It should be similar to putting the code into a let block with all defined functions bound in the block, then setting the global definitions to the locally defined versions, except for any variations in forms with semantics that depend on whether they appear at top-level or in a lexical scope. It might be interesting to extend the language with a form that makes the unsafe optimizations safe with respect to the compilation unit. On Wed, Jun 1, 2022 at 9:50 AM Andrea Corallo <akrl@sdf.org> wrote: > Lynn Winebarger <owinebar@gmail.com> writes: > > > Hi, > > Since the native compiler does not support linking eln files, I'm > curious if anyone has tried combining elisp files as > > source code files and compiling the result as a unit? > > Has there been any testing to determine if larger compilation units > would be more efficient either in terms of loading or > > increased optimization opportunities visible to the compiler? > > Hi, > > the compiler can't take advantage of interprocedural optimizations (such > as inline etc) as every function in Lisp can be redefined in every > moment. > > You can trigger those optimizations anyway using native-comp-speed 3 but > each time one of the function in the compilation unit is redefined > you'll have to recompile the whole CU to make sure all changes take > effect. > > This strategy might be useful, but I guess limited to some specific > application. > > Best Regards > > Andrea > [-- Attachment #2: Type: text/html, Size: 3268 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-03 14:17 ` Lynn Winebarger @ 2022-06-03 16:05 ` Eli Zaretskii [not found] ` <CAM=F=bDxxyHurxM_xdbb7XJtP8rdK16Cwp30ti52Ox4nv19J_w@mail.gmail.com> 2022-06-03 18:15 ` Stefan Monnier 1 sibling, 1 reply; 46+ messages in thread From: Eli Zaretskii @ 2022-06-03 16:05 UTC (permalink / raw) To: Lynn Winebarger; +Cc: akrl, emacs-devel > From: Lynn Winebarger <owinebar@gmail.com> > Date: Fri, 3 Jun 2022 10:17:25 -0400 > Cc: emacs-devel@gnu.org > > There was a thread in January starting at > https://lists.gnu.org/archive/html/emacs-devel/2022-01/msg01005.html that gets at one scenario. At least in > pre-10 versions in my experience, Windows has not dealt well with large numbers of files in a single > directory, at least if it's on a network drive. There's some super-linear behavior just listing the contents of a > directory that makes having more than, say, a thousand files in a directory impractical. Is this only on networked drives? I have a directory with almost 5000 files, and I see no issues there. Could you show a recipe for observing the slow-down you are describing? > That makes > packaging emacs with all files on the system load path precompiled inadvisable. If you add any significant > number of pre-compiled site-lisp libraries (eg a local elpa mirror), it will get worse. ELPA files are supposed to be compiled into the user's eln-cache directory, not into the native-lisp subdirectory of lib/emacs/, so we are okay there. And users can split their eln-cache directory into several ones (and update native-comp-eln-load-path accordingly) if needed. But I admit that I never saw anything like what you describe, so I'm curious what and why is going on in these cases, and how bad is the slow-down. > Aside from explicit interprocedural optimization, is it possible libgccjit would lay out the code in a more > optimal way in terms of memory locality? > > If the only concern for semantic safety with -O3 is the redefinability of all symbols, that's already the case for > emacs lisp primitives implemented in C. It should be similar to putting the code into a let block with all > defined functions bound in the block, then setting the global definitions to the locally defined versions, except > for any variations in forms with semantics that depend on whether they appear at top-level or in a lexical > scope. It might be interesting to extend the language with a form that makes the unsafe optimizations safe > with respect to the compilation unit. I believe this is an entirely different subject? ^ permalink raw reply [flat|nested] 46+ messages in thread
[parent not found: <CAM=F=bDxxyHurxM_xdbb7XJtP8rdK16Cwp30ti52Ox4nv19J_w@mail.gmail.com>]
* Re: native compilation units [not found] ` <CAM=F=bDxxyHurxM_xdbb7XJtP8rdK16Cwp30ti52Ox4nv19J_w@mail.gmail.com> @ 2022-06-04 5:57 ` Eli Zaretskii 2022-06-05 13:53 ` Lynn Winebarger 0 siblings, 1 reply; 46+ messages in thread From: Eli Zaretskii @ 2022-06-04 5:57 UTC (permalink / raw) To: Lynn Winebarger; +Cc: Andrea Corallo, emacs-devel [Please use Reply All, to keep the mailing list and other interested people part of this discussion.] > From: Lynn Winebarger <owinebar@gmail.com> > Date: Fri, 3 Jun 2022 15:17:51 -0400 > > Unfortunately most of my "productive" experience in a Windows environment has been in a corporate > environment where the configuration is opaque to end users. For all I know, it's not just a network issue but > could also involve the security/antivirus infrastructure. > I can tell you that at approximately 1000 files in a directory, any process I've designed that uses said > directory slows down dramatically. Just displaying the contents in file explorer exhibits quadratic behavior as > the process appears to start refreshing the listing before completing one pass. You can try setting the w32-get-true-file-attributes variable to the value 'local. Or maybe the following entry from etc/PROBLEMS will help: ** A few seconds delay is seen at startup and for many file operations This happens when the Net Logon service is enabled. During Emacs startup, this service issues many DNS requests looking up for the Windows Domain Controller. When Emacs accesses files on networked drives, it automatically logs on the user into those drives, which again causes delays when Net Logon is running. The solution seems to be to disable Net Logon with this command typed at the Windows shell prompt: net stop netlogon To start the service again, type "net start netlogon". (You can also stop and start the service from the Computer Management application, accessible by right-clicking "My Computer" or "Computer", selecting "Manage", then clicking on "Services".) > As for elpa being created in the user's cache, that depends on whether the user has access to the gccjit > infrastructure If the user cannot use libgccjit on the user's system, then why *.eln files from external packages are relevant? They will never appear, because native compilation is not available. So I don't think I understand what you are saying here. If you have in mind ELPA packages that come with precompiled *.eln files (are there packages like that?), then the user can place them in several directories and adapt native-comp-eln-load-path accordingly. So again I don't think I understand the problem you describe. > this was one of the points mentioned in > https://lists.gnu.org/archive/html/emacs-devel/2022-01/msg01005.html as it related to the system lisp files. Sorry, I don't see anything about the issue of eln-cache location there. Could you be more specific and point to what was said there that is relevant to this discussion? ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-04 5:57 ` Eli Zaretskii @ 2022-06-05 13:53 ` Lynn Winebarger 0 siblings, 0 replies; 46+ messages in thread From: Lynn Winebarger @ 2022-06-05 13:53 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Andrea Corallo, emacs-devel [-- Attachment #1: Type: text/plain, Size: 4177 bytes --] On Sat, Jun 4, 2022, 1:57 AM Eli Zaretskii <eliz@gnu.org> wrote: > > From: Lynn Winebarger <owinebar@gmail.com> > > Date: Fri, 3 Jun 2022 15:17:51 -0400 > > > > Unfortunately most of my "productive" experience in a Windows > environment has been in a corporate > > environment where the configuration is opaque to end users. For all I > know, it's not just a network issue but > > could also involve the security/antivirus infrastructure. > > I can tell you that at approximately 1000 files in a directory, any > process I've designed that uses said > > directory slows down dramatically. Just displaying the contents in file > explorer exhibits quadratic behavior as > > the process appears to start refreshing the listing before completing > one pass. > > You can try setting the w32-get-true-file-attributes variable to the > value 'local. > > Or maybe the following entry from etc/PROBLEMS will help: > > ** A few seconds delay is seen at startup and for many file operations > > This happens when the Net Logon service is enabled. During Emacs > startup, this service issues many DNS requests looking up for the > Windows Domain Controller. When Emacs accesses files on networked > drives, it automatically logs on the user into those drives, which > again causes delays when Net Logon is running. > > The solution seems to be to disable Net Logon with this command typed > at the Windows shell prompt: > > net stop netlogon > > To start the service again, type "net start netlogon". (You can also > stop and start the service from the Computer Management application, > accessible by right-clicking "My Computer" or "Computer", selecting > "Manage", then clicking on "Services".) > I was only intending to illustrate a situation in which a local packager (internal to an organization) might want to (a) provide pre-compiled versions of elisp files that may or may not be from files installed in the "lisp" directory, while (b) not wanting to have huge numbers of files in a particular directory for performance reasons. The performance issues I've experienced are not particular to any individual application, and the way the Windows systems are configured I may not even reliably be able to tell if a given application is stored on a local or network drive (although performance may lead me to believe it is one or the other). They do appear to be particular to the context in which I have been using Windows, though. > As for elpa being created in the user's cache, that depends on whether > the user has access to the gccjit > > infrastructure > > If the user cannot use libgccjit on the user's system, then why *.eln > files from external packages are relevant? They will never appear, > because native compilation is not available. > > So I don't think I understand what you are saying here. > > If you have in mind ELPA packages that come with precompiled *.eln > files (are there packages like that?), then the user can place them in > several directories and adapt native-comp-eln-load-path accordingly. > So again I don't think I understand the problem you describe. > A local packager can precompile anything they like and put it in the system native-lisp directory, no? I'm not sure if the package system would find it if installed as a package by the user, but many packages are just single files that can just be placed directly in site-lisp and used directly. > > this was one of the points mentioned in > > https://lists.gnu.org/archive/html/emacs-devel/2022-01/msg01005.html as > it related to the system lisp files. > > Sorry, I don't see anything about the issue of eln-cache location > there. Could you be more specific and point to what was said there > that is relevant to this discussion? > I was thinking of these: https://lists.gnu.org/archive/html/emacs-devel/2022-01/msg01005.html particularly: I don't understand yet the packaging requirements, is it not possible to copy additionally the native-lisp/ folder to the package? and then these points: https://lists.gnu.org/archive/html/emacs-devel/2022-01/msg01009.html https://lists.gnu.org/archive/html/emacs-devel/2022-01/msg01020.html Lynn [-- Attachment #2: Type: text/html, Size: 6498 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-03 14:17 ` Lynn Winebarger 2022-06-03 16:05 ` Eli Zaretskii @ 2022-06-03 18:15 ` Stefan Monnier 2022-06-04 2:43 ` Lynn Winebarger 1 sibling, 1 reply; 46+ messages in thread From: Stefan Monnier @ 2022-06-03 18:15 UTC (permalink / raw) To: Lynn Winebarger; +Cc: Andrea Corallo, emacs-devel > There was a thread in January starting at > https://lists.gnu.org/archive/html/emacs-devel/2022-01/msg01005.html that > gets at one scenario. At least in pre-10 versions in my experience, > Windows has not dealt well with large numbers of files in a single > directory, at least if it's on a network drive. Hmm... I count a bit over 6K ELisp files in Emacs + (Non)GNU ELPA, so the ELN cache should presumably not go much past 10K files. Performance issues with read access to directories containing less than 10K files seems like something that was solved last century, so I wouldn't worry very much about it. [ But that doesn't mean we shouldn't try to compile several ELisp files into a single ELN file, especially since the size of ELN files seems to be proportionally larger for small ELisp files than for large ones. ] > Aside from explicit interprocedural optimization, is it possible libgccjit > would lay out the code in a more optimal way in terms of memory locality? Could be, but I doubt it because I don't think GCC gets enough info to make such a decision. For lazily-compiled ELN files I could imagine collecting some amount of profiling info to generate better code, but our code generation is definitely not that sophisticated. > If the only concern for semantic safety with -O3 is the redefinability of > all symbols, that's already the case for emacs lisp primitives implemented > in C. Not really: - Most ELisp primitives implemented in C can be redefined just fine. The problem is about *calls* to those primitives, where the redefinition may fail to apply to those calls that are made from C. - While the problem is similar the scope is very different. > It should be similar to putting the code into a let block with all > defined functions bound in the block, then setting the global > definitions to the locally defined versions, except for any variations > in forms with semantics that depend on whether they appear at > top-level or in a lexical scope. IIUC the current native-compiler will actually leave those locally-defined functions in their byte-code form :-( IOW, there are lower-hanging fruits to pick first. > It might be interesting to extend the language with a form that > makes the unsafe optimizations safe with respect to the compilation unit. Yes, in the context of Scheme I think this is called "sealing". Stefan ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-03 18:15 ` Stefan Monnier @ 2022-06-04 2:43 ` Lynn Winebarger 2022-06-04 14:32 ` Stefan Monnier 2022-06-08 6:46 ` Andrea Corallo 0 siblings, 2 replies; 46+ messages in thread From: Lynn Winebarger @ 2022-06-04 2:43 UTC (permalink / raw) To: Stefan Monnier; +Cc: Andrea Corallo, emacs-devel [-- Attachment #1: Type: text/plain, Size: 5094 bytes --] On Fri, Jun 3, 2022 at 2:15 PM Stefan Monnier <monnier@iro.umontreal.ca> wrote: > > There was a thread in January starting at > > https://lists.gnu.org/archive/html/emacs-devel/2022-01/msg01005.html > that > > gets at one scenario. At least in pre-10 versions in my experience, > > Windows has not dealt well with large numbers of files in a single > > directory, at least if it's on a network drive. > > Hmm... I count a bit over 6K ELisp files in Emacs + (Non)GNU ELPA, so > the ELN cache should presumably not go much past 10K files. > > Performance issues with read access to directories containing less than > 10K files seems like something that was solved last century, so > I wouldn't worry very much about it. > > Per my response to Eli, I see (network) directories become almost unusable somewhere around 1000 files, but it seems that's a consequence of the network and/or security configuration. > [ But that doesn't mean we shouldn't try to compile several ELisp files > into a single ELN file, especially since the size of ELN files seems > to be proportionally larger for small ELisp files than for large > ones. ] > Since I learned of the native compiler in 28.1, I decided to try it out and also "throw the spaghetti at the wall" with a bunch of packages that provide features similar to those found in more "modern" IDEs. In terms of startup time, the normal package system does not deal well with hundreds of directories on the load path, regardless of AOR native compilation, so I'm tranforming the packages to install in the version-specific load path, and compiling that ahead of time. At least for the ones amenable to such treatment. Given I'm compiling all the files AOT for use in a common installation (this is on Linux, not Windows), the natural question for me is whether larger compilation units would be more efficient, particularly at startup. Would there be advantages comparable to including packages in the dump file, for example? I posed the question to the list mostly to see if the approach (or similar) had already been tested for viability or effectiveness, so I can avoid unnecessary experimentation if the answer is already well-understood. > > Aside from explicit interprocedural optimization, is it possible > libgccjit > > would lay out the code in a more optimal way in terms of memory locality? > > Could be, but I doubt it because I don't think GCC gets enough info to > make such a decision. For lazily-compiled ELN files I could imagine > collecting some amount of profiling info to generate better code, but > our code generation is definitely not that sophisticated. I don't know enough about modern library loading to know whether you'd expect N distinct but interdependent dynamic libraries to be loaded in as compact a memory region as a single dynamic library formed from the same underlying object code. > > If the only concern for semantic safety with -O3 is the redefinability of > > all symbols, that's already the case for emacs lisp primitives > implemented > > in C. > > Not really: > - Most ELisp primitives implemented in C can be redefined just fine. > The problem is about *calls* to those primitives, where the > redefinition may fail to apply to those calls that are made from C. > - While the problem is similar the scope is very different. > From Andrea's description, this would be the primary "unsafe" aspect of intraprocedural optimizations applied to one of these aggregated compilation units. That is, that the semantics of redefining function symbols would not apply to points in the code at which the compiler had made optimizations based on assuming the function definitions were constants. It's not clear to me whether those points are limited to call sites or not. > > It should be similar to putting the code into a let block with all > > defined functions bound in the block, then setting the global > > definitions to the locally defined versions, except for any variations > > in forms with semantics that depend on whether they appear at > > top-level or in a lexical scope. > > IIUC the current native-compiler will actually leave those > locally-defined functions in their byte-code form :-( > That's not what I understood from https://akrl.sdf.org/gccemacs.html#org0f21a5b As you deduce below, I come from a Scheme background - cl-flet is the form I should have referenced, not let. > > IOW, there are lower-hanging fruits to pick first. > This is mainly of interest if a simple transformation of the sort I originally suggested can provide benefits in either reducing startup time for large sets of preloaded packages, or by enabling additional optimizations. Primarily the former for me, but the latter would be interesting. It seems more straightforward than trying to link the eln files into larger units after compilation. > > It might be interesting to extend the language with a form that > > makes the unsafe optimizations safe with respect to the compilation unit. > > Yes, in the context of Scheme I think this is called "sealing". > > > Stefan > No [-- Attachment #2: Type: text/html, Size: 7505 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-04 2:43 ` Lynn Winebarger @ 2022-06-04 14:32 ` Stefan Monnier 2022-06-05 12:16 ` Lynn Winebarger 2022-06-08 6:56 ` Andrea Corallo 2022-06-08 6:46 ` Andrea Corallo 1 sibling, 2 replies; 46+ messages in thread From: Stefan Monnier @ 2022-06-04 14:32 UTC (permalink / raw) To: Lynn Winebarger; +Cc: Andrea Corallo, emacs-devel >> Performance issues with read access to directories containing less than >> 10K files seems like something that was solved last century, so >> I wouldn't worry very much about it. > Per my response to Eli, I see (network) directories become almost unusable > somewhere around 1000 files, I don't doubt there are still (in the current century) cases where largish directories get slow, but what I meant is that it's now considered as a problem that should be solved by making those directories fast rather than by avoiding making them so large. >> [ But that doesn't mean we shouldn't try to compile several ELisp files >> into a single ELN file, especially since the size of ELN files seems >> to be proportionally larger for small ELisp files than for large >> ones. ] > > Since I learned of the native compiler in 28.1, I decided to try it out and > also "throw the spaghetti at the wall" with a bunch of packages that > provide features similar to those found in more "modern" IDEs. In terms of > startup time, the normal package system does not deal well with hundreds of > directories on the load path, regardless of AOR native compilation, so I'm > tranforming the packages to install in the version-specific load path, and > compiling that ahead of time. At least for the ones amenable to such > treatment. There are two load-paths at play (`load-path` and `native-comp-eln-load-path`) and I'm not sure which one you're taking about. OT1H `native-comp-eln-load-path` should not grow with the number of packages so it typically contains exactly 2 entries, and definitely not hundreds. OTOH `load-path` is unrelated to native compilation. I also don't understand what you mean by "version-specific load path". Also, what kind of startup time are you talking about? E.g., are you using `package-quickstart`? > Given I'm compiling all the files AOT for use in a common installation > (this is on Linux, not Windows), the natural question for me is whether > larger compilation units would be more efficient, particularly at startup. It all depends where the slowdown comes from :-) E.g. `package-quickstart` follows a similar idea to the one you propose by collecting all the `<pkg>-autoloads.el` into one bug file, which saves us from having to load separately all those little files. It also saves us from having to look for them through those hundreds of directories. I suspect a long `load-path` can itself be a source of slow down especially during startup, but I haven't bumped into that yet. There are ways we could speed it up, if needed: - create "meta packages" (or just one containing all your packages), which would bring together in a single directory the files of several packages (and presumably also bring together their `<pkg>-autoloads.el` into a larger combined one). Under GNU/Linux we could have this metapackage be made of symlinks, making it fairly efficient an non-obtrusive (e.g. `C-h o` could still get you to the actual file rather than its metapackage-copy). - Manage a cache of where are our ELisp files (i.e. a hash table mapping relative ELisp file names to the absolute file name returned by looking for them in `load-path`). This way we can usually avoid scanning those hundred directories to find the .elc file we need, and go straight to it. > I posed the question to the list mostly to see if the approach (or similar) > had already been tested for viability or effectiveness, so I can avoid > unnecessary experimentation if the answer is already well-understood. I don't think it has been tried, no. > I don't know enough about modern library loading to know whether you'd > expect N distinct but interdependent dynamic libraries to be loaded in as > compact a memory region as a single dynamic library formed from the same > underlying object code. I think you're right here, but I'd expect the effect to be fairly small except when the .elc/.eln files are themselves small. > It's not clear to me whether those points are limited to call > sites or not. I believe it is: the optimization is to replace a call via `Ffuncall` to a "symbol" (which looks up the value stored in the `symbol-function` cell), with a direct call to the actual C function contained in the "subr" object itself (expected to be) contained in the `symbol-function` cell. Andrea would know if there are other semantic-non-preserving optimizations in the level 3 of the optimizations, but IIUC this is very much the main one. >> IIUC the current native-compiler will actually leave those >> locally-defined functions in their byte-code form :-( > That's not what I understood from > https://akrl.sdf.org/gccemacs.html#org0f21a5b > As you deduce below, I come from a Scheme background - cl-flet is the form > I should have referenced, not let. Indeed you're right that those functions can be native compiled, tho only if they're closed (i.e. if they don't refer to surrounding lexical variables). [ I always forget that little detail :-( ] > It seems more straightforward than trying to link the eln > files into larger units after compilation. That seems like too much trouble, indeed. Stefan ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-04 14:32 ` Stefan Monnier @ 2022-06-05 12:16 ` Lynn Winebarger 2022-06-05 14:08 ` Lynn Winebarger 2022-06-05 14:20 ` Stefan Monnier 2022-06-08 6:56 ` Andrea Corallo 1 sibling, 2 replies; 46+ messages in thread From: Lynn Winebarger @ 2022-06-05 12:16 UTC (permalink / raw) To: Stefan Monnier; +Cc: Andrea Corallo, emacs-devel [-- Attachment #1: Type: text/plain, Size: 9912 bytes --] On Sat, Jun 4, 2022, 10:32 AM Stefan Monnier <monnier@iro.umontreal.ca> wrote: > >> Performance issues with read access to directories containing less than > >> 10K files seems like something that was solved last century, so > >> I wouldn't worry very much about it. > > Per my response to Eli, I see (network) directories become almost > unusable > > somewhere around 1000 files, > > I don't doubt there are still (in the current century) cases where > largish directories get slow, but what I meant is that it's now > considered as a problem that should be solved by making those > directories fast rather than by avoiding making them so large. > Unfortunately sometimes we have to cope with environment we use. And for all I know some of the performance penalties may be inherent in the (security related) infrastructure requirements in a highly regulated industry. Not that that should be a primary concern for the development team, but it is something a local packager might be stuck with. > >> [ But that doesn't mean we shouldn't try to compile several ELisp files > >> into a single ELN file, especially since the size of ELN files seems > >> to be proportionally larger for small ELisp files than for large > >> ones. ] > > > > Since I learned of the native compiler in 28.1, I decided to try it out > and > > also "throw the spaghetti at the wall" with a bunch of packages that > > provide features similar to those found in more "modern" IDEs. In terms > of > > startup time, the normal package system does not deal well with hundreds > of > > directories on the load path, regardless of AOR native compilation, so > I'm > > tranforming the packages to install in the version-specific load path, > and > > compiling that ahead of time. At least for the ones amenable to such > > treatment. > > There are two load-paths at play (`load-path` and > `native-comp-eln-load-path`) and I'm not sure which one you're taking > about. OT1H `native-comp-eln-load-path` should not grow with the number > of packages so it typically contains exactly 2 entries, and definitely > not hundreds. OTOH `load-path` is unrelated to native compilation. > Not entirely - as I understand it, the load system first finds the source file and computers a hash before determining if there is an ELN file corresponding to it. Although I do wonder if there is some optimization for ELN files in the system directory as opposed to the user's cache. I have one build where I native compiled (but not byte compiled) all the el files in the lisp directory, and another where I byte compiled and then native compiled the same set of files. In both cases I used the flag to batch-native-compile to put the ELN file in the system cache. In the first case a number of files failed to compile, and in the second, they all compiled. I've also observed another situation where a file will only (bye or native) compile if one of its required files has been byte compiled ahead of time - but only native compiling that dependency resulted in the same behavior as not compiling it at all. I planned to send a separate mail to the list asking whether it was intended behavior once I had reduced it to a simple case, or if it should be submitted as a bug. In any case, I noticed that the "browse customization groups" buffer is noticeable faster in the second case. I need to try it again to confirm that it wasn't just waiting on the relevant source files to compile in the first case. I also don't understand what you mean by "version-specific load path". > In the usual unix installation, there will be a "site-lisp" one directory above the version specific installation directory, and another site-lisp in the version-specific installation directory. I'm referring to installing the source (ultimately) in ..../emacs/28.1/site-lisp. During the build it's just in the site-lisp subdirectory of the source root path. > Also, what kind of startup time are you talking about? > E.g., are you using `package-quickstart`? > That was the first alternative I tried. With 1250 packages, it did not work. First, the file consisted of a series of "let" forms corresponding to the package directories, and apparently the autoload forms are ignored if they appear anywhere below top-level. At least I got a number of warnings to that effect. The other problem was that I got a "bytecode overflow error". I only got the first error after chopping off the file approximately after the first 10k lines. Oddly enough, when I put all the files in the site-lisp directory, and collect all the autoloads for that directory in a single file, it has no problem with the 80k line file that results. > > Given I'm compiling all the files AOT for use in a common installation > > (this is on Linux, not Windows), the natural question for me is whether > > larger compilation units would be more efficient, particularly at > startup. > > It all depends where the slowdown comes from :-) > > E.g. `package-quickstart` follows a similar idea to the one you propose > by collecting all the `<pkg>-autoloads.el` into one bug file, which > saves us from having to load separately all those little files. It also > saves us from having to look for them through those hundreds > of directories. > > I suspect a long `load-path` can itself be a source of slow down > especially during startup, but I haven't bumped into that yet. > There are ways we could speed it up, if needed: > > - create "meta packages" (or just one containing all your packages), > which would bring together in a single directory the files of several > packages (and presumably also bring together their > `<pkg>-autoloads.el` into a larger combined one). Under GNU/Linux we > could have this metapackage be made of symlinks, making it fairly > efficient an non-obtrusive (e.g. `C-h o` could still get you to the > actual file rather than its metapackage-copy). > - Manage a cache of where are our ELisp files (i.e. a hash table > mapping relative ELisp file names to the absolute file name returned > by looking for them in `load-path`). This way we can usually avoid > scanning those hundred directories to find the .elc file we need, and > go straight to it. > I'm pretty sure the load-path is an issue with 1250 packages, even if half of them consist of single files. Since I'm preparing this for a custom installation that will be accessible for multiple users, I decided to try putting everything in site-lisp and native compile everything AOT. Most of the other potential users are not experienced Unix users, which is why I'm trying to make everything work smoothly up front and have features they would find familiar from other editors. One issue with this approach is that the package selection mechanism doesn't recognize the modules as being installed, or provide any assistance in selectively activating modules. Other places where there is a noticeable slowdown with large numbers of packages: * Browsing customization groups - just unfolding a single group can take minutes (this is on fast server hardware with a lot of free memory) * Browsing custom themes with many theme packages installed I haven't gotten to the point that I can test the same situation by explicitly loading the same modules from the site-lisp directory that had been activated as packages. Installing the themes in the system directory does skip the "suspicious files" check that occurs when loading them from the user configuration. > > I posed the question to the list mostly to see if the approach (or > similar) > > had already been tested for viability or effectiveness, so I can avoid > > unnecessary experimentation if the answer is already well-understood. > > I don't think it has been tried, no. > > > I don't know enough about modern library loading to know whether you'd > > expect N distinct but interdependent dynamic libraries to be loaded in as > > compact a memory region as a single dynamic library formed from the same > > underlying object code. > > I think you're right here, but I'd expect the effect to be fairly small > except when the .elc/.eln files are themselves small. > There are a lot of packages that have fairly small source files, just because they've factored their code the same way it would be in languages where the shared libraries are not in 1-1 correspondence with source files. > > > It's not clear to me whether those points are limited to call > > sites or not. > > I believe it is: the optimization is to replace a call via `Ffuncall` to > a "symbol" (which looks up the value stored in the `symbol-function` > cell), with a direct call to the actual C function contained in the > "subr" object itself (expected to be) contained in the > `symbol-function` cell. > > Andrea would know if there are other semantic-non-preserving > optimizations in the level 3 of the optimizations, but IIUC this is very > much the main one. > > >> IIUC the current native-compiler will actually leave those > >> locally-defined functions in their byte-code form :-( > > That's not what I understood from > > https://akrl.sdf.org/gccemacs.html#org0f21a5b > > As you deduce below, I come from a Scheme background - cl-flet is the > form > > I should have referenced, not let. > > Indeed you're right that those functions can be native compiled, tho only > if > they're closed (i.e. if they don't refer to surrounding lexical > variables). > [ I always forget that little detail :-( ] > I would expect this would apply to most top-level defuns in elisp packages/modules. From my cursory review, it looks like the ability to redefine these defuns is mostly useful when developing the packages themselves, and "sealing" them for use would be appropriate. I'm not clear on whether this optimization is limited to the case of calling functions defined in the compilation unit, or applied more broadly. Thanks, Lynn > [-- Attachment #2: Type: text/html, Size: 13343 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-05 12:16 ` Lynn Winebarger @ 2022-06-05 14:08 ` Lynn Winebarger 2022-06-05 14:46 ` Stefan Monnier 2022-06-05 14:20 ` Stefan Monnier 1 sibling, 1 reply; 46+ messages in thread From: Lynn Winebarger @ 2022-06-05 14:08 UTC (permalink / raw) To: Stefan Monnier; +Cc: Andrea Corallo, emacs-devel [-- Attachment #1: Type: text/plain, Size: 13967 bytes --] On Sun, Jun 5, 2022 at 8:16 AM Lynn Winebarger <owinebar@gmail.com> wrote: > On Sat, Jun 4, 2022, 10:32 AM Stefan Monnier <monnier@iro.umontreal.ca> > wrote: > >> >> [ But that doesn't mean we shouldn't try to compile several ELisp files >> > >> into a single ELN file, especially since the size of ELN files seems >> >> to be proportionally larger for small ELisp files than for large >> >> ones. ] >> > >> > Since I learned of the native compiler in 28.1, I decided to try it out >> and >> > also "throw the spaghetti at the wall" with a bunch of packages that >> > provide features similar to those found in more "modern" IDEs. In >> terms of >> > startup time, the normal package system does not deal well with >> hundreds of >> > directories on the load path, regardless of AOR native compilation, so >> I'm >> > tranforming the packages to install in the version-specific load path, >> and >> > compiling that ahead of time. At least for the ones amenable to such >> > treatment. >> >> There are two load-paths at play (`load-path` and >> `native-comp-eln-load-path`) and I'm not sure which one you're taking >> about. OT1H `native-comp-eln-load-path` should not grow with the number >> of packages so it typically contains exactly 2 entries, and definitely >> not hundreds. OTOH `load-path` is unrelated to native compilation. >> > > Not entirely - as I understand it, the load system first finds the source > file and computers a hash before determining if there is an ELN file > corresponding to it. > Although I do wonder if there is some optimization for ELN files in the > system directory as opposed to the user's cache. I have one build where I > native compiled (but not byte compiled) all the el files in the lisp > directory, and another where I byte compiled and then native compiled the > same set of files. In both cases I used the flag to batch-native-compile > to put the ELN file in the system cache. In the first case a number of > files failed to compile, and in the second, they all compiled. I've also > observed another situation where a file will only (bye or native) compile > if one of its required files has been byte compiled ahead of time - but > only native compiling that dependency resulted in the same behavior as not > compiling it at all. I planned to send a separate mail to the list asking > whether it was intended behavior once I had reduced it to a simple case, or > if it should be submitted as a bug. > Unrelated, but the one type of file I don't seem to be able to produce AOT (because I have no way to specify them) in the system directory are the subr/trampoline files. Any hints on how to make those AOT in the system directory? > >> Also, what kind of startup time are you talking about? >> E.g., are you using `package-quickstart`? >> > That was the first alternative I tried. With 1250 packages, it did not > work. First, the file consisted of a series of "let" forms corresponding > to the package directories, and apparently the autoload forms are ignored > if they appear anywhere below top-level. At least I got a number of > warnings to that effect. > The other problem was that I got a "bytecode overflow error". I only got > the first error after chopping off the file approximately after the first > 10k lines. Oddly enough, when I put all the files in the site-lisp > directory, and collect all the autoloads for that directory in a single > file, it has no problem with the 80k line file that results. > >> >> Also, I should have responded to the first question - "minutes" on recent server-grade hardware with 24 cores and >100GB of RAM. That was with 1193 enabled packages in my .emacs file. On Sun, Jun 5, 2022 at 8:16 AM Lynn Winebarger <owinebar@gmail.com> wrote: > On Sat, Jun 4, 2022, 10:32 AM Stefan Monnier <monnier@iro.umontreal.ca> > wrote: > >> >> Performance issues with read access to directories containing less than >> >> 10K files seems like something that was solved last century, so >> >> I wouldn't worry very much about it. >> > Per my response to Eli, I see (network) directories become almost >> unusable >> > somewhere around 1000 files, >> >> I don't doubt there are still (in the current century) cases where >> largish directories get slow, but what I meant is that it's now >> considered as a problem that should be solved by making those >> directories fast rather than by avoiding making them so large. >> > Unfortunately sometimes we have to cope with environment we use. And for > all I know some of the performance penalties may be inherent in the > (security related) infrastructure requirements in a highly regulated > industry. > Not that that should be a primary concern for the development team, but it > is something a local packager might be stuck with. > > >> >> [ But that doesn't mean we shouldn't try to compile several ELisp files >> >> into a single ELN file, especially since the size of ELN files seems >> >> to be proportionally larger for small ELisp files than for large >> >> ones. ] >> > >> > Since I learned of the native compiler in 28.1, I decided to try it out >> and >> > also "throw the spaghetti at the wall" with a bunch of packages that >> > provide features similar to those found in more "modern" IDEs. In >> terms of >> > startup time, the normal package system does not deal well with >> hundreds of >> > directories on the load path, regardless of AOR native compilation, so >> I'm >> > tranforming the packages to install in the version-specific load path, >> and >> > compiling that ahead of time. At least for the ones amenable to such >> > treatment. >> >> There are two load-paths at play (`load-path` and >> `native-comp-eln-load-path`) and I'm not sure which one you're taking >> about. OT1H `native-comp-eln-load-path` should not grow with the number >> of packages so it typically contains exactly 2 entries, and definitely >> not hundreds. OTOH `load-path` is unrelated to native compilation. >> > > Not entirely - as I understand it, the load system first finds the source > file and computers a hash before determining if there is an ELN file > corresponding to it. > Although I do wonder if there is some optimization for ELN files in the > system directory as opposed to the user's cache. I have one build where I > native compiled (but not byte compiled) all the el files in the lisp > directory, and another where I byte compiled and then native compiled the > same set of files. In both cases I used the flag to batch-native-compile > to put the ELN file in the system cache. In the first case a number of > files failed to compile, and in the second, they all compiled. I've also > observed another situation where a file will only (bye or native) compile > if one of its required files has been byte compiled ahead of time - but > only native compiling that dependency resulted in the same behavior as not > compiling it at all. I planned to send a separate mail to the list asking > whether it was intended behavior once I had reduced it to a simple case, or > if it should be submitted as a bug. > In any case, I noticed that the "browse customization groups" buffer is > noticeable faster in the second case. I need to try it again to confirm > that it wasn't just waiting on the relevant source files to compile in the > first case. > > I also don't understand what you mean by "version-specific load path". >> > In the usual unix installation, there will be a "site-lisp" one directory > above the version specific installation directory, and another site-lisp in > the version-specific installation directory. I'm referring to installing > the source (ultimately) in ..../emacs/28.1/site-lisp. During the build > it's just in the site-lisp subdirectory of the source root path. > > >> Also, what kind of startup time are you talking about? >> E.g., are you using `package-quickstart`? >> > That was the first alternative I tried. With 1250 packages, it did not > work. First, the file consisted of a series of "let" forms corresponding > to the package directories, and apparently the autoload forms are ignored > if they appear anywhere below top-level. At least I got a number of > warnings to that effect. > The other problem was that I got a "bytecode overflow error". I only got > the first error after chopping off the file approximately after the first > 10k lines. Oddly enough, when I put all the files in the site-lisp > directory, and collect all the autoloads for that directory in a single > file, it has no problem with the 80k line file that results. > > >> > Given I'm compiling all the files AOT for use in a common installation >> > (this is on Linux, not Windows), the natural question for me is whether >> > larger compilation units would be more efficient, particularly at >> startup. >> >> It all depends where the slowdown comes from :-) >> >> E.g. `package-quickstart` follows a similar idea to the one you propose >> by collecting all the `<pkg>-autoloads.el` into one bug file, which >> saves us from having to load separately all those little files. It also >> saves us from having to look for them through those hundreds >> of directories. >> >> I suspect a long `load-path` can itself be a source of slow down >> especially during startup, but I haven't bumped into that yet. >> There are ways we could speed it up, if needed: >> >> - create "meta packages" (or just one containing all your packages), >> which would bring together in a single directory the files of several >> packages (and presumably also bring together their >> `<pkg>-autoloads.el` into a larger combined one). Under GNU/Linux we >> could have this metapackage be made of symlinks, making it fairly >> efficient an non-obtrusive (e.g. `C-h o` could still get you to the >> actual file rather than its metapackage-copy). >> - Manage a cache of where are our ELisp files (i.e. a hash table >> mapping relative ELisp file names to the absolute file name returned >> by looking for them in `load-path`). This way we can usually avoid >> scanning those hundred directories to find the .elc file we need, and >> go straight to it. >> > I'm pretty sure the load-path is an issue with 1250 packages, even if half > of them consist of single files. > > Since I'm preparing this for a custom installation that will be accessible > for multiple users, I decided to try putting everything in site-lisp and > native compile everything AOT. Most of the other potential users are not > experienced Unix users, which is why I'm trying to make everything work > smoothly up front and have features they would find familiar from other > editors. > > One issue with this approach is that the package selection mechanism > doesn't recognize the modules as being installed, or provide any assistance > in selectively activating modules. > > Other places where there is a noticeable slowdown with large numbers of > packages: > * Browsing customization groups - just unfolding a single group can take > minutes (this is on fast server hardware with a lot of free memory) > * Browsing custom themes with many theme packages installed > I haven't gotten to the point that I can test the same situation by > explicitly loading the same modules from the site-lisp directory that had > been activated as packages. Installing the themes in the system directory > does skip the "suspicious files" check that occurs when loading them from > the user configuration. > > >> > I posed the question to the list mostly to see if the approach (or >> similar) >> > had already been tested for viability or effectiveness, so I can avoid >> > unnecessary experimentation if the answer is already well-understood. >> >> I don't think it has been tried, no. >> >> > I don't know enough about modern library loading to know whether you'd >> > expect N distinct but interdependent dynamic libraries to be loaded in >> as >> > compact a memory region as a single dynamic library formed from the same >> > underlying object code. >> >> I think you're right here, but I'd expect the effect to be fairly small >> except when the .elc/.eln files are themselves small. >> > > There are a lot of packages that have fairly small source files, just > because they've factored their code the same way it would be in languages > where the shared libraries are not in 1-1 correspondence with source files. > >> >> > It's not clear to me whether those points are limited to call >> > sites or not. >> >> I believe it is: the optimization is to replace a call via `Ffuncall` to >> a "symbol" (which looks up the value stored in the `symbol-function` >> cell), with a direct call to the actual C function contained in the >> "subr" object itself (expected to be) contained in the >> `symbol-function` cell. >> >> Andrea would know if there are other semantic-non-preserving >> optimizations in the level 3 of the optimizations, but IIUC this is very >> much the main one. >> >> >> IIUC the current native-compiler will actually leave those >> >> locally-defined functions in their byte-code form :-( >> > That's not what I understood from >> > https://akrl.sdf.org/gccemacs.html#org0f21a5b >> > As you deduce below, I come from a Scheme background - cl-flet is the >> form >> > I should have referenced, not let. >> >> Indeed you're right that those functions can be native compiled, tho only >> if >> they're closed (i.e. if they don't refer to surrounding lexical >> variables). >> [ I always forget that little detail :-( ] >> > > I would expect this would apply to most top-level defuns in elisp > packages/modules. From my cursory review, it looks like the ability to > redefine these defuns is mostly useful when developing the packages > themselves, and "sealing" them for use would be appropriate. > I'm not clear on whether this optimization is limited to the case of > calling functions defined in the compilation unit, or applied more broadly. > > Thanks, > Lynn > > >> [-- Attachment #2: Type: text/html, Size: 19291 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-05 14:08 ` Lynn Winebarger @ 2022-06-05 14:46 ` Stefan Monnier 0 siblings, 0 replies; 46+ messages in thread From: Stefan Monnier @ 2022-06-05 14:46 UTC (permalink / raw) To: Lynn Winebarger; +Cc: Andrea Corallo, emacs-devel > Unrelated, but the one type of file I don't seem to be able to produce AOT > (because I have no way to specify them) in the system directory are the > subr/trampoline files. Any hints on how to make those AOT in the system > directory? [ No idea, sorry. ] > Also, I should have responded to the first question - "minutes" on recent > server-grade hardware with 24 cores and >100GB of RAM. That was with 1193 > enabled packages in my .emacs file. And those minutes are all spent in `package-activate-all` or are they spent in other parts of the init file? [ Also, in my experience several packages are poorly behaved in the sense that they presume that if you install them you will probably use them in all Emacs sessions so they eagerly load/execute a lot of code during startup (some even enable themselves unconditionally). In those cases `package-quickstart` doesn't help very much. ] Stefan ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-05 12:16 ` Lynn Winebarger 2022-06-05 14:08 ` Lynn Winebarger @ 2022-06-05 14:20 ` Stefan Monnier 2022-06-06 4:12 ` Lynn Winebarger 2022-06-14 4:19 ` Lynn Winebarger 1 sibling, 2 replies; 46+ messages in thread From: Stefan Monnier @ 2022-06-05 14:20 UTC (permalink / raw) To: Lynn Winebarger; +Cc: Andrea Corallo, emacs-devel > Unfortunately sometimes we have to cope with environment we use. And for > all I know some of the performance penalties may be inherent in the > (security related) infrastructure requirements in a highly regulated > industry. What we learned at the end of last century is exactly that there aren't any such *inherent* performance penalties. It may take extra coding work in the file-system to make it fast with 10k entries. It may take yet more work to make it fast with 10G entries. But it can be done (and has been done), and compared to the overall complexity of current kernels, it's a drop in the bucket. So nowadays if it's slow with 10k entries you should treat it as a bug (could be a configuration problem, or some crap software (anti-virus?) getting in the way, or ...). > Not that that should be a primary concern for the development team, but it > is something a local packager might be stuck with. Indeed. Especially if it only affects a few rare Emacs users which don't have much leverage with the MS-certified sysadmins. >> >> [ But that doesn't mean we shouldn't try to compile several ELisp files >> >> into a single ELN file, especially since the size of ELN files seems >> >> to be proportionally larger for small ELisp files than for large >> >> ones. ] >> > >> > Since I learned of the native compiler in 28.1, I decided to try it out >> and >> > also "throw the spaghetti at the wall" with a bunch of packages that >> > provide features similar to those found in more "modern" IDEs. In terms >> of >> > startup time, the normal package system does not deal well with hundreds >> of >> > directories on the load path, regardless of AOR native compilation, so >> I'm >> > tranforming the packages to install in the version-specific load path, >> and >> > compiling that ahead of time. At least for the ones amenable to such >> > treatment. >> >> There are two load-paths at play (`load-path` and >> `native-comp-eln-load-path`) and I'm not sure which one you're taking >> about. OT1H `native-comp-eln-load-path` should not grow with the number >> of packages so it typically contains exactly 2 entries, and definitely >> not hundreds. OTOH `load-path` is unrelated to native compilation. >> > > Not entirely - as I understand it, the load system first finds the source > file and computers a hash before determining if there is an ELN file > corresponding to it. `load-path` is used for native-compiled files, yes. But it's used in exactly the same way (and should hence cost the same) for: - No native compilation - AOT native compilation - lazy native compilation Which is what I meant by "unrelated to native compilation". > Although I do wonder if there is some optimization for ELN files in the > system directory as opposed to the user's cache. I have one build where I > native compiled (but not byte compiled) all the el files in the lisp > directory, IIUC current code only loads an ELN file if there is a corresponding ELC file, so natively compiling a file without also byte-compiling it is definitely not part of the expected situation. Buyer beware. >> I also don't understand what you mean by "version-specific load path". > In the usual unix installation, there will be a "site-lisp" one directory > above the version specific installation directory, and another site-lisp in > the version-specific installation directory. I'm referring to installing > the source (ultimately) in ..../emacs/28.1/site-lisp. During the build > it's just in the site-lisp subdirectory of the source root path. I'm not following you. Are you talking about compiling third-party packages during the compilation of Emacs itself by placing them into a `site-lisp` subdirectory inside Emacs's own source code tree, and then moving the resulting `.el` and `.elc` files to the `../NN.MM/site-lisp` subdirectory in Emacs's installation target directory? And you're saying that whether you place them in `../NN.MM/site-lisp` rather than in `../site-lisp` makes a significant performance difference? >> Also, what kind of startup time are you talking about? >> E.g., are you using `package-quickstart`? > That was the first alternative I tried. With 1250 packages, it did not > work. Please `M-x report-emacs-bug` (and put me in `X-Debbugs-Cc`). > First, the file consisted of a series of "let" forms corresponding > to the package directories, and apparently the autoload forms are ignored > if they appear anywhere below top-level. At least I got a number of > warnings to that effect. > The other problem was that I got a "bytecode overflow error". I only got > the first error after chopping off the file approximately after the first > 10k lines. Oddly enough, when I put all the files in the site-lisp > directory, and collect all the autoloads for that directory in a single > file, it has no problem with the 80k line file that results. We need to fix those problems. Please try and give as much detail as possible in your bug report so we can try and reproduce it on our end (both for the warnings about non-top-level forms and for the bytecode overflow). > I'm pretty sure the load-path is an issue with 1250 packages, even if half > of them consist of single files. I'm afraid so, indeed. > One issue with this approach is that the package selection mechanism > doesn't recognize the modules as being installed, or provide any assistance > in selectively activating modules. Indeed, since the selective activation relies crucially on the `load-path` for that. > Other places where there is a noticeable slowdown with large numbers of > packages: > * Browsing customization groups - just unfolding a single group can take > minutes (this is on fast server hardware with a lot of free memory) Hmm... can't think of why that would be. You might want to make a separate bug-report for that. > * Browsing custom themes with many theme packages installed > I haven't gotten to the point that I can test the same situation by > explicitly loading the same modules from the site-lisp directory that had > been activated as packages. Installing the themes in the system directory > does skip the "suspicious files" check that occurs when loading them from > the user configuration. Same here. I'm not very familiar with the custom-theme code, but it does seem "unrelated" in the sense that I don't think fixing some of the other problems you've encountered will fix this one. >> I think you're right here, but I'd expect the effect to be fairly small >> except when the .elc/.eln files are themselves small. > There are a lot of packages that have fairly small source files, just > because they've factored their code the same way it would be in languages > where the shared libraries are not in 1-1 correspondence with source files. Oh, indeed, small source files are quite common. > I would expect this would apply to most top-level defuns in elisp > packages/modules. From my cursory review, it looks like the ability to > redefine these defuns is mostly useful when developing the packages > themselves, and "sealing" them for use would be appropriate. Advice are not used very often, but it's very hard to predict on which function(s) they may end up being needed, and sealing would make advice ineffective. I would personally recommend to just stay away from the level 3 of the native compiler's optimization. Or at least, only use it in targeted ways, i.e. only at the very rare few spots where you've clearly found it to have a noticeable performance benefit. In lower levels of optimization, those same calls are still optimized but just less aggressively, which basically means they turn into: if (<symbol unchanged) <call the C function directly>; else <use the old slow but correct code path>; Stefan ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-05 14:20 ` Stefan Monnier @ 2022-06-06 4:12 ` Lynn Winebarger 2022-06-06 6:12 ` Stefan Monnier 2022-06-14 4:19 ` Lynn Winebarger 1 sibling, 1 reply; 46+ messages in thread From: Lynn Winebarger @ 2022-06-06 4:12 UTC (permalink / raw) To: Stefan Monnier; +Cc: Andrea Corallo, emacs-devel [-- Attachment #1: Type: text/plain, Size: 7535 bytes --] On Sun, Jun 5, 2022, 10:20 AM Stefan Monnier <monnier@iro.umontreal.ca> wrote: > > >> >> [ But that doesn't mean we shouldn't try to compile several ELisp > files > >> >> into a single ELN file, especially since the size of ELN files > seems > >> >> to be proportionally larger for small ELisp files than for large > >> >> ones. ] > >> > > Not sure if these general statistics are of much use, but of 4324 source files successfully compiled (1557 from the lisp directory), with a total size of 318MB, including 13 trampolines, The smallest 450 are 17632 bytes or less, with the trampolines at 16744 bytes, total of 7.4M The smallest 1000 are under 25700 bytes, totaling 20M The smallest 2000 are under 38592 bytes, totaling 48M The smallest 3000 are under 62832 bytes, totaling 95M The smallest 4000 are under 188440 bytes, totaling 194M There are only 58 over 500k in size, and only 13 over 1M (max is 3.1M) Those last 58 total about 52M in size. I am curious as to why the system doesn't just produce trampolines for all the system calls AOT in a single module. `load-path` is used for native-compiled files, yes. But it's used > in exactly the same way (and should hence cost the same) for: > - No native compilation > - AOT native compilation > - lazy native compilation > Which is what I meant by "unrelated to native compilation". > True, but it does lead to a little more disappointment when that 2.5-5x speedup is dominated by the load-path length while starting up. > > Although I do wonder if there is some optimization for ELN files in the > > system directory as opposed to the user's cache. I have one build where > I > > native compiled (but not byte compiled) all the el files in the lisp > > directory, > > IIUC current code only loads an ELN file if there is a corresponding ELC > file, so natively compiling a file without also byte-compiling it is > definitely not part of the expected situation. Buyer beware. > That would explain the behavior I've seen. If that's the case, shouldn't batch-native-compile produce the byte-compiled file if it doesn't exist? I'm not following you. Are you talking about compiling third-party > packages during the compilation of Emacs itself by placing them into > a `site-lisp` subdirectory inside Emacs's own source code tree, and then > moving the resulting `.el` and `.elc` files to the `../NN.MM/site-lisp` > <http://NN.MM/site-lisp> > subdirectory in Emacs's installation target directory? > That's the way I'm doing it. Compatibility of these packages with Emacs versions varies too much for me to want to treat them as version-independent. I got burned in an early attempt where I didn't set the prefix, and emacs kept adding the /usr/share site-lisp paths even running from the build directory, and the version of auctex that is installed there is compatible with 24.3 but not 28.1, so I kept getting mysterious compile errors for the auctex packages until I realized what was going on. And you're saying that whether you place them in `../NN.MM/site-lisp` > <http://NN.MM/site-lisp> > rather than in `../site-lisp` makes a significant performance difference? > Sorry, no. I meant I'm curious if having them in the user's cache versus the system ELN cache would make any difference in start-up time, ignoring the initial async native compilation. In particular whether the checksum calculation is bypassed in one case but not the other (by keeping a permanent mapping from the system load-path to the system cache, say). other problem was that I got a "bytecode overflow error". I only got > > the first error after chopping off the file approximately after the first > > 10k lines. Oddly enough, when I put all the files in the site-lisp > > directory, and collect all the autoloads for that directory in a single > > file, it has no problem with the 80k line file that results. > > We need to fix those problems. Please try and give as much detail as > possible in your bug report so we can try and reproduce it on our end > (both for the warnings about non-top-level forms and for the bytecode > overflow). > > > I'm pretty sure the load-path is an issue with 1250 packages, even if > half > > of them consist of single files. > > I'm afraid so, indeed. > > > One issue with this approach is that the package selection mechanism > > doesn't recognize the modules as being installed, or provide any > assistance > > in selectively activating modules. > > Indeed, since the selective activation relies crucially on the > `load-path` for that. > > > Other places where there is a noticeable slowdown with large numbers of > > packages: > > * Browsing customization groups - just unfolding a single group can > take > > minutes (this is on fast server hardware with a lot of free memory) > > Hmm... can't think of why that would be. You might want to make > a separate bug-report for that. > > > * Browsing custom themes with many theme packages installed > > I haven't gotten to the point that I can test the same situation by > > explicitly loading the same modules from the site-lisp directory that had > > been activated as packages. Installing the themes in the system > directory > > does skip the "suspicious files" check that occurs when loading them from > > the user configuration. > > Same here. I'm not very familiar with the custom-theme code, but it > does seem "unrelated" in the sense that I don't think fixing some of the > other problems you've encountered will fix this one. > I agree, but there was the possiblity the compilation process (I'm assuming the byte-compile stage would do this, if it were done at all) would precompute things like customization groups for the compilation unit. Then aggregating the source of compilation units into larger libraries might be expected to significantly decrease the amount of dynamic computation currently required. I know there's no inherent link to native compilation, it's more a case of if NC makes the implementation fast enough to make these additional packages attractive, you're more likely to see the consequences of design choices made assuming the byte code interpreter would be the bottleneck, etc. > I would expect this would apply to most top-level defuns in elisp > > packages/modules. From my cursory review, it looks like the ability to > > redefine these defuns is mostly useful when developing the packages > > themselves, and "sealing" them for use would be appropriate. > > Advice are not used very often, but it's very hard to predict on which > function(s) they may end up being needed, and sealing would make advice > ineffective. I would personally recommend to just stay away from the > level 3 of the native compiler's optimization. Or at least, only use it > in targeted ways, i.e. only at the very rare few spots where you've > clearly found it to have a noticeable performance benefit. > > In lower levels of optimization, those same calls are still optimized > but just less aggressively, which basically means they turn into: > > if (<symbol unchanged) > <call the C function directly>; > else > <use the old slow but correct code path>; I'm guessing the native compiled code is making the GC's performance a more noticeable chunk of overhead. I'd really love to see something like Chromium's concurrent gc integrated into emacs. If I do any rigorous experiments to see if there's anything resembling a virtuous cycle in larger compilation units + higher intraprocedural optimizations, I'll report back. Lynn [-- Attachment #2: Type: text/html, Size: 10454 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-06 4:12 ` Lynn Winebarger @ 2022-06-06 6:12 ` Stefan Monnier 2022-06-06 10:39 ` Eli Zaretskii 2022-06-06 16:13 ` Lynn Winebarger 0 siblings, 2 replies; 46+ messages in thread From: Stefan Monnier @ 2022-06-06 6:12 UTC (permalink / raw) To: Lynn Winebarger; +Cc: Andrea Corallo, emacs-devel > Not sure if these general statistics are of much use, but of 4324 source > files successfully compiled (1557 from the lisp directory), with a total > size of 318MB, including 13 trampolines, > The smallest 450 are 17632 bytes or less, with the trampolines at 16744 > bytes, total of 7.4M > The smallest 1000 are under 25700 bytes, totaling 20M > The smallest 2000 are under 38592 bytes, totaling 48M > The smallest 3000 are under 62832 bytes, totaling 95M > The smallest 4000 are under 188440 bytes, totaling 194M > There are only 58 over 500k in size, and only 13 over 1M (max is 3.1M) > Those last 58 total about 52M in size. The way I read this, the small files don't dominate, so bundling them may still be a good idea but it's probably not going to make a big difference. > I am curious as to why the system doesn't just produce trampolines for all > the system calls AOT in a single module. Trampolines are needed for any native-compiled function which gets redefined. We could try to build them eagerly when the native-compiled function is compiled, and there could be various other ways to handle this. There's room for improvement here, but the current system works well enough for a first version. > True, but it does lead to a little more disappointment when that 2.5-5x > speedup is dominated by the load-path length while starting up. I don't know where you got that 2.5-5x expectation, but native compilation will often result in "no speed up at all". > That would explain the behavior I've seen. If that's the case, shouldn't > batch-native-compile produce the byte-compiled file if it doesn't exist? Sounds about right, tho maybe there's a good reason for the current behavior, I don't know. Maybe you should `M-x report-emacs-bug`. > Sorry, no. I meant I'm curious if having them in the user's cache versus > the system ELN cache would make any difference in start-up time, ignoring > the initial async native compilation. In particular whether the checksum > calculation is bypassed in one case but not the other (by keeping a > permanent mapping from the system load-path to the system cache, say). No, I don't think it should make any difference in this respect. > I'm guessing the native compiled code is making the GC's performance a more > noticeable chunk of overhead. Indeed, the GC is the same and the native compiler does not make many efforts to reduce memory allocations, so fraction of time spent in GC tends to increase. > I'd really love to see something like Chromium's concurrent gc > integrated into Emacs. Our GC is in serious need of improvement, yes. Bolting some existing GC onto Emacs won't be easy, tho. > If I do any rigorous experiments to see if there's anything resembling a > virtuous cycle in larger compilation units + higher intraprocedural > optimizations, I'll report back. Looking forward to it, thanks, I'd be interested as well in seeing a `profile-report` output covering your minute-long startup. Stefan ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-06 6:12 ` Stefan Monnier @ 2022-06-06 10:39 ` Eli Zaretskii 2022-06-06 16:23 ` Lynn Winebarger 2022-06-06 16:13 ` Lynn Winebarger 1 sibling, 1 reply; 46+ messages in thread From: Eli Zaretskii @ 2022-06-06 10:39 UTC (permalink / raw) To: Stefan Monnier; +Cc: owinebar, akrl, emacs-devel > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: Andrea Corallo <akrl@sdf.org>, emacs-devel@gnu.org > Date: Mon, 06 Jun 2022 02:12:30 -0400 > > > That would explain the behavior I've seen. If that's the case, shouldn't > > batch-native-compile produce the byte-compiled file if it doesn't exist? > > Sounds about right, tho maybe there's a good reason for the current > behavior, I don't know. Of course, there is: that function is what is invoked when building a release tarball, where the *.elc files are already present. See lisp/Makefile.in. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-06 10:39 ` Eli Zaretskii @ 2022-06-06 16:23 ` Lynn Winebarger 2022-06-06 16:58 ` Eli Zaretskii 0 siblings, 1 reply; 46+ messages in thread From: Lynn Winebarger @ 2022-06-06 16:23 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Stefan Monnier, Andrea Corallo, emacs-devel [-- Attachment #1: Type: text/plain, Size: 1273 bytes --] On Mon, Jun 6, 2022 at 6:39 AM Eli Zaretskii <eliz@gnu.org> wrote: > > From: Stefan Monnier <monnier@iro.umontreal.ca> > > Cc: Andrea Corallo <akrl@sdf.org>, emacs-devel@gnu.org > > Date: Mon, 06 Jun 2022 02:12:30 -0400 > > > > > That would explain the behavior I've seen. If that's the case, > shouldn't > > > batch-native-compile produce the byte-compiled file if it doesn't > exist? > > > > Sounds about right, tho maybe there's a good reason for the current > > behavior, I don't know. > > Of course, there is: that function is what is invoked when building a > release tarball, where the *.elc files are already present. See > lisp/Makefile.in. > That's what I expected was the case, but the question is whether it "should" check for those .elc files and create them only if they do not exist, as opposed to batch-byte+native-compile, which creates both unconditionally. Or perhaps just note the possible hiccup in the docstring for batch-native-compile? However, since the eln file can be generated without the elc file, it also begs the question of why the use of the eln file is conditioned on the existence of the elc file in the first place. Are there situations where the eln file would be incorrect to use without the byte-compiled file in place? Lynn [-- Attachment #2: Type: text/html, Size: 1979 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-06 16:23 ` Lynn Winebarger @ 2022-06-06 16:58 ` Eli Zaretskii 2022-06-07 2:14 ` Lynn Winebarger 0 siblings, 1 reply; 46+ messages in thread From: Eli Zaretskii @ 2022-06-06 16:58 UTC (permalink / raw) To: Lynn Winebarger; +Cc: monnier, akrl, emacs-devel > From: Lynn Winebarger <owinebar@gmail.com> > Date: Mon, 6 Jun 2022 12:23:49 -0400 > Cc: Stefan Monnier <monnier@iro.umontreal.ca>, Andrea Corallo <akrl@sdf.org>, emacs-devel@gnu.org > > Of course, there is: that function is what is invoked when building a > release tarball, where the *.elc files are already present. See > lisp/Makefile.in. > > That's what I expected was the case, but the question is whether it "should" > check for those .elc files and create them only if they do not exist, as opposed > to batch-byte+native-compile, which creates both unconditionally. Or perhaps > just note the possible hiccup in the docstring for batch-native-compile? You are describing a different function. batch-native-compile was explicitly written to support the build of a release tarball, where the *.elc files are always present, and regenerating them is just a waste of cycles, and also runs the risk of creating a .elc file that is not fully functional, due to some peculiarity of the platform or the build environment. > However, since the eln file can be generated without the elc file, it also begs the question > of why the use of the eln file is conditioned on the existence of the elc file in the > first place. Are there situations where the eln file would be incorrect to use > without the byte-compiled file in place? Andrea was asked this question several times and explained his design, you can find it in the archives. Basically, native compilation is driven by byte compilation, and is a kind of side effect of it. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-06 16:58 ` Eli Zaretskii @ 2022-06-07 2:14 ` Lynn Winebarger 2022-06-07 10:53 ` Eli Zaretskii 0 siblings, 1 reply; 46+ messages in thread From: Lynn Winebarger @ 2022-06-07 2:14 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Stefan Monnier, Andrea Corallo, emacs-devel [-- Attachment #1: Type: text/plain, Size: 3703 bytes --] On Mon, Jun 6, 2022 at 12:58 PM Eli Zaretskii <eliz@gnu.org> wrote: > > From: Lynn Winebarger <owinebar@gmail.com> > > Date: Mon, 6 Jun 2022 12:23:49 -0400 > > Cc: Stefan Monnier <monnier@iro.umontreal.ca>, Andrea Corallo < > akrl@sdf.org>, emacs-devel@gnu.org > > > > Of course, there is: that function is what is invoked when building a > > release tarball, where the *.elc files are already present. See > > lisp/Makefile.in. > > > > That's what I expected was the case, but the question is whether it > "should" > > check for those .elc files and create them only if they do not exist, as > opposed > > to batch-byte+native-compile, which creates both unconditionally. Or > perhaps > > just note the possible hiccup in the docstring for batch-native-compile? > > You are describing a different function. batch-native-compile was > explicitly written to support the build of a release tarball, where > the *.elc files are always present, and regenerating them is just a > waste of cycles, and also runs the risk of creating a .elc file that > is not fully functional, due to some peculiarity of the platform or > the build environment. > Ok - I'm not sure why only generating the .elc in the case that it does not already exist is inconsistent with the restriction you describe. Ignoring that, according to https://github.com/emacs-mirror/emacs/blob/master/lisp/emacs-lisp/comp.el the signature and docstring are: (defun batch-native-compile (&optional for-tarball) "Perform batch native compilation of remaining command-line arguments. Native compilation equivalent of `batch-byte-compile'. Use this from the command line, with `-batch'; it won't work in an interactive Emacs session. Optional argument FOR-TARBALL non-nil means the file being compiled as part of building the source tarball, in which case the .eln file will be placed under the native-lisp/ directory (actually, in the last directory in `native-comp-eln-load-path')." If the restriction you describe is the intent, why not (1) make "for-tarball" non-optional and remove that argument, and (2) put that intent in the documentation so we would know not to use it > > However, since the eln file can be generated without the elc file, it > also begs the question > > of why the use of the eln file is conditioned on the existence of the > elc file in the > > first place. Are there situations where the eln file would be incorrect > to use > > without the byte-compiled file in place? > > Andrea was asked this question several times and explained his design, > you can find it in the archives. Basically, native compilation is > driven by byte compilation, and is a kind of side effect of it. > I understood that already - the question was why the .elc file, as an artifact, was required to exist in addition to the .eln file. I did follow your (implied?) suggestion and went back through the archives for 2021 and 2020 and saw some relevant discussions. The last relevant post I saw was from Andrea indicating he thought it shouldn't be required, but then it was just dropped: https://lists.gnu.org/archive/html/emacs-devel/2020-08/msg00561.html I have an experimental branch where the .elc are not produced at all by make bootstrap. The only complication is that for the Emacs build I had to modify the process to depose files containing the doc so make-docfile.c can eat those instead of the .elc files. Other than that we should re-add .eln to load-suffixes. But as I'm not sure this is a requirement I'd prefer first to converge with the current setup. Unless I get some specific input on that I think I'll keep this idea and its branch aside for now :) I may have missed a relevant subsequent post. Lynn [-- Attachment #2: Type: text/html, Size: 13765 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-07 2:14 ` Lynn Winebarger @ 2022-06-07 10:53 ` Eli Zaretskii 0 siblings, 0 replies; 46+ messages in thread From: Eli Zaretskii @ 2022-06-07 10:53 UTC (permalink / raw) To: Lynn Winebarger; +Cc: monnier, akrl, emacs-devel > From: Lynn Winebarger <owinebar@gmail.com> > Date: Mon, 6 Jun 2022 22:14:00 -0400 > Cc: Stefan Monnier <monnier@iro.umontreal.ca>, Andrea Corallo <akrl@sdf.org>, emacs-devel@gnu.org > > > Of course, there is: that function is what is invoked when building a > > release tarball, where the *.elc files are already present. See > > lisp/Makefile.in. > > > > That's what I expected was the case, but the question is whether it "should" > > check for those .elc files and create them only if they do not exist, as opposed > > to batch-byte+native-compile, which creates both unconditionally. Or perhaps > > just note the possible hiccup in the docstring for batch-native-compile? > > You are describing a different function. batch-native-compile was > explicitly written to support the build of a release tarball, where > the *.elc files are always present, and regenerating them is just a > waste of cycles, and also runs the risk of creating a .elc file that > is not fully functional, due to some peculiarity of the platform or > the build environment. > > Ok - I'm not sure why only generating the .elc in the case that it does not already exist is inconsistent with the > restriction you describe. Because this function is for the case where producing *.elc files is not wanted. > Ignoring that, according to https://github.com/emacs-mirror/emacs/blob/master/lisp/emacs-lisp/comp.el the > signature and docstring are: > > (defun batch-native-compile (&optional for-tarball) "Perform batch native compilation of remaining > command-line arguments. > > Native compilation equivalent of `batch-byte-compile'. > Use this from the command line, with `-batch'; it won't work > in an interactive Emacs session. > Optional argument FOR-TARBALL non-nil means the file being compiled > as part of building the source tarball, in which case the .eln file > will be placed under the native-lisp/ directory (actually, in the > last directory in `native-comp-eln-load-path')." > If the restriction you describe is the intent, why not > (1) make "for-tarball" non-optional and remove that argument, and > (2) put that intent in the documentation so we would know not to use it Because that function could be used in contexts other than building a release tarball, and I see no need to restrict it. And I don't think I understand the use case you want to support. When is it useful to produce *.eln files for all the *.el files, but *.elc files only for those *.el files that were modified or for which *.elc doesn't exist? > > However, since the eln file can be generated without the elc file, it also begs the question > > of why the use of the eln file is conditioned on the existence of the elc file in the > > first place. Are there situations where the eln file would be incorrect to use > > without the byte-compiled file in place? > > Andrea was asked this question several times and explained his design, > you can find it in the archives. Basically, native compilation is > driven by byte compilation, and is a kind of side effect of it. > > I understood that already - the question was why the .elc file, as an artifact, was required to exist in addition > to the .eln file. Where do you see that requirement? ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-06 6:12 ` Stefan Monnier 2022-06-06 10:39 ` Eli Zaretskii @ 2022-06-06 16:13 ` Lynn Winebarger 2022-06-07 2:39 ` Lynn Winebarger 1 sibling, 1 reply; 46+ messages in thread From: Lynn Winebarger @ 2022-06-06 16:13 UTC (permalink / raw) To: Stefan Monnier; +Cc: Andrea Corallo, emacs-devel [-- Attachment #1: Type: text/plain, Size: 3596 bytes --] On Mon, Jun 6, 2022 at 2:12 AM Stefan Monnier <monnier@iro.umontreal.ca> wrote: > > Trampolines are needed for any native-compiled function which > gets redefined. We could try to build them eagerly when the > native-compiled function is compiled, and there could be various other > ways to handle this. There's room for improvement here, but the current > system works well enough for a first version. > > Yes, I agree. As I wrote in the initial email, my questions are primarily curiosity about how the new capability can be further exploited. When I'm not loading the build down with a ridiculous number of packages, it performs very well. > > True, but it does lead to a little more disappointment when that 2.5-5x > > speedup is dominated by the load-path length while starting up. > > I don't know where you got that 2.5-5x expectation, but native > compilation will often result in "no speed up at all". > That's a good question - it was one of the articles I read when I first learned about this new capability. It was in the context of overall emacs performance with the feature enabled, rather than any particular piece of code. > > Sorry, no. I meant I'm curious if having them in the user's cache versus > > the system ELN cache would make any difference in start-up time, ignoring > > the initial async native compilation. In particular whether the checksum > > calculation is bypassed in one case but not the other (by keeping a > > permanent mapping from the system load-path to the system cache, say). > > No, I don't think it should make any difference in this respect. > > > I'm guessing the native compiled code is making the GC's performance a > more > > noticeable chunk of overhead. > > Indeed, the GC is the same and the native compiler does not make many > efforts to reduce memory allocations, so fraction of time spent in GC > tends to increase. > > > I'd really love to see something like Chromium's concurrent gc > > integrated into Emacs. > > Our GC is in serious need of improvement, yes. Bolting some existing GC > onto Emacs won't be easy, tho. > Chromium came to mind primarily because I've been tracking V8's refactoring of the "Oilpan" gc for use as a stand-alone collector for other projects I'm interested in. Though I believe V8 uses a type-tagging system treated specially by the collector separately from the C++ classes managed by the stand-alone collector. That's the piece I think would be adapted for lisp GC, with the added benefit of offering integrated GC for types using the cppgc interface for additional modules. I did see a thread in the archives of emacs-devel that someone hacked spider monkey's collector a few years ago (2017 I believe) into emacs as a proof of concept. My very cursory inspection of the memory allocation bits of the emacs core give me the impression the abstraction boundaries set by the simple interface are not rampantly violated. I would hope that at this point adapting the V8 (or similar) collector would be more straightforward than that effort was. I'm also not sure whether code derived from V8 would be eligible for incorporation into emacs directly, given the legal requirements for explicit copyright assignment. Maybe the best bet would be to define a rigorous interface and allow alternative GC implementations to be plugged in. That would make it easier to experiment with alternative garbage collectors more generally, which would probably be a general positive if you were looking to improve that part of the system in general while maintaining the current safe implementation. Lynn [-- Attachment #2: Type: text/html, Size: 4790 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-06 16:13 ` Lynn Winebarger @ 2022-06-07 2:39 ` Lynn Winebarger 2022-06-07 11:50 ` Stefan Monnier 0 siblings, 1 reply; 46+ messages in thread From: Lynn Winebarger @ 2022-06-07 2:39 UTC (permalink / raw) To: Stefan Monnier; +Cc: Andrea Corallo, emacs-devel [-- Attachment #1: Type: text/plain, Size: 979 bytes --] On Mon, Jun 6, 2022 at 2:12 AM Stefan Monnier <monnier@iro.umontreal.ca> wrote: > > I am curious as to why the system doesn't just produce trampolines for > all > > the system calls AOT in a single module. > > Trampolines are needed for any native-compiled function which > gets redefined. We could try to build them eagerly when the > native-compiled function is compiled, and there could be various other > ways to handle this. There's room for improvement here, but the current > system works well enough for a first version. > While I was going over the archives for answers to my questions (following Eli's observation), I found these gems: https://lists.gnu.org/archive/html/emacs-devel/2021-02/msg00599.html https://lists.gnu.org/archive/html/emacs-devel/2021-02/msg00724.html I have the impression these ideas/concerns got lost in all the other work required to get the first release ready, but I could have missed a follow-up definitively knocking them down. Lynn [-- Attachment #2: Type: text/html, Size: 1901 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-07 2:39 ` Lynn Winebarger @ 2022-06-07 11:50 ` Stefan Monnier 2022-06-07 13:11 ` Eli Zaretskii 0 siblings, 1 reply; 46+ messages in thread From: Stefan Monnier @ 2022-06-07 11:50 UTC (permalink / raw) To: Lynn Winebarger; +Cc: Andrea Corallo, emacs-devel > I have the impression these ideas/concerns got lost in all the other > work required to get the first release ready, but I could have missed > a follow-up definitively knocking them down. I don't think they got lost. They have simply been put aside temporarily. Stefan ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-07 11:50 ` Stefan Monnier @ 2022-06-07 13:11 ` Eli Zaretskii 0 siblings, 0 replies; 46+ messages in thread From: Eli Zaretskii @ 2022-06-07 13:11 UTC (permalink / raw) To: Stefan Monnier; +Cc: owinebar, akrl, emacs-devel > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: Andrea Corallo <akrl@sdf.org>, emacs-devel@gnu.org > Date: Tue, 07 Jun 2022 07:50:56 -0400 > > > I have the impression these ideas/concerns got lost in all the other > > work required to get the first release ready, but I could have missed > > a follow-up definitively knocking them down. > > I don't think they got lost. They have simply been put > aside temporarily. More accurately, they are waiting for Someone(TM) to work on them. As always happens in Emacs with useful ideas that got "put aside". ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-05 14:20 ` Stefan Monnier 2022-06-06 4:12 ` Lynn Winebarger @ 2022-06-14 4:19 ` Lynn Winebarger 2022-06-14 12:23 ` Stefan Monnier 1 sibling, 1 reply; 46+ messages in thread From: Lynn Winebarger @ 2022-06-14 4:19 UTC (permalink / raw) To: Stefan Monnier; +Cc: Andrea Corallo, emacs-devel [-- Attachment #1: Type: text/plain, Size: 1738 bytes --] On Sun, Jun 5, 2022 at 10:20 AM Stefan Monnier <monnier@iro.umontreal.ca> wrote: > > >> Also, what kind of startup time are you talking about? > >> E.g., are you using `package-quickstart`? > > That was the first alternative I tried. With 1250 packages, it did not > > work. > > Please `M-x report-emacs-bug` (and put me in `X-Debbugs-Cc`). > > I was able to reproduce this at home on cygwin with 940 packages. I had tried to install a few more than that (maybe 945 or something), I just removed the corresponding sections of the last few until it did compile. Then I verified that it didn't matter whether I removed the first or the last package autoloads, I would get the overflow regardless. After spending the weekend going over byte code examples, I looked at the output, and it's literally just hitting 64k instructions. Each package uses 17 instructions just putting itself on the loadpath, which accounts for ~15000 instructions. That means every package uses about 50 instructions on average, so (if that's representative), you wouldn't expect to be able to do much more than 300 or so additional packages just from putting those paths in an array and looping over them. Most of the forms are just calls to a handful of operators with constant arguments, so I would assume you could just create arrays for the most common instruction types, put the argument lists in a giant vector, and then just loop over those vectors performing the operator. Then there'd be a handful of oddball expressions to handle. Or, you could just create a vector with one thunk for each package and loop through it invoking each one. It wouldn't be as space efficient, but it would be trivially correct. I'll put this in a bug report. Lynn [-- Attachment #2: Type: text/html, Size: 2353 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-14 4:19 ` Lynn Winebarger @ 2022-06-14 12:23 ` Stefan Monnier 2022-06-14 14:55 ` Lynn Winebarger 0 siblings, 1 reply; 46+ messages in thread From: Stefan Monnier @ 2022-06-14 12:23 UTC (permalink / raw) To: Lynn Winebarger; +Cc: Andrea Corallo, emacs-devel > Or, you could just create a vector with one thunk for each package and > loop through it invoking each one. It wouldn't be as space > efficient, but it would be trivially correct. IIRC the compiler has code to split a bytecode object into two to try and circumvent the 64k limit and it should definitely be applicable here (it's more problematic when it's inside a loop), which is why I think it's a plain bug. Stefan ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-14 12:23 ` Stefan Monnier @ 2022-06-14 14:55 ` Lynn Winebarger 0 siblings, 0 replies; 46+ messages in thread From: Lynn Winebarger @ 2022-06-14 14:55 UTC (permalink / raw) To: Stefan Monnier; +Cc: Andrea Corallo, emacs-devel [-- Attachment #1: Type: text/plain, Size: 1636 bytes --] I think you may be remembering an intention to implement that. The issue came up in 2009 (before the byte-compiler even caught the error), and there's only the initial patch you committed to signal the error: https://git.savannah.gnu.org/cgit/emacs.git/commit/lisp/emacs-lisp/bytecomp.el?id=8476cfaf3dadf04379fde65cd7e24820151f78a9 and one more changing a variable name: https://git.savannah.gnu.org/cgit/emacs.git/commit/lisp/emacs-lisp/bytecomp.el?id=d9bbf40098801a859f4625c4aa7a8cbe99949705 so lines 954-961 of bytecomp.el still read: (dolist (bytes-tail patchlist) (setq pc (caar bytes-tail)) ; Pick PC from goto's tag. ;; Splits PC's value into 2 bytes. The jump address is ;; "reconstructed" by the `FETCH2' macro in `bytecode.c'. (setcar (cdr bytes-tail) (logand pc 255)) (setcar bytes-tail (ash pc -8)) ;; FIXME: Replace this by some workaround. (or (<= 0 (car bytes-tail) 255) (error "Bytecode overflow"))) I mainly quote this to say: see what I mean about losing starting off as putting aside temporarily? :-) I sent in the bug report. On Tue, Jun 14, 2022 at 8:23 AM Stefan Monnier <monnier@iro.umontreal.ca> wrote: > > Or, you could just create a vector with one thunk for each package and > > loop through it invoking each one. It wouldn't be as space > > efficient, but it would be trivially correct. > > IIRC the compiler has code to split a bytecode object into two to try > and circumvent the 64k limit and it should definitely be applicable here > (it's more problematic when it's inside a loop), which is why I think > it's a plain bug. > > > Stefan > > [-- Attachment #2: Type: text/html, Size: 2528 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-04 14:32 ` Stefan Monnier 2022-06-05 12:16 ` Lynn Winebarger @ 2022-06-08 6:56 ` Andrea Corallo 2022-06-11 16:13 ` Lynn Winebarger 1 sibling, 1 reply; 46+ messages in thread From: Andrea Corallo @ 2022-06-08 6:56 UTC (permalink / raw) To: Stefan Monnier; +Cc: Lynn Winebarger, emacs-devel Stefan Monnier <monnier@iro.umontreal.ca> writes: >>> Performance issues with read access to directories containing less than >>> 10K files seems like something that was solved last century, so >>> I wouldn't worry very much about it. >> Per my response to Eli, I see (network) directories become almost unusable >> somewhere around 1000 files, > > I don't doubt there are still (in the current century) cases where > largish directories get slow, but what I meant is that it's now > considered as a problem that should be solved by making those > directories fast rather than by avoiding making them so large. > >>> [ But that doesn't mean we shouldn't try to compile several ELisp files >>> into a single ELN file, especially since the size of ELN files seems >>> to be proportionally larger for small ELisp files than for large >>> ones. ] >> >> Since I learned of the native compiler in 28.1, I decided to try it out and >> also "throw the spaghetti at the wall" with a bunch of packages that >> provide features similar to those found in more "modern" IDEs. In terms of >> startup time, the normal package system does not deal well with hundreds of >> directories on the load path, regardless of AOR native compilation, so I'm >> tranforming the packages to install in the version-specific load path, and >> compiling that ahead of time. At least for the ones amenable to such >> treatment. > > There are two load-paths at play (`load-path` and > `native-comp-eln-load-path`) and I'm not sure which one you're taking > about. OT1H `native-comp-eln-load-path` should not grow with the number > of packages so it typically contains exactly 2 entries, and definitely > not hundreds. OTOH `load-path` is unrelated to native compilation. > > I also don't understand what you mean by "version-specific load path". > > Also, what kind of startup time are you talking about? > E.g., are you using `package-quickstart`? > >> Given I'm compiling all the files AOT for use in a common installation >> (this is on Linux, not Windows), the natural question for me is whether >> larger compilation units would be more efficient, particularly at startup. > > It all depends where the slowdown comes from :-) > > E.g. `package-quickstart` follows a similar idea to the one you propose > by collecting all the `<pkg>-autoloads.el` into one bug file, which > saves us from having to load separately all those little files. It also > saves us from having to look for them through those hundreds > of directories. > > I suspect a long `load-path` can itself be a source of slow down > especially during startup, but I haven't bumped into that yet. > There are ways we could speed it up, if needed: > > - create "meta packages" (or just one containing all your packages), > which would bring together in a single directory the files of several > packages (and presumably also bring together their > `<pkg>-autoloads.el` into a larger combined one). Under GNU/Linux we > could have this metapackage be made of symlinks, making it fairly > efficient an non-obtrusive (e.g. `C-h o` could still get you to the > actual file rather than its metapackage-copy). > - Manage a cache of where are our ELisp files (i.e. a hash table > mapping relative ELisp file names to the absolute file name returned > by looking for them in `load-path`). This way we can usually avoid > scanning those hundred directories to find the .elc file we need, and > go straight to it. > >> I posed the question to the list mostly to see if the approach (or similar) >> had already been tested for viability or effectiveness, so I can avoid >> unnecessary experimentation if the answer is already well-understood. > > I don't think it has been tried, no. > >> I don't know enough about modern library loading to know whether you'd >> expect N distinct but interdependent dynamic libraries to be loaded in as >> compact a memory region as a single dynamic library formed from the same >> underlying object code. > > I think you're right here, but I'd expect the effect to be fairly small > except when the .elc/.eln files are themselves small. > >> It's not clear to me whether those points are limited to call >> sites or not. > > I believe it is: the optimization is to replace a call via `Ffuncall` to > a "symbol" (which looks up the value stored in the `symbol-function` > cell), with a direct call to the actual C function contained in the > "subr" object itself (expected to be) contained in the > `symbol-function` cell. > > Andrea would know if there are other semantic-non-preserving > optimizations in the level 3 of the optimizations, but IIUC this is very > much the main one. Correct that's the main one: it does that for all calls to C primitives and for all calls to lisp function defined in the same compilation unit. Other than that speed 3 enables pure function optimization and self tail recursion optimization. Andrea ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-08 6:56 ` Andrea Corallo @ 2022-06-11 16:13 ` Lynn Winebarger 2022-06-11 16:37 ` Stefan Monnier 0 siblings, 1 reply; 46+ messages in thread From: Lynn Winebarger @ 2022-06-11 16:13 UTC (permalink / raw) To: Andrea Corallo; +Cc: Stefan Monnier, emacs-devel [-- Attachment #1: Type: text/plain, Size: 1693 bytes --] On Wed, Jun 8, 2022, 2:56 AM Andrea Corallo <akrl@sdf.org> wrote: > Stefan Monnier <monnier@iro.umontreal.ca> writes: > > > >> It's not clear to me whether those points are limited to call > >> sites or not. > > > > I believe it is: the optimization is to replace a call via `Ffuncall` to > > a "symbol" (which looks up the value stored in the `symbol-function` > > cell), with a direct call to the actual C function contained in the > > "subr" object itself (expected to be) contained in the > > `symbol-function` cell. > > > > Andrea would know if there are other semantic-non-preserving > > optimizations in the level 3 of the optimizations, but IIUC this is very > > much the main one. > > Correct that's the main one: it does that for all calls to C primitives > and for all calls to lisp function defined in the same compilation unit. > > Other than that speed 3 enables pure function optimization and self tail > recursion optimization. > Would it make sense to add a feature for declaring a function symbol value is constant and non-advisable, at least within some notion of explicitly named scope(s)? That would allow developers to be more selective about which functions are "exported" to library users, and which are defined as global function symbols because it's more convenient than wrapping everything in a package/module/namespace in a giant cl-flet and then explicitly "exporting" functions and macros via fset. Then intraprocedural optimization within the named scopes would be consistent with the language. I'm thinking of using semantic/wisent for a modern for a proprietary language. I am curious whether these optimizations are used or usable in that context. Lynn [-- Attachment #2: Type: text/html, Size: 2393 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-11 16:13 ` Lynn Winebarger @ 2022-06-11 16:37 ` Stefan Monnier 2022-06-11 17:49 ` Lynn Winebarger 0 siblings, 1 reply; 46+ messages in thread From: Stefan Monnier @ 2022-06-11 16:37 UTC (permalink / raw) To: Lynn Winebarger; +Cc: Andrea Corallo, emacs-devel > Would it make sense to add a feature for declaring a function symbol value > is constant and non-advisable, at least within some notion of explicitly > named scope(s)? That would allow developers to be more selective about > which functions are "exported" to library users, and which are defined as > global function symbols because it's more convenient than wrapping > everything in a package/module/namespace in a giant cl-flet and then > explicitly "exporting" functions and macros via fset. In which sense would it be different from: (cl-flet ... (defun ...) (defun ...) ...) -- Stefan ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-11 16:37 ` Stefan Monnier @ 2022-06-11 17:49 ` Lynn Winebarger 2022-06-11 20:34 ` Stefan Monnier 0 siblings, 1 reply; 46+ messages in thread From: Lynn Winebarger @ 2022-06-11 17:49 UTC (permalink / raw) To: Stefan Monnier; +Cc: Andrea Corallo, emacs-devel [-- Attachment #1: Type: text/plain, Size: 1593 bytes --] On Sat, Jun 11, 2022 at 12:37 PM Stefan Monnier <monnier@iro.umontreal.ca> wrote: > > Would it make sense to add a feature for declaring a function symbol > value > > is constant and non-advisable, at least within some notion of explicitly > > named scope(s)? That would allow developers to be more selective about > > which functions are "exported" to library users, and which are defined as > > global function symbols because it's more convenient than wrapping > > everything in a package/module/namespace in a giant cl-flet and then > > explicitly "exporting" functions and macros via fset. > > In which sense would it be different from: > > (cl-flet > ... > (defun ...) > (defun ...) > ...) > > Good point - it's my scheme background confusing me. I was thinking defun would operate with similar scoping rules as defvar and establish a local binding, where fset (like setq) would not create any new bindings. (1) I don't know how much performance difference (if any) there is between (fsetq exported-fxn #'internal-implementation) and (defun exported-fxn (x y ...) (internal-implementation x y ...)) (2) I'm also thinking of more aggressively forcing const-ness at run-time with something like: (eval-when-compile (cl-flet ((internal-implemenation (x y ...) body ...)) (fset exported-fxn #'internal-implementation))) (fset exported-fxn (eval-when-compile #'exported-fxn)) If that makes sense, is there a way to do the same thing with defun? Or perhaps cl-labels instead of cl-flet, assuming they are both optimized the same way. Lynn [-- Attachment #2: Type: text/html, Size: 2200 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-11 17:49 ` Lynn Winebarger @ 2022-06-11 20:34 ` Stefan Monnier 2022-06-12 17:38 ` Lynn Winebarger 0 siblings, 1 reply; 46+ messages in thread From: Stefan Monnier @ 2022-06-11 20:34 UTC (permalink / raw) To: Lynn Winebarger; +Cc: Andrea Corallo, emacs-devel >> In which sense would it be different from: >> >> (cl-flet >> ... >> (defun ...) >> (defun ...) >> ...) >> >> > Good point - it's my scheme background confusing me. I was thinking defun > would operate with similar scoping rules as defvar and establish a local > binding, where fset (like setq) would not create any new bindings. I was not talking about performance but about semantics (under the assumption that if the semantics is the same then it should be possible to get the same performance somehow). > (1) I don't know how much performance difference (if any) there is between > (fsetq exported-fxn #'internal-implementation) > and > (defun exported-fxn (x y ...) (internal-implementation x y ...)) If you don't want the indirection, then use `defalias` (which is like `fset` but registers the action as one that *defines* the function, for the purpose of `C-h f` and the likes, and they also have slightly different semantics w.r.t advice). > (2) I'm also thinking of more aggressively forcing const-ness at run-time > with something like: > (eval-when-compile > (cl-flet ((internal-implemenation (x y ...) body ...)) > (fset exported-fxn #'internal-implementation))) > (fset exported-fxn (eval-when-compile #'exported-fxn)) > > If that makes sense, is there a way to do the same thing with defun? I don't know what the above code snippet is intended to show/do, sorry :-( Stefan ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-11 20:34 ` Stefan Monnier @ 2022-06-12 17:38 ` Lynn Winebarger 2022-06-12 18:47 ` Stefan Monnier 0 siblings, 1 reply; 46+ messages in thread From: Lynn Winebarger @ 2022-06-12 17:38 UTC (permalink / raw) To: Stefan Monnier; +Cc: Andrea Corallo, emacs-devel [-- Attachment #1: Type: text/plain, Size: 14046 bytes --] On Sat, Jun 11, 2022 at 4:34 PM Stefan Monnier <monnier@iro.umontreal.ca> wrote: > >> In which sense would it be different from: > >> > >> (cl-flet > >> ... > >> (defun ...) > >> (defun ...) > >> ...) > >> > >> > > Good point - it's my scheme background confusing me. I was thinking > defun > > would operate with similar scoping rules as defvar and establish a local > > binding, where fset (like setq) would not create any new bindings. > > I was not talking about performance but about semantics (under the > assumption that if the semantics is the same then it should be possible > to get the same performance somehow). > I'm trying to determine if there's a set of expressions for which it is semantically sound to perform the intraprocedural optimizations by -O3 - that is, where it is correct to treat functions in operator position as constants rather than a reference through a symbol's function cell. > > > (1) I don't know how much performance difference (if any) there is > between > > (fsetq exported-fxn #'internal-implementation) > > and > > (defun exported-fxn (x y ...) (internal-implementation x y ...)) > > If you don't want the indirection, then use `defalias` (which is like > `fset` but registers the action as one that *defines* the function, for > the purpose of `C-h f` and the likes, and they also have slightly > different semantics w.r.t advice). > > What I'm looking for is for a function as a first class value, whether as a byte-code vector, a symbolic reference to a position in the .text section (or equivalent) of a shared object that may or may not have been loaded, or a pointer to a region that is allowed to be executed. > > (2) I'm also thinking of more aggressively forcing const-ness at run-time > > with something like: > > (eval-when-compile > > (cl-flet ((internal-implemenation (x y ...) body ...)) > > (fset exported-fxn #'internal-implementation))) > > (fset exported-fxn (eval-when-compile #'exported-fxn)) > > > > If that makes sense, is there a way to do the same thing with defun? > > I don't know what the above code snippet is intended to show/do, sorry :-( > I'm trying to capture a function as a first class value. Better example - I put the following in ~/test1.el and byte compiled it (with emacs 28.1 running on cygwin). ------------- (require 'cl-lib) (eval-when-compile (cl-labels ((my-evenp (n) (if (= n 0) t (my-oddp (1- n)))) (my-oddp (n) (if (= n 0) nil (my-evenp (1- n))))) (defun my-global-evenp (n) (my-evenp n)) (defun my-global-oddp (n) (my-oddp n)))) ----------------- I get the following (expected) error when running in batch (or interactively, if only loading the compiled file) $ emacs -batch --eval '(load "~/test1.elc")' --eval '(message "%s" (my-global-evenp 5))' Loading ~/test1.elc... Debugger entered--Lisp error: (void-function my-global-evenp) (my-global-evenp 5) (message "%s" (my-global-evenp 5)) eval((message "%s" (my-global-evenp 5)) t) command-line-1(("--eval" "(load \"~/test1.elc\")" "--eval" "(message \"%s\" (my-global-evenp 5))")) command-line() normal-top-level() The function symbol is only defined at compile time by the defun, so it is undefined when the byte-compiled file is loaded in a clean environment. When I tried using (fset 'my-global-evenp (eval-when-compile #'my-ct-global-evenp) it just produced a symbol indirection, which was disappointing. So here there are global compile time variables being assigned trampolines to the local functions at compile time as values. ------------------------------- (require 'cl-lib) (eval-when-compile (defvar my-ct-global-evenp nil) (defvar my-ct-global-oddp nil) (cl-labels ((my-evenp (n) (if (= n 0) t (my-oddp (1- n)))) (my-oddp (n) (if (= n 0) nil (my-evenp (1- n))))) (setq my-ct-global-evenp (lambda (n) (my-evenp n))) (setq my-ct-global-oddp (lambda (n) (my-oddp n))))) (fset 'my-global-evenp (eval-when-compile my-ct-global-evenp)) (fset 'my-global-oddp (eval-when-compile my-ct-global-oddp)) ------------------------------- Then I get $ emacs -batch --eval '(load "~/test2.elc")' --eval '(message "%s" (my-global-evenp 5))' Loading ~/test2.elc... Debugger entered--Lisp error: (void-variable --cl-my-evenp--) my-global-evenp(5) (message "%s" (my-global-evenp 5)) eval((message "%s" (my-global-evenp 5)) t) command-line-1(("--eval" "(load \"~/test2.elc\")" "--eval" "(message \"%s\" (my-global-evenp 5))")) command-line() normal-top-level() This I did not expect. Maybe the variable name is just an artifact of the way cl-labels is implemented and not a fundamental limitation. Third attempt to express a statically allocated closure with constant code (which is one way of viewing an ELF shared object): -------------------------------- (require 'cl-lib) (eval-when-compile (defvar my-ct-global-evenp nil) (defvar my-ct-global-oddp nil) (let (my-evenp my-oddp) (setq my-evenp (lambda (n) (if (= n 0) t (funcall my-oddp (1- n))))) (setq my-oddp (lambda (n) (if (= n 0) nil (funcall my-evenp (1- n))))) (setq my-ct-global-evenp (lambda (n) (funcall my-evenp n))) (setq my-ct-global-oddp (lambda (n) (funcall my-oddp n))))) (fset 'my-global-evenp (eval-when-compile my-ct-global-evenp)) (fset 'my-global-oddp (eval-when-compile my-ct-global-oddp)) -------------------------------- And the result is worse: $ emacs -batch --eval '(load "~/test3.elc")' --eval '(message "%s" (my-global-evenp 5))' Loading ~/test3.elc... Debugger entered--Lisp error: (void-variable my-evenp) my-global-evenp(5) (message "%s" (my-global-evenp 5)) eval((message "%s" (my-global-evenp 5)) t) command-line-1(("--eval" "(load \"~/test3.elc\")" "--eval" "(message \"%s\" (my-global-evenp 5))")) command-line() normal-top-level() This was not expected with lexical scope. $ emacs -batch --eval '(load "~/test3.elc")' --eval "(message \"%s\" (symbol-function 'my-global-evenp))" Loading ~/test3.elc... #[(n) !\207 [my-evenp n] 2] At least my-global-evenp has byte-code as a value, not a symbol, which was the intent. I get the same result if I wrap the two lambdas stored in the my-ct-* variables with "byte-compile", which is what I intended (for the original to be equivalent to explicitly compiling the form). However, what I expected would have been the byte-code equivalent of an ELF object with 2 symbols defined for relocation. So why is the compiler producing code that would correspond to the "let" binding my-evenp and my-oddp being dynamically scoped? That made me curious, so I found https://rocky.github.io/elisp-bytecode.pdf and reviewed it. I believe I see the issue now. With the current byte-codes, there's just no way to express a call to an offset in the current byte-vector. There's not even a way to reference the address of the current byte vector to use as an argument to funcall. There's no way to reference symbols that were resolved at compile-time at all, which would require the equivalent of dl symbols embedded in a code vector that would be patched at load time. That forces the compiler to emit a call to a symbol. And when the manual talks about lexical scope, it's only for "variables" not function symbols. That explains a lot. The reason Andrea had to use LAP as the starting point for optimizations, for example. I can't find a spec for Emacs's version of LAP, but I'm guessing it can still express symbolic names for local function expressions in a way byte-code simply cannot. I don't see how the language progresses without resolving the inconsistency between what's expressible in ELF and what's expressible in a byte-code object. One possible set of changes to make the two compatible - and I'd use the relative goto byte codes if they haven't been produced by emacs since v19. I'd also add a few special registers. There's already one used to enable GOTO (i.e. the program counter) - byte codes for call/returns directly into/from byte code objects - CALL-RELATIVE - execute a function call to the current byte-vector object with the pc set to the pc+operand0 - basically PIC code If a return is required, the byte compiler should arrange for the return address to be pushed before other operands to the function being called No additional manipulation of the stack is required, since funcall would just pop the arguments and then immediately push them again. Alternatively, you could have a byte-code that explicitly allocates a stack frame (if needed), push the return offset, then goto - CALL-ABSOLUTE - execute a function call to a specified byte-vector object + pc as the first 2 operands, This is useless until the byte-code object supports a notional of relocation symbols, i.e. named compile-time constants that get patched on load in one way or another, e.g. directly by modifying the byte-string with the value at run-time (assuming eager loading), or indirectly by adding a "linkage table" of external symbols that will be filled in at load and specifying an index into that table. - RETURN-RELATIVE - operand is the number of items that have to be popped from the stack to get the return address, which is an offset in the current byte-vector object. Alternatively, could be implemented as "discardN <n>; goto" - RETURN-ABSOLUTE - same as return-relative, but the return address is given by two operands, a byte-vector and offset in the byte-vector - Alternate formulation - RESERVE-STACK operand is a byte-vector object (reference) that will be used to determine how much total stack space will be required for safety, and ensure enough space is allocated. - GOTO-ABSOLUTE - operand is a byte-vector object and an offset. Immediate control transfer to the specified context - These two are adequate to implement the above - Additional registers and related instructions - PC - register already exists - PUSH-PC - the opposite of goto, which pops the stack into the PC register. - GOT - a table of byte-vectors + offsets corresponding to a PLT section of the byte-vector specifying the compile-time symbols that have to be resolved - The byte-vector references + offset in the "absolute" instructions above would be specified as an index into this table. Otherwise the byte-vector could not be saved and directly loaded for later execution. - STATIC - a table for the lexical variables allocated and accessible to the closures at compile-time. Compiler should treat all sexp as occuring at the top-level with regard to the run-time lexical environment. A form like (let ((x 5)) (byte-compile (lambda (n) (+ n (eval-when-compile x))))) should produce byte-code with the constant 5, while (let ((x 5)) (byte-compile (lambda (n) (+ n x)))) should produce byte code adding the argument n to the value of the global variable x at run-time - PUSH-STATIC - POP-STATIC - ENV - the environment register. - ENV-PUSH-FRAME - operand is number of stack items to capture as a (freshly allocated) frame, which is then added as a rib to a new environment pointed to by the ENV register - PUSH-ENV - push the value of ENV onto the stack - POP-ENV - pop the top of the stack into ENV, discarding any value there - Changes to byte-code object - IMPORTS table of symbols defined at compile-time requiring resolution to constants at load-time, particularly for references to compilation units (byte-vector or native code) and exported symbols bound to constants (really immutable) Note - the "relative" versions of call and return above could be eliminated if "IMPORTS" includes self-references into the byte-vector object itself - EXPORTS table of symbols available to be called or referenced externally - Static table with values initialized from the values in the closure at compile-time - Constant table and byte string remain - Changes to byte-code loader - Read the new format - Resolve symbols - should link to specific compilation units rather than "features", as compilation units will define specific exported symbols, while features do not support that detail. Source could still use "require", but the symbols referenced from the compile-time environment would have to be traced back to the compilation unit supplying them (unless they are recorded as constants by an expression like (eval-when-compile (setq v (eval-when-compile some-imported-symbol))) - Allocate and initialize the static segment - Create a "static closure" for the compilation unit = loaded object + GOT + static frame - record as singleton entry mapping compilation units to closures (hence "static") - Changes to funcall - invoking a function from a compilation unit would require setting the GOT, STATIC and setting the ENV register to point to STATIC as the first rib (directly or indirectly) - invoking a closure with a "code" element pointing to an "exported" symbol from a compilation unit + an environment pointer - Set GOT and STATIC according to the byte-vector's static closure - Dispatch according to whether compilation unit is native or byte-compiled, but both have the above elements - Changes to byte-compiler - Correct the issues with compile-time evaluation + lexical scope of function names above - Emit additional sections in byte-code - Should be able to implement the output of native-compiler pass (pre-libgccjit) with "-O3" flags in byte-code correctly Lynn [-- Attachment #2: Type: text/html, Size: 19242 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-12 17:38 ` Lynn Winebarger @ 2022-06-12 18:47 ` Stefan Monnier 2022-06-13 16:33 ` Lynn Winebarger 0 siblings, 1 reply; 46+ messages in thread From: Stefan Monnier @ 2022-06-12 18:47 UTC (permalink / raw) To: Lynn Winebarger; +Cc: Andrea Corallo, emacs-devel >> >> In which sense would it be different from: >> >> >> >> (cl-flet >> >> ... >> >> (defun ...) >> >> (defun ...) >> >> ...) >> >> >> >> >> > Good point - it's my scheme background confusing me. I was thinking defun >> > would operate with similar scoping rules as defvar and establish a local >> > binding, where fset (like setq) would not create any new bindings. >> >> I was not talking about performance but about semantics (under the >> assumption that if the semantics is the same then it should be possible >> to get the same performance somehow). > > I'm trying to determine if there's a set of expressions for which it > is semantically sound to perform the intraprocedural optimizations The cl-flet above is such an example, AFAIK. Or maybe I don't understand what you mean. > I'm trying to capture a function as a first class value. Functions are first class values and they can be trivially captured via things like (setq foo (lambda ...)), (defalias 'foo (lambda ...)) and a lot more, so I there's some additional constraint you're expecting but I don't know what that is. > This was not expected with lexical scope. You explicitly write `(require 'cl-lib)` but I don't see any -*- lexical-binding:t -*- anywhere, so I suspect you forgot to add those cookies that are needed to get proper lexical scoping. > With the current byte-codes, there's just no way to express a call to > an offset in the current byte-vector. Indeed, but you can call a byte-code object instead. Stefan ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-12 18:47 ` Stefan Monnier @ 2022-06-13 16:33 ` Lynn Winebarger 2022-06-13 17:15 ` Stefan Monnier 0 siblings, 1 reply; 46+ messages in thread From: Lynn Winebarger @ 2022-06-13 16:33 UTC (permalink / raw) To: Stefan Monnier; +Cc: Andrea Corallo, emacs-devel [-- Attachment #1: Type: text/plain, Size: 6284 bytes --] On Sun, Jun 12, 2022 at 2:47 PM Stefan Monnier <monnier@iro.umontreal.ca> wrote: > >> >> In which sense would it be different from: > >> >> > >> >> (cl-flet > >> >> ... > >> >> (defun ...) > >> >> (defun ...) > >> >> ...) > >> >> > > I'm trying to determine if there's a set of expressions for which it > > is semantically sound to perform the intraprocedural optimizations > > The cl-flet above is such an example, AFAIK. Or maybe I don't > understand what you mean. > To be clear, I'm trying to first understand what Andrea means by "safe". I'm assuming it means the result agrees with whatever the byte compiler and VM would produce for the same code. I doubt I'm bringing up topics or ideas that are new to you. But if I do make use of semantic/wisent, I'd like to know the result can be fast (modulo garbage collection, anyway). I've been operating under the assumption that - Compiled code objects should be first class in the sense that they can be serialized just by using print and read. That seems to have been important historically, and was true for byte-code vectors for dynamically scoped functions. It's still true for byte-code vectors of top-level functions, but is not true for byte-code vectors for closures (and hasn't been for at least a decade, apparently). - It's still worthwhile to have a class of code objects that are immutable in the VM semantics, but now because there are compiler passes implemented that can make use of that as an invariant - cl-flet doesn't allow mutual recursion, and there is no shared state above, so there's nothing to optimize intraprocedurally. - cl-labels is implemented with closures, so (as I understand it) the native compiler would not be able to produce code if you asked it to compile the closure returned by a form like (cl-labels ((f ..) (g...) ...) f) I also mistakenly thought byte-code-vectors of the sort saved in ".elc" files would not be able to represent closures without being consed, as the components (at least the first 4) are nominally constant. But I see that closures are being implemented by calling an ordinary function that side-effects the "constants" vector. That's unfortunate because it means the optimizer cannot assume byte-vectors are constants that can be freely propagated. OTOH, prior to commit https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=d0c47652e527397cae96444c881bf60455c763c1 it looks like the closures were constructed at compile time rather than by side-effect, which would mean the VM would be expected to treat them as immutable, at least. Wedging closures into the byte-code format that works for dynamic scoping could be made to work with shared structures, but you'd need to modify print to always capture shared structure (at least for byte-code vectors), not just when there's a cycle. The approach that's been implemented only works at run-time when there's shared state between closures, at least as far asI can tell. However, it's a hack that will never really correspond closely to the semantics of shared objects with explicit tracking and load-time linking of compile-time symbols, because the relocations are already performed and there's no way to back out where they occured from the value itself. If a goal is to have a semantics in which you can 1. unambiguously specify that at load/run time a function or variable name is resolved in the compile time environment provided by a separate compilation unit as an immutable constant at run-time 2. serialize compiled closures as compilation units that provide a well-defined compile-time environment for linking 3. reduce the headaches of the compiler writer by making it easy to produce code that is eligible for their optimizations Then I think the current approach is suboptimal. The current byte-code representation is analogous to the a.out format. Because the .elc files run code on load you can put an arbitrary amount of infrastructure in there to support an implementation of compilation units with exported compile-time symbols, but it puts a lot more burden on the compiler and linker/loader writers than just being explicit would. And I'm not sure what the payoff is. When there wasn't a native compiler (and associated optimization passes), I suppose there was no pressing reason to upend backward compatibility. Then again, I've never been responsible for maintaining a 3-4 decade old application with I don't have any idea how large an installed user base ranging in size from chips running "smart" electric switches to (I assume) the biggest of "big iron", whatever that means these days. > > I'm trying to capture a function as a first class value. > > Functions are first class values and they can be trivially captured via > things like (setq foo (lambda ...)), (defalias 'foo (lambda ...)) and > a lot more, so I there's some additional constraint you're expecting but > I don't know what that is. > Yes, I thought byte-code would be treated as constant. I still think it makes a lot of sense to make it so. > > > This was not expected with lexical scope. > > You explicitly write `(require 'cl-lib)` but I don't see any > > -*- lexical-binding:t -*- > > anywhere, so I suspect you forgot to add those cookies that are needed > to get proper lexical scoping. > > Ok, wow, I really misread the NEWS for 28.1 where it said The 'lexical-binding' local variable is always enabled. As meaning "always set". My fault. > With the current byte-codes, there's just no way to express a call to > > an offset in the current byte-vector. > > Indeed, but you can call a byte-code object instead. > > Creating the byte code with shared structure was what I meant by one of the solutions being to "patch compile-time constants" at load, i.e. perform the relocations directly. The current implementation effectively inlines copies of the constants (byte-code objects), which is fine for shared code but not for shared variables. That is, the values that are assigned to my-global-oddp and my-global-evenp (for test2 after correcting the lexical-binding setting) do not reference each other. Each is created with an independent copy of the other. to [-- Attachment #2: Type: text/html, Size: 8574 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-13 16:33 ` Lynn Winebarger @ 2022-06-13 17:15 ` Stefan Monnier 2022-06-15 3:03 ` Lynn Winebarger 0 siblings, 1 reply; 46+ messages in thread From: Stefan Monnier @ 2022-06-13 17:15 UTC (permalink / raw) To: Lynn Winebarger; +Cc: Andrea Corallo, emacs-devel > To be clear, I'm trying to first understand what Andrea means by "safe". > I'm assuming it means the result agrees with whatever the byte > compiler and VM would produce for the same code. Not directly. It means that it agrees with the intended semantics. That semantics is sometimes accidentally defined by the actual implementation in the Lisp interpreter or the bytecode compiler, but that's secondary. The semantic issue is that if you call (foo bar baz) it normally (when `foo` is a global function) means you're calling the function contained in the `symbol-function` of the `foo` symbol *at the time of the function call*. So compiling this to jump directly to the code that happens to be contained there during compilation (or the code which the compiler expects to be there at that point) is unsafe in the sense that you don't know whether that symbol's `symbol-function` will really have that value when we get to executing that function call. The use of `cl-flet` (or `cl-labels`) circumvents this problem since the call to `foo` is now to a lexically-scoped function `foo`, so the compiler knows that the code that is called is always that same one (there is no way to modify it between the compilation time and the runtime). > I doubt I'm bringing up topics or ideas that are new to you. But if > I do make use of semantic/wisent, I'd like to know the result can be > fast (modulo garbage collection, anyway). It's also "modulo enough work on the compiler (and potentially some primitive functions) to make the code fast". > I've been operating under the assumption that > > - Compiled code objects should be first class in the sense that > they can be serialized just by using print and read. That seems to > have been important historically, and was true for byte-code > vectors for dynamically scoped functions. It's still true for > byte-code vectors of top-level functions, but is not true for > byte-code vectors for closures (and hasn't been for at least > a decade, apparently). It's also true for byte-compiled closures, although, inevitably, this holds only for closures that capture only serializable values. > But I see that closures are being implemented by calling an ordinary > function that side-effects the "constants" vector. I don't think that's the case. Where do you see that? The constants vector is implemented as a normal vector, so strictly speaking it is mutable, but the compiler will never generate code that mutates it, AFAIK, so you'd have to write ad-hoc code that digs inside a byte-code closure and mutates the constants vector for that to happen (and I don't know of such code out in the wild). > OTOH, prior to commit > https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=d0c47652e527397cae96444c881bf60455c763c1 > it looks like the closures were constructed at compile time rather than by > side-effect, No, this commit only changes the *way* they're constructed but not the when and both the before and the after result in constant vectors which are not side-effected (every byte-code closure gets its own fresh constants-vector). > Wedging closures into the byte-code format that works for dynamic scoping > could be made to work with shared structures, but you'd need to modify > print to always capture shared structure (at least for byte-code vectors), > not just when there's a cycle. It already does. > The approach that's been implemented only works at run-time when > there's shared state between closures, at least as far asI can tell. There can be problems if two *toplevel* definitions are serialized and they share common objects, indeed. The byte-compiler may fail to preserve the shared structure in that case, IIRC. I have some vague recollection of someone bumping into that limitation at some point, but it should be easy to circumvent. > Then I think the current approach is suboptimal. The current > byte-code representation is analogous to the a.out format. > Because the .elc files run code on load you can put an arbitrary > amount of infrastructure in there to support an implementation of > compilation units with exported compile-time symbols, but it puts > a lot more burden on the compiler and linker/loader writers than just > being explicit would. I think the practical performance issues with ELisp code are very far removed from these problems. Maybe some day we'll have to face them, but we still have a long way to go. >> You explicitly write `(require 'cl-lib)` but I don't see any >> >> -*- lexical-binding:t -*- >> >> anywhere, so I suspect you forgot to add those cookies that are needed >> to get proper lexical scoping. >> Ok, wow, I really misread the NEWS for 28.1 where it said > The 'lexical-binding' local variable is always enabled. Are you sure? How do you do that? Some of the errors you showed seem to point very squarely towards the code being compiled as dyn-bound ELisp. Stefan ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-13 17:15 ` Stefan Monnier @ 2022-06-15 3:03 ` Lynn Winebarger 2022-06-15 12:23 ` Stefan Monnier 0 siblings, 1 reply; 46+ messages in thread From: Lynn Winebarger @ 2022-06-15 3:03 UTC (permalink / raw) To: Stefan Monnier; +Cc: Andrea Corallo, emacs-devel [-- Attachment #1: Type: text/plain, Size: 10140 bytes --] On Mon, Jun 13, 2022 at 1:15 PM Stefan Monnier <monnier@iro.umontreal.ca> wrote: > > To be clear, I'm trying to first understand what Andrea means by "safe". > > I'm assuming it means the result agrees with whatever the byte > > compiler and VM would produce for the same code. > > Not directly. It means that it agrees with the intended semantics. > That semantics is sometimes accidentally defined by the actual > implementation in the Lisp interpreter or the bytecode compiler, but > that's secondary. > What I mean is, there's not really a spec defining the semantics to judge against. But every emacs has a working byte compiler, and only some have a native compiler. If the users with the byte compiler get a different result than the users that have the native compiler, my guess is that the code would be expected to be rewritten so that it produces the expected result from the byte compiler (at least until the byte compiler is revised). To the extent the byte compiler is judged to produce an incorrect result, it's probably an area of the language that was not considered well-defined enough (or useful enough) to have been used previously. Or it was known that the byte compiler's semantics weren't very useful for a particular family of expressions. > The semantic issue is that if you call > > (foo bar baz) > > it normally (when `foo` is a global function) means you're calling the > function contained in the `symbol-function` of the `foo` symbol *at the > time of the function call*. So compiling this to jump directly to the > code that happens to be contained there during compilation (or the code > which the compiler expects to be there at that point) is unsafe in > the sense that you don't know whether that symbol's `symbol-function` > will really have that value when we get to executing that function call. > > The use of `cl-flet` (or `cl-labels`) circumvents this problem since the > call to `foo` is now to a lexically-scoped function `foo`, so the > compiler knows that the code that is called is always that same one > (there is no way to modify it between the compilation time and the > runtime). > The fact that cl-flet (and cl-labels) are defined to provide immutable bindings is really a surprise to me. However, what I was trying to do originally was figure out if there was any situation where Andrea's statement (in another reply): the compiler can't take advantage of interprocedural optimizations (such > as inline etc) as every function in Lisp can be redefined in every > moment. Remember, I was asking whether concatenating a bunch of files together as a library would have the same meaning as compiling and linking the object files. There is one kind of expression where Andrea isn't quite correct, and that is with respect to (eval-when-compile ...). Those *can* be treated as constants, even without actually compiling them first. If I understand the CL-Hyperspec/Emacs Lisp manual, the following expression: ------------------------------------ (let () (eval-when-compile (defvar a (lambda (f) (lambda (x) (f (+ x 5)))))) (eval-when-compile (defvar b (lambda (y) (* y 3)))) (let ((f (eval-when-compile (a b)))) (lambda (z) (pow z (f 6))))) ------------------------------------ can be rewritten (using a new form "define-eval-time-constant") as ------------------------------------ (eval-when-compile (define-eval-time-constant ct-r1 (defvar a (lambda (f) (lambda (x) (f (+ x 5)))))) (define-eval-time-constant ct-r2 (defvar b (lambda (y) (* y 3)))) (define-eval-time-constant ct-r3 (a b))) (let () ct-r1 ct-r2 (let ((f ct-r3)) (lambda (z) (pow z (f 6))))) ------------------------------------ Now the optimizer can treat ct-r1,ct-r2, and ct-r3 as constants for the purpose of propagation, *without actually determining their value*. So this could be rewritten as ------------------------------------------- (eval-when-compile (define-eval-time-constant ct-r1 (defvar a (lambda (f) (lambda (x) (f (+ x 5)))))) (define-eval-time-constant ct-r2 (defvar b (lambda (y) (* y 3)))) (define-eval-time-constant ct-r3 (a b))) (let () (lambda (z) (pow z (ct-r3 6)))) ------------------------------------------------ If I wanted to "link" files A, B, and C together, with A exporting symbols a1,..., and b exporting symbols b1,...., I could do the following: (eval-when-compile (eval-when-compile <text of A> ) <text of B with a1,...,and replaced by (eval-when-compile a1), ....> ) <text of C with a1,... replaced by (eval-when-compile (eval-when-compile a1))... and b1,... replaced by (eval-when-compile b1),... And now the (eval-when-compile) expressions can be freely propagated within the code of each file, as they are constant expressions. I don't know how the native compiler is handling "eval-when-compile" expressions now, but this should give that optimizer pass a class of expressions where "-O3" is in fact safe to apply. Then it's just a matter of creating the macros to make producing those expressions in appropriate contexts convenient to do in practice. > I doubt I'm bringing up topics or ideas that are new to you. But if > > I do make use of semantic/wisent, I'd like to know the result can be > > fast (modulo garbage collection, anyway). > > It's also "modulo enough work on the compiler (and potentially some > primitive functions) to make the code fast". > Absolutely, it just doesn't look to me like a very big lift compared to, say, what Andrea did. > > I've been operating under the assumption that > > > > - Compiled code objects should be first class in the sense that > > they can be serialized just by using print and read. That seems to > > have been important historically, and was true for byte-code > > vectors for dynamically scoped functions. It's still true for > > byte-code vectors of top-level functions, but is not true for > > byte-code vectors for closures (and hasn't been for at least > > a decade, apparently). > > It's also true for byte-compiled closures, although, inevitably, this > holds only for closures that capture only serializable values. > > > But I see that closures are being implemented by calling an ordinary > > function that side-effects the "constants" vector. > > I don't think that's the case. Where do you see that? > My misreading, unfortunately. That does seem like a lot of copying for anyone relying on efficient closures. Does this mean the native compiled code can only produce closures in byte-code form? Assuming dlopen loads the shared object into read-only memory for execution. > > Wedging closures into the byte-code format that works for dynamic scoping > > could be made to work with shared structures, but you'd need to modify > > print to always capture shared structure (at least for byte-code > vectors), > > not just when there's a cycle. > > It already does. > > Ok, I must be missing it. I know eval_byte_code *creates* the result shown below with shared structure (the '(5)], but I don't see anything in the printed text to indicate it if read back in. (defvar z (byte-compile-sexp '(let ((lx 5)) (let ((f (lambda () lx)) (g (lambda (ly) (setq lx ly)))) `(,f ,g))))) (ppcb z) (byte-code "\300C\301\302 \"\301\303 \" D\207" [5 make-closure #[0 "\300\242\207" [V0] 1] #[257 "\300 \240\207" [V0] 3 "\n\n(fn LY)"]] 5) (defvar zv (eval z)) (ppcb zv) (#[0 "\300\242\207" [(5)] 1] #[257 "\300 \240\207" [(5)] 3 "\n\n(fn LY)"]) (defvar zvs (prin1-to-string zv)) (ppcb zvs) "(#[0 \"\\300\\242\\207\" [(5)] 1] #[257 \"\\300 \\240\\207\" [(5)] 3 \"\n\n(fn LY)\"])" (defvar zz (car (read-from-string zvs))) (ppcb zz) (#[0 "\300\242\207" [(5)] 1] #[257 "\300 \240\207" [(5)] 3 "\n\n(fn LY)"]) (let ((f (car zz)) (g (cadr zz))) (print (eq (aref (aref f 2) 0) (aref (aref g 2) 0)) (current-buffer))) nil Of course, those last bindings of f and g were just vectors, not byte-code vectors, but the (5) is no longer shared state. > > Then I think the current approach is suboptimal. The current > > byte-code representation is analogous to the a.out format. > > Because the .elc files run code on load you can put an arbitrary > > amount of infrastructure in there to support an implementation of > > compilation units with exported compile-time symbols, but it puts > > a lot more burden on the compiler and linker/loader writers than just > > being explicit would. > > I think the practical performance issues with ELisp code are very far > removed from these problems. Maybe some day we'll have to face them, > but we still have a long way to go. > I'm sure you're correct in terms of the current code base. But isn't the history of these kinds of improvements in compilers for functional languages that coding styles that had been avoided in the past can be adopted and produce faster code than the original? In this case, it would be enabling the pervasive use of recursion and less reliance on side-effects. Improvements in the gc wouldn't hurt, either. > >> You explicitly write `(require 'cl-lib)` but I don't see any > >> > >> -*- lexical-binding:t -*- > >> > >> anywhere, so I suspect you forgot to add those cookies that are needed > >> to get proper lexical scoping. > >> Ok, wow, I really misread the NEWS for 28.1 where it said > > The 'lexical-binding' local variable is always enabled. > > Are you sure? How do you do that? > Some of the errors you showed seem to point very squarely towards the > code being compiled as dyn-bound ELisp. > > My quoting wasn't very effective. That last line was actually line 2902 of NEWS.28: "** The 'lexical-binding' local variable is always enabled. Previously, if 'enable-local-variables' was nil, a 'lexical-binding' local variable would not be heeded. This has now changed, and a file with a 'lexical-binding' cookie is always heeded. To revert to the old behavior, set 'permanently-enabled-local-variables' to nil." I feel a little less silly about my optimistic misreading of the first line, at least. Lynn [-- Attachment #2: Type: text/html, Size: 15237 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-15 3:03 ` Lynn Winebarger @ 2022-06-15 12:23 ` Stefan Monnier 2022-06-19 17:52 ` Lynn Winebarger 0 siblings, 1 reply; 46+ messages in thread From: Stefan Monnier @ 2022-06-15 12:23 UTC (permalink / raw) To: Lynn Winebarger; +Cc: Andrea Corallo, emacs-devel > The fact that cl-flet (and cl-labels) are defined to provide immutable > bindings is really a surprise to me. Whether they are mutable or not is not directly relevant, tho: the import part is that being lexically scoped, the compiler gets to see all the places where it's used and can thus determine that it's ever mutated. > There is one kind of expression where Andrea isn't quite correct, and that > is with respect to (eval-when-compile ...). You don't need `eval-when-compile`. It's already "not quite correct" for lambda expressions. What he meant is that the function associated with a symbol can be changed in every moment. But if you call a function without going through such a globally-mutable indirection the problem vanishes. > Now the optimizer can treat ct-r1,ct-r2, and ct-r3 as constants for the > purpose of propagation, Same holds for (let* ((a (lambda (f) (lambda (x) (f (+ x 5))))) (b (lambda (y) (* y 3))) (f (funcall a b))) (lambda (z) (pow z (funcall f 6)))) >> It's also "modulo enough work on the compiler (and potentially some >> primitive functions) to make the code fast". > Absolutely, it just doesn't look to me like a very big lift compared to, > say, what Andrea did. It very depends on the specifics, but it's definitely not obviously true. ELisp like Python has grown around a "slow language" so its code is structured in such a way that most of the time the majority of the code that's executed is actually not ELisp but C, over which the native compiler has no impact. > Does this mean the native compiled code can only produce closures in > byte-code form? Not directly, no. But currently that's the case, yes. > below with shared structure (the '(5)], but I don't see anything in > the printed text to indicate it if read back in. You need to print with `print-circle` bound to t, like the compiler does when writing to a `.elc` file. > I'm sure you're correct in terms of the current code base. But isn't > the history of these kinds of improvements in compilers for functional > languages that coding styles that had been avoided in the past can be > adopted and produce faster code than the original? Right, but it's usually a slow co-evolution. > In this case, it would be enabling the pervasive use of recursion and > less reliance on side-effects. Not everyone would agree that "pervasive use of recursion" is an improvement. > Improvements in the gc wouldn't hurt, either. Actually, nowadays lots of benchmarks are already bumping into the GC as the main bottleneck. > ** The 'lexical-binding' local variable is always enabled. Indeed, that's misleading. Not sure how none of us noticed it before. Stefan ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-15 12:23 ` Stefan Monnier @ 2022-06-19 17:52 ` Lynn Winebarger 2022-06-19 23:02 ` Stefan Monnier 0 siblings, 1 reply; 46+ messages in thread From: Lynn Winebarger @ 2022-06-19 17:52 UTC (permalink / raw) To: Stefan Monnier; +Cc: Andrea Corallo, emacs-devel [-- Attachment #1: Type: text/plain, Size: 7169 bytes --] On Wed, Jun 15, 2022 at 8:23 AM Stefan Monnier <monnier@iro.umontreal.ca> wrote: > > There is one kind of expression where Andrea isn't quite correct, and > that > > is with respect to (eval-when-compile ...). > > You don't need `eval-when-compile`. It's already "not quite correct" > for lambda expressions. What he meant is that the function associated > with a symbol can be changed in every moment. But if you call > a function without going through such a globally-mutable indirection the > problem vanishes. > > I'm not sure what the point here is. If all programs were written with every variable and function name lexically bound, then there wouldn't be an issue. After Andrea's response to my original question, I was curious if the kind of semantic object that an ELF shared-object file *is* can be captured (directly) in the semantic model of emacs lisp, including the fact that some symbols in ELF are bound to truly immutable constants at runtime by the loader. Also, if someone were to rewrite some of the primitives now in C in Lisp and rely on the compiler for their use, would there be a way to write them with the same semantics they have now (not referencing the run-time bindings of other primitives). Based on what I've observed in this thread, I think the answer is either yes or almost yes. The one sticking point is that there is no construct for retaining the compile-time environment. If I "link" files by concatenating the source together, it's not an issue, but I can't replicate that with the results the byte-compiler currently produces. What would also be useful is some analogue to Common Lisp's package construct, but extended so that symbols could be imported from compile-time environments as immutable bindings. Now, that would be a change in the current semantics of symbols, unquestionably, but not one that would break the existing code base. It would only come into play compiling a file as a library, with semantics along the lines of: (eval-when-compile (namespace <name of library obstack>) <library code> ... (export <symbol> ...) ) Currently compiling a top-level expression wrapped in eval-when-compile by itself leaves no residue in the compiled output, but I would want to make the above evaluate to an object at run-time where the exported symbols in the obstack are immutable. Since no existing code uses the above constructs - because they are not currently defined - it would only be an extension. I don't want to restart the namespace debates - I'm not suggesting anything to do with the reader parsing symbol names spaces from prefixes in the symbol name. > >> It's also "modulo enough work on the compiler (and potentially some > >> primitive functions) to make the code fast". > > Absolutely, it just doesn't look to me like a very big lift compared to, > > say, what Andrea did. > > It very depends on the specifics, but it's definitely not obviously true. > ELisp like Python has grown around a "slow language" so its code is > structured in such a way that most of the time the majority of the code > that's executed is actually not ELisp but C, over which the native > compiler has no impact. > > That's why I said "look[s] to me", and inquired here before proceeding. Having looked more closely, it appears the most obvious safe approach, that doesn't require any ability to manipulate the C call stack, is to introduce another manually managed call stack as is done for the specpdl stack, but malloced (I haven't reviewed that implementation closely enough to tell if it is stack or heap allocated). That does complicate matters. That part would be for allowing calls to (and returns from) arbitrary points in byte-code (or native-code) instruction arrays. This would in turn enable implementing proper tail recursion as "goto with arguments". These changes would be one way to address the items in the TODO file for 28.1, starting at line 173: > * Important features > ** Speed up Elisp execution [...] > *** Speed up function calls [..] > ** Add an "indirect goto" byte-code [...] > *** Compile efficiently local recursive functions [...] As for the other elements - introducing additional registers to facilitate efficient lexical closures and namespaces - it still doesn't look like a huge lift to introduce them into the bytecode interpreter, although there is still the work to make effective use of them in the output of the compilers. I have been thinking that some additional reader syntax for what might be called "meta-evaluation quasiquotation" (better name welcome) could be useful. I haven't worked out the details yet, though. I would make #, and #,@ effectively be shorthand for eval-when-compile. Using #` inside eval-when-compile should produce an expression that, after compilation, would provide the meta-quoted expression with the semantics it would have outside an eval-when-compile form. > Does this mean the native compiled code can only produce closures in > > byte-code form? > > Not directly, no. But currently that's the case, yes. > > > below with shared structure (the '(5)], but I don't see anything in > > the printed text to indicate it if read back in. > > You need to print with `print-circle` bound to t, like the compiler does > when writing to a `.elc` file. > I feel silly again. I've *used* emacs for years, but have (mostly) avoided using emacs lisp for programming because of the default dynamic scoping and the implications that has for the efficiency of lexical closures. > > > I'm sure you're correct in terms of the current code base. But isn't > > the history of these kinds of improvements in compilers for functional > > languages that coding styles that had been avoided in the past can be > > adopted and produce faster code than the original? > > Right, but it's usually a slow co-evolution. > I don't think I've suggested anything else. I don't think my proposed changes to the byte-code VM would change the semantics of emacs LISP, just the semantics of the byte-code VM. Which you've already stated do not dictate the semantics of emacs LISP. > > In this case, it would be enabling the pervasive use of recursion and > > less reliance on side-effects. > > Not everyone would agree that "pervasive use of recursion" is an > improvement. > True, but it's still a lisp - no one is required to write code in any particular style. It would be peculiar (these days, anyway) to expect a lisp compiler to optimize imperative-style code more effectively than code employing recursion. > > Improvements in the gc wouldn't hurt, either. > > Actually, nowadays lots of benchmarks are already bumping into the GC as > the main bottleneck. > I'm not familiar with emacs's profiling facilities. Is it possible to tell how much of the allocated space/time spent in gc is due to the constant vectors of lexical closures? In particular, how much of the constant vectors are copied elements independent of the lexical environment? That would provide some measure of any gc-related benefit that *might* be gained from using an explicit environment register for closures, instead of embedding it in the byte-code vector. Lynn [-- Attachment #2: Type: text/html, Size: 9581 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-19 17:52 ` Lynn Winebarger @ 2022-06-19 23:02 ` Stefan Monnier 2022-06-20 1:39 ` Lynn Winebarger 0 siblings, 1 reply; 46+ messages in thread From: Stefan Monnier @ 2022-06-19 23:02 UTC (permalink / raw) To: Lynn Winebarger; +Cc: Andrea Corallo, emacs-devel > Currently compiling a top-level expression wrapped in > eval-when-compile by itself leaves no residue in the compiled output, `eval-when-compile` has 2 effects: 1- Run the code within the compiler's process. E.g. (eval-when-compile (require 'cl-lib)). This is somewhat comparable to loading a gcc plugin during a compilation: it affects the GCC process itself, rather than the code it emits. 2- It replaces the (eval-when-compile ...) thingy with the value returned by the evaluation of this code. So you can do (defvar my-str (eval-when-compile (concat "foo" "bar"))) and you know that the concatenation will be done during compilation. > but I would want to make the above evaluate to an object at run-time > where the exported symbols in the obstack are immutable. Then it wouldn't be called `eval-when-compile` because it would do something quite different from what `eval-when-compile` does :-) > byte-code (or native-code) instruction arrays. This would in turn enable > implementing proper tail recursion as "goto with arguments". Proper tail recursion elimination would require changing the *normal* function call protocol. I suspect you're thinking of a smaller-scale version of it specifically tailored to self-recursion, kind of like what `named-let` provides. Note that such ad-hoc TCO tends to hit the same semantic issues as the -O3 optimization of the native compiler. E.g. in code like the following: (defun vc-foo-register (file) (when (some-hint-is-true) (load "vc-foo") (vc-foo-register file))) the final call to `vc-foo-register` is in tail position but is not a self call because loading `vc-foo` is expected to redefine `vc-foo-register` with the real implementation. > I'm not familiar with emacs's profiling facilities. Is it possible to > tell how much of the allocated space/time spent in gc is due to the > constant vectors of lexical closures? In particular, how much of the > constant vectors are copied elements independent of the lexical > environment? That would provide some measure of any gc-related > benefit that *might* be gained from using an explicit environment > register for closures, instead of embedding it in the > byte-code vector. No, I can't think of any profiling tool we currently have that can help with that, sorry :-( Note that when support for native closures is added to the native compiler, it will hopefully not be using this clunky representation where capture vars are mixed in with the vector of constants, so that might be a more promising direction (may be able to skip the step where we need to change the bytecode). Stefan ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-19 23:02 ` Stefan Monnier @ 2022-06-20 1:39 ` Lynn Winebarger 2022-06-20 12:14 ` Lynn Winebarger ` (2 more replies) 0 siblings, 3 replies; 46+ messages in thread From: Lynn Winebarger @ 2022-06-20 1:39 UTC (permalink / raw) To: Stefan Monnier; +Cc: Andrea Corallo, emacs-devel [-- Attachment #1: Type: text/plain, Size: 6773 bytes --] On Sun, Jun 19, 2022 at 7:02 PM Stefan Monnier <monnier@iro.umontreal.ca> wrote: > > Currently compiling a top-level expression wrapped in > > eval-when-compile by itself leaves no residue in the compiled output, > > `eval-when-compile` has 2 effects: > > 1- Run the code within the compiler's process. > E.g. (eval-when-compile (require 'cl-lib)). > This is somewhat comparable to loading a gcc plugin during > a compilation: it affects the GCC process itself, rather than the > code it emits. > > 2- It replaces the (eval-when-compile ...) thingy with the value > returned by the evaluation of this code. So you can do (defvar > my-str (eval-when-compile (concat "foo" "bar"))) and you know that > the concatenation will be done during compilation. > > > but I would want to make the above evaluate to an object at run-time > > where the exported symbols in the obstack are immutable. > > Then it wouldn't be called `eval-when-compile` because it would do > something quite different from what `eval-when-compile` does :-) > > The informal semantics of "eval-when-compile" from the elisp info file are that This form marks BODY to be evaluated at compile time but not when the compiled program is loaded. The result of evaluation by the compiler becomes a constant which appears in the compiled program. If you load the source file, rather than compiling it, BODY is evaluated normally. I'm not sure what I have proposed that would be inconsistent with "the result of evaluation by the compiler becomes a constant which appears in the compiled program". The exact form of that appearance in the compiled program is not specified. For example, the byte-compile of (eval-when-compile (cl-labels ((f...) (g ...))) currently produces a byte-code vector in which f and g are byte-code vectors with shared structure. However, that representation is only one choice. It is inconsistent with the semantics of *symbols* as they currently stand, as I have already admitted. Even there, you could advance a model where it is not inconsistent. For example, if you view the binding of symbol to value as having two components - the binding and the cell holding the mutable value during the extent of the symbol as a global/dynamically scoped variable, then having the binding of the symbol to the final value of the cell before the dynamic extent of the variable terminates would be consistent. That's not how it's currently implemented, because there is no way to express the final compile-time environment as a value after compilation has completed with the current semantics. The part that's incompatible with current semantics of symbols is importing that symbol as an immutable symbolic reference. Not really a "variable" reference, but as a binding of a symbol to a value in the run-time namespace (or package in CL terminology, although CL did not allow any way to specify what I'm suggesting either, as far as I know). However, that would capture the semantics of ELF shared objects with the text and ro_data segments loaded into memory that is in fact immutable for a userspace program. > > byte-code (or native-code) instruction arrays. This would in turn enable > > implementing proper tail recursion as "goto with arguments". > > Proper tail recursion elimination would require changing the *normal* > function call protocol. I suspect you're thinking of a smaller-scale version of it specifically tailored to self-recursion, kind of like > what `named-let` provides. Note that such ad-hoc TCO tends to hit the same > semantic issues as the -O3 optimization of the native compiler. > E.g. in code like the following: > > (defun vc-foo-register (file) > (when (some-hint-is-true) > (load "vc-foo") > (vc-foo-register file))) > > the final call to `vc-foo-register` is in tail position but is not > a self call because loading `vc-foo` is expected to redefine > `vc-foo-register` with the real implementation. > > I'm only talking about the steps that are required to allow the compiler to produce code that implements proper tail recursion. With the abstract machine currently implemented by the byte-code VM, the "call[n]" instructions will always be needed to call out according to the C calling conventions. The call[-absolute/relative] or [goto-absolute] instructions I suggested *would be* used in the "normal" function-call protocol in place of the current funcall dispatch, at least to functions defined in lisp. This is necessary but not sufficient for proper tail recursion. To actually get proper tail recursion requires the compiler to use the instructions for implementing the appropriate function call protocol, especially if "goto-absolute" is the instruction provided for changing the PC register. Other instructions would have to be issued to manage the stack frame explicitly if that were the route taken. Or, a more CISCish call-absolute type of instruction could be used that would perform that stack frame management implicitly. EIther way, it's the compiler that has to determine whether a return instruction following a control transfer can be safely eliminated or not. If the "goto-absolute" instruction were used, the compiler would have to decide whether the address following the "goto-absolute" should be pushed in a new frame, or if it can be "pre-emptively garbage collected" at compile time because it's a tail call. > > I'm not familiar with emacs's profiling facilities. Is it possible to > > tell how much of the allocated space/time spent in gc is due to the > > constant vectors of lexical closures? In particular, how much of the > > constant vectors are copied elements independent of the lexical > > environment? That would provide some measure of any gc-related > > benefit that *might* be gained from using an explicit environment > > register for closures, instead of embedding it in the > > byte-code vector. > > No, I can't think of any profiling tool we currently have that can help > with that, sorry :-( > > Note that when support for native closures is added to the native > compiler, it will hopefully not be using this clunky representation > where capture vars are mixed in with the vector of constants, so that > might be a more promising direction (may be able to skip the step where > we need to change the bytecode). > > The trick is to make the implementation of the abstract machine by each of the compilers have enough in common to support calling one from the other. The extensions I've suggested for the byte-code VM and lisp semantics are intended to support that interoperation, so the semantics of the byte-code implementation won't unnecessarily constrain the semantics of the native-code implementation. Lynn [-- Attachment #2: Type: text/html, Size: 8690 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-20 1:39 ` Lynn Winebarger @ 2022-06-20 12:14 ` Lynn Winebarger 2022-06-20 12:34 ` Lynn Winebarger 2022-06-25 18:12 ` Lynn Winebarger 2 siblings, 0 replies; 46+ messages in thread From: Lynn Winebarger @ 2022-06-20 12:14 UTC (permalink / raw) To: Stefan Monnier; +Cc: Andrea Corallo, emacs-devel [-- Attachment #1: Type: text/plain, Size: 1463 bytes --] On Sun, Jun 19, 2022, 9:39 PM Lynn Winebarger <owinebar@gmail.com> wrote: > The part that's incompatible with current semantics of symbols is > importing that symbol as > an immutable symbolic reference. Not really a "variable" reference, but > as a binding > of a symbol to a value in the run-time namespace (or package in CL > terminology, although > CL did not allow any way to specify what I'm suggesting either, as far as > I know). > An alternative would be to extend the semantics of symbols with two additional immutable bindings - one for constant values and another for constant functions. These would be shadowed by the mutable bindings during evaluation, then (if unset) be bound to the final value assigned to the mutable bindings when the namespace is finalized. Then, when a symbol is imported from a compile time environment, the import would be to the constant (value or function) binding, which could be shadowed by an evaluation-time variable/function. That should qualify as a consistent extension of the current semantics rather than a modification. It would be a lisp-4 instead of a lisp-2. Personally, I'd also like to have a way to define a global variable that does not modify the lexical scoping of let for that variable. Say, "defstatic" - corresponding to a variable with static global storage. I kind of hate that the semantics of "let" (or lambda parameters) are determined by the global state at evaluation time. Lynn > [-- Attachment #2: Type: text/html, Size: 2298 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-20 1:39 ` Lynn Winebarger 2022-06-20 12:14 ` Lynn Winebarger @ 2022-06-20 12:34 ` Lynn Winebarger 2022-06-25 18:12 ` Lynn Winebarger 2 siblings, 0 replies; 46+ messages in thread From: Lynn Winebarger @ 2022-06-20 12:34 UTC (permalink / raw) To: Stefan Monnier; +Cc: Andrea Corallo, emacs-devel [-- Attachment #1: Type: text/plain, Size: 3323 bytes --] On Sun, Jun 19, 2022 at 9:39 PM Lynn Winebarger <owinebar@gmail.com> wrote: > On Sun, Jun 19, 2022 at 7:02 PM Stefan Monnier <monnier@iro.umontreal.ca> > wrote: > >> >> Proper tail recursion elimination would require changing the *normal* >> function call protocol. I suspect you're thinking of a smaller-scale > > version of it specifically tailored to self-recursion, kind of like >> what `named-let` provides. Note that such ad-hoc TCO tends to hit the >> same >> semantic issues as the -O3 optimization of the native compiler. >> E.g. in code like the following: >> >> (defun vc-foo-register (file) >> (when (some-hint-is-true) >> (load "vc-foo") >> (vc-foo-register file))) >> >> the final call to `vc-foo-register` is in tail position but is not >> a self call because loading `vc-foo` is expected to redefine >> `vc-foo-register` with the real implementation. >> >> I'm only talking about the steps that are required to allow the compiler > to > produce code that implements proper tail recursion. > With the abstract machine currently implemented by the byte-code VM, > the "call[n]" instructions will always be needed to call out according to > the C calling conventions. > The call[-absolute/relative] or [goto-absolute] instructions I suggested > *would be* used in the "normal" function-call protocol in place of the > current > funcall dispatch, at least to functions defined in lisp. > This is necessary but not sufficient for proper tail recursion. > To actually get proper tail recursion requires the compiler to use the > instructions > for implementing the appropriate function call protocol, especially if > "goto-absolute" is the instruction provided for changing the PC register. > Other instructions would have to be issued to manage the stack frame > explicitly if that were the route taken. Or, a more CISCish call-absolute > type of instruction could be used that would perform that stack frame > management implicitly. > EIther way, it's the compiler that has to determine whether a return > instruction following a control transfer can be safely eliminated or not. > If the "goto-absolute" instruction were used, the compiler would > have to decide whether the address following the "goto-absolute" > should be pushed in a new frame, or if it can be "pre-emptively > garbage collected" at compile time because it's a tail call. > > For the record, my point of reference for a classic implementation of efficient lexical closures and proper tail recursion is Clinger's TwoBit compiler for Larceny Scheme, and the associated "MacScheme" abstract machine: https://www.larcenists.org/twobit.html. That system is implemented in several variants. Each has a well-defined mapping of the state of the MacScheme machine state to the actual machine state for compiled code. That system does not have the constraint of having a byte-code interpreter and native-code implementation co-existing, but if they do coexist and are expected to be able to call each other with the "normal" (lisp, not C) calling conventions, defining the abstract machine state that has to be maintained between calls would be a key step. If calling between byte-code and native-code is expected to have the same overhead as calling between lisp and C, then I suppose that's not necessary. Lynn [-- Attachment #2: Type: text/html, Size: 4713 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-20 1:39 ` Lynn Winebarger 2022-06-20 12:14 ` Lynn Winebarger 2022-06-20 12:34 ` Lynn Winebarger @ 2022-06-25 18:12 ` Lynn Winebarger 2022-06-26 14:14 ` Lynn Winebarger 2 siblings, 1 reply; 46+ messages in thread From: Lynn Winebarger @ 2022-06-25 18:12 UTC (permalink / raw) To: Stefan Monnier; +Cc: Andrea Corallo, emacs-devel [-- Attachment #1: Type: text/plain, Size: 4052 bytes --] On Sun, Jun 19, 2022, 9:39 PM Lynn Winebarger <owinebar@gmail.com> wrote: > > > On Sun, Jun 19, 2022 at 7:02 PM Stefan Monnier <monnier@iro.umontreal.ca> > wrote: > >> > Currently compiling a top-level expression wrapped in >> > eval-when-compile by itself leaves no residue in the compiled output, >> >> `eval-when-compile` has 2 effects: >> >> 1- Run the code within the compiler's process. >> E.g. (eval-when-compile (require 'cl-lib)). >> This is somewhat comparable to loading a gcc plugin during >> a compilation: it affects the GCC process itself, rather than the >> code it emits. >> >> 2- It replaces the (eval-when-compile ...) thingy with the value >> returned by the evaluation of this code. So you can do (defvar >> my-str (eval-when-compile (concat "foo" "bar"))) and you know that >> the concatenation will be done during compilation. >> >> > but I would want to make the above evaluate to an object at run-time >> > where the exported symbols in the obstack are immutable. >> >> Then it wouldn't be called `eval-when-compile` because it would do >> something quite different from what `eval-when-compile` does :-) >> >> > The informal semantics of "eval-when-compile" from the elisp info file are > that > This form marks BODY to be evaluated at compile time but not when > the compiled program is loaded. The result of evaluation by the > compiler becomes a constant which appears in the compiled program. > If you load the source file, rather than compiling it, BODY is > evaluated normally. > I'm not sure what I have proposed that would be inconsistent with "the > result of evaluation > by the compiler becomes a constant which appears in the compiled program". > The exact form of that appearance in the compiled program is not specified. > For example, the byte-compile of (eval-when-compile (cl-labels ((f...) (g > ...))) > currently produces a byte-code vector in which f and g are byte-code > vectors with > shared structure. However, that representation is only one choice. > > It is inconsistent with the semantics of *symbols* as they currently > stand, as I have already admitted. > Even there, you could advance a model where it is not inconsistent. For > example, > if you view the binding of symbol to value as having two components - the > binding and the cell > holding the mutable value during the extent of the symbol as a > global/dynamically scoped variable, > then having the binding of the symbol to the final value of the cell > before the dynamic extent of the variable > terminates would be consistent. That's not how it's currently > implemented, because there is no way to > express the final compile-time environment as a value after compilation > has completed with the > current semantics. > > The part that's incompatible with current semantics of symbols is > importing that symbol as > an immutable symbolic reference. Not really a "variable" reference, but > as a binding > of a symbol to a value in the run-time namespace (or package in CL > terminology, although > CL did not allow any way to specify what I'm suggesting either, as far as > I know). > > However, that would capture the semantics of ELF shared objects with the > text and ro_data > segments loaded into memory that is in fact immutable for a userspace > program. > It looks to me like the portable dump code/format could be adapted to serve the purpose I have in mind here. What needs to be added is a way to limit the scope of the dump so only the appropriate set of objects are captured. There would probably also need to be a separate load-path for these libraries similar to the approach employed for native compiled files. It could be neat if all LISP code and constants eventually lived in some larger associated compilation units (scope-limited pdmp file), to have a residual dump at any time of the remaining live objects, most corresponding to the space of global/dynamic variables. That could in turn be used for local debugging or in actual bug reporting. Lynn [-- Attachment #2: Type: text/html, Size: 5245 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-25 18:12 ` Lynn Winebarger @ 2022-06-26 14:14 ` Lynn Winebarger 0 siblings, 0 replies; 46+ messages in thread From: Lynn Winebarger @ 2022-06-26 14:14 UTC (permalink / raw) To: Stefan Monnier; +Cc: Andrea Corallo, emacs-devel [-- Attachment #1: Type: text/plain, Size: 8479 bytes --] On Sat, Jun 25, 2022, 2:12 PM Lynn Winebarger <owinebar@gmail.com> wrote: > The part that's incompatible with current semantics of symbols is >> importing that symbol as >> an immutable symbolic reference. Not really a "variable" reference, but >> as a binding >> of a symbol to a value in the run-time namespace (or package in CL >> terminology, although >> CL did not allow any way to specify what I'm suggesting either, as far as >> I know). >> >> However, that would capture the semantics of ELF shared objects with the >> text and ro_data >> segments loaded into memory that is in fact immutable for a userspace >> program. >> > > It looks to me like the portable dump code/format could be adapted to > serve the purpose I have in mind here. What needs to be added is a way to > limit the scope of the dump so only the appropriate set of objects are > captured. > I'm going to start with a copy of pdumper.c and pdumper.h renamed to ndumper (n for namespace). The pdmp format conceptually organizes the emacs executable space into a graph with three nodes - an "Emacs executable" node (or the temacs text and ro sections), "Emacs static" (sections of the executable loaded into writeable memory), and a "dump" node, corresponding to heap-allocated objects that were live at the time of the dump. The dump node has relocations that can point into itself or to the emacs executable, and "discardable" relocations for values instantiated into the "Emacs static". While the data structure doesn't require it, the only values saved from the Emacs static data are symbols, primitive subrs (not native compiled), and the thread structure for the main thread. There can be cycles between these nodes in the memory graph, but cutting the edge[s] between the emacs executable and the Emacs static nodes yields a DAG. Note, pdumper does not make the partition I'm describing explicitly. I'm inferring that there must be such a partition. The discardable relocations should be ones that instantiate into static data of the temacs executable. My plan is to refine the structure of the Emacs process introduced by pdumper to yield a namespace graph structure with the same property - cutting the edge from executable to runtime state yields a DAG whose only root is the emacs executable. Each ndmp namespace (or module or cl-package) would have its own symbol table and a unique namespace identifier, with a runtime mapping to the file backing it (if loaded from a file). Interned symbols will be extended with three additional properties: static value, constant value and constant function. For variables, scope resolution will be done at compile time: * Value if not void (undefined), else * Static value A constant symbol is referenced by importing a constant symbol, either from another namespace or a variable in the current namespace's compile-time environment. The attempt at run-time to rebind a symbol bound by an import form will signal an error. Multiple imports binding a particular symbol at run-time will effectively cause the shadowing of an earlier binding by the later binding. Any sequence of imports and other forms that would result in the ambiguity of the resolution of a particular variable at compile time will signal an error. That is, a given symbol will have only one associated binding in the namespace scope during a particular evaluation time (eval, compile, compile-compile, etc) A static value binding will be global but not dynamic. A constant value binding will result from an export form in an eval-when-compile form encountered while compiling the source of the ndmp module. Since static bindings capture the "global" aspect of the current semantics of special variable bindings, dynamic scope can be safely restricted to provide thread-local semantics. Instantiation of a compiled ndmp object will initialize the bindings to be consistent with the current semantics of defvar and setq in global scope, as well as the separation of compile-time and eval-time variable bindings. [I am not certain what the exact approach will be to ensure that will be]. Note constant bindings are only created by "importing" from the compile-time environment through eval-when-compile under the current semantics model. This approach simply avoids the beta substitution of compile-time variable references performed in the current implementation of eval-when-compile semantics. Macro expansion is still available to insert such values directly in forms from the compile-time environment. A function symbol will resolve to the function property if not void, and the constant function property otherwise. Each ndmp module will explicitly identify the symbols it exports, and those it imports. The storage of variable bindings for unexported symbols will not be directly referenceable from any other namespace. Constant bindings may be enforced by loading into a read-only page of memory, a write barrier implemented by the system, or unenforced. In other words, attempting to set a constant binding is an error with unspecified effect. Additional declarations may be provided to require the signaling of an error, the enforcement of constancy (without an error), both, or neither. The storage of static and constant variables may or may not be incorporated directly in the symbol object. For example, such storage may be allocated using separate hash tables for static and constant symbol tables to reduce the allocation of space for variables without a static or constant binding. When compiling a form that imports a symbol from an ndmp module, importing in an eval-when-compile context will resolve to the constant value binding of the symbol, as though the source forms were concatenated during compilation to have a single compile time environment. Otherwise, the resolution will proceed as described above. There will be a distinguished ndmp object that contains relocations instantiated into the Emacs static nodes, serving the baseline function of pdmp. There will also be a distinguished ndmp object "ELISP" that exports all the primitives of Emacs lisp. The symbols of this namespace will be implicitly imported into every ndmp unless overridden by a special form to be specified. In this way, a namespace may use an alternative lisp semantic model, eg CL. Additonal forms for importing symbols from other namespaces remain to be specified. Ideally the byte code vm would be able to treat an ndmp object as an extended byte code vector, but the restriction of the byte-codes to 16-bit addressing is problematic. For 64-bit machines, the ndmp format will restrict the (stored) addresses to 32 bits, and use the remaining bits of relocs not already used for administrative purposes as an index into a vector of imported namespaces in the ndmp file itself, where the 0 value corresponds to an "un-interned" namespace that is not backed by a (permanent) file. I don't know what the split should be in 32-bit systems (without the wide-int option). The interpretation of the bits is specific to file-backed compiled namespaces, so it may restrict the number of namespace imports in a compiled object without restricting the number of namespaces imported in the runtime namespace. Once implemented, this functionality should significantly reduce the need for a monolithic dump or "redumping" functionality. Or rather, "dumping" will be done incrementally. My ultimate goal is to introduce a clean way to express a compiled object that has multiple code labels, and a mechanism to call or jump to them directly, so that the expressible control-flow structure of native and byte compiled code will be equivalent (I believe the technical term is that there will be a bisimulation between their operational semantics, but it's been a while). An initial version might move in this direction by encoding the namespaces using a byte-code vector to trampoline to the code-entry points, but this would not provide a bisimulation. Eventually, the byte-code VM and compiler will have to be modified to make full use of ndmp objects as primary semantic objects without intermediation through byte-code vectors as currently implemented. If there's an error in my interpretation of current implementation (particular pdumper), I'd be happy to find out about it now. As a practical matter, I've been working with the 28.1 source. Am I better off continuing with that, or starting from a more recent commit to the main branch? Lynn [-- Attachment #2: Type: text/html, Size: 10355 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: native compilation units 2022-06-04 2:43 ` Lynn Winebarger 2022-06-04 14:32 ` Stefan Monnier @ 2022-06-08 6:46 ` Andrea Corallo 1 sibling, 0 replies; 46+ messages in thread From: Andrea Corallo @ 2022-06-08 6:46 UTC (permalink / raw) To: Lynn Winebarger; +Cc: Stefan Monnier, emacs-devel Lynn Winebarger <owinebar@gmail.com> writes: [...] > From Andrea's description, this would be the primary "unsafe" aspect of intraprocedural optimizations applied to one of > these aggregated compilation units. That is, that the semantics of redefining function symbols would not apply to points > in the code at which the compiler had made optimizations based on assuming the function definitions were constants. It's > not clear to me whether those points are limited to call sites or not. Yes, they are limited to the call site. Andrea ^ permalink raw reply [flat|nested] 46+ messages in thread
end of thread, other threads:[~2022-06-26 14:14 UTC | newest] Thread overview: 46+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-05-31 1:02 native compilation units Lynn Winebarger 2022-06-01 13:50 ` Andrea Corallo 2022-06-03 14:17 ` Lynn Winebarger 2022-06-03 16:05 ` Eli Zaretskii [not found] ` <CAM=F=bDxxyHurxM_xdbb7XJtP8rdK16Cwp30ti52Ox4nv19J_w@mail.gmail.com> 2022-06-04 5:57 ` Eli Zaretskii 2022-06-05 13:53 ` Lynn Winebarger 2022-06-03 18:15 ` Stefan Monnier 2022-06-04 2:43 ` Lynn Winebarger 2022-06-04 14:32 ` Stefan Monnier 2022-06-05 12:16 ` Lynn Winebarger 2022-06-05 14:08 ` Lynn Winebarger 2022-06-05 14:46 ` Stefan Monnier 2022-06-05 14:20 ` Stefan Monnier 2022-06-06 4:12 ` Lynn Winebarger 2022-06-06 6:12 ` Stefan Monnier 2022-06-06 10:39 ` Eli Zaretskii 2022-06-06 16:23 ` Lynn Winebarger 2022-06-06 16:58 ` Eli Zaretskii 2022-06-07 2:14 ` Lynn Winebarger 2022-06-07 10:53 ` Eli Zaretskii 2022-06-06 16:13 ` Lynn Winebarger 2022-06-07 2:39 ` Lynn Winebarger 2022-06-07 11:50 ` Stefan Monnier 2022-06-07 13:11 ` Eli Zaretskii 2022-06-14 4:19 ` Lynn Winebarger 2022-06-14 12:23 ` Stefan Monnier 2022-06-14 14:55 ` Lynn Winebarger 2022-06-08 6:56 ` Andrea Corallo 2022-06-11 16:13 ` Lynn Winebarger 2022-06-11 16:37 ` Stefan Monnier 2022-06-11 17:49 ` Lynn Winebarger 2022-06-11 20:34 ` Stefan Monnier 2022-06-12 17:38 ` Lynn Winebarger 2022-06-12 18:47 ` Stefan Monnier 2022-06-13 16:33 ` Lynn Winebarger 2022-06-13 17:15 ` Stefan Monnier 2022-06-15 3:03 ` Lynn Winebarger 2022-06-15 12:23 ` Stefan Monnier 2022-06-19 17:52 ` Lynn Winebarger 2022-06-19 23:02 ` Stefan Monnier 2022-06-20 1:39 ` Lynn Winebarger 2022-06-20 12:14 ` Lynn Winebarger 2022-06-20 12:34 ` Lynn Winebarger 2022-06-25 18:12 ` Lynn Winebarger 2022-06-26 14:14 ` Lynn Winebarger 2022-06-08 6:46 ` Andrea Corallo
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.