Ludovic Courtès schreef op zo 05-09-2021 om 00:04 [+0200]: > Maxime Devos skribis: > > > +(define-syntax search-patch > > + (lambda (s) > > + "Search the patch FILE-NAME and compute its hash at expansion time > > +if possible. Return #f if not found." > > + (syntax-case s () > > + ((_ file-name) > > + (string? (syntax->datum #'file-name)) > > + ;; FILE-NAME is a constant string, so the hash can be computed > > + ;; in advance. > > + (let ((patch (try-search-patch (syntax->datum #'file-name)))) > > + (if patch > > + #`(%local-patch-file file-name #,(file-hash* patch #:select? true)) > > + (begin > > + (warning (source-properties->location > > + (syntax-source #'file-name)) > > + (G_ "~a: patch not found at expansion time") > > + (syntax->datum #'ile-name)) > > + #'(%search-patch file-name))))) > > + ;; FILE-NAME is variable, so the hash cannot be pre-computed. > > + ((_ file-name) #'(%search-patch file-name)) > > + ;; search-patch is being used used in a construct like > > + ;; (map search-patch ...). > > + (id (identifier? #'id) #'%search-patch)))) > > It’s clever… but also a bit evil, in that it changes the semantics of > package files in a surprising way. Modifying foo.patch without > recompiling foo.scm would lead you to still use the old foo.patch, which > can be rather off-putting and error-prone IMO. I added two patches adding (limited) dependency tracking to compile-all.scm. If a patch file is now modified or deleted, the corresponding package modules will be recompiled. This should remove the ‘evilness’ I think. > To address this, ‘local-file’ could store the inode/mtime + computed > store file name (rather than the SHA256). ‘local-file-compiler’ would > check whether the actual file has matching inode/mtime before returning > the computed store file name. Problem is that inode/mtime are > guaranteed to differ once you’ve run “make install”. :-/ An additional problem is that 'local-file-compiler' would have to 'stat' the file even if it is already in the store, undoing the (fairly limited?) performance gains of this patch series. The dependency tracking avoids this. > Intuitively, I’d have imagined a cache populated at run time; it would > map, say, file name/inode/mtime to a store file name. ‘add-to-store’ > (or some wrapper above it) would check the cache and return the store > file name directly, unless ‘valid-path?’ says it no longer exists. > Downside is that this would be a per-user cache and you’d still pay the > cost until it’s warm. Advantage is that you could easily tell whether > it’s stale. > > Thoughts? Intuitively, I'd have imagined doing as much as possible at compilation time. The cost at compilation is only paid once (or, more correctly, at every "guix pull"), while if you delay things until runtime, you need to check the caches. With this patch series (+ the two patches mentioned previously), the ‘cache’ is always fresh (though possibly not warm: the patch might not yet be in the store). Ludovic Courtès schreef op za 04-09-2021 om 23:47 [+0200]: > Hi! > > Some initial comments… > > Maxime Devos skribis: > > > +++ b/guix/gexp.scm > > @@ -531,13 +531,37 @@ appears." > > (define-gexp-compiler (local-file-compiler (file ) system target) > > [...] > > + (if sha256 > > + (let ((path (fixed-output-path name sha256 #:recursive? recursive?))) > > + ;; If the hash is known in advance and the store already has the > > + ;; item, there is no need to intern the file. > > + (if (file-exists? path) > > + (mbegin %store-monad > > + ;; Tell the GC that PATH will be used, such that it won't > > + ;; be deleted. > > + ((store-lift add-temp-root) path) > > + ;; The GC could have deleted the item before add-temp-root > > + ;; completed, so check again if PATH exists. > > + (if (file-exists? path) > > + (return path) > > + ;; If it has been removed, fall-back interning. > > + (intern))) > > + ;; If PATH does not yet exist, fall back to interning. > > + (intern))) > > + (intern)))))) > > ‘file-exists?’ won’t work when talking to a remote store (e.g., > GUIX_DAEMON_SOCKET=ssh://…). > > ‘add-temp-root’ doesn’t throw if the given store item does not exist. > So it could be written like this: > > (if sha256 > (mbegin %store-monad > (add-temp-root* item) > (if (valid-path?* item) > (return item) > (intern))) > (intern)) Done in the v2. > But then, we’d add one RPC for every ‘add-to-store’ RPC corresponding to > a patch (you can set “GUIX_PROFILING=rpc” to see the numbers), which is > not great. > > Ludo’. Note that 'intern' is only called if the patch isn't yet in the store. In practice, the patch is almost always already in the store. For example, suppose I have a few packages in my profile. As the packages are in my profile, they had to have their derivation computed at some point, so the corresponding patches had to be interned. If I now run "guix pull && guix package -u", when computing the derivation of the updated profile, most required patches are already in the store, because patches don't change often. Likewise, if I run "guix environment guix" in one terminal, then in another, then in yet another ... possibly for the first invocation, some patches need to be interned, but for the other invocations, it's already in the store. Because fixed-output-path is now called more often, I've added a patch optimising (guix base32). Let's compare the numbers again! This time, I've run echo powersave |sudo tee /sys/devices/system/cpu/cpufreq/policy{0,1,2,3}/scaling_governor to make sure the CPU frequency doesn't change. On a hot (disk) cache: # After the patch series time GUIX_PROFILING="rpc gc" ./the-optimised-guix/bin/guix build -d --no-grafts pigx Remote procedure call summary: 5949 RPCs built-in-builders ... 1 add-to-store ... 3 add-to-store/tree ... 26 add-temp-root ... 195 valid-path? ... 195 add-text-to-store ... 5529 Garbage collection statistics: heap size: 93.85 MiB allocated: 312.04 MiB GC times: 17 time spent in GC: 3.34 seconds (25% of user time) # averaged over three runs real 0m14,035s user 0m14,138s sys 0m0,650s # Before the patch series time GUIX_PROFILING="rpc gc" ./the-unoptimised-guix/bin/guix build -d --no-grafts pigx /gnu/store/fq6x8d2vcm6sbjkimg7g8kcgb4c5xv1b-pigx-0.0.3.drv Remote procedure call summary: 5749 RPCs built-in-builders ... 1 add-to-store/tree ... 26 add-to-store ... 193 add-text-to-store ... 5529 Garbage collection statistics: heap size: 93.85 MiB allocated: 325.24 MiB GC times: 18 time spent in GC: 3.66 seconds (26% of user time) real 0m13,700s user 0m14,051s sys 0m0,658s So on a hot disk cache, there doesn't appear to be any improvement (except for ‘time spent in GC’ -- presumably that's due to the optimisations to guix/base32.scm). What about a cold cache? # After the patch series sync && echo 3 | sudo tee /proc/sys/vm/drop_caches ./the-optimised-guix/bin/guix --help time GUIX_PROFILING="rpc gc" ./the-optimised-guix/bin/guix build -d --no-grafts pigx /gnu/store/fq6x8d2vcm6sbjkimg7g8kcgb4c5xv1b-pigx-0.0.3.drv Remote procedure call summary: 5949 RPCs built-in-builders ... 1 add-to-store ... 3 add-to-store/tree ... 26 add-temp-root ... 195 valid-path? ... 195 add-text-to-store ... 5529 Garbage collection statistics: heap size: 93.85 MiB allocated: 312.03 MiB GC times: 17 time spent in GC: 3.37 seconds (23% of user time) real 1m39,178s user 0m14,557s sys 0m0,990s # Before the patch series sync && echo 3 | sudo tee /proc/sys/vm/drop_caches ./the-unoptimised-guix/bin/guix --help time GUIX_PROFILING="rpc gc" ./the-unoptimised-guix/bin/guix build -d --no-grafts pigx Remote procedure call summary: 5749 RPCs built-in-builders ... 1 add-to-store/tree ... 26 add-to-store ... 193 add-text-to-store ... 5529 Garbage collection statistics: heap size: 93.85 MiB allocated: 325.25 MiB GC times: 18 time spent in GC: 3.63 seconds (25% of user time) real 1m42,100s user 0m14,690s sys 0m1,127s It seems that if the disk cache is cold, the time-to-derivation decreases a little by this patch series. Much less than I had hoped for though; I'll have to look into other areas for interesting performance gains ... Greetings, Maxime.