unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#55499: excessively large manifests due to propagation
@ 2022-05-18 13:53 Ricardo Wurmus
  2022-05-24 15:30 ` Ludovic Courtès
  2022-05-31 16:09 ` bug#55499: [PATCH 0/3] Make 'manifest' files more compact Ludovic Courtès
  0 siblings, 2 replies; 10+ messages in thread
From: Ricardo Wurmus @ 2022-05-18 13:53 UTC (permalink / raw)
  To: 55499; +Cc: Ludovic Courtès

Packages of some languages rely heavily on propagation.  R is one
example.  Since the generated “manifest” file of a Guix profile records
entries for all propagated packages, this can get really big really
quickly.

A profile consisting only of four R packages (r-seurat, r-cistopic,
r-monocle3, and r-cicero-monocle3) results in a “manifest” file that
weighs 7.1MB.  At the MDC I repeatedly encountered manifest files that
are exceeding 24MB.

Simply reading that big a file with

    (call-with-input-file "huge-manifest" read)

takes several seconds.  On the MDC cluster I observed a delay of about
27 seconds.

Disabling read positions with (read-disable 'positions) significantly
speeds this up (18s vs 27s), but it’s still very slow.

We may be able to speed things up by supporting definitions or
references in manifest files, so that we don’t need to repeat a sub-tree
when generating the file.

-- 
Ricardo




^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#55499: excessively large manifests due to propagation
  2022-05-18 13:53 bug#55499: excessively large manifests due to propagation Ricardo Wurmus
@ 2022-05-24 15:30 ` Ludovic Courtès
  2022-05-25  4:35   ` Ricardo Wurmus
  2022-05-31 16:09 ` bug#55499: [PATCH 0/3] Make 'manifest' files more compact Ludovic Courtès
  1 sibling, 1 reply; 10+ messages in thread
From: Ludovic Courtès @ 2022-05-24 15:30 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: 55499

Hi!

Ricardo Wurmus <rekado@elephly.net> skribis:

> A profile consisting only of four R packages (r-seurat, r-cistopic,
> r-monocle3, and r-cicero-monocle3) results in a “manifest” file that
> weighs 7.1MB.  At the MDC I repeatedly encountered manifest files that
> are exceeding 24MB.

Commit 93f601d97ca2d9b82c41afeb86879ee37eae39e6 provides a 12% size
reduction on this example, and it’s backward-compatible and cheap.

I’ll try and follow up with changes along the lines you describe.

Ludo’.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#55499: excessively large manifests due to propagation
  2022-05-24 15:30 ` Ludovic Courtès
@ 2022-05-25  4:35   ` Ricardo Wurmus
  0 siblings, 0 replies; 10+ messages in thread
From: Ricardo Wurmus @ 2022-05-25  4:35 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 55499


Ludovic Courtès <ludo@gnu.org> writes:

> Hi!
>
> Ricardo Wurmus <rekado@elephly.net> skribis:
>
>> A profile consisting only of four R packages (r-seurat, r-cistopic,
>> r-monocle3, and r-cicero-monocle3) results in a “manifest” file that
>> weighs 7.1MB.  At the MDC I repeatedly encountered manifest files that
>> are exceeding 24MB.
>
> Commit 93f601d97ca2d9b82c41afeb86879ee37eae39e6 provides a 12% size
> reduction on this example, and it’s backward-compatible and cheap.

Excellent!  This is a great first step.

> I’ll try and follow up with changes along the lines you describe.

Thank you!

-- 
Ricardo




^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#55499: [PATCH 0/3] Make 'manifest' files more compact
  2022-05-18 13:53 bug#55499: excessively large manifests due to propagation Ricardo Wurmus
  2022-05-24 15:30 ` Ludovic Courtès
@ 2022-05-31 16:09 ` Ludovic Courtès
  2022-05-31 16:09   ` bug#55499: [PATCH 1/3] tests: Augment profile collision test Ludovic Courtès
                     ` (3 more replies)
  1 sibling, 4 replies; 10+ messages in thread
From: Ludovic Courtès @ 2022-05-31 16:09 UTC (permalink / raw)
  To: 55499; +Cc: Ludovic Courtès, Ricardo Wurmus

Hello,

These patches implement what you suggested on IRC: not repeating
entire manifest entries and their propagated inputs.  This has a
dramatic impact on the size of the ‘manifest’ file and on the memory
and processing time to read it for the the use case you gave.

The second patch goes a tiny bit further by making the ‘search-paths’
and ‘propagated-inputs’ fields optional, shaving another ~10% on the
size of ‘manifest’ in this example.

The second patch should be squashed with the first one (so we don’t
bump version formats a second time and duplicate code).  It’s kinda
optional because it doesn’t bring much compared to the first patch and
causes a bit of extra complexity, but maybe it’s still worth keeping?

Could you try this on your larger use cases and tell me how it goes?

Thanks,
Ludo’.

Ludovic Courtès (3):
  tests: Augment profile collision test.
  profiles: Do not repeat entries in 'manifest' file.
  squash! profiles: Make all entry fields optional.

 guix/build/profiles.scm |  32 ++++++++--
 guix/profiles.scm       | 137 ++++++++++++++++++++++++++++++++--------
 tests/profiles.scm      |  52 ++++++++++++++-
 3 files changed, 187 insertions(+), 34 deletions(-)


base-commit: fed51b26141548a5bae349a5e1d8d6f681320f4f
-- 
2.36.1





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#55499: [PATCH 1/3] tests: Augment profile collision test.
  2022-05-31 16:09 ` bug#55499: [PATCH 0/3] Make 'manifest' files more compact Ludovic Courtès
@ 2022-05-31 16:09   ` Ludovic Courtès
  2022-05-31 16:09   ` bug#55499: [PATCH 2/3] profiles: Do not repeat entries in 'manifest' file Ludovic Courtès
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Ludovic Courtès @ 2022-05-31 16:09 UTC (permalink / raw)
  To: 55499; +Cc: Ludovic Courtès

* tests/profiles.scm ("collision of propagated inputs"): Check the
parents of ENTRY1 and ENTRY2.
---
 tests/profiles.scm | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/tests/profiles.scm b/tests/profiles.scm
index d59d75985f..7e51d37ab9 100644
--- a/tests/profiles.scm
+++ b/tests/profiles.scm
@@ -556,14 +556,20 @@ (define (entry->sexp entry)
         (return #f)))))
 
 (test-equal "collision of propagated inputs"
-  '(("guile-bootstrap" "2.0") ("guile-bootstrap" "42"))
+  '(("guile-bootstrap" "2.0") "p1"
+    <> ("guile-bootstrap" "42") "p2")
   (guard (c ((profile-collision-error? c)
              (let ((entry1 (profile-collision-error-entry c))
                    (entry2 (profile-collision-error-conflict c)))
                (list (list (manifest-entry-name entry1)
                            (manifest-entry-version entry1))
+                     (manifest-entry-name
+                      (force (manifest-entry-parent entry1)))
+                     '<>
                      (list (manifest-entry-name entry2)
-                           (manifest-entry-version entry2))))))
+                           (manifest-entry-version entry2))
+                     (manifest-entry-name
+                      (force (manifest-entry-parent entry2)))))))
     (run-with-store %store
       (mlet* %store-monad ((p0 -> (package
                                     (inherit %bootstrap-guile)
-- 
2.36.1





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#55499: [PATCH 2/3] profiles: Do not repeat entries in 'manifest' file.
  2022-05-31 16:09 ` bug#55499: [PATCH 0/3] Make 'manifest' files more compact Ludovic Courtès
  2022-05-31 16:09   ` bug#55499: [PATCH 1/3] tests: Augment profile collision test Ludovic Courtès
@ 2022-05-31 16:09   ` Ludovic Courtès
  2022-05-31 17:35     ` Maxime Devos
  2022-05-31 16:09   ` bug#55499: [PATCH 3/3] squash! profiles: Make all entry fields optional Ludovic Courtès
  2022-06-14  7:06   ` bug#55499: excessively large manifests due to propagation Ludovic Courtès
  3 siblings, 1 reply; 10+ messages in thread
From: Ludovic Courtès @ 2022-05-31 16:09 UTC (permalink / raw)
  To: 55499; +Cc: Ludovic Courtès

Fixes <https://issues.guix.gnu.org/55499>.
Reported by Ricardo Wurmus <rekado@elephly.net>.

With this change, the manifest file created for:

  guix install r r-seurat r-cistopic r-monocle3 r-cicero-monocle3 r-assertthat

goes from 5.6M to 192K.  Likewise, on this profile, wall-clock time of:

  GUIX_PROFILING=gc guix package -I

goes from 0.7s to 0.1s, with heap usage going from 55M to 9M.

* guix/profiles.scm (manifest->gexp)[entry->gexp]: Turn into a monadic
procedure.  Return a 'repeated' sexp if ENTRY was already visited
before.
Adjust caller accordingly.  Bump manifest version.
(sexp->manifest)[sexp->manifest-entry]: Turn into a monadic procedure.
Add case for 'repeated' nodes.  Add each entry to the current state
vhash.
Add clause for version 4 manifests.
* tests/profiles.scm ("deduplication of repeated entries"): New test.
* guix/build/profiles.scm (manifest-sexp->inputs+search-paths): Expect
version 4.  Add clause for 'repeated' nodes.
---
 guix/build/profiles.scm |   4 +-
 guix/profiles.scm       | 127 ++++++++++++++++++++++++++++------------
 tests/profiles.scm      |  42 +++++++++++++
 3 files changed, 134 insertions(+), 39 deletions(-)

diff --git a/guix/build/profiles.scm b/guix/build/profiles.scm
index f9875ca92e..c4460f624b 100644
--- a/guix/build/profiles.scm
+++ b/guix/build/profiles.scm
@@ -150,7 +150,7 @@ (define (manifest-sexp->inputs+search-paths manifest)
 values: the list of store items of its manifest entries, and the list of
 search path specifications."
   (match manifest                            ;this must match 'manifest->gexp'
-    (('manifest ('version 3)
+    (('manifest ('version 4)
                 ('packages (entries ...)))
      (let loop ((entries entries)
                 (inputs '())
@@ -162,6 +162,8 @@ (define (manifest-sexp->inputs+search-paths manifest)
           (loop (append rest deps)                ;breadth-first traversal
                 (cons item inputs)
                 (append paths search-paths)))
+         ((('repeated name version item) . rest)
+          (loop rest inputs search-paths))
          (()
           (values (reverse inputs)
                   (delete-duplicates
diff --git a/guix/profiles.scm b/guix/profiles.scm
index bf50c00a1e..44ff37e75b 100644
--- a/guix/profiles.scm
+++ b/guix/profiles.scm
@@ -455,31 +455,53 @@ (define (inferior->entry)
 (define (manifest->gexp manifest)
   "Return a representation of MANIFEST as a gexp."
   (define (entry->gexp entry)
-    (match entry
-      (($ <manifest-entry> name version output (? string? path)
-                           (deps ...) (search-paths ...) _ (properties ...))
-       #~(#$name #$version #$output #$path
-                 (propagated-inputs #$(map entry->gexp deps))
-                 (search-paths #$(map search-path-specification->sexp
-                                      search-paths))
-                 #$@(if (null? properties)
-                        #~()
-                        #~((properties . #$properties)))))
-      (($ <manifest-entry> name version output package
-                           (deps ...) (search-paths ...) _ (properties ...))
-       #~(#$name #$version #$output
-                 (ungexp package (or output "out"))
-                 (propagated-inputs #$(map entry->gexp deps))
-                 (search-paths #$(map search-path-specification->sexp
-                                      search-paths))
-                 #$@(if (null? properties)
-                        #~()
-                        #~((properties . #$properties)))))))
+    ;; Maintain in state monad a vhash of visited entries, indexed by their
+    ;; item, usually package objects (we cannot use the entry itself as an
+    ;; index since identical entries are usually not 'eq?').  Use that vhash
+    ;; to avoid repeating duplicate entries.  This is particularly useful in
+    ;; the presence of propagated inputs, where we could otherwise end up
+    ;; repeating large trees.
+    (mlet %state-monad ((visited (current-state)))
+      (if (match (vhash-assq (manifest-entry-item entry) visited)
+            ((_ . previous-entry)
+             (manifest-entry=? previous-entry entry))
+            (#f #f))
+          (return #~(repeated #$(manifest-entry-name entry)
+                              #$(manifest-entry-version entry)
+                              #$(manifest-entry-item entry)))
+          (mbegin %state-monad
+            (set-current-state (vhash-consq (manifest-entry-item entry)
+                                            entry visited))
+            (mlet %state-monad ((deps (mapm %state-monad entry->gexp
+                                            (manifest-entry-dependencies entry))))
+              (return
+               (match entry
+                 (($ <manifest-entry> name version output (? string? path)
+                                      (_deps ...) (search-paths ...) _ (properties ...))
+                  #~(#$name #$version #$output #$path
+                            (propagated-inputs #$deps)
+                            (search-paths #$(map search-path-specification->sexp
+                                                 search-paths))
+                            #$@(if (null? properties)
+                                   #~()
+                                   #~((properties . #$properties)))))
+                 (($ <manifest-entry> name version output package
+                                      (_deps ...) (search-paths ...) _ (properties ...))
+                  #~(#$name #$version #$output
+                            (ungexp package (or output "out"))
+                            (propagated-inputs #$deps)
+                            (search-paths #$(map search-path-specification->sexp
+                                                 search-paths))
+                            #$@(if (null? properties)
+                                   #~()
+                                   #~((properties . #$properties))))))))))))
 
   (match manifest
     (($ <manifest> (entries ...))
-     #~(manifest (version 3)
-                 (packages #$(map entry->gexp entries))))))
+     #~(manifest (version 4)
+                 (packages #$(run-with-state
+                                 (mapm %state-monad entry->gexp entries)
+                               vlist-null))))))
 
 (define (find-package name version)
   "Return a package from the distro matching NAME and possibly VERSION.  This
@@ -522,25 +544,44 @@ (define (infer-dependency item parent)
 
   (define* (sexp->manifest-entry sexp #:optional (parent (delay #f)))
     (match sexp
+      (('repeated name version path)
+       ;; This entry is the same as another one encountered earlier; look it
+       ;; up and return it.
+       (mlet %state-monad ((visited (current-state))
+                           (key -> (list name version path)))
+         (match (vhash-assoc key visited)
+           (#f
+            (raise (formatted-message
+                    (G_ "invalid repeated entry in profile: ~s")
+                    sexp)))
+           ((_ . entry)
+            (return entry)))))
       ((name version output path
              ('propagated-inputs deps)
              ('search-paths search-paths)
              extra-stuff ...)
-       ;; For each of DEPS, keep a promise pointing to ENTRY.
-       (letrec* ((deps* (map (cut sexp->manifest-entry <> (delay entry))
-                             deps))
-                 (entry (manifest-entry
-                          (name name)
-                          (version version)
-                          (output output)
-                          (item path)
-                          (dependencies deps*)
-                          (search-paths (map sexp->search-path-specification
-                                             search-paths))
-                          (parent parent)
-                          (properties (or (assoc-ref extra-stuff 'properties)
-                                          '())))))
-         entry))))
+       (mlet* %state-monad
+           ((entry -> #f)
+            (deps*    (mapm %state-monad
+                            (cut sexp->manifest-entry <> (delay entry))
+                            deps))
+            (visited  (current-state))
+            (key ->   (list name version path)))
+         (set! entry                              ;XXX: emulate 'letrec*'
+               (manifest-entry
+                 (name name)
+                 (version version)
+                 (output output)
+                 (item path)
+                 (dependencies deps*)
+                 (search-paths (map sexp->search-path-specification
+                                    search-paths))
+                 (parent parent)
+                 (properties (or (assoc-ref extra-stuff 'properties)
+                                 '()))))
+         (mbegin %state-monad
+           (set-current-state (vhash-cons key entry visited))
+           (return entry))))))
 
   (match sexp
     (('manifest ('version 0)
@@ -608,7 +649,17 @@ (define* (sexp->manifest-entry sexp #:optional (parent (delay #f)))
     ;; Version 3 represents DEPS as full-blown manifest entries.
     (('manifest ('version 3 minor-version ...)
                 ('packages (entries ...)))
-     (manifest (map sexp->manifest-entry entries)))
+     (manifest (run-with-state
+                   (mapm %state-monad sexp->manifest-entry entries)
+                 vlist-null)))
+
+    ;; Version 4 deduplicates repeated entries, as can happen with deep
+    ;; propagated input trees.
+    (('manifest ('version 4 minor-version ...)
+                ('packages (entries ...)))
+     (manifest (run-with-state
+                   (mapm %state-monad sexp->manifest-entry entries)
+                 vlist-null)))
     (_
      (raise (condition
              (&message (message "unsupported manifest format")))))))
diff --git a/tests/profiles.scm b/tests/profiles.scm
index 7e51d37ab9..3838d971c9 100644
--- a/tests/profiles.scm
+++ b/tests/profiles.scm
@@ -586,6 +586,48 @@ (define (entry->sexp entry)
                                                     #:locales? #f)))
         (return #f)))))
 
+(test-assertm "deduplication of repeated entries"
+  ;; Make sure the 'manifest' file does not duplicate identical entries.
+  ;; See <https://issues.guix.gnu.org/55499>.
+  (mlet* %store-monad ((p0 -> (dummy-package "p0"
+                                (build-system trivial-build-system)
+                                (arguments
+                                 `(#:guile ,%bootstrap-guile
+                                   #:builder (mkdir (assoc-ref %outputs "out"))))
+                                (propagated-inputs
+                                 `(("guile" ,%bootstrap-guile)))))
+                       (p1 -> (package
+                                (inherit p0)
+                                (name "p1")))
+                       (drv (profile-derivation (packages->manifest
+                                                 (list p0 p1))
+                                                #:hooks '()
+                                                #:locales? #f)))
+    (mbegin %store-monad
+      (built-derivations (list drv))
+      (let ((file     (string-append (derivation->output-path drv)
+                                     "/manifest"))
+            (manifest (profile-manifest (derivation->output-path drv))))
+        (define (contains-repeated? sexp)
+          (match sexp
+            (('repeated _ ...) #t)
+            ((lst ...) (any contains-repeated? sexp))
+            (_ #f)))
+
+        (return (and (contains-repeated? (call-with-input-file file read))
+
+                     ;; MANIFEST has two entries for %BOOTSTRAP-GUILE since
+                     ;; it's propagated both from P0 and from P1.  When
+                     ;; reading a 'repeated' node, 'read-manifest' should
+                     ;; reuse the previously-read entry so the two
+                     ;; %BOOTSTRAP-GUILE entries must be 'eq?'.
+                     (match (manifest-entries manifest)
+                       (((= manifest-entry-dependencies (dep0))
+                         (= manifest-entry-dependencies (dep1)))
+                        (and (string=? (manifest-entry-name dep0)
+                                       (package-name %bootstrap-guile))
+                             (eq? dep0 dep1))))))))))
+
 (test-assertm "no collision"
   ;; Here we have an entry that is "lowered" (its 'item' field is a store file
   ;; name) and another entry (its 'item' field is a package) that is
-- 
2.36.1





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#55499: [PATCH 3/3] squash! profiles: Make all entry fields optional.
  2022-05-31 16:09 ` bug#55499: [PATCH 0/3] Make 'manifest' files more compact Ludovic Courtès
  2022-05-31 16:09   ` bug#55499: [PATCH 1/3] tests: Augment profile collision test Ludovic Courtès
  2022-05-31 16:09   ` bug#55499: [PATCH 2/3] profiles: Do not repeat entries in 'manifest' file Ludovic Courtès
@ 2022-05-31 16:09   ` Ludovic Courtès
  2022-06-14  7:06   ` bug#55499: excessively large manifests due to propagation Ludovic Courtès
  3 siblings, 0 replies; 10+ messages in thread
From: Ludovic Courtès @ 2022-05-31 16:09 UTC (permalink / raw)
  To: 55499; +Cc: Ludovic Courtès

This is meant to be squashed with the previous patch.

This makes the 'search-paths' and 'propagated-inputs' fields of each
entry optional, shaving a bit more space and reading time, down to 180K
instead of 192K.

* guix/build/profiles.scm (manifest-sexp->inputs+search-paths)[let-fields]:
New macro.
Use it.
* guix/profiles.scm (manifest->gexp)[optional]: New procedure.  Use it.
[sexp->manifest-entry]: Rename to...
[sexp->manifest-entry/v3]: ... this.
---
 guix/build/profiles.scm |  28 ++++++++--
 guix/profiles.scm       | 120 ++++++++++++++++++++++++++--------------
 2 files changed, 100 insertions(+), 48 deletions(-)

diff --git a/guix/build/profiles.scm b/guix/build/profiles.scm
index c4460f624b..2ab76bde74 100644
--- a/guix/build/profiles.scm
+++ b/guix/build/profiles.scm
@@ -149,6 +149,18 @@ (define (manifest-sexp->inputs+search-paths manifest)
   "Parse MANIFEST, an sexp as produced by 'manifest->gexp', and return two
 values: the list of store items of its manifest entries, and the list of
 search path specifications."
+  (define-syntax let-fields
+    (syntax-rules ()
+      ;; Bind the fields NAME of LST to same-named variables in the lexical
+      ;; scope of BODY.
+      ((_ lst (name rest ...) body ...)
+       (let ((name (match (assq 'name lst)
+                     ((_ value) value)
+                     (#f '()))))
+         (let-fields lst (rest ...) body ...)))
+      ((_ lst () body ...)
+       (begin body ...))))
+
   (match manifest                            ;this must match 'manifest->gexp'
     (('manifest ('version 4)
                 ('packages (entries ...)))
@@ -156,12 +168,12 @@ (define (manifest-sexp->inputs+search-paths manifest)
                 (inputs '())
                 (search-paths '()))
        (match entries
-         (((name version output item
-                 ('propagated-inputs deps)
-                 ('search-paths paths) _ ...) . rest)
-          (loop (append rest deps)                ;breadth-first traversal
-                (cons item inputs)
-                (append paths search-paths)))
+         (((name version output item fields ...) . rest)
+          (let ((paths search-paths))
+            (let-fields fields (propagated-inputs search-paths properties)
+              (loop (append rest propagated-inputs) ;breadth-first traversal
+                    (cons item inputs)
+                    (append search-paths paths)))))
          ((('repeated name version item) . rest)
           (loop rest inputs search-paths))
          (()
@@ -214,4 +226,8 @@ (define manifest-file
     ;; Write 'OUTPUT/etc/profile'.
     (build-etc/profile output search-paths)))
 
+;;; Local Variables:
+;;; eval: (put 'let-fields 'scheme-indent-function 2)
+;;; End:
+
 ;;; profile.scm ends here
diff --git a/guix/profiles.scm b/guix/profiles.scm
index 44ff37e75b..d694ac07da 100644
--- a/guix/profiles.scm
+++ b/guix/profiles.scm
@@ -454,6 +454,11 @@ (define (inferior->entry)
 
 (define (manifest->gexp manifest)
   "Return a representation of MANIFEST as a gexp."
+  (define (optional name value)
+    (if (null? value)
+        #~()
+        #~((#$name #$value))))
+
   (define (entry->gexp entry)
     ;; Maintain in state monad a vhash of visited entries, indexed by their
     ;; item, usually package objects (we cannot use the entry itself as an
@@ -477,24 +482,22 @@ (define (entry->gexp entry)
               (return
                (match entry
                  (($ <manifest-entry> name version output (? string? path)
-                                      (_deps ...) (search-paths ...) _ (properties ...))
+                                      (_ ...) (search-paths ...) _ (properties ...))
                   #~(#$name #$version #$output #$path
-                            (propagated-inputs #$deps)
-                            (search-paths #$(map search-path-specification->sexp
-                                                 search-paths))
-                            #$@(if (null? properties)
-                                   #~()
-                                   #~((properties . #$properties)))))
+                            #$@(optional 'propagated-inputs deps)
+                            #$@(optional 'search-paths
+                                         (map search-path-specification->sexp
+                                              search-paths))
+                            #$@(optional 'properties properties)))
                  (($ <manifest-entry> name version output package
                                       (_deps ...) (search-paths ...) _ (properties ...))
                   #~(#$name #$version #$output
                             (ungexp package (or output "out"))
-                            (propagated-inputs #$deps)
-                            (search-paths #$(map search-path-specification->sexp
-                                                 search-paths))
-                            #$@(if (null? properties)
-                                   #~()
-                                   #~((properties . #$properties))))))))))))
+                            #$@(optional 'propagated-inputs deps)
+                            #$@(optional 'search-paths
+                                         (map search-path-specification->sexp
+                                              search-paths))
+                            #$@(optional 'properties properties))))))))))
 
   (match manifest
     (($ <manifest> (entries ...))
@@ -542,6 +545,40 @@ (define (infer-dependency item parent)
         (item item)
         (parent parent))))
 
+  (define* (sexp->manifest-entry/v3 sexp #:optional (parent (delay #f)))
+    (match sexp
+      ((name version output path
+             ('propagated-inputs deps)
+             ('search-paths search-paths)
+             extra-stuff ...)
+       ;; For each of DEPS, keep a promise pointing to ENTRY.
+       (letrec* ((deps* (map (cut sexp->manifest-entry/v3 <> (delay entry))
+                             deps))
+                 (entry (manifest-entry
+                          (name name)
+                          (version version)
+                          (output output)
+                          (item path)
+                          (dependencies deps*)
+                          (search-paths (map sexp->search-path-specification
+                                             search-paths))
+                          (parent parent)
+                          (properties (or (assoc-ref extra-stuff 'properties)
+                                          '())))))
+         entry))))
+
+  (define-syntax let-fields
+    (syntax-rules ()
+      ;; Bind the fields NAME of LST to same-named variables in the lexical
+      ;; scope of BODY.
+      ((_ lst (name rest ...) body ...)
+       (let ((name (match (assq 'name lst)
+                     ((_ value) value)
+                     (#f '()))))
+         (let-fields lst (rest ...) body ...)))
+      ((_ lst () body ...)
+       (begin body ...))))
+
   (define* (sexp->manifest-entry sexp #:optional (parent (delay #f)))
     (match sexp
       (('repeated name version path)
@@ -556,32 +593,29 @@ (define* (sexp->manifest-entry sexp #:optional (parent (delay #f)))
                     sexp)))
            ((_ . entry)
             (return entry)))))
-      ((name version output path
-             ('propagated-inputs deps)
-             ('search-paths search-paths)
-             extra-stuff ...)
-       (mlet* %state-monad
-           ((entry -> #f)
-            (deps*    (mapm %state-monad
-                            (cut sexp->manifest-entry <> (delay entry))
-                            deps))
-            (visited  (current-state))
-            (key ->   (list name version path)))
-         (set! entry                              ;XXX: emulate 'letrec*'
-               (manifest-entry
-                 (name name)
-                 (version version)
-                 (output output)
-                 (item path)
-                 (dependencies deps*)
-                 (search-paths (map sexp->search-path-specification
-                                    search-paths))
-                 (parent parent)
-                 (properties (or (assoc-ref extra-stuff 'properties)
-                                 '()))))
-         (mbegin %state-monad
-           (set-current-state (vhash-cons key entry visited))
-           (return entry))))))
+      ((name version output path fields ...)
+       (let-fields fields (propagated-inputs search-paths properties)
+         (mlet* %state-monad
+             ((entry -> #f)
+              (deps     (mapm %state-monad
+                              (cut sexp->manifest-entry <> (delay entry))
+                              propagated-inputs))
+              (visited  (current-state))
+              (key ->   (list name version path)))
+           (set! entry                             ;XXX: emulate 'letrec*'
+                 (manifest-entry
+                   (name name)
+                   (version version)
+                   (output output)
+                   (item path)
+                   (dependencies deps)
+                   (search-paths (map sexp->search-path-specification
+                                      search-paths))
+                   (parent parent)
+                   (properties properties)))
+           (mbegin %state-monad
+             (set-current-state (vhash-cons key entry visited))
+             (return entry)))))))
 
   (match sexp
     (('manifest ('version 0)
@@ -649,9 +683,7 @@ (define* (sexp->manifest-entry sexp #:optional (parent (delay #f)))
     ;; Version 3 represents DEPS as full-blown manifest entries.
     (('manifest ('version 3 minor-version ...)
                 ('packages (entries ...)))
-     (manifest (run-with-state
-                   (mapm %state-monad sexp->manifest-entry entries)
-                 vlist-null)))
+     (manifest (map sexp->manifest-entry/v3 entries)))
 
     ;; Version 4 deduplicates repeated entries, as can happen with deep
     ;; propagated input trees.
@@ -2368,4 +2400,8 @@ (define (user-friendly-profile profile)
             %known-shorthand-profiles)
       profile))
 
+;;; Local Variables:
+;;; eval: (put 'let-fields 'scheme-indent-function 2)
+;;; End:
+
 ;;; profiles.scm ends here
-- 
2.36.1





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#55499: [PATCH 2/3] profiles: Do not repeat entries in 'manifest' file.
  2022-05-31 16:09   ` bug#55499: [PATCH 2/3] profiles: Do not repeat entries in 'manifest' file Ludovic Courtès
@ 2022-05-31 17:35     ` Maxime Devos
  2022-06-01  9:38       ` Ludovic Courtès
  0 siblings, 1 reply; 10+ messages in thread
From: Maxime Devos @ 2022-05-31 17:35 UTC (permalink / raw)
  To: Ludovic Courtès, 55499

[-- Attachment #1: Type: text/plain, Size: 557 bytes --]

Ludovic Courtès schreef op di 31-05-2022 om 18:09 [+0200]:
> With this change, the manifest file created for:
> 
>   guix install r r-seurat r-cistopic r-monocle3 r-cicero-monocle3 r-assertthat
> 
> goes from 5.6M to 192K.  Likewise, on this profile, wall-clock time of:
> 
>   GUIX_PROFILING=gc guix package -I
> 
> goes from 0.7s to 0.1s, with heap usage going from 55M to 9M.

I'm not familiar enough with this part of Guix to evaluate the patches,
but the time, disk memory and heap memory decreases sound great!

Greetings,
Maxime.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 260 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#55499: [PATCH 2/3] profiles: Do not repeat entries in 'manifest' file.
  2022-05-31 17:35     ` Maxime Devos
@ 2022-06-01  9:38       ` Ludovic Courtès
  0 siblings, 0 replies; 10+ messages in thread
From: Ludovic Courtès @ 2022-06-01  9:38 UTC (permalink / raw)
  To: Maxime Devos; +Cc: 55499

Hi,

Maxime Devos <maximedevos@telenet.be> skribis:

> Ludovic Courtès schreef op di 31-05-2022 om 18:09 [+0200]:
>> With this change, the manifest file created for:
>> 
>>   guix install r r-seurat r-cistopic r-monocle3 r-cicero-monocle3 r-assertthat
>> 
>> goes from 5.6M to 192K.  Likewise, on this profile, wall-clock time of:
>> 
>>   GUIX_PROFILING=gc guix package -I
>> 
>> goes from 0.7s to 0.1s, with heap usage going from 55M to 9M.
>
> I'm not familiar enough with this part of Guix to evaluate the patches,
> but the time, disk memory and heap memory decreases sound great!

Yup!  The difference is significant primarily for profiles with lots of
propagated inputs, so typically profiles with R or Python packages.

For my home profile (300+ packages but no R and no Python), it goes from
316K to 112K, heap usage for ‘guix package -I’ goes from 12M to 9M and
wall-clock time is almost unchanged.

Ludo’.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#55499: excessively large manifests due to propagation
  2022-05-31 16:09 ` bug#55499: [PATCH 0/3] Make 'manifest' files more compact Ludovic Courtès
                     ` (2 preceding siblings ...)
  2022-05-31 16:09   ` bug#55499: [PATCH 3/3] squash! profiles: Make all entry fields optional Ludovic Courtès
@ 2022-06-14  7:06   ` Ludovic Courtès
  3 siblings, 0 replies; 10+ messages in thread
From: Ludovic Courtès @ 2022-06-14  7:06 UTC (permalink / raw)
  To: 55499; +Cc: Ricardo Wurmus

Hey Ricardo,

Ludovic Courtès <ludo@gnu.org> skribis:

> These patches implement what you suggested on IRC: not repeating
> entire manifest entries and their propagated inputs.  This has a
> dramatic impact on the size of the ‘manifest’ file and on the memory
> and processing time to read it for the the use case you gave.
>
> The second patch goes a tiny bit further by making the ‘search-paths’
> and ‘propagated-inputs’ fields optional, shaving another ~10% on the
> size of ‘manifest’ in this example.
>
> The second patch should be squashed with the first one (so we don’t
> bump version formats a second time and duplicate code).  It’s kinda
> optional because it doesn’t bring much compared to the first patch and
> causes a bit of extra complexity, but maybe it’s still worth keeping?
>
> Could you try this on your larger use cases and tell me how it goes?

Did you have a chance to give it a try?

Maybe I can double-check that everything’s alright and go ahead.

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-06-14  7:08 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-18 13:53 bug#55499: excessively large manifests due to propagation Ricardo Wurmus
2022-05-24 15:30 ` Ludovic Courtès
2022-05-25  4:35   ` Ricardo Wurmus
2022-05-31 16:09 ` bug#55499: [PATCH 0/3] Make 'manifest' files more compact Ludovic Courtès
2022-05-31 16:09   ` bug#55499: [PATCH 1/3] tests: Augment profile collision test Ludovic Courtès
2022-05-31 16:09   ` bug#55499: [PATCH 2/3] profiles: Do not repeat entries in 'manifest' file Ludovic Courtès
2022-05-31 17:35     ` Maxime Devos
2022-06-01  9:38       ` Ludovic Courtès
2022-05-31 16:09   ` bug#55499: [PATCH 3/3] squash! profiles: Make all entry fields optional Ludovic Courtès
2022-06-14  7:06   ` bug#55499: excessively large manifests due to propagation Ludovic Courtès

Code repositories for project(s) associated with this inbox:

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).