unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#49439: grafts cause “guix environment” to get killed with OOM
@ 2021-07-06 14:38 Ricardo Wurmus
  2021-07-23  4:59 ` Sarah Morgensen
  0 siblings, 1 reply; 10+ messages in thread
From: Ricardo Wurmus @ 2021-07-06 14:38 UTC (permalink / raw)
  To: 49439

With a recent version of Guix, “guix environment” will not 
terminate on its own, keeps the CPU busy, and gets killed when the 
system eventually runs out of memory.

$ guix describe -f channels

--8<---------------cut here---------------start------------->8---
(list (channel
        (name 'guix)
        (url "/home/rekado/dev/gx/branches/master")
        (commit
          "685cfdec94e5e48c4ad28de53466a28dfc258edb")))
--8<---------------cut here---------------end--------------->8---


$ guix environment pigx-scrnaseq
[wait until it gets killed]

The problem disappears when grafts are disabled:

$ guix environment --no-grafts pigx-scrnaseq
$ [env] yay!

-- 
Ricardo




^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#49439: grafts cause “guix environment” to get killed with OOM
  2021-07-06 14:38 bug#49439: grafts cause “guix environment” to get killed with OOM Ricardo Wurmus
@ 2021-07-23  4:59 ` Sarah Morgensen
  2021-07-27 14:52   ` Julien Lepiller
  2021-07-27 16:28   ` Ludovic Courtès
  0 siblings, 2 replies; 10+ messages in thread
From: Sarah Morgensen @ 2021-07-23  4:59 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: 49439

Hello,

Ricardo Wurmus <rekado@elephly.net> writes:

> With a recent version of Guix, “guix environment” will not 
> terminate on its own, keeps the CPU busy, and gets killed when the 
> system eventually runs out of memory.
>
> $ guix describe -f channels
>
> (list (channel
>         (name 'guix)
>         (url "/home/rekado/dev/gx/branches/master")
>         (commit
>           "685cfdec94e5e48c4ad28de53466a28dfc258edb")))
>
>
> $ guix environment pigx-scrnaseq
> [wait until it gets killed]

I can reproduce this with pigx-scrnaseq as well a number of other
packages (listed below).

$ ./pre-inst-env guix describe -f channels
(list (channel
        (name 'guix)
        (url "/home/sarah/guix")
        (commit
          "3217a04b0352c2dd13323257b369604eeabfccc3")))

Does not complete within 5 minutes:
package            # inputs  # transitive inputs
                                 (from package-transitive-inputs)
pigx-chipseq       48        338
pigx-scrnaseq      41        321
r-cellchat         34        110
pigx-rnaseq        34        343
pigx-bsseq         32        358
pigx-sars-cov2-ww  25        261
r-circus           16        134

Does complete:
r-chipseq          6         37     completes in >2m
r-shortread        17        36     completes in >1m
python-scanpy      25        113    completes in <15s

I suspect it has something to do with the number of transitive inputs,
because it is so prevalent with these R packages, which all use
propagated inputs. However... python-scanpy succeeds in under 15
seconds, and it has more transitive inputs than r-chipseq.

Can we reproduce this with a large number of low-transitivity packages
directly on the command line?

>
> The problem disappears when grafts are disabled:
>
> $ guix environment --no-grafts pigx-scrnaseq
> $ [env] yay!

--
Sarah




^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#49439: grafts cause “guix environment” to get killed with OOM
  2021-07-23  4:59 ` Sarah Morgensen
@ 2021-07-27 14:52   ` Julien Lepiller
  2021-07-27 16:28   ` Ludovic Courtès
  1 sibling, 0 replies; 10+ messages in thread
From: Julien Lepiller @ 2021-07-27 14:52 UTC (permalink / raw)
  To: 49439, iskarian, rekado

[-- Attachment #1: Type: text/plain, Size: 3171 bytes --]

I have a similar issue with an ocaml package I use at work. It's not free software, but all its dependencies are. The dependencies are not all yeet in guix, so to reproduce you might have to import them first with "guix import opam -r foo" for ocaml-foo.

The package depends on ocaml-ounit, ocaml-lp, ocaml-apronext, menhir, ocaml-async, ocaml-core, ocaml-graph, ocaml-libsvm, ocaml-minisat, ocaml-ppx-deriving-yojson, ocaml-yojson, ocaml-z3, ocaml-zarith and z3.

In total, that's 118 transitive inputs. Building the profile takes 30 minutes for me, on an SSD. The builder takes 1.5g resident.

Other than that, I measured time and memory for creating the environment when the profile was already built (no more derivation to build):

`which time` -v ~/guix/pre-inst-env guix environment ocaml-dummy-package -- echo done

User time: 121.43s
System time: 2.28s
Maximum resident: 1803028kB (1.8 GB)

With a warning from GC:

Repeated allocation of very large block (approx. size 35606528)

Note that I get the same numbers with --no-grafts, so it might be a different issue.

"guix build" terminates quickly.

Le 23 juillet 2021 00:59:21 GMT-04:00, Sarah Morgensen <iskarian@mgsn.dev> a écrit :
>Hello,
>
>Ricardo Wurmus <rekado@elephly.net> writes:
>
>> With a recent version of Guix, “guix environment” will not 
>> terminate on its own, keeps the CPU busy, and gets killed when the 
>> system eventually runs out of memory.
>>
>> $ guix describe -f channels
>>
>> (list (channel
>>         (name 'guix)
>>         (url "/home/rekado/dev/gx/branches/master")
>>         (commit
>>           "685cfdec94e5e48c4ad28de53466a28dfc258edb")))
>>
>>
>> $ guix environment pigx-scrnaseq
>> [wait until it gets killed]
>
>I can reproduce this with pigx-scrnaseq as well a number of other
>packages (listed below).
>
>$ ./pre-inst-env guix describe -f channels
>(list (channel
>        (name 'guix)
>        (url "/home/sarah/guix")
>        (commit
>          "3217a04b0352c2dd13323257b369604eeabfccc3")))
>
>Does not complete within 5 minutes:
>package            # inputs  # transitive inputs
>                                 (from package-transitive-inputs)
>pigx-chipseq       48        338
>pigx-scrnaseq      41        321
>r-cellchat         34        110
>pigx-rnaseq        34        343
>pigx-bsseq         32        358
>pigx-sars-cov2-ww  25        261
>r-circus           16        134
>
>Does complete:
>r-chipseq          6         37     completes in >2m
>r-shortread        17        36     completes in >1m
>python-scanpy      25        113    completes in <15s
>
>I suspect it has something to do with the number of transitive inputs,
>because it is so prevalent with these R packages, which all use
>propagated inputs. However... python-scanpy succeeds in under 15
>seconds, and it has more transitive inputs than r-chipseq.
>
>Can we reproduce this with a large number of low-transitivity packages
>directly on the command line?
>
>>
>> The problem disappears when grafts are disabled:
>>
>> $ guix environment --no-grafts pigx-scrnaseq
>> $ [env] yay!
>
>--
>Sarah

[-- Attachment #2: Type: text/html, Size: 3790 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#49439: grafts cause “guix environment” to get killed with OOM
  2021-07-23  4:59 ` Sarah Morgensen
  2021-07-27 14:52   ` Julien Lepiller
@ 2021-07-27 16:28   ` Ludovic Courtès
  2021-07-27 23:35     ` Sarah Morgensen
  1 sibling, 1 reply; 10+ messages in thread
From: Ludovic Courtès @ 2021-07-27 16:28 UTC (permalink / raw)
  To: Sarah Morgensen; +Cc: Ricardo Wurmus, 49439

Hi,

Sarah Morgensen <iskarian@mgsn.dev> skribis:

> Ricardo Wurmus <rekado@elephly.net> writes:
>
>> With a recent version of Guix, “guix environment” will not 
>> terminate on its own, keeps the CPU busy, and gets killed when the 
>> system eventually runs out of memory.
>>
>> $ guix describe -f channels
>>
>> (list (channel
>>         (name 'guix)
>>         (url "/home/rekado/dev/gx/branches/master")
>>         (commit
>>           "685cfdec94e5e48c4ad28de53466a28dfc258edb")))
>>
>>
>> $ guix environment pigx-scrnaseq
>> [wait until it gets killed]
>
> I can reproduce this with pigx-scrnaseq as well a number of other
> packages (listed below).
>
> $ ./pre-inst-env guix describe -f channels
> (list (channel
>         (name 'guix)
>         (url "/home/sarah/guix")
>         (commit
>           "3217a04b0352c2dd13323257b369604eeabfccc3")))
>
> Does not complete within 5 minutes:

What hardware are you using?

Here’s what I observe (with everything already in store and on a hot
cache, with an i7 CPU):

--8<---------------cut here---------------start------------->8---
$ guix describe
Generacio 188	Jul 25 2021 12:47:29	(nuna)
  guix a92dfbc
    repository URL: https://git.savannah.gnu.org/git/guix.git
    branch: master
    commit: a92dfbce30777de6ca05031e275410cf9f56c84c
$ time GUIX_PROFILING=gc guix environment pigx-scrnaseq --search-paths --no-grafts >/dev/null
Garbage collection statistics:
  heap size:        160.31 MiB
  allocated:        1440.70 MiB
  GC times:         39
  time spent in GC: 4.51 seconds (46% of user time)

real	0m7.534s
user	0m9.747s
sys	0m0.167s
$ time GUIX_PROFILING=gc guix environment pigx-scrnaseq --search-paths  >/dev/null
Garbage collection statistics:
  heap size:        168.31 MiB
  allocated:        2111.71 MiB
  GC times:         53
  time spent in GC: 6.92 seconds (45% of user time)

real	0m12.100s
user	0m15.517s
sys	0m0.198s
--8<---------------cut here---------------end--------------->8---

Commit 78daf9e02e5bc51f91488d8237cab2050cc060cf optimizes
‘coalesce-duplicate-inputs’, which was quadratic in the number of inputs (!).
After that change, I get:

--8<---------------cut here---------------start------------->8---
$ time GUIX_PROFILING=gc ./pre-inst-env guix environment pigx-scrnaseq --search-paths --no-grafts  >/dev/null
Garbage collection statistics:
  heap size:        168.31 MiB
  allocated:        716.58 MiB
  GC times:         24
  time spent in GC: 2.65 seconds (40% of user time)

real	0m5.720s
user	0m6.708s
sys	0m0.149s
$ time GUIX_PROFILING=gc ./pre-inst-env guix environment pigx-scrnaseq --search-paths  >/dev/null
Garbage collection statistics:
  heap size:        160.31 MiB
  allocated:        1387.96 MiB
  GC times:         42
  time spent in GC: 5.87 seconds (43% of user time)

real	0m10.821s
user	0m13.513s
sys	0m0.198s
--8<---------------cut here---------------end--------------->8---

Could you tell me if it improves the situation for you?

It’s not the end of the road, but it should be an improvement.

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#49439: grafts cause “guix environment” to get killed with OOM
  2021-07-27 16:28   ` Ludovic Courtès
@ 2021-07-27 23:35     ` Sarah Morgensen
  2021-07-28 10:00       ` Ludovic Courtès
  0 siblings, 1 reply; 10+ messages in thread
From: Sarah Morgensen @ 2021-07-27 23:35 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Ricardo Wurmus, 49439

Hi,

Ludovic Courtès <ludo@gnu.org> writes:

[...]

>> I can reproduce this with pigx-scrnaseq as well a number of other
>> packages (listed below).
>>
>> $ ./pre-inst-env guix describe -f channels
>> (list (channel
>>         (name 'guix)
>>         (url "/home/sarah/guix")
>>         (commit
>>           "3217a04b0352c2dd13323257b369604eeabfccc3")))
>>
>> Does not complete within 5 minutes:

Okay, so all of a sudden I can't reproduce this; even with the same
commit as above, it completes in ~20s.

  guix time-machine --commit=3217a04 -- environment pigx-scrnaseq --search-paths >/dev/null

> What hardware are you using?

Virtualbox VM with VT-x etc. on a host i7-6700. The VM has 6GB of memory.

>
> Here’s what I observe (with everything already in store and on a hot
> cache, with an i7 CPU):
>
> $ guix describe
> Generacio 188	Jul 25 2021 12:47:29	(nuna)
>   guix a92dfbc
>     repository URL: https://git.savannah.gnu.org/git/guix.git
>     branch: master
>     commit: a92dfbce30777de6ca05031e275410cf9f56c84c
> $ time GUIX_PROFILING=gc guix environment pigx-scrnaseq --search-paths --no-grafts >/dev/null
> Garbage collection statistics:
>   heap size:        160.31 MiB
>   allocated:        1440.70 MiB
>   GC times:         39
>   time spent in GC: 4.51 seconds (46% of user time)
>
> real	0m7.534s
> user	0m9.747s
> sys	0m0.167s
> $ time GUIX_PROFILING=gc guix environment pigx-scrnaseq --search-paths  >/dev/null
> Garbage collection statistics:
>   heap size:        168.31 MiB
>   allocated:        2111.71 MiB
>   GC times:         53
>   time spent in GC: 6.92 seconds (45% of user time)
>
> real	0m12.100s
> user	0m15.517s
> sys	0m0.198s
>
>
> Commit 78daf9e02e5bc51f91488d8237cab2050cc060cf optimizes
> ‘coalesce-duplicate-inputs’, which was quadratic in the number of inputs (!).
> After that change, I get:
>
> $ time GUIX_PROFILING=gc ./pre-inst-env guix environment pigx-scrnaseq --search-paths --no-grafts  >/dev/null
> Garbage collection statistics:
>   heap size:        168.31 MiB
>   allocated:        716.58 MiB
>   GC times:         24
>   time spent in GC: 2.65 seconds (40% of user time)
>
> real	0m5.720s
> user	0m6.708s
> sys	0m0.149s
> $ time GUIX_PROFILING=gc ./pre-inst-env guix environment pigx-scrnaseq --search-paths  >/dev/null
> Garbage collection statistics:
>   heap size:        160.31 MiB
>   allocated:        1387.96 MiB
>   GC times:         42
>   time spent in GC: 5.87 seconds (43% of user time)
>
> real	0m10.821s
> user	0m13.513s
> sys	0m0.198s
>
> Could you tell me if it improves the situation for you?

*Now* my experience is like yours:

--8<---------------cut here---------------start------------->8---
$ guix describe
Generation 9	Jul 27 2021 12:35:05	(current)
  guix d0ec739
    repository URL: https://git.savannah.gnu.org/git/guix.git
    branch: master
    commit: d0ec73907c2995034a34339f4a7c2c72c2e21fea

time GUIX_PROFILING=gc guix time-machine --commit=3217a04 -- environment pigx-scrnaseq --search-paths >/dev/null
Garbage collection statistics:
  heap size:        176.31 MiB
  allocated:        2107.82 MiB
  GC times:         52
  time spent in GC: 5.26 seconds (23% of user time)

real	0m20.471s
user	0m22.605s
sys	0m0.372s

$ time GUIX_PROFILING=gc guix environment pigx-scrnaseq --search-paths >/dev/null
Garbage collection statistics:
  heap size:        152.31 MiB
  allocated:        1367.16 MiB
  GC times:         40
  time spent in GC: 3.25 seconds (21% of user time)

real	0m14.701s
user	0m15.698s
sys	0m0.361s
--8<---------------cut here---------------end--------------->8---

But why was it occurring before? The only I thing I can think of is that
I didn't have everything in the store first. Is there a way I can prune
just the relevant items from the store to test this?

>
> It’s not the end of the road, but it should be an improvement.
>
> Thanks,
> Ludo’.

--
Sarah




^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#49439: grafts cause “guix environment” to get killed with OOM
  2021-07-27 23:35     ` Sarah Morgensen
@ 2021-07-28 10:00       ` Ludovic Courtès
  2021-07-28 22:03         ` Ludovic Courtès
  0 siblings, 1 reply; 10+ messages in thread
From: Ludovic Courtès @ 2021-07-28 10:00 UTC (permalink / raw)
  To: Sarah Morgensen; +Cc: Ricardo Wurmus, 49439

Hi,

Sarah Morgensen <iskarian@mgsn.dev> skribis:

> But why was it occurring before? The only I thing I can think of is that
> I didn't have everything in the store first. Is there a way I can prune
> just the relevant items from the store to test this?

You could try something like:

  guix gc -D $(guix gc --references $(guix build pigx-scrnaseq)) \
    $(guix gc --references $(guix build pigx-scrnaseq --no-grafts))

Thinking about it, the grafts code depends on what’s in the store: when
nothing is in the store, it bounces to the “build handler”, which
accumulates the list of missing store items, until it starts building
them.

Let’s see if I can reproduce that behavior, too…

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#49439: grafts cause “guix environment” to get killed with OOM
  2021-07-28 10:00       ` Ludovic Courtès
@ 2021-07-28 22:03         ` Ludovic Courtès
  2021-07-29  3:20           ` Sarah Morgensen
  0 siblings, 1 reply; 10+ messages in thread
From: Ludovic Courtès @ 2021-07-28 22:03 UTC (permalink / raw)
  To: Sarah Morgensen; +Cc: Ricardo Wurmus, 49439

Hi,

Ludovic Courtès <ludo@gnu.org> skribis:

> Thinking about it, the grafts code depends on what’s in the store: when
> nothing is in the store, it bounces to the “build handler”, which
> accumulates the list of missing store items, until it starts building
> them.

So I can reproduce the problem Ricardo and you initially reported by
running:

  ./pre-inst-env guix environment pigx-scrnaseq --search-paths

after removing some of the ungrafted store items with:

  guix gc -D $(guix build r-rlang --no-grafts) \
    $(guix gc --references $(guix build pigx-scrnaseq --no-grafts))

The seemingly endless CPU usage and unbound memory use comes from the
‘build-accumulator’ build handler.  Here, the graph of ‘pigx-scrnaseq’
has many nodes, and many of them depend on, say, ‘r-rlang’.  Thus, when
‘r-rlang’ is not in the store, the grafting code keeps asking for it by
calling ‘build-derivations’, which aborts to the build handler; the
build handler saves the .drv file name and the continuation and keeps
going.  But since the next package also depends on ‘r-langr’, we abort
again to the build handler, and so on.

The end result is a very long list of <unresolved> nodes, probably of
this order in this case:

  $ guix graph -t reverse-package r-rlang |grep 'label = "'|wc -l
  594

Presumably, the captured continuations occupy quite a bit of memory,
hence the quick growth.

I suppose one solution is to fire suspended builds when the build
handler sees a build request for a given derivation for the second time.
It needs more thought and testing…

Ludo’.

PS: Did you know ‘pigx-scrnaseq’ has twice as many nodes as
    ‘libreoffice’?

      $ guix graph -t bag pigx-scrnaseq |grep 'label = "'|wc -l
      1359
      $ guix graph -t bag libreoffice |grep 'label = "'|wc -l
      699

    That makes it a great example to study and fix scalability issues!




^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#49439: grafts cause “guix environment” to get killed with OOM
  2021-07-28 22:03         ` Ludovic Courtès
@ 2021-07-29  3:20           ` Sarah Morgensen
  2021-08-09 21:26             ` Ludovic Courtès
  0 siblings, 1 reply; 10+ messages in thread
From: Sarah Morgensen @ 2021-07-29  3:20 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Ricardo Wurmus, 49439

Hi,

Ludovic Courtès <ludo@gnu.org> writes:

> Hi,
>
> Ludovic Courtès <ludo@gnu.org> skribis:
>
>> Thinking about it, the grafts code depends on what’s in the store: when
>> nothing is in the store, it bounces to the “build handler”, which
>> accumulates the list of missing store items, until it starts building
>> them.
>
> So I can reproduce the problem Ricardo and you initially reported by
> running:
>
>   ./pre-inst-env guix environment pigx-scrnaseq --search-paths
>
> after removing some of the ungrafted store items with:
>
>   guix gc -D $(guix build r-rlang --no-grafts) \
>     $(guix gc --references $(guix build pigx-scrnaseq --no-grafts))

Same here. I'm glad we were able to pinpoint this!

>
> The seemingly endless CPU usage and unbound memory use comes from the
> ‘build-accumulator’ build handler.  Here, the graph of ‘pigx-scrnaseq’
> has many nodes, and many of them depend on, say, ‘r-rlang’.  Thus, when
> ‘r-rlang’ is not in the store, the grafting code keeps asking for it by
> calling ‘build-derivations’, which aborts to the build handler; the
> build handler saves the .drv file name and the continuation and keeps
> going.  But since the next package also depends on ‘r-langr’, we abort
> again to the build handler, and so on.
>
> The end result is a very long list of <unresolved> nodes, probably of
> this order in this case:
>
>   $ guix graph -t reverse-package r-rlang |grep 'label = "'|wc -l
>   594
>
> Presumably, the captured continuations occupy quite a bit of memory,
> hence the quick growth.
>
> I suppose one solution is to fire suspended builds when the build
> handler sees a build request for a given derivation for the second time.
> It needs more thought and testing…

I wonder if instead it's possible to reduce the memory taken by the
continuations? As someone who has absolutely no experience with the
build/derivation system, it seems like all we *should* need to save is
the derivation file name.

>
> Ludo’.
>
> PS: Did you know ‘pigx-scrnaseq’ has twice as many nodes as
>     ‘libreoffice’?
>
>       $ guix graph -t bag pigx-scrnaseq |grep 'label = "'|wc -l
>       1359
>       $ guix graph -t bag libreoffice |grep 'label = "'|wc -l
>       699
>
>     That makes it a great example to study and fix scalability issues!

For those interested, I've compiled a list of the top 60
highest-dependency packages, as reported by package-closure:

pigx             	1630
nextcloud-client 	1539
akregator        	1486
kmail            	1484
korganizer       	1481
kdepim-runtime   	1480
kincidenceeditor 	1478
keventviews      	1477
kmailcommon      	1476
kcalendarsupport 	1475
kmessagelib      	1474
knotes           	1472
kaddressbook     	1469
libksieve        	1467
kdepim-apps-libs 	1465
libgravatar      	1463
kpimcommon       	1462
akonadi-calendar 	1453
pigx-bsseq       	1448
elisa            	1446
kaffeine         	1432
kdenlive         	1431
kmailtransport   	1431
dolphin-plugins  	1426
k3b              	1424
libkgapi         	1422
dolphin          	1421
kopete           	1403
pigx-sars-cov2-ww	1401
krdc             	1398
baloo-widgets    	1397
baloo            	1396
pigx-chipseq     	1396
krfb             	1389
ffmpegthumbs     	1388
kget             	1382
kmplayer         	1381
kdelibs4support  	1375
pigx-scrnaseq    	1342
kdevelop         	1340
kmailimporter    	1326
libkdepim        	1325
pigx-rnaseq      	1324
labplot          	1316
smb4k            	1315
kleopatra        	1311
kalarmcal        	1311
choqok           	1311
okular           	1310
ktnef            	1310
ktorrent         	1310
kate             	1308
akonadi-search   	1308
kcalutils        	1307
yakuake          	1306
khelpcenter      	1305
libksysguard     	1305
kdeconnect       	1304
konsole          	1304
libkleo          	1304

There seem to be a lot of kde packages in there, so perhaps this issue
isn't as rare as we might otherwise expect?

--
Sarah




^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#49439: grafts cause “guix environment” to get killed with OOM
  2021-07-29  3:20           ` Sarah Morgensen
@ 2021-08-09 21:26             ` Ludovic Courtès
  2021-08-10 15:36               ` Ludovic Courtès
  0 siblings, 1 reply; 10+ messages in thread
From: Ludovic Courtès @ 2021-08-09 21:26 UTC (permalink / raw)
  To: Sarah Morgensen; +Cc: Ricardo Wurmus, 49439

[-- Attachment #1: Type: text/plain, Size: 5328 bytes --]

Hi!

Sarah Morgensen <iskarian@mgsn.dev> skribis:

> Ludovic Courtès <ludo@gnu.org> writes:

[...]

>> The seemingly endless CPU usage and unbound memory use comes from the
>> ‘build-accumulator’ build handler.  Here, the graph of ‘pigx-scrnaseq’
>> has many nodes, and many of them depend on, say, ‘r-rlang’.  Thus, when
>> ‘r-rlang’ is not in the store, the grafting code keeps asking for it by
>> calling ‘build-derivations’, which aborts to the build handler; the
>> build handler saves the .drv file name and the continuation and keeps
>> going.  But since the next package also depends on ‘r-langr’, we abort
>> again to the build handler, and so on.
>>
>> The end result is a very long list of <unresolved> nodes, probably of
>> this order in this case:
>>
>>   $ guix graph -t reverse-package r-rlang |grep 'label = "'|wc -l
>>   594
>>
>> Presumably, the captured continuations occupy quite a bit of memory,
>> hence the quick growth.
>>
>> I suppose one solution is to fire suspended builds when the build
>> handler sees a build request for a given derivation for the second time.
>> It needs more thought and testing…
>
> I wonder if instead it's possible to reduce the memory taken by the
> continuations? As someone who has absolutely no experience with the
> build/derivation system, it seems like all we *should* need to save is
> the derivation file name.

Continuations may take a bit of room but the main problem is that we’re
storing too many of them.  :-)

Roughly, I think we keep one <unresolved> node per incoming edge into a
package not in the store (‘r-rlang’ in the example above).  In a case
like ‘pigx-scrnaseq’, that’s a lot of items.

I added counters and ‘pk’ calls in ‘map/accumulate-builds’ & co. to see
what happens.  AFAICS, a cutoff as in the attached patch does the job.
It’s a basic heuristic to avoid unbounded growth, at the expense of
possibly reduced accumulation.  For example, here’s the effect on my
user profile with 300+ packages:

--8<---------------cut here---------------start------------->8---
$ # with cutoff
$ time GUIX_PROFILING=gc ./pre-inst-env guix upgrade -n

[...]


2,926.7 MB would be downloaded
Garbage collection statistics:
  heap size:        119.37 MiB
  allocated:        849.56 MiB
  GC times:         30
  time spent in GC: 2.88 seconds (30% of user time)

real	0m8.987s
user	0m9.482s
sys	0m0.186s
$ 
$ # without cutoff
$ time GUIX_PROFILING=gc ./pre-inst-env guix upgrade -n

[...]

3,402.5 MB would be downloaded
Garbage collection statistics:
  heap size:        127.37 MiB
  allocated:        1504.59 MiB
  GC times:         46
  time spent in GC: 5.31 seconds (29% of user time)

real	0m21.616s
user	0m18.082s
sys	0m0.255s
--8<---------------cut here---------------end--------------->8---

You can tell that, without the cutoff (2nd run), the command more
accurately finds out what it’s going to have to download since the
“would be downloaded” figure is higher; with the cutoff (1st run), it
“sees less” of what it’s going to end up downloading.  This could be
annoying from a UI viewpoint in “extreme” cases like this one, but most
likely the difference would be invisible in many cases.

More importantly, having this cutoff halves the execution time, which is
nice.

The command initially given in this bug report now behaves like this:

--8<---------------cut here---------------start------------->8---
$ time GUIX_PROFILING=gc  ./pre-inst-env guix environment pigx-scrnaseq --search-paths -n -v2
41.8 MB would be downloaded:
   /gnu/store/difgj9gf8l7y5bsilcl42x2vzgh493c6-r-utf8-1.2.2
   /gnu/store/z4rpk1sn3jhy8brsr30zccln8mira3z5-r-purrr-0.3.4
   /gnu/store/nkiga9rfd8a9gdfsp2rjcqz39s8p2hg3-r-magrittr-2.0.1
   /gnu/store/c5y0xlw1fbcwa5p5qnk4xji286hd7cqk-r-vctrs-0.3.8
   /gnu/store/86vz257dqy4vm6s7axq7zl2jqxacgngf-r-ellipsis-0.3.2
   /gnu/store/m2m7h497g5n6vdrp5pxsnr2v8hsriq5f-r-glue-1.4.2
   /gnu/store/0zv1sl91fz3ddq8namdfvf6yr0j1p2vx-texlive-bin-20190410
   /gnu/store/7ks38sv5cpg7x76g2fdp9cra3x7dpbyq-texlive-union-51265
   /gnu/store/vkdjhkl5s025jsjy7jrr9q7kxlsi2sc4-r-minimal-4.1.0
   /gnu/store/pysc6ixb5fj1m2n50dac6aq5lgd5bnv4-r-rlang-0.4.11
Garbage collection statistics:
  heap size:        127.31 MiB
  allocated:        621.97 MiB
  GC times:         24
  time spent in GC: 2.82 seconds (37% of user time)

real	0m6.493s
user	0m7.584s
sys	0m0.117s
--8<---------------cut here---------------end--------------->8---

This time it completes, which is already an improvement ;-).  The
41.8 MB download reported is underestimated, but again that’s the
tradeoff we’re making.

Attached is the patch.  I’ll go ahead with that if there are no
objections.

> For those interested, I've compiled a list of the top 60
> highest-dependency packages, as reported by package-closure:
>
> pigx             	1630
> nextcloud-client 	1539
> akregator        	1486
> kmail            	1484
> korganizer       	1481
> kdepim-runtime   	1480
> kincidenceeditor 	1478
> keventviews      	1477
> kmailcommon      	1476
> kcalendarsupport 	1475

Nice!  I really need to stop taking ‘libreoffice’ as an example.

Thanks,
Ludo’.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-patch, Size: 2026 bytes --]

diff --git a/guix/store.scm b/guix/store.scm
index 1ab2b08b47..340ad8bc10 100644
--- a/guix/store.scm
+++ b/guix/store.scm
@@ -1358,11 +1358,27 @@ on the build output of a previous derivation."
 (define (map/accumulate-builds store proc lst)
   "Apply PROC over each element of LST, accumulating 'build-things' calls and
 coalescing them into a single call."
-  (define result
-    (map (lambda (obj)
-           (with-build-handler build-accumulator
-             (proc obj)))
-         lst))
+  (define accumulation-cutoff
+    ;; Threshold above which we stop accumulating unresolved nodes to avoid
+    ;; pessimal behavior.  See <https://bugs.gnu.org/49439>.
+    30)
+
+  (define-values (result rest)
+    (let loop ((lst lst)
+               (result '())
+               (unresolved 0))
+      (match lst
+        ((head . tail)
+         (match (with-build-handler build-accumulator
+                  (proc head))
+           ((? unresolved? obj)
+            (if (> unresolved accumulation-cutoff)
+                (values (reverse (cons obj result)) tail)
+                (loop tail (cons obj result) (+ 1 unresolved))))
+           (obj
+            (loop tail (cons obj result) unresolved))))
+        (()
+         (values (reverse result) lst)))))
 
   (match (append-map (lambda (obj)
                        (if (unresolved? obj)
@@ -1370,6 +1386,7 @@ coalescing them into a single call."
                            '()))
                      result)
     (()
+     ;; REST is necessarily empty.
      result)
     (to-build
      ;; We've accumulated things TO-BUILD.  Actually build them and resume the
@@ -1382,7 +1399,7 @@ coalescing them into a single call."
                                   ;; unnecessary.
                                   ((unresolved-continuation obj) #f)
                                   obj))
-                            result))))
+                            (append result rest)))))
 
 (define build-things
   (let ((build (operation (build-things (string-list things)

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* bug#49439: grafts cause “guix environment” to get killed with OOM
  2021-08-09 21:26             ` Ludovic Courtès
@ 2021-08-10 15:36               ` Ludovic Courtès
  0 siblings, 0 replies; 10+ messages in thread
From: Ludovic Courtès @ 2021-08-10 15:36 UTC (permalink / raw)
  To: Sarah Morgensen; +Cc: Ricardo Wurmus, 49439-done

Hello!

Ludovic Courtès <ludo@gnu.org> skribis:

> I added counters and ‘pk’ calls in ‘map/accumulate-builds’ & co. to see
> what happens.  AFAICS, a cutoff as in the attached patch does the job.
> It’s a basic heuristic to avoid unbounded growth, at the expense of
> possibly reduced accumulation.  For example, here’s the effect on my
> user profile with 300+ packages:

Pushed as fa81971cbae85b39183ccf8f51e8d96ac88fb4ac.

I saw your message on IRC, Sarah; note that because grafts are “dynamic
dependencies” (they depend on previous build results), we cannot know in
advance which grafts we’re going to build, so there’s necessarily at
least a second phase during which grafts get built.  See
<https://guix.gnu.org/en/blog/2020/grafts-continued/> for background.

Thanks!

Ludo’.




^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-08-10 15:37 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-06 14:38 bug#49439: grafts cause “guix environment” to get killed with OOM Ricardo Wurmus
2021-07-23  4:59 ` Sarah Morgensen
2021-07-27 14:52   ` Julien Lepiller
2021-07-27 16:28   ` Ludovic Courtès
2021-07-27 23:35     ` Sarah Morgensen
2021-07-28 10:00       ` Ludovic Courtès
2021-07-28 22:03         ` Ludovic Courtès
2021-07-29  3:20           ` Sarah Morgensen
2021-08-09 21:26             ` Ludovic Courtès
2021-08-10 15:36               ` Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).