Speeding up “guix pull”: splitting modules

unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Speeding up “guix pull”: splitting modules
@ 2020-01-05 20:37 Ricardo Wurmus
  2020-01-06  9:11 ` Andy Wingo
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Ricardo Wurmus @ 2020-01-05 20:37 UTC (permalink / raw)
  To: guix-devel

Hi Guix,

thanks to Ludo’s excellent work on “guix pull”, updating Guix has become
a lot faster than in the old days.  With build farm support one usually
only needs to wait a little while to be able to download all parts of
an up-to-date Guix.

Unfortunately, changes to the repository invalidate expensive
derivations regularly, and waiting for the build farm to get around to
building these derivations gets old quickly.  Running “guix pull”
repeatedly is a drag because the chances of getting a fully remotely
built Guix are pretty slim.

Part of the problem is that there’s just one derivation for the
compilation of all package modules.  This one derivation’s output is an
input to another expensive derivation: the one for the Guix System
(services, etc).

We should aim to split up the guix-packages derivation, so that the
guix-system derivation will not have to be rebuilt when its packages
aren’t affected.  Unfortunately, it’s difficult to build modules
independently from one another because of the many inter-module
dependency cycles.

Gábor once suggested an iterative approach of identifying the most
important nodes in the package graph that should be moved to their own
modules, so that we would end up with a package graph that looks less
like a hair ball.  I think it would useful to get a better view on the
relationships between modules.

On the other hand: this would need to be an ongoing effort.  Newly
introduced packages or even new features might create complex module
cycles.  It sounds tedious to keep track of this and to enforce
boundaries.

Is there an alternative?  Could we, for example, make all lookups of
cross-module package references lazy?  Or could we improve compilation
times by disabling optimizations?  Or by marking the modules as
declarative and thus unlock more optimizations (with Guile 3)?

--
Ricardo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Speeding up “guix pull”: splitting modules
  2020-01-05 20:37 Speeding up “guix pull”: splitting modules Ricardo Wurmus
@ 2020-01-06  9:11 ` Andy Wingo
  2020-01-07 18:37 ` zimoun
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Andy Wingo @ 2020-01-06  9:11 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: guix-devel

Hi,

Just a couple notes here regarding the concrete questions.  I'm sure
there are good solutions but I don't know what they are yet!

On Sun 05 Jan 2020 21:37, Ricardo Wurmus <rekado@elephly.net> writes:

> Or could we improve compilation
> times by disabling optimizations?

I think Guix does this already; see %lightweight-optimizations in (guix
build compile).

> Or by marking the modules as declarative and thus unlock more
> optimizations (with Guile 3)?

Not really; the declarative modules optimization only really helps cases
where inlining would help.  For package definitions, there's not much to
inline, I don't think.  Given that the consideration is compile-time
rather than run-time I would not think that declarative modules would
help.

Andy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Speeding up “guix pull”: splitting modules
  2020-01-05 20:37 Speeding up “guix pull”: splitting modules Ricardo Wurmus
  2020-01-06  9:11 ` Andy Wingo
@ 2020-01-07 18:37 ` zimoun
  2020-01-07 20:14   ` Gábor Boskovits
  2020-01-08 21:50 ` Ludovic Courtès
  2020-01-08 21:57 ` Ludovic Courtès
  3 siblings, 1 reply; 14+ messages in thread
From: zimoun @ 2020-01-07 18:37 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: Guix Devel

Hi Ricardo,

On Sun, 5 Jan 2020 at 21:38, Ricardo Wurmus <rekado@elephly.net> wrote:

> Gábor once suggested an iterative approach of identifying the most
> important nodes in the package graph that should be moved to their own
> modules, so that we would end up with a package graph that looks less
> like a hair ball.  I think it would useful to get a better view on the
> relationships between modules.

I am not sure to correctly understand. What does it mean "the most
important nodes in the package graph"?
Does "important node" mean a package which appears as inputs in a lot
of other packages?


Cheers,
simon

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Speeding up “guix pull”: splitting modules
  2020-01-07 18:37 ` zimoun
@ 2020-01-07 20:14   ` Gábor Boskovits
  2020-01-10 12:09     ` zimoun
  0 siblings, 1 reply; 14+ messages in thread
From: Gábor Boskovits @ 2020-01-07 20:14 UTC (permalink / raw)
  To: zimoun; +Cc: Guix Devel

Hello,

zimoun <zimon.toutoune@gmail.com> ezt írta (időpont: 2020. jan. 7., K, 19:38):
>
> Hi Ricardo,
>
> On Sun, 5 Jan 2020 at 21:38, Ricardo Wurmus <rekado@elephly.net> wrote:
>
> > Gábor once suggested an iterative approach of identifying the most
> > important nodes in the package graph that should be moved to their own
> > modules, so that we would end up with a package graph that looks less
> > like a hair ball.  I think it would useful to get a better view on the
> > relationships between modules.
>
> I am not sure to correctly understand. What does it mean "the most
> important nodes in the package graph"?
> Does "important node" mean a package which appears as inputs in a lot
> of other packages?

I was suggesting the following idea:
The underlying package graph is actually a DAG, so by splitting modules it is
possible to achieve a state where the module graph also becomes a DAG.
The exact heuristic or algorithm of splitting was not defined.

One idea which is quite cloes to my heart is to make a one package per module
rewrite, with possibly some grouping modules for convenience on top, but that
was determined not to be feasible performacewise at the time.

>
>
> Cheers,
> simon
>

Best regards,
g_bor
-- 
OpenPGP Key Fingerprint: 7988:3B9F:7D6A:4DBF:3719:0367:2506:A96C:CF63:0B21

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Speeding up “guix pull”: splitting modules
  2020-01-05 20:37 Speeding up “guix pull”: splitting modules Ricardo Wurmus
  2020-01-06  9:11 ` Andy Wingo
  2020-01-07 18:37 ` zimoun
@ 2020-01-08 21:50 ` Ludovic Courtès
  2020-01-10 12:13   ` zimoun
  2020-01-08 21:57 ` Ludovic Courtès
  3 siblings, 1 reply; 14+ messages in thread
From: Ludovic Courtès @ 2020-01-08 21:50 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: guix-devel

Hello!

Ricardo Wurmus <rekado@elephly.net> skribis:

> On the other hand: this would need to be an ongoing effort.  Newly
> introduced packages or even new features might create complex module
> cycles.  It sounds tedious to keep track of this and to enforce
> boundaries.

Yes, I think this is a dead end: glibc could well end up become on
Haskell (hi, Pandoc!), and then the whole module split effort collapses.

Ultimately, I think we need to look into optimizing the compiler.  The
profiling I did a while back¹ suggests pessimal behavior of some of the
compiler phases when given large inputs.

Ludo’.

¹ https://lists.gnu.org/archive/html/guile-devel/2017-10/msg00035.html
  (I’ve been meaning to resume that investigation for a long time, but
  I’d really need to do nothing but that for a couple of weeks…)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Speeding up “guix pull”: splitting modules
  2020-01-05 20:37 Speeding up “guix pull”: splitting modules Ricardo Wurmus
                   ` (2 preceding siblings ...)
  2020-01-08 21:50 ` Ludovic Courtès
@ 2020-01-08 21:57 ` Ludovic Courtès
       [not found]   ` <87zhenp01v.fsf@cbaines.net>
  3 siblings, 1 reply; 14+ messages in thread
From: Ludovic Courtès @ 2020-01-08 21:57 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: guix-devel

BTW, there’s also the “Compute Guix derivation” phase that could be sped
up, and that’s mostly independent of Guile.

Things that could help include: “recursive Nix”¹ (the Guix derivation
would itself be the result of a derivation, thus subject to
substitutes), or something equivalent without special daemon support
where we’d simply embed (guix derivations) & co. to compute the
derivation in memory.

Another less elegant option would be to resort to a service (Data
Service or Cuirass) that would map a Guix commit to its .drv, and then
fetch that .drv from substitute servers and build it.

Finally, we could/should also profile that phase and see what can be
done.

Lastly :-), it would be great to profile this phase over time to see how
it evolves.  (Does the Guix Data Service already stores timings for such
things?)

Ludo’.

¹ https://github.com/NixOS/nix/pull/3205

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Speeding up “guix pull”: splitting modules
  2020-01-07 20:14   ` Gábor Boskovits
@ 2020-01-10 12:09     ` zimoun
  2020-01-10 12:42       ` Gábor Boskovits
  0 siblings, 1 reply; 14+ messages in thread
From: zimoun @ 2020-01-10 12:09 UTC (permalink / raw)
  To: Gábor Boskovits; +Cc: Guix Devel

Hi Gábor,

Thank you for the explanations.

Below, I am thinking loudly. :-)

On Tue, 7 Jan 2020 at 21:14, Gábor Boskovits <boskovits@gmail.com> wrote:

> > > Gábor once suggested an iterative approach of identifying the most
> > > important nodes in the package graph that should be moved to their own
> > > modules, so that we would end up with a package graph that looks less
> > > like a hair ball.  I think it would useful to get a better view on the
> > > relationships between modules.

[...]

> I was suggesting the following idea:
> The underlying package graph is actually a DAG, so by splitting modules it is
> possible to achieve a state where the module graph also becomes a DAG.
> The exact heuristic or algorithm of splitting was not defined.

The modules graph (DAG) is already available. :-)
Modulo some glue code / tweaks, the tools are already in
guix/graph.scm and guix/scripts/graph.scm, if I understand correctly.
And note that the commit  ddd59159004ca73c9449a27945116ff5069c3743
introduces topological sort -- for another topic: recursive import --
but could be reused /adapted to detect cycles, IMHO.

Would a weighted DAG help to detect which modules are in "bad shape"?
Other said, mix somehow the packages DAG and the modules DAG?

For example, the edge between the module m1 and the module m2 should
be weighted by the ratio between the number of packages defined in the
module m2 required in the module m1.
So then, sorting this edges by weight value, it would give a better
view of the relationship between modules.

What do you think?

We need to detect which "big" modules are imported for "few" packages, right?

Aside, a naive question: does '#:select' improve the situation?

All the best,
simon

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Speeding up “guix pull”: splitting modules
  2020-01-08 21:50 ` Ludovic Courtès
@ 2020-01-10 12:13   ` zimoun
  2020-01-10 19:02     ` Ricardo Wurmus
  0 siblings, 1 reply; 14+ messages in thread
From: zimoun @ 2020-01-10 12:13 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Guix Devel

Hi Ludo,

On Wed, 8 Jan 2020 at 22:50, Ludovic Courtès <ludo@gnu.org> wrote:
> Ricardo Wurmus <rekado@elephly.net> skribis:

> > On the other hand: this would need to be an ongoing effort.  Newly
> > introduced packages or even new features might create complex module
> > cycles.  It sounds tedious to keep track of this and to enforce
> > boundaries.
>
> Yes, I think this is a dead end: glibc could well end up become on
> Haskell (hi, Pandoc!), and then the whole module split effort collapses.

What kind of metrics could help to detect which modules are going to
the wrong way?
For example, would some DAG post-processings help?


All the best,
simon

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Speeding up “guix pull”: splitting modules
  2020-01-10 12:09     ` zimoun
@ 2020-01-10 12:42       ` Gábor Boskovits
  2020-01-10 12:53         ` zimoun
  0 siblings, 1 reply; 14+ messages in thread
From: Gábor Boskovits @ 2020-01-10 12:42 UTC (permalink / raw)
  To: zimoun; +Cc: Guix Devel

Hello zimoun,

zimoun <zimon.toutoune@gmail.com> ezt írta (időpont: 2020. jan. 10., P, 13:10):
>
> Hi Gábor,
>
> Thank you for the explanations.
>
> Below, I am thinking loudly. :-)
>
> On Tue, 7 Jan 2020 at 21:14, Gábor Boskovits <boskovits@gmail.com> wrote:
>
> > > > Gábor once suggested an iterative approach of identifying the most
> > > > important nodes in the package graph that should be moved to their own
> > > > modules, so that we would end up with a package graph that looks less
> > > > like a hair ball.  I think it would useful to get a better view on the
> > > > relationships between modules.
>
> [...]
>
> > I was suggesting the following idea:
> > The underlying package graph is actually a DAG, so by splitting modules it is
> > possible to achieve a state where the module graph also becomes a DAG.
> > The exact heuristic or algorithm of splitting was not defined.
>
> The modules graph (DAG) is already available. :-)

The main problem here is that the modules do not form a DAG.
There are circular dependencies between the modules.
If those were not, then modular build would be possible, but because of
the spaghetti we are forced to build these together.

> Modulo some glue code / tweaks, the tools are already in
> guix/graph.scm and guix/scripts/graph.scm, if I understand correctly.
> And note that the commit  ddd59159004ca73c9449a27945116ff5069c3743
> introduces topological sort -- for another topic: recursive import --
> but could be reused /adapted to detect cycles, IMHO.
>
> Would a weighted DAG help to detect which modules are in "bad shape"?
> Other said, mix somehow the packages DAG and the modules DAG?
>
> For example, the edge between the module m1 and the module m2 should
> be weighted by the ratio between the number of packages defined in the
> module m2 required in the module m1.
> So then, sorting this edges by weight value, it would give a better
> view of the relationship between modules.
>
> What do you think?
>
> We need to detect which "big" modules are imported for "few" packages, right?
>
>
> Aside, a naive question: does '#:select' improve the situation?

I don't know, but it is an interesting question.

>
>
>
> All the best,
> simon


Best regards,
g_bor
-- 
OpenPGP Key Fingerprint: 7988:3B9F:7D6A:4DBF:3719:0367:2506:A96C:CF63:0B21

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Speeding up “guix pull”: splitting modules
  2020-01-10 12:42       ` Gábor Boskovits
@ 2020-01-10 12:53         ` zimoun
  2020-01-11 23:29           ` Ludovic Courtès
  0 siblings, 1 reply; 14+ messages in thread
From: zimoun @ 2020-01-10 12:53 UTC (permalink / raw)
  To: Gábor Boskovits; +Cc: Guix Devel

Hi Gábor,

On Fri, 10 Jan 2020 at 13:42, Gábor Boskovits <boskovits@gmail.com> wrote:

> > The modules graph (DAG) is already available. :-)
>
> The main problem here is that the modules do not form a DAG.
> There are circular dependencies between the modules.
> If those were not, then modular build would be possible, but because of
> the spaghetti we are forced to build these together.

Maybe we have a naming problem. :-)

I agree that it is not an Acyclic graph and if I understand you
correctly it is because there are cycles that the mess starts.

Using the Directed properties (but not required in fact, whatever :-),
traversing the graph detects the cycle. It is more or less what it is
done in the function `guix/import/utils.scm (topological-sort)`.

So knowing where the cycles are could help to transform the DaG (not
fully acyclic yet) to a DAG. :-)

All the best,
simon

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Speeding up “guix pull”: splitting modules
  2020-01-10 12:13   ` zimoun
@ 2020-01-10 19:02     ` Ricardo Wurmus
  0 siblings, 0 replies; 14+ messages in thread
From: Ricardo Wurmus @ 2020-01-10 19:02 UTC (permalink / raw)
  To: zimoun; +Cc: Guix Devel


zimoun <zimon.toutoune@gmail.com> writes:

> On Wed, 8 Jan 2020 at 22:50, Ludovic Courtès <ludo@gnu.org> wrote:
>> Ricardo Wurmus <rekado@elephly.net> skribis:
>
>> > On the other hand: this would need to be an ongoing effort.  Newly
>> > introduced packages or even new features might create complex module
>> > cycles.  It sounds tedious to keep track of this and to enforce
>> > boundaries.
>>
>> Yes, I think this is a dead end: glibc could well end up become on
>> Haskell (hi, Pandoc!), and then the whole module split effort collapses.
>
> What kind of metrics could help to detect which modules are going to
> the wrong way?
> For example, would some DAG post-processings help?

I don’t think it’s feasible to clean up the module graph in a way that
can be kept clean long enough.  Ideally we wouldn’t need to think about
Guile modules at all, so the best screw to turn appears to be in the
Guile compiler and not in Guix.

--
Ricardo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Speeding up “guix pull”: splitting modules
  2020-01-10 12:53         ` zimoun
@ 2020-01-11 23:29           ` Ludovic Courtès
  2020-01-20 17:44             ` zimoun
  0 siblings, 1 reply; 14+ messages in thread
From: Ludovic Courtès @ 2020-01-11 23:29 UTC (permalink / raw)
  To: zimoun; +Cc: Guix Devel

Hello,

zimoun <zimon.toutoune@gmail.com> skribis:

> So knowing where the cycles are could help to transform the DaG (not
> fully acyclic yet) to a DAG. :-)

Unfortunately, the module graph necessarily contains cycles.  The only
way to avoid them would be to have exactly one module per package, given
that the package graph is acyclic.

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Speeding up “guix pull”: splitting modules
       [not found]   ` <87zhenp01v.fsf@cbaines.net>
@ 2020-01-19 21:07     ` Ludovic Courtès
  0 siblings, 0 replies; 14+ messages in thread
From: Ludovic Courtès @ 2020-01-19 21:07 UTC (permalink / raw)
  To: Christopher Baines; +Cc: guix-devel

Hi!

Christopher Baines <mail@cbaines.net> skribis:

> Ludovic Courtès <ludo@gnu.org> writes:

[...]

>> Another less elegant option would be to resort to a service (Data
>> Service or Cuirass) that would map a Guix commit to its .drv, and then
>> fetch that .drv from substitute servers and build it.
>
> I haven't got around to storing the derivations for `guix pull` in the
> Guix Data Service, but it does provide substitutes for derivations now,
> so support for this is only one step away now.

Sweet!

>> Finally, we could/should also profile that phase and see what can be
>> done.
>>
>> Lastly :-), it would be great to profile this phase over time to see how
>> it evolves.  (Does the Guix Data Service already stores timings for such
>> things?)
>
> There are timings output in the logs for jobs ([1] for example), it's
> not really feasible to compare that across different revisions.

Right, and the timing after “Finished building the channel derivation”
is not directly useful because it depends on the state of the store (the
number of Guix dependencies that had already been built).

> I do want to improve the process of loading data in to the Guix Data
> Service though, and that might involve storing this data in a more
> structured way, so that it can be analysed more easily.

Cool.

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Speeding up “guix pull”: splitting modules
  2020-01-11 23:29           ` Ludovic Courtès
@ 2020-01-20 17:44             ` zimoun
  0 siblings, 0 replies; 14+ messages in thread
From: zimoun @ 2020-01-20 17:44 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Guix Devel

Hi Ludo,

On Sun, 12 Jan 2020 at 00:29, Ludovic Courtès <ludo@gnu.org> wrote:

> zimoun <zimon.toutoune@gmail.com> skribis:
>
> > So knowing where the cycles are could help to transform the DaG (not
> > fully acyclic yet) to a DAG. :-)
>
> Unfortunately, the module graph necessarily contains cycles.  The only
> way to avoid them would be to have exactly one module per package, given
> that the package graph is acyclic.

My mind is not clear and I miss a point.

I understand that module graph contains cycles. I am not convinced it
is *necessary* and I am not convinced neither that the only solution
is "one module per package".

Today, the structure of modules is per language and/or topic.
The union of all the package graphs which is big DAG can be
arbitrarily splited into sub-DAGs and these sub-DAGs becomes modules.
And I am not convinced that the size of these sub-DAG is only one
package.

Well, that's another story. :-)
And it will not help for speeding up.

However, working on the graph could help to identify the "hot"
packages and/or modules, IMHO.

Cheers,
simon

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-01-20 17:45 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-05 20:37 Speeding up “guix pull”: splitting modules Ricardo Wurmus
2020-01-06  9:11 ` Andy Wingo
2020-01-07 18:37 ` zimoun
2020-01-07 20:14   ` Gábor Boskovits
2020-01-10 12:09     ` zimoun
2020-01-10 12:42       ` Gábor Boskovits
2020-01-10 12:53         ` zimoun
2020-01-11 23:29           ` Ludovic Courtès
2020-01-20 17:44             ` zimoun
2020-01-08 21:50 ` Ludovic Courtès
2020-01-10 12:13   ` zimoun
2020-01-10 19:02     ` Ricardo Wurmus
2020-01-08 21:57 ` Ludovic Courtès
     [not found]   ` <87zhenp01v.fsf@cbaines.net>
2020-01-19 21:07     ` Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).