unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#44254: Performance of package input rewriting
@ 2020-10-27 13:26 Lars-Dominik Braun
  2020-10-27 14:14 ` zimoun
  2020-10-27 19:58 ` Ricardo Wurmus
  0 siblings, 2 replies; 8+ messages in thread
From: Lars-Dominik Braun @ 2020-10-27 13:26 UTC (permalink / raw)
  To: 44254

[-- Attachment #1: Type: text/plain, Size: 1126 bytes --]

Hi,

this issue is similar to https://issues.guix.gnu.org/41702, but I’m not sure
it’s exactly the same. For guix-science I’m trying to provide some packages
like python-jupyterlab, which depend on a mix of packages from guix proper and
newer versions of packages already included in guix proper. Thus I need to
rewrite inputs of the former to the latter. (Because Python only propagates
dependencies and thus collisions would occur.)

Previously I have been doing this using package-input-rewriting, but starting
an environment containing python-jupyterlab alone took about 20s (warm caches,
all derivations in the store). Manually rewriting inputs by inheriting and
alist-delete’ing brings this down to 3s, which is pretty significant.
--no-grafts has not much of an impact (15s vs 2s) here. See
https://github.com/guix-science/guix-science/commit/972795a23cc9eb5a0bb1a2ffb5681d151fc4d4b0
for the exact changes.

My expectation would be that package-input-rewriting is the preferred, because
easier, solution to this problem and thus should have minimal impact on
performance.

Cheers,
Lars


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#44254: Performance of package input rewriting
  2020-10-27 13:26 bug#44254: Performance of package input rewriting Lars-Dominik Braun
@ 2020-10-27 14:14 ` zimoun
  2020-10-28 14:19   ` Ludovic Courtès
  2020-10-27 19:58 ` Ricardo Wurmus
  1 sibling, 1 reply; 8+ messages in thread
From: zimoun @ 2020-10-27 14:14 UTC (permalink / raw)
  To: Lars-Dominik Braun, 44254

Hi Lars,

On Tue, 27 Oct 2020 at 14:26, Lars-Dominik Braun <ldb@leibniz-psychology.org> wrote:

> Previously I have been doing this using package-input-rewriting, but starting
> an environment containing python-jupyterlab alone took about 20s (warm caches,
> all derivations in the store). Manually rewriting inputs by inheriting and
> alist-delete’ing brings this down to 3s, which is pretty significant.
> --no-grafts has not much of an impact (15s vs 2s) here. See
> https://github.com/guix-science/guix-science/commit/972795a23cc9eb5a0bb1a2ffb5681d151fc4d4b0
> for the exact changes.

Is it not related to “#:deep? #t“ by default?  The default was #f.

Well, using ’inherit’ only rewrites the direct explicit dependencies.
However, ’package-input-rewriting’ traverse all the graph of
dependencies and replaces accordingly.  Maybe I misunderstand.


All the best,
simon






^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#44254: Performance of package input rewriting
  2020-10-27 13:26 bug#44254: Performance of package input rewriting Lars-Dominik Braun
  2020-10-27 14:14 ` zimoun
@ 2020-10-27 19:58 ` Ricardo Wurmus
  2020-10-30  8:42   ` Lars-Dominik Braun
  1 sibling, 1 reply; 8+ messages in thread
From: Ricardo Wurmus @ 2020-10-27 19:58 UTC (permalink / raw)
  To: Lars-Dominik Braun; +Cc: 44254


Lars-Dominik Braun <ldb@leibniz-psychology.org> writes:

> this issue is similar to https://issues.guix.gnu.org/41702, but I’m not sure
> it’s exactly the same. For guix-science I’m trying to provide some packages
> like python-jupyterlab, which depend on a mix of packages from guix proper and
> newer versions of packages already included in guix proper. Thus I need to
> rewrite inputs of the former to the latter. (Because Python only propagates
> dependencies and thus collisions would occur.)
>
> Previously I have been doing this using package-input-rewriting, but starting
> an environment containing python-jupyterlab alone took about 20s (warm caches,
> all derivations in the store). Manually rewriting inputs by inheriting and
> alist-delete’ing brings this down to 3s, which is pretty significant.

Could you show us a concrete example?  Input rewriting is recursive and
will traverse the whole package graph by default, even if you *know*
that, say, GCC doesn’t need to be rewritten.

For the more generic “package-mapping” you can provide a “cut?”
procedure to determine when to stop recursion.  Perhaps this would make
things faster in your case?

-- 
Ricardo




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#44254: Performance of package input rewriting
  2020-10-27 14:14 ` zimoun
@ 2020-10-28 14:19   ` Ludovic Courtès
  0 siblings, 0 replies; 8+ messages in thread
From: Ludovic Courtès @ 2020-10-28 14:19 UTC (permalink / raw)
  To: zimoun; +Cc: 44254, Lars-Dominik Braun

Hi,

zimoun <zimon.toutoune@gmail.com> skribis:

> On Tue, 27 Oct 2020 at 14:26, Lars-Dominik Braun <ldb@leibniz-psychology.org> wrote:
>
>> Previously I have been doing this using package-input-rewriting, but starting
>> an environment containing python-jupyterlab alone took about 20s (warm caches,
>> all derivations in the store). Manually rewriting inputs by inheriting and
>> alist-delete’ing brings this down to 3s, which is pretty significant.
>> --no-grafts has not much of an impact (15s vs 2s) here. See
>> https://github.com/guix-science/guix-science/commit/972795a23cc9eb5a0bb1a2ffb5681d151fc4d4b0
>> for the exact changes.
>
> Is it not related to “#:deep? #t“ by default?  The default was #f.

Yes, that’s a possible culprit.  Try passing #:deep? #f if it works for
your use case.

Another thing to look at is the <package> object graph (as show by ‘guix
graph’).  Input rewriting can duplicate parts of the graph, which in
turn defeats package->derivation memoization.  Just looking at the
number of nodes in the graph can give hints.

Like Ricardo wrote, it’d be great it you could share a short reproducer.

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#44254: Performance of package input rewriting
  2020-10-27 19:58 ` Ricardo Wurmus
@ 2020-10-30  8:42   ` Lars-Dominik Braun
  2020-10-31 10:27     ` Ludovic Courtès
  0 siblings, 1 reply; 8+ messages in thread
From: Lars-Dominik Braun @ 2020-10-30  8:42 UTC (permalink / raw)
  To: Ricardo Wurmus, Ludovic Courtès; +Cc: 44254

[-- Attachment #1: Type: text/plain, Size: 999 bytes --]

Hi,

> Yes, that’s a possible culprit.  Try passing #:deep? #f if it works for
> your use case.
Yeah, that brings it down to ~8s, which is still alot.

> Another thing to look at is the <package> object graph (as show by ‘guix
> graph’).  Input rewriting can duplicate parts of the graph, which in
> turn defeats package->derivation memoization.  Just looking at the
> number of nodes in the graph can give hints.
Aha, it’s 913 nodes without rewriting, 13916 with rewriting (#:deep? #t) and
4286 with rewriting (#:deep? #f) as determined by a rather ad-hoc `guix graph
-L . -t package python-jupyterlab | grep 'shape = box' | wc -l`. That seems way
too much. Does that mean I’m using package rewriting in the wrong way or is
that a bug?

Unfortunately I don’t have a short reproducer right now. I’ll look at the graph
more closely to figure out which parts are actually duplicated. Maybe I can
create a reproducing testcase with more information.

Cheers,
Lars


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#44254: Performance of package input rewriting
  2020-10-30  8:42   ` Lars-Dominik Braun
@ 2020-10-31 10:27     ` Ludovic Courtès
  2020-11-03  8:23       ` Lars-Dominik Braun
  0 siblings, 1 reply; 8+ messages in thread
From: Ludovic Courtès @ 2020-10-31 10:27 UTC (permalink / raw)
  To: Lars-Dominik Braun; +Cc: 44254

Hi Lars,

Lars-Dominik Braun <ldb@leibniz-psychology.org> skribis:

>> Another thing to look at is the <package> object graph (as show by ‘guix
>> graph’).  Input rewriting can duplicate parts of the graph, which in
>> turn defeats package->derivation memoization.  Just looking at the
>> number of nodes in the graph can give hints.
> Aha, it’s 913 nodes without rewriting, 13916 with rewriting (#:deep? #t) and
> 4286 with rewriting (#:deep? #f) as determined by a rather ad-hoc `guix graph
> -L . -t package python-jupyterlab | grep 'shape = box' | wc -l`. That seems way
> too much. Does that mean I’m using package rewriting in the wrong way or is
> that a bug?

It could be a mixture thereof.  :-)

I guess it’s easy to end up creating huge object graphs.  Here’s an
example of an anti-pattern:

  (define a
    ((package-input-rewriting x) ((package-input-rewriting y) p1))) 

  (define b
    ((package-input-rewriting x) ((package-input-rewriting y) p2)))

The correct use is:

  (define transform
    (package-input-rewriting (append x y)))

  (define a (transform p1))
  (define b (transform p2))

That guarantees that ‘a’ and ‘b’ share most of the nodes of their object
graph.

From a quick look, the code in Guix-Science seemed to be following the
pattern above.

For example, there’s:

--8<---------------cut here---------------start------------->8---
(define python-ipykernel-5.3-bootstrap
  (let ((rewritten ((package-input-rewriting
    `((,python-jupyter-client . ,python-jupyter-client-6.1-bootstrap)
     ;; Indirect through IPython.
     (,python-testpath . ,python-testpath-0.4)
     (,python-nbformat . ,python-nbformat-5.0)))
   python-ipykernel-5.3-proper)))
    (package
      (inherit rewritten)
      (name "python-ipykernel-bootstrap"))))

(define-public python-jupyter-client-6.1
  ((package-input-rewriting
   `((,python-ipykernel . ,python-ipykernel-5.3-bootstrap)
     (,python-jupyter-core . ,python-jupyter-core-4.6)
     ;; Indirect through IPython.
     (,python-testpath . ,python-testpath-0.4)
     (,python-nbformat . ,python-nbformat-5.0)))
   python-jupyter-client-6.1-proper))

(define-public python-ipykernel-5.3
  ((package-input-rewriting
   `((,python-jupyter-client . ,python-jupyter-client-6.1)
     ;; Indirect through IPython.
     (,python-testpath . ,python-testpath-0.4)
     (,python-nbformat . ,python-nbformat-5.0)))
   python-ipykernel-5.3-proper))
--8<---------------cut here---------------end--------------->8---

It seems to me that you’re redefining a dependency graph, node by node.
Thus, you probably don’t need ‘package-input-rewriting’ here.  What you
did in Guix-Science commit 972795a23cc9eb5a0bb1a2ffb5681d151fc4d4b0
looks more appropriate to me, in terms of style and semantics.

Does that make sense?

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#44254: Performance of package input rewriting
  2020-10-31 10:27     ` Ludovic Courtès
@ 2020-11-03  8:23       ` Lars-Dominik Braun
  2020-11-03  9:32         ` Ludovic Courtès
  0 siblings, 1 reply; 8+ messages in thread
From: Lars-Dominik Braun @ 2020-11-03  8:23 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 44254

[-- Attachment #1: Type: text/plain, Size: 1239 bytes --]

Hi Ludo,

> I guess it’s easy to end up creating huge object graphs.  Here’s an
> example of an anti-pattern:
> 
>   (define a
>     ((package-input-rewriting x) ((package-input-rewriting y) p1))) 
> 
>   (define b
>     ((package-input-rewriting x) ((package-input-rewriting y) p2)))
> 
> The correct use is:
> 
>   (define transform
>     (package-input-rewriting (append x y)))
> 
>   (define a (transform p1))
>   (define b (transform p2))
that sounds like a section for the cookbook :)

> It seems to me that you’re redefining a dependency graph, node by node.
> Thus, you probably don’t need ‘package-input-rewriting’ here.  What you
> did in Guix-Science commit 972795a23cc9eb5a0bb1a2ffb5681d151fc4d4b0
> looks more appropriate to me, in terms of style and semantics.
Okay, got it. My initial concern was that rewriting the graph “by hand” (i.e.
alist-delete) would be tedious and error-prone.

Thank you very much,
Lars

-- 
Lars-Dominik Braun
Wissenschaftlicher Mitarbeiter/Research Associate
www.leibniz-psychology.org
ZPID - Leibniz-Institut für Psychologie /
ZPID - Leibniz Institute for Psychology
Universitätsring 15
D-54296 Trier - Germany
Tel.: +49–651–201-4964

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#44254: Performance of package input rewriting
  2020-11-03  8:23       ` Lars-Dominik Braun
@ 2020-11-03  9:32         ` Ludovic Courtès
  0 siblings, 0 replies; 8+ messages in thread
From: Ludovic Courtès @ 2020-11-03  9:32 UTC (permalink / raw)
  To: Lars-Dominik Braun; +Cc: 44254

Hi,

Lars-Dominik Braun <ldb@leibniz-psychology.org> skribis:

>> I guess it’s easy to end up creating huge object graphs.  Here’s an
>> example of an anti-pattern:
>> 
>>   (define a
>>     ((package-input-rewriting x) ((package-input-rewriting y) p1))) 
>> 
>>   (define b
>>     ((package-input-rewriting x) ((package-input-rewriting y) p2)))
>> 
>> The correct use is:
>> 
>>   (define transform
>>     (package-input-rewriting (append x y)))
>> 
>>   (define a (transform p1))
>>   (define b (transform p2))
> that sounds like a section for the cookbook :)

Note that there’s a new section in the manual on this topic:

  https://guix.gnu.org/manual/devel/en/html_node/Defining-Package-Variants.html

>> It seems to me that you’re redefining a dependency graph, node by node.
>> Thus, you probably don’t need ‘package-input-rewriting’ here.  What you
>> did in Guix-Science commit 972795a23cc9eb5a0bb1a2ffb5681d151fc4d4b0
>> looks more appropriate to me, in terms of style and semantics.
> Okay, got it. My initial concern was that rewriting the graph “by hand” (i.e.
> alist-delete) would be tedious and error-prone.

I haven’t looked closely enough.  If you can define a single procedure
that rewrites the graph, that’s of course better than rewriting nodes
one by one.  Maybe that’s possible, but you need to be careful about
factorizing the transformation procedure as I shown above.

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-11-03  9:33 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-27 13:26 bug#44254: Performance of package input rewriting Lars-Dominik Braun
2020-10-27 14:14 ` zimoun
2020-10-28 14:19   ` Ludovic Courtès
2020-10-27 19:58 ` Ricardo Wurmus
2020-10-30  8:42   ` Lars-Dominik Braun
2020-10-31 10:27     ` Ludovic Courtès
2020-11-03  8:23       ` Lars-Dominik Braun
2020-11-03  9:32         ` Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).