From: ludo@gnu.org (Ludovic Courtès)
To: Mark H Weaver <mhw@netris.org>
Cc: guix-devel@gnu.org
Subject: Re: Optimizing union.scm
Date: Tue, 25 Mar 2014 18:18:07 +0100 [thread overview]
Message-ID: <87bnwugptc.fsf@gnu.org> (raw)
In-Reply-To: <87txamkbde.fsf@yeeloong.lan> (Mark H. Weaver's message of "Tue, 25 Mar 2014 03:04:13 -0400")
Mark H Weaver <mhw@netris.org> skribis:
> Here's a first draft of a new union.scm.
Thanks for being so fast!
> On my YeeLoong 8101B, when building my 155-package profile, it uses
> about 1/14 as much CPU time (25 seconds vs 6 minutes), and about 2/7 as
> much real time (2 minutes vs about 7.17 minutes). It also handles
> libffi in core-updates properly. A few notes:
That’s good news, and definitely an incentive to switch to this
implementation.
> * For now, I'm using a hash table, because it turned out to be rather
> awkward to use a vhash for this. I can make it purely functional
> later if it's deemed important.
I would prefer it, but it must not be blocking.
> * I've not yet updated tests/union.scm, which tests internal procedures
> in union.scm that no longer exist.
Right, the ‘tree-union’ and ‘delete-duplicate-leaves’ tests will have to
be removed.
However, could you add similar tests? They will have to use the real
‘union-build’, though. Maybe the ‘with-file-tree’ can be useful to
write the tests.
> * Although I can now run union-build in 2 minutes, another full minute
> is spent in 'display-search-paths', plus another 30-40 seconds in
> 'guix package' before the profile build begins.
Ouch, you know where to look afterwards. :-)
> In the end, the total time for "guix package -i" goes from around 8.5
> minutes to around 3.5 minutes: a significant improvement, but there's
> still more code for me to look at trying to optimize.
Excellent!
Some stylistic comments:
> (define (names-in-directory dirname)
> (let ((dir (opendir dirname)))
> (let loop ((filenames '()))
> (match (readdir dir)
> ((or "." "..")
> (loop filenames))
> ((? eof-object?)
> (closedir dir)
> filenames)
> (name
> (loop (cons name filenames)))))))
Rather use something like:
(scandir directory (lambda (file)
(not (member file '("." "..")))))
‘scandir’ also has the advantage of being deterministic (it sorts
entries.)
> (define (directory? name)
> (eq? 'directory (stat:type (stat name))))
Rather ‘file-is-directory?’.
> (define* (union-build output inputs
> #:key (log-port (current-error-port)))
Docstring.
> (define (make-link input output)
Maybe ‘symlink*’? (‘make-link’ sounds like it returns a newly-allocated
object.)
> (format log-port "`~a' ~~> `~a'~%" input output)
> (symlink input output))
>
> (define (union output inputs)
> (match inputs
> ((input)
> ;; There's only one input, so just make a link.
> (make-link input output))
> (_
> (receive (dirs files) (partition directory? inputs)
Rather SRFI-11 let-values.
> (cond
> ((null? files)
>
> ;; All inputs are directories. Create a new directory
> ;; where we will merge the input directories.
> (mkdir output)
>
> ;; Build a hash table mapping each name to a list of input
> ;; directories containing that name.
> (let ((table (make-hash-table)))
>
> (define (add-to-table! name dir)
> (let ((handle (hash-create-handle! table name '())))
> (set-cdr! handle (cons dir (cdr handle)))))
Rather ‘hash-set!’ (‘set-cdr!’ is doomed, you know ;-)).
> (format (current-error-port)
> "warning: collision encountered: ~{~a ~}~%"
> files)
>
> ;; TODO: Implement smarter strategies.
> (format (current-error-port)
> "warning: arbitrarily choosing ~a~%"
> file)
Can we keep that as a separate, possibly inner ‘resolve-collisions’
procedure? That makes it clear what to extend, when we want to
implement that (the original idea was that ‘union-build’ could be passed
a collision-resolution procedure.)
> (make-link file output))))
>
> (else
> ;; The inputs are a mixture of files and directories
> (error "union-build: collision between file and directories"
> `((files ,files) (dirs ,dirs)))))))))
Same for this one.
I like that it’s smaller that the current union.scm. :-)
Could you please update it and tests/union.scm, and then I think we’ll
be all set?
Thank you!
Ludo’.
next prev parent reply other threads:[~2014-03-25 17:19 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-02-12 7:12 Problems with handicapped 'bash' from glibc package Mark H Weaver
2014-02-12 13:14 ` Ludovic Courtès
2014-02-12 17:39 ` Mark H Weaver
2014-02-12 19:31 ` Andreas Enge
2014-02-12 20:33 ` Ludovic Courtès
2014-02-12 21:33 ` Mark H Weaver
2014-02-13 9:14 ` Andreas Enge
2014-03-23 16:19 ` Ludovic Courtès
2014-03-23 20:19 ` Mark H Weaver
2014-03-23 20:27 ` Ludovic Courtès
2014-03-24 3:31 ` Mark H Weaver
2014-03-28 13:48 ` Ludovic Courtès
2014-03-24 3:55 ` Optimizing union.scm Mark H Weaver
2014-03-24 13:45 ` Ludovic Courtès
2014-03-25 7:04 ` Mark H Weaver
2014-03-25 17:18 ` Ludovic Courtès [this message]
2014-03-25 22:30 ` Mark H Weaver
2014-03-25 22:58 ` Ludovic Courtès
2014-03-27 7:09 ` Mark H Weaver
2014-03-27 9:57 ` Ludovic Courtès
2014-04-02 14:14 ` Optimizing ‘guix package’ Ludovic Courtès
2014-04-02 16:58 ` Mark H Weaver
2014-03-26 23:29 ` Problems with handicapped 'bash' from glibc package Ludovic Courtès
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87bnwugptc.fsf@gnu.org \
--to=ludo@gnu.org \
--cc=guix-devel@gnu.org \
--cc=mhw@netris.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/guix.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.