unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Mark H Weaver <mhw@netris.org>
To: "Ludovic Courtès" <ludo@gnu.org>
Cc: guix-devel@gnu.org
Subject: Re: Optimizing union.scm
Date: Tue, 25 Mar 2014 18:30:42 -0400	[thread overview]
Message-ID: <87ob0tkj1p.fsf@yeeloong.lan> (raw)
In-Reply-To: <87bnwugptc.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Tue, 25 Mar 2014 18:18:07 +0100")

ludo@gnu.org (Ludovic Courtès) writes:

> Mark H Weaver <mhw@netris.org> skribis:
>
>> Here's a first draft of a new union.scm.
>
> Thanks for being so fast!
>
>> On my YeeLoong 8101B, when building my 155-package profile, it uses
>> about 1/14 as much CPU time (25 seconds vs 6 minutes), and about 2/7 as
>> much real time (2 minutes vs about 7.17 minutes).  It also handles
>> libffi in core-updates properly.  A few notes:
>
> That’s good news, and definitely an incentive to switch to this
> implementation.
>
>> * For now, I'm using a hash table, because it turned out to be rather
>>   awkward to use a vhash for this.  I can make it purely functional
>>   later if it's deemed important.
>
> I would prefer it, but it must not be blocking.

Okay, I'll work on it.

>> * I've not yet updated tests/union.scm, which tests internal procedures
>>   in union.scm that no longer exist.
>
> Right, the ‘tree-union’ and ‘delete-duplicate-leaves’ tests will have to
> be removed.
>
> However, could you add similar tests?  They will have to use the real
> ‘union-build’, though.  Maybe the ‘with-file-tree’ can be useful to
> write the tests.

Could you help with this?  I'm a bit overloaded right now.

[...]

> Some stylistic comments:
>
>> (define (names-in-directory dirname)
>>   (let ((dir (opendir dirname)))
>>     (let loop ((filenames '()))
>>       (match (readdir dir)
>>         ((or "." "..")
>>          (loop filenames))
>>         ((? eof-object?)
>>          (closedir dir)
>>          filenames)
>>         (name
>>          (loop (cons name filenames)))))))
>
> Rather use something like:
>
>   (scandir directory (lambda (file)
>                        (not (member file '("." "..")))))
>
> ‘scandir’ also has the advantage of being deterministic (it sorts
> entries.)

I looked at 'scandir', but it's very wasteful.  For one thing, it
unconditionally calls 'lstat' on every directory entry, which entails
over ten thousand unnecessary 'lstat' system calls, a lot of unnecessary
I/O to read the inodes, the use of a vhash to keep track of where it has
been to detect cycles (since it's based on 'file-system-fold'), etc.

>> (define (directory? name)
>>   (eq? 'directory (stat:type (stat name))))
>
> Rather ‘file-is-directory?’.

Okay.

>> (define* (union-build output inputs
>>                       #:key (log-port (current-error-port)))
>
> Docstring.

Oops, I removed yours by mistake.

>>   (define (make-link input output)
>
> Maybe ‘symlink*’?  (‘make-link’ sounds like it returns a newly-allocated
> object.)

Okay.

>>     (format log-port "`~a' ~~> `~a'~%" input output)
>>     (symlink input output))
>>
>>   (define (union output inputs)
>>     (match inputs
>>       ((input)
>>        ;; There's only one input, so just make a link.
>>        (make-link input output))
>>       (_
>>        (receive (dirs files) (partition directory? inputs)
>
> Rather SRFI-11 let-values.

Hmm.  In simple cases like this, I find 'receive' much more attractive
than 'let-values', since the latter would involve triple-nested parens,
which is a bit much even for me.

Can you explain why you prefer 'let-values' to 'receive' in this case?

Also, how do you feel about 'call-with-values'?  In an earlier draft of
this code, I used 'call-with-values' with 'match-lambda*' as the
consumer, which eliminated the need for the 'cond'.  Would you prefer
that?

>>          (cond
>>           ((null? files)
>>
>>            ;; All inputs are directories.  Create a new directory
>>            ;; where we will merge the input directories.
>>            (mkdir output)
>>
>>            ;; Build a hash table mapping each name to a list of input
>>            ;; directories containing that name.
>>            (let ((table (make-hash-table)))
>>
>>              (define (add-to-table! name dir)
>>                (let ((handle (hash-create-handle! table name '())))
>>                  (set-cdr! handle (cons dir (cdr handle)))))
>
> Rather ‘hash-set!’ (‘set-cdr!’ is doomed, you know ;-)).

That would double the number of hash table lookups.  Anyway, I plan to
eliminate the use of the hash table altogether, so the point is moot :)

>>               (format (current-error-port)
>>                       "warning: collision encountered: ~{~a ~}~%"
>>                       files)
>>
>>               ;; TODO: Implement smarter strategies.
>>               (format (current-error-port)
>>                       "warning: arbitrarily choosing ~a~%"
>>                       file)
>
> Can we keep that as a separate, possibly inner ‘resolve-collisions’
> procedure?  That makes it clear what to extend, when we want to
> implement that (the original idea was that ‘union-build’ could be passed
> a collision-resolution procedure.)
>
>>               (make-link file output))))
>>
>>           (else
>>            ;; The inputs are a mixture of files and directories
>>            (error "union-build: collision between file and directories"
>>                   `((files ,files) (dirs ,dirs)))))))))
>
> Same for this one.

Okay.

Thanks for the quick review :)

      Mark

  reply	other threads:[~2014-03-25 22:31 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-12  7:12 Problems with handicapped 'bash' from glibc package Mark H Weaver
2014-02-12 13:14 ` Ludovic Courtès
2014-02-12 17:39   ` Mark H Weaver
2014-02-12 19:31     ` Andreas Enge
2014-02-12 20:33       ` Ludovic Courtès
2014-02-12 21:33         ` Mark H Weaver
2014-02-13  9:14           ` Andreas Enge
2014-03-23 16:19 ` Ludovic Courtès
2014-03-23 20:19   ` Mark H Weaver
2014-03-23 20:27     ` Ludovic Courtès
2014-03-24  3:31       ` Mark H Weaver
2014-03-28 13:48         ` Ludovic Courtès
2014-03-24  3:55       ` Optimizing union.scm Mark H Weaver
2014-03-24 13:45         ` Ludovic Courtès
2014-03-25  7:04           ` Mark H Weaver
2014-03-25 17:18             ` Ludovic Courtès
2014-03-25 22:30               ` Mark H Weaver [this message]
2014-03-25 22:58                 ` Ludovic Courtès
2014-03-27  7:09                   ` Mark H Weaver
2014-03-27  9:57                     ` Ludovic Courtès
2014-04-02 14:14             ` Optimizing ‘guix package’ Ludovic Courtès
2014-04-02 16:58               ` Mark H Weaver
2014-03-26 23:29   ` Problems with handicapped 'bash' from glibc package Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ob0tkj1p.fsf@yeeloong.lan \
    --to=mhw@netris.org \
    --cc=guix-devel@gnu.org \
    --cc=ludo@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).