From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mark H Weaver Subject: Re: Optimizing union.scm Date: Tue, 25 Mar 2014 18:30:42 -0400 Message-ID: <87ob0tkj1p.fsf@yeeloong.lan> References: <871tz8oldk.fsf@netris.org> <874n2oubuq.fsf@gnu.org> <8761n4mzvu.fsf@yeeloong.lan> <87siq8r77u.fsf@gnu.org> <87r45sl074.fsf_-_@yeeloong.lan> <87ha6nzp58.fsf@gnu.org> <87txamkbde.fsf@yeeloong.lan> <87bnwugptc.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:42802) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WSZsf-0000jH-MH for guix-devel@gnu.org; Tue, 25 Mar 2014 18:31:42 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WSZsa-0005SF-Bi for guix-devel@gnu.org; Tue, 25 Mar 2014 18:31:37 -0400 In-Reply-To: <87bnwugptc.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Tue, 25 Mar 2014 18:18:07 +0100") List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org To: Ludovic =?utf-8?Q?Court=C3=A8s?= Cc: guix-devel@gnu.org ludo@gnu.org (Ludovic Court=C3=A8s) writes: > Mark H Weaver skribis: > >> Here's a first draft of a new union.scm. > > Thanks for being so fast! > >> On my YeeLoong 8101B, when building my 155-package profile, it uses >> about 1/14 as much CPU time (25 seconds vs 6 minutes), and about 2/7 as >> much real time (2 minutes vs about 7.17 minutes). It also handles >> libffi in core-updates properly. A few notes: > > That=E2=80=99s good news, and definitely an incentive to switch to this > implementation. > >> * For now, I'm using a hash table, because it turned out to be rather >> awkward to use a vhash for this. I can make it purely functional >> later if it's deemed important. > > I would prefer it, but it must not be blocking. Okay, I'll work on it. >> * I've not yet updated tests/union.scm, which tests internal procedures >> in union.scm that no longer exist. > > Right, the =E2=80=98tree-union=E2=80=99 and =E2=80=98delete-duplicate-lea= ves=E2=80=99 tests will have to > be removed. > > However, could you add similar tests? They will have to use the real > =E2=80=98union-build=E2=80=99, though. Maybe the =E2=80=98with-file-tree= =E2=80=99 can be useful to > write the tests. Could you help with this? I'm a bit overloaded right now. [...] > Some stylistic comments: > >> (define (names-in-directory dirname) >> (let ((dir (opendir dirname))) >> (let loop ((filenames '())) >> (match (readdir dir) >> ((or "." "..") >> (loop filenames)) >> ((? eof-object?) >> (closedir dir) >> filenames) >> (name >> (loop (cons name filenames))))))) > > Rather use something like: > > (scandir directory (lambda (file) > (not (member file '("." ".."))))) > > =E2=80=98scandir=E2=80=99 also has the advantage of being deterministic (= it sorts > entries.) I looked at 'scandir', but it's very wasteful. For one thing, it unconditionally calls 'lstat' on every directory entry, which entails over ten thousand unnecessary 'lstat' system calls, a lot of unnecessary I/O to read the inodes, the use of a vhash to keep track of where it has been to detect cycles (since it's based on 'file-system-fold'), etc. >> (define (directory? name) >> (eq? 'directory (stat:type (stat name)))) > > Rather =E2=80=98file-is-directory?=E2=80=99. Okay. >> (define* (union-build output inputs >> #:key (log-port (current-error-port))) > > Docstring. Oops, I removed yours by mistake. >> (define (make-link input output) > > Maybe =E2=80=98symlink*=E2=80=99? (=E2=80=98make-link=E2=80=99 sounds li= ke it returns a newly-allocated > object.) Okay. >> (format log-port "`~a' ~~> `~a'~%" input output) >> (symlink input output)) >> >> (define (union output inputs) >> (match inputs >> ((input) >> ;; There's only one input, so just make a link. >> (make-link input output)) >> (_ >> (receive (dirs files) (partition directory? inputs) > > Rather SRFI-11 let-values. Hmm. In simple cases like this, I find 'receive' much more attractive than 'let-values', since the latter would involve triple-nested parens, which is a bit much even for me. Can you explain why you prefer 'let-values' to 'receive' in this case? Also, how do you feel about 'call-with-values'? In an earlier draft of this code, I used 'call-with-values' with 'match-lambda*' as the consumer, which eliminated the need for the 'cond'. Would you prefer that? >> (cond >> ((null? files) >> >> ;; All inputs are directories. Create a new directory >> ;; where we will merge the input directories. >> (mkdir output) >> >> ;; Build a hash table mapping each name to a list of input >> ;; directories containing that name. >> (let ((table (make-hash-table))) >> >> (define (add-to-table! name dir) >> (let ((handle (hash-create-handle! table name '()))) >> (set-cdr! handle (cons dir (cdr handle))))) > > Rather =E2=80=98hash-set!=E2=80=99 (=E2=80=98set-cdr!=E2=80=99 is doomed,= you know ;-)). That would double the number of hash table lookups. Anyway, I plan to eliminate the use of the hash table altogether, so the point is moot :) >> (format (current-error-port) >> "warning: collision encountered: ~{~a ~}~%" >> files) >> >> ;; TODO: Implement smarter strategies. >> (format (current-error-port) >> "warning: arbitrarily choosing ~a~%" >> file) > > Can we keep that as a separate, possibly inner =E2=80=98resolve-collision= s=E2=80=99 > procedure? That makes it clear what to extend, when we want to > implement that (the original idea was that =E2=80=98union-build=E2=80=99 = could be passed > a collision-resolution procedure.) > >> (make-link file output)))) >> >> (else >> ;; The inputs are a mixture of files and directories >> (error "union-build: collision between file and directories" >> `((files ,files) (dirs ,dirs))))))))) > > Same for this one. Okay. Thanks for the quick review :) Mark