From mboxrd@z Thu Jan 1 00:00:00 1970 From: ludo@gnu.org (Ludovic =?utf-8?Q?Court=C3=A8s?=) Subject: Re: Optimizing union.scm Date: Tue, 25 Mar 2014 18:18:07 +0100 Message-ID: <87bnwugptc.fsf@gnu.org> References: <871tz8oldk.fsf@netris.org> <874n2oubuq.fsf@gnu.org> <8761n4mzvu.fsf@yeeloong.lan> <87siq8r77u.fsf@gnu.org> <87r45sl074.fsf_-_@yeeloong.lan> <87ha6nzp58.fsf@gnu.org> <87txamkbde.fsf@yeeloong.lan> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:48569) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WSUzq-0007v8-LW for guix-devel@gnu.org; Tue, 25 Mar 2014 13:19:13 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WSUzJ-0000bL-IY for guix-devel@gnu.org; Tue, 25 Mar 2014 13:18:42 -0400 Received: from hera.aquilenet.fr ([2a01:474::1]:47583) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WSUzJ-0000ar-4g for guix-devel@gnu.org; Tue, 25 Mar 2014 13:18:09 -0400 In-Reply-To: <87txamkbde.fsf@yeeloong.lan> (Mark H. Weaver's message of "Tue, 25 Mar 2014 03:04:13 -0400") List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org To: Mark H Weaver Cc: guix-devel@gnu.org Mark H Weaver skribis: > Here's a first draft of a new union.scm. Thanks for being so fast! > On my YeeLoong 8101B, when building my 155-package profile, it uses > about 1/14 as much CPU time (25 seconds vs 6 minutes), and about 2/7 as > much real time (2 minutes vs about 7.17 minutes). It also handles > libffi in core-updates properly. A few notes: That=E2=80=99s good news, and definitely an incentive to switch to this implementation. > * For now, I'm using a hash table, because it turned out to be rather > awkward to use a vhash for this. I can make it purely functional > later if it's deemed important. I would prefer it, but it must not be blocking. > * I've not yet updated tests/union.scm, which tests internal procedures > in union.scm that no longer exist. Right, the =E2=80=98tree-union=E2=80=99 and =E2=80=98delete-duplicate-leave= s=E2=80=99 tests will have to be removed. However, could you add similar tests? They will have to use the real =E2=80=98union-build=E2=80=99, though. Maybe the =E2=80=98with-file-tree= =E2=80=99 can be useful to write the tests. > * Although I can now run union-build in 2 minutes, another full minute > is spent in 'display-search-paths', plus another 30-40 seconds in > 'guix package' before the profile build begins. Ouch, you know where to look afterwards. :-) > In the end, the total time for "guix package -i" goes from around 8.5 > minutes to around 3.5 minutes: a significant improvement, but there's > still more code for me to look at trying to optimize. Excellent! Some stylistic comments: > (define (names-in-directory dirname) > (let ((dir (opendir dirname))) > (let loop ((filenames '())) > (match (readdir dir) > ((or "." "..") > (loop filenames)) > ((? eof-object?) > (closedir dir) > filenames) > (name > (loop (cons name filenames))))))) Rather use something like: (scandir directory (lambda (file) (not (member file '("." ".."))))) =E2=80=98scandir=E2=80=99 also has the advantage of being deterministic (it= sorts entries.) > (define (directory? name) > (eq? 'directory (stat:type (stat name)))) Rather =E2=80=98file-is-directory?=E2=80=99. > (define* (union-build output inputs > #:key (log-port (current-error-port))) Docstring. > (define (make-link input output) Maybe =E2=80=98symlink*=E2=80=99? (=E2=80=98make-link=E2=80=99 sounds like= it returns a newly-allocated object.) > (format log-port "`~a' ~~> `~a'~%" input output) > (symlink input output)) > > (define (union output inputs) > (match inputs > ((input) > ;; There's only one input, so just make a link. > (make-link input output)) > (_ > (receive (dirs files) (partition directory? inputs) Rather SRFI-11 let-values. > (cond > ((null? files) > > ;; All inputs are directories. Create a new directory > ;; where we will merge the input directories. > (mkdir output) > > ;; Build a hash table mapping each name to a list of input > ;; directories containing that name. > (let ((table (make-hash-table))) > > (define (add-to-table! name dir) > (let ((handle (hash-create-handle! table name '()))) > (set-cdr! handle (cons dir (cdr handle))))) Rather =E2=80=98hash-set!=E2=80=99 (=E2=80=98set-cdr!=E2=80=99 is doomed, y= ou know ;-)). > (format (current-error-port) > "warning: collision encountered: ~{~a ~}~%" > files) > > ;; TODO: Implement smarter strategies. > (format (current-error-port) > "warning: arbitrarily choosing ~a~%" > file) Can we keep that as a separate, possibly inner =E2=80=98resolve-collisions= =E2=80=99 procedure? That makes it clear what to extend, when we want to implement that (the original idea was that =E2=80=98union-build=E2=80=99 co= uld be passed a collision-resolution procedure.) > (make-link file output)))) > > (else > ;; The inputs are a mixture of files and directories > (error "union-build: collision between file and directories" > `((files ,files) (dirs ,dirs))))))))) Same for this one. I like that it=E2=80=99s smaller that the current union.scm. :-) Could you please update it and tests/union.scm, and then I think we=E2=80= =99ll be all set? Thank you! Ludo=E2=80=99.