From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mark H Weaver Subject: Re: Optimizing union.scm Date: Tue, 25 Mar 2014 03:04:13 -0400 Message-ID: <87txamkbde.fsf@yeeloong.lan> References: <871tz8oldk.fsf@netris.org> <874n2oubuq.fsf@gnu.org> <8761n4mzvu.fsf@yeeloong.lan> <87siq8r77u.fsf@gnu.org> <87r45sl074.fsf_-_@yeeloong.lan> <87ha6nzp58.fsf@gnu.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:60766) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WSLQC-0000x0-Km for guix-devel@gnu.org; Tue, 25 Mar 2014 03:05:21 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WSLQ7-0000l1-7o for guix-devel@gnu.org; Tue, 25 Mar 2014 03:05:16 -0400 In-Reply-To: <87ha6nzp58.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Mon, 24 Mar 2014 14:45:23 +0100") List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org To: Ludovic =?utf-8?Q?Court=C3=A8s?= Cc: guix-devel@gnu.org --=-=-= Content-Type: text/plain Here's a first draft of a new union.scm. On my YeeLoong 8101B, when building my 155-package profile, it uses about 1/14 as much CPU time (25 seconds vs 6 minutes), and about 2/7 as much real time (2 minutes vs about 7.17 minutes). It also handles libffi in core-updates properly. A few notes: * For now, I'm using a hash table, because it turned out to be rather awkward to use a vhash for this. I can make it purely functional later if it's deemed important. * I've not yet updated tests/union.scm, which tests internal procedures in union.scm that no longer exist. * Although I can now run union-build in 2 minutes, another full minute is spent in 'display-search-paths', plus another 30-40 seconds in 'guix package' before the profile build begins. In the end, the total time for "guix package -i" goes from around 8.5 minutes to around 3.5 minutes: a significant improvement, but there's still more code for me to look at trying to optimize. Instead of a patch, I've simply attached the new union.scm. The only code they have in common is 'file=?', which is unchanged. Comments and suggestions welcome, Mark --=-=-= Content-Type: text/plain; charset=utf-8 Content-Disposition: inline; filename=union.scm Content-Transfer-Encoding: quoted-printable Content-Description: New optimized union.scm ;;; GNU Guix --- Functional package management for GNU ;;; Copyright =C2=A9 2012, 2013, 2014 Ludovic Court=C3=A8s ;;; Copyright =C2=A9 2014 Mark H Weaver ;;; ;;; This file is part of GNU Guix. ;;; ;;; GNU Guix is free software; you can redistribute it and/or modify it ;;; under the terms of the GNU General Public License as published by ;;; the Free Software Foundation; either version 3 of the License, or (at ;;; your option) any later version. ;;; ;;; GNU Guix is distributed in the hope that it will be useful, but ;;; WITHOUT ANY WARRANTY; without even the implied warranty of ;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ;;; GNU General Public License for more details. ;;; ;;; You should have received a copy of the GNU General Public License ;;; along with GNU Guix. If not, see . (define-module (guix build union) #:use-module (ice-9 match) #:use-module (ice-9 format) #:use-module (ice-9 receive) #:use-module (srfi srfi-1) #:use-module (srfi srfi-26) #:use-module (rnrs bytevectors) #:use-module (rnrs io ports) #:export (union-build)) ;;; Commentary: ;;; ;;; Build a directory that is the union of a set of directories, using ;;; symbolic links. ;;; ;;; Code: (define (names-in-directory dirname) (let ((dir (opendir dirname))) (let loop ((filenames '())) (match (readdir dir) ((or "." "..") (loop filenames)) ((? eof-object?) (closedir dir) filenames) (name (loop (cons name filenames))))))) (define (directory? name) (eq? 'directory (stat:type (stat name)))) (define (file=3D? file1 file2) "Return #t if FILE1 and FILE2 are regular files and their contents are identical, #f otherwise." (let ((st1 (stat file1)) (st2 (stat file2))) (and (eq? (stat:type st1) 'regular) (eq? (stat:type st2) 'regular) (=3D (stat:size st1) (stat:size st2)) (call-with-input-file file1 (lambda (port1) (call-with-input-file file2 (lambda (port2) (define len 8192) (define buf1 (make-bytevector len)) (define buf2 (make-bytevector len)) (let loop () (let ((n1 (get-bytevector-n! port1 buf1 0 len)) (n2 (get-bytevector-n! port2 buf2 0 len))) (and (equal? n1 n2) (or (eof-object? n1) (loop)))))))))))) (define* (union-build output inputs #:key (log-port (current-error-port))) (define (make-link input output) (format log-port "`~a' ~~> `~a'~%" input output) (symlink input output)) (define (union output inputs) (match inputs ((input) ;; There's only one input, so just make a link. (make-link input output)) (_ (receive (dirs files) (partition directory? inputs) (cond ((null? files) ;; All inputs are directories. Create a new directory ;; where we will merge the input directories. (mkdir output) ;; Build a hash table mapping each name to a list of input ;; directories containing that name. (let ((table (make-hash-table))) (define (add-to-table! name dir) (let ((handle (hash-create-handle! table name '()))) (set-cdr! handle (cons dir (cdr handle))))) ;; Populate the table. (for-each (lambda (dir) (for-each (cut add-to-table! <> dir) (names-in-directory dir))) dirs) ;; Now iterate over the table and recursively ;; perform a union for each entry. (hash-for-each (lambda (name dirs-with-name) (union (string-append output "/" name) (map (cut string-append <> "/" name) (reverse dirs-with-name)))) table))) ((null? dirs) ;; All inputs are non-directories. We assume they are files. (match files ((file (? (cut file=3D? <> file)) ...) ;; All files have the same contents, so there's no conflict. (make-link file output)) ((file other-files ...) (format (current-error-port) "warning: collision encountered: ~{~a ~}~%" files) ;; TODO: Implement smarter strategies. (format (current-error-port) "warning: arbitrarily choosing ~a~%" file) (make-link file output)))) (else ;; The inputs are a mixture of files and directories (error "union-build: collision between file and directories" `((files ,files) (dirs ,dirs))))))))) (setvbuf (current-output-port) _IOLBF) (setvbuf (current-error-port) _IOLBF) (when (file-port? log-port) (setvbuf log-port _IOLBF)) (union output (delete-duplicates inputs))) ;;; union.scm ends here --=-=-=--