all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Mark H Weaver <mhw@netris.org>
To: "Ludovic Courtès" <ludo@gnu.org>
Cc: guix-devel@gnu.org
Subject: Re: Optimizing union.scm
Date: Tue, 25 Mar 2014 03:04:13 -0400	[thread overview]
Message-ID: <87txamkbde.fsf@yeeloong.lan> (raw)
In-Reply-To: <87ha6nzp58.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Mon, 24 Mar 2014 14:45:23 +0100")

[-- Attachment #1: Type: text/plain, Size: 1141 bytes --]

Here's a first draft of a new union.scm.

On my YeeLoong 8101B, when building my 155-package profile, it uses
about 1/14 as much CPU time (25 seconds vs 6 minutes), and about 2/7 as
much real time (2 minutes vs about 7.17 minutes).  It also handles
libffi in core-updates properly.  A few notes:

* For now, I'm using a hash table, because it turned out to be rather
  awkward to use a vhash for this.  I can make it purely functional
  later if it's deemed important.

* I've not yet updated tests/union.scm, which tests internal procedures
  in union.scm that no longer exist.

* Although I can now run union-build in 2 minutes, another full minute
  is spent in 'display-search-paths', plus another 30-40 seconds in
  'guix package' before the profile build begins.

In the end, the total time for "guix package -i" goes from around 8.5
minutes to around 3.5 minutes: a significant improvement, but there's
still more code for me to look at trying to optimize.

Instead of a patch, I've simply attached the new union.scm.  The only
code they have in common is 'file=?', which is unchanged.

Comments and suggestions welcome,

      Mark


[-- Attachment #2: New optimized union.scm --]
[-- Type: text/plain, Size: 5396 bytes --]

;;; GNU Guix --- Functional package management for GNU
;;; Copyright © 2012, 2013, 2014 Ludovic Courtès <ludo@gnu.org>
;;; Copyright © 2014 Mark H Weaver <mhw@netris.org>
;;;
;;; This file is part of GNU Guix.
;;;
;;; GNU Guix is free software; you can redistribute it and/or modify it
;;; under the terms of the GNU General Public License as published by
;;; the Free Software Foundation; either version 3 of the License, or (at
;;; your option) any later version.
;;;
;;; GNU Guix is distributed in the hope that it will be useful, but
;;; WITHOUT ANY WARRANTY; without even the implied warranty of
;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
;;; GNU General Public License for more details.
;;;
;;; You should have received a copy of the GNU General Public License
;;; along with GNU Guix.  If not, see <http://www.gnu.org/licenses/>.

(define-module (guix build union)
  #:use-module (ice-9 match)
  #:use-module (ice-9 format)
  #:use-module (ice-9 receive)
  #:use-module (srfi srfi-1)
  #:use-module (srfi srfi-26)
  #:use-module (rnrs bytevectors)
  #:use-module (rnrs io ports)
  #:export (union-build))

;;; Commentary:
;;;
;;; Build a directory that is the union of a set of directories, using
;;; symbolic links.
;;;
;;; Code:

(define (names-in-directory dirname)
  (let ((dir (opendir dirname)))
    (let loop ((filenames '()))
      (match (readdir dir)
        ((or "." "..")
         (loop filenames))
        ((? eof-object?)
         (closedir dir)
         filenames)
        (name
         (loop (cons name filenames)))))))

(define (directory? name)
  (eq? 'directory (stat:type (stat name))))

(define (file=? file1 file2)
  "Return #t if FILE1 and FILE2 are regular files and their contents are
identical, #f otherwise."
  (let ((st1 (stat file1))
        (st2 (stat file2)))
    (and (eq? (stat:type st1) 'regular)
         (eq? (stat:type st2) 'regular)
         (= (stat:size st1) (stat:size st2))
         (call-with-input-file file1
           (lambda (port1)
             (call-with-input-file file2
               (lambda (port2)
                 (define len 8192)
                 (define buf1 (make-bytevector len))
                 (define buf2 (make-bytevector len))
                 (let loop ()
                   (let ((n1 (get-bytevector-n! port1 buf1 0 len))
                         (n2 (get-bytevector-n! port2 buf2 0 len)))
                     (and (equal? n1 n2)
                          (or (eof-object? n1)
                              (loop))))))))))))

(define* (union-build output inputs
                      #:key (log-port (current-error-port)))

  (define (make-link input output)
    (format log-port "`~a' ~~> `~a'~%" input output)
    (symlink input output))

  (define (union output inputs)
    (match inputs
      ((input)
       ;; There's only one input, so just make a link.
       (make-link input output))
      (_
       (receive (dirs files) (partition directory? inputs)
         (cond
          ((null? files)

           ;; All inputs are directories.  Create a new directory
           ;; where we will merge the input directories.
           (mkdir output)

           ;; Build a hash table mapping each name to a list of input
           ;; directories containing that name.
           (let ((table (make-hash-table)))

             (define (add-to-table! name dir)
               (let ((handle (hash-create-handle! table name '())))
                 (set-cdr! handle (cons dir (cdr handle)))))

             ;; Populate the table.
             (for-each (lambda (dir)
                         (for-each (cut add-to-table! <> dir)
                                   (names-in-directory dir)))
                       dirs)

             ;; Now iterate over the table and recursively
             ;; perform a union for each entry.
             (hash-for-each (lambda (name dirs-with-name)
                              (union (string-append output "/" name)
                                     (map (cut string-append <> "/" name)
                                          (reverse dirs-with-name))))
                            table)))

          ((null? dirs)
           ;; All inputs are non-directories.  We assume they are files.
           (match files
             ((file (? (cut file=? <> file)) ...)
              ;; All files have the same contents, so there's no conflict.
              (make-link file output))

             ((file other-files ...)
              (format (current-error-port)
                      "warning: collision encountered: ~{~a ~}~%"
                      files)

              ;; TODO: Implement smarter strategies.
              (format (current-error-port)
                      "warning: arbitrarily choosing ~a~%"
                      file)

              (make-link file output))))

          (else
           ;; The inputs are a mixture of files and directories
           (error "union-build: collision between file and directories"
                  `((files ,files) (dirs ,dirs)))))))))

  (setvbuf (current-output-port) _IOLBF)
  (setvbuf (current-error-port) _IOLBF)
  (when (file-port? log-port)
    (setvbuf log-port _IOLBF))

  (union output (delete-duplicates inputs)))

;;; union.scm ends here

  reply	other threads:[~2014-03-25  7:05 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-12  7:12 Problems with handicapped 'bash' from glibc package Mark H Weaver
2014-02-12 13:14 ` Ludovic Courtès
2014-02-12 17:39   ` Mark H Weaver
2014-02-12 19:31     ` Andreas Enge
2014-02-12 20:33       ` Ludovic Courtès
2014-02-12 21:33         ` Mark H Weaver
2014-02-13  9:14           ` Andreas Enge
2014-03-23 16:19 ` Ludovic Courtès
2014-03-23 20:19   ` Mark H Weaver
2014-03-23 20:27     ` Ludovic Courtès
2014-03-24  3:31       ` Mark H Weaver
2014-03-28 13:48         ` Ludovic Courtès
2014-03-24  3:55       ` Optimizing union.scm Mark H Weaver
2014-03-24 13:45         ` Ludovic Courtès
2014-03-25  7:04           ` Mark H Weaver [this message]
2014-03-25 17:18             ` Ludovic Courtès
2014-03-25 22:30               ` Mark H Weaver
2014-03-25 22:58                 ` Ludovic Courtès
2014-03-27  7:09                   ` Mark H Weaver
2014-03-27  9:57                     ` Ludovic Courtès
2014-04-02 14:14             ` Optimizing ‘guix package’ Ludovic Courtès
2014-04-02 16:58               ` Mark H Weaver
2014-03-26 23:29   ` Problems with handicapped 'bash' from glibc package Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87txamkbde.fsf@yeeloong.lan \
    --to=mhw@netris.org \
    --cc=guix-devel@gnu.org \
    --cc=ludo@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.