unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
From: raingloom <raingloom@riseup.net>
To: "Ludovic Courtès" <ludo@gnu.org>
Cc: 44760@debbugs.gnu.org
Subject: bug#44760: Closure copy in ‘guix system init’ is inefficient
Date: Sun, 22 Nov 2020 20:46:34 +0100	[thread overview]
Message-ID: <20201122204634.2730df12@riseup.net> (raw)
In-Reply-To: <87h7pkffzy.fsf@inria.fr>

On Fri, 20 Nov 2020 12:02:25 +0100
Ludovic Courtès <ludo@gnu.org> wrote:

> ‘guix system init’ ends by copying the system’s closure from the
> “host” store to the target store; it also initializes the database of
> that target store.
> 
> That copy is inefficient for several reasons.  Let’s pick one file,
> shred.1.gz, that ends up being copied, and let’s look at its
> occurrences in the strace log of ‘guix system init config.scm
> /tmp/os’:
> 
> --8<---------------cut here---------------start------------->8---
> $ grep -A2 '/shred.1.gz' ,,s
> lstat("/gnu/store/57xj5gcy1jbl9ai2lnrqnpr0dald9i65-coreutils-8.32/share/man/man1/shred.1.gz",
> {st_mode=S_IFREG|0444, st_size=1490, ...}) = 0 openat(AT_FDCWD,
> "/gnu/store/57xj5gcy1jbl9ai2lnrqnpr0dald9i65-coreutils-8.32/share/man/man1/shred.1.gz",
> O_RDONLY) = 15 fstat(15, {st_mode=S_IFREG|0444, st_size=1490, ...}) =
> 0 openat(AT_FDCWD,
> "/tmp/os/gnu/store/57xj5gcy1jbl9ai2lnrqnpr0dald9i65-coreutils-8.32/share/man/man1/shred.1.gz",
> O_WRONLY|O_CREAT|O_TRUNC, 0444) = 16 read(15,
> "\37\213\10\0\0\0\0\0\2\3\215VMs\3336\20\275\363Wluh\354\251L%vg\322:M"...,
> 8192) = 1490 write(16,
> "\37\213\10\0\0\0\0\0\2\3\215VMs\3336\20\275\363Wluh\354\251L%vg\322:M"...,
> 1490) = 1490 -- utimensat(AT_FDCWD,
> "/tmp/os/gnu/store/57xj5gcy1jbl9ai2lnrqnpr0dald9i65-coreutils-8.32/share/man/man1/shred.1.gz",
> [{tv_sec=1605721025, tv_nsec=616985411} /*
> 2020-11-18T18:37:05.616985411+0100 */, {tv_sec=1, tv_nsec=0} /*
> 1970-01-01T01:00:01+0100 */], 0) = 0
> lstat("/gnu/store/57xj5gcy1jbl9ai2lnrqnpr0dald9i65-coreutils-8.32/share/man/man1/sleep.1.gz",
> {st_mode=S_IFREG|0444, st_size=813, ...}) = 0 openat(AT_FDCWD,
> "/gnu/store/57xj5gcy1jbl9ai2lnrqnpr0dald9i65-coreutils-8.32/share/man/man1/sleep.1.gz",
> O_RDONLY) = 15 --
> lstat("/tmp/os/gnu/store/57xj5gcy1jbl9ai2lnrqnpr0dald9i65-coreutils-8.32/share/man/man1/shred.1.gz",
> {st_mode=S_IFREG|0444, st_size=1490, ...}) = 0
> lstat("/tmp/os/gnu/store/57xj5gcy1jbl9ai2lnrqnpr0dald9i65-coreutils-8.32/share/man/man1/shuf.1.gz",
> {st_mode=S_IFREG|0444, st_size=972, ...}) = 0
> lstat("/tmp/os/gnu/store/57xj5gcy1jbl9ai2lnrqnpr0dald9i65-coreutils-8.32/share/man/man1/sleep.1.gz",
> {st_mode=S_IFREG|0444, st_size=813, ...}) = 0 --
> lstat("/tmp/os/gnu/store/57xj5gcy1jbl9ai2lnrqnpr0dald9i65-coreutils-8.32/share/man/man1/shred.1.gz",
> {st_mode=S_IFREG|0444, st_size=1490, ...}) = 0 openat(AT_FDCWD,
> "/tmp/os/gnu/store/57xj5gcy1jbl9ai2lnrqnpr0dald9i65-coreutils-8.32/share/man/man1/shred.1.gz",
> O_RDONLY) = 17 lseek(17, 0, SEEK_CUR)                  = 0 read(17,
> "\37\213\10\0\0\0\0\0\2\3\215VMs\3336\20\275\363Wluh\354\251L%vg\322:M"...,
> 1490) = 1490 --
> lstat("/tmp/os/gnu/store/57xj5gcy1jbl9ai2lnrqnpr0dald9i65-coreutils-8.32/share/man/man1/shred.1.gz",
> {st_mode=S_IFREG|0444, st_size=1490, ...}) = 0 openat(AT_FDCWD,
> "/tmp/os/gnu/store/57xj5gcy1jbl9ai2lnrqnpr0dald9i65-coreutils-8.32/share/man/man1/shred.1.gz",
> O_RDONLY) = 17 lseek(17, 0, SEEK_CUR)                  = 0 read(17,
> "\37\213\10\0\0\0\0\0\2\3\215VMs\3336\20\275\363Wluh\354\251L%vg\322:M"...,
> 1490) = 1490 --
> link("/tmp/os/gnu/store/57xj5gcy1jbl9ai2lnrqnpr0dald9i65-coreutils-8.32/share/man/man1/shred.1.gz",
> "/tmp/os/gnu/store/.links/0w0qcs5lp36i89yry91r2ixlghihzf0vc56bpd9yylj342gv82xl")
> = 0
> lstat("/tmp/os/gnu/store/57xj5gcy1jbl9ai2lnrqnpr0dald9i65-coreutils-8.32/share/man/man1/shuf.1.gz",
> {st_mode=S_IFREG|0444, st_size=972, ...}) = 0 openat(AT_FDCWD,
> "/tmp/os/gnu/store/57xj5gcy1jbl9ai2lnrqnpr0dald9i65-coreutils-8.32/share/man/man1/shuf.1.gz",
> O_RDONLY) = 17 --8<---------------cut
> here---------------end--------------->8---
> 
> First, /tmp/os/…/shred.1.gz is read entirely twice: once in
> ‘register-items’ (in the ‘nar-sha256’ call) to compute its hash, and a
> second time for deduplication (the ‘deduplicate’ call in there.)
> 
> The ‘nar-sha256’ call could be avoided because the database of
> /gnu/store contains that value.  As for deduplication, we could
> perhaps create those ‘.links’ entries as we copy files instead of
> re-traversing the whole thing afterwards.
> 
> Second, all of /tmp/os is traversed to reset timestamps, although we
> could have cleared those timestamps when we created those files in the
> first place (<https://issues.guix.gnu.org/44741> prevents that though,
> unless we keep a bug-fixed copy of ‘copy-recursively’ in there.)
> 
> Third, in the case of the installer, we’re really copying from
> /mnt/guix-inst/store to /mnt/gnu/store, which is likely the same
> device.  In this case we could create hard links instead of actually
> copying files.
> 
> Fourth, we’re adding items one by one in the target store database,
> but it may be more efficient to more or less dump the subset of the
> source database in bulk.
> 
> Surely we can do better.
> 
> Ludo’.
> 
> 
> 

Also, if a store is already present (eg.: because of a partial
install), it could make sense to (optionally) keep its contents. AFAIK
this is still not possible. It was one the bigger time sinks while I
was working on the F2FS support.




  reply	other threads:[~2020-11-22 19:55 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-20 11:02 bug#44760: Closure copy in ‘guix system init’ is inefficient Ludovic Courtès
2020-11-22 19:46 ` raingloom [this message]
2020-11-22 21:10   ` Ludovic Courtès
2020-12-11 15:09 ` bug#44760: [PATCH 00/15] Speed up 'guix system init' & co Ludovic Courtès
2020-12-11 15:09   ` bug#44760: [PATCH 01/15] serialization: 'fold-archive' notifies about directory processing completion Ludovic Courtès
2020-12-11 15:09   ` bug#44760: [PATCH 02/15] serialization: 'restore-file' sets canonical timestamp and permissions Ludovic Courtès
2020-12-11 15:09   ` bug#44760: [PATCH 03/15] nar: Deduplicate files right as they are restored Ludovic Courtès
2020-12-11 15:09   ` bug#44760: [PATCH 04/15] store-copy: 'populate-store' resets timestamps Ludovic Courtès
2020-12-11 15:09   ` bug#44760: [PATCH 05/15] image: 'register-closure' assumes already-reset timestamps Ludovic Courtès
2020-12-11 15:09   ` bug#44760: [PATCH 06/15] database: Remove #:reset-timestamps? from 'register-items' Ludovic Courtès
2020-12-11 15:09   ` bug#44760: [PATCH 07/15] store-copy: 'populate-store' can optionally deduplicate files Ludovic Courtès
2020-12-11 15:09   ` bug#44760: [PATCH 08/15] image: 'register-closure' leaves it up to the caller to deduplicate Ludovic Courtès
2020-12-11 15:09   ` bug#44760: [PATCH 09/15] database: Remove #:deduplicate? from 'register-items' Ludovic Courtès
2020-12-15 16:33   ` bug#44760: [PATCH 00/15] Speed up 'guix system init' & co Ludovic Courtès
2020-12-11 15:09 ` bug#44760: [PATCH 10/15] guix system: 'init' copies, resets timestamps, and deduplicates at once Ludovic Courtès
2020-12-11 15:09   ` bug#44760: [PATCH 11/15] database: Remove #:deduplicate? and #:reset-timestamps? from 'register-path' Ludovic Courtès
2020-12-11 15:09   ` bug#44760: [PATCH 12/15] system: 'init' does not recompute the hash of each store item Ludovic Courtès
2020-12-11 15:09   ` bug#44760: [PATCH 13/15] database: Remove 'register-path' Ludovic Courtès
2020-12-11 15:09   ` bug#44760: [PATCH 14/15] database: Honor 'SOURCE_DATE_EPOCH' Ludovic Courtès
2020-12-11 15:09   ` bug#44760: [PATCH 15/15] deduplicate: Create the '.links' directory lazily Ludovic Courtès
2020-12-15 16:38 ` bug#44760: Closure copy in ‘guix system init’ is inefficient Ludovic Courtès
2020-12-16 21:53 ` Jonathan Brielmaier
2020-12-17 13:24   ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201122204634.2730df12@riseup.net \
    --to=raingloom@riseup.net \
    --cc=44760@debbugs.gnu.org \
    --cc=ludo@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).