From mboxrd@z Thu Jan 1 00:00:00 1970 From: ludo@gnu.org (Ludovic =?utf-8?Q?Court=C3=A8s?=) Subject: Heads-up: daemon update Date: Sat, 18 Jan 2014 22:38:37 +0100 Message-ID: <87mwitnf6q.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:46331) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W4deD-0002FS-1i for guix-devel@gnu.org; Sat, 18 Jan 2014 16:41:49 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1W4de8-0001pG-LY for guix-devel@gnu.org; Sat, 18 Jan 2014 16:41:44 -0500 Received: from hera.aquilenet.fr ([141.255.128.1]:59944) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W4de8-0001pC-3x for guix-devel@gnu.org; Sat, 18 Jan 2014 16:41:40 -0500 Received: from localhost (localhost [127.0.0.1]) by hera.aquilenet.fr (Postfix) with ESMTP id 948DC1991 for ; Sat, 18 Jan 2014 22:38:38 +0100 (CET) Received: from hera.aquilenet.fr ([127.0.0.1]) by localhost (hera.aquilenet.fr [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id IunO0zYL93Ah for ; Sat, 18 Jan 2014 22:38:38 +0100 (CET) Received: from pluto (reverse-83.fdn.fr [80.67.176.83]) by hera.aquilenet.fr (Postfix) with ESMTPSA id DF2551831 for ; Sat, 18 Jan 2014 22:38:37 +0100 (CET) List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org To: guix-devel@gnu.org Commit d43eb49 updates the Nix sub-module that provides the daemon=E2=80=99s C++ code. To update, run: ( cd nix ; ./sync-with-upstream ) autoreconf -vfi ./configure ... && make && make check Please report any test suite failure. The changes compared to the daemon we had so far include interesting bug fixes and improvements (see git log below for the libstore part.) Thanks for the Nix hackers for that! :-) Ludo=E2=80=99. commit 11cb4bfb257f18c906ef1d6f14ed450be8fa49fe Author: Eelco Dolstra Date: Wed Jan 8 17:32:40 2014 +0100 Fix checking of NAR hashes =20=20=20=20 *headdesk* *headdesk* *headdesk* =20=20=20=20 So since commit 22144afa8d9f8968da351618a1347072a93bd8aa, Nix hasn't actually checked whether the content of a downloaded NAR matches the hash specified in the manifest / NAR info file. Urghhh... commit 485f4740ee3ba4228ba3345873eb530466a8f42d Author: Domen Ko=C5=BEar Date: Wed Jan 1 18:10:48 2014 +0100 wording commit a6add93d734279db8503951ac6466c275b3c8e4e Author: Eelco Dolstra Date: Tue Dec 10 13:13:59 2013 +0100 Garbage collector: Release locks on temporary root files =20=20=20=20 This allows processes waiting for such locks to proceed during the trash deletion phase of the garbage collector. commit c5b8fe315162440c1d808bc0a684a412d78bfa76 Author: Eelco Dolstra Date: Thu Dec 5 14:31:57 2013 -0500 Print a trace message if a build fails due to the platform being unknown commit 709cbe4e76e7b0f1b8abddbeb7714e194f6f8a02 Author: Eelco Dolstra Date: Fri Nov 22 10:00:43 2013 +0000 Include for memset =20=20=20=20 This should fix building on Illumos. commit a478e8a7bb8c24da0ac91b7100bd0e422035c62f Author: Eelco Dolstra Date: Thu Nov 14 11:57:37 2013 +0100 Remove nix-setuid-helper =20=20=20=20 AFAIK, nobody uses it, it's not maintained, and it has no tests. commit 89e6781cc5885cbf6284a51c0403dded62ce8bc0 Author: Eelco Dolstra Date: Tue Nov 12 12:51:59 2013 +0100 Make function calls show up in stack traces again =20=20=20=20 Note that adding --show-trace prevents functions calls from being tail-recursive, so an expression that evaluates without --show-trace may fail with a stack overflow if --show-trace is given. commit c0861838432ea9d8e5ea5750aadfbc59ebd6f3b4 Author: Eelco Dolstra Date: Tue Oct 22 11:39:10 2013 +0200 For auto roots, show the intermediate link =20=20=20=20 I.e. "nix-store -q --roots" will now show (for example) =20=20=20=20 /home/eelco/Dev/nixpkgs/result =20=20=20=20 rather than =20=20=20=20 /nix/var/nix/gcroots/auto/53222qsppi12s2hkap8dm2lg8xhhyk6v commit a737f51fd96be2866a33eb67732e034bcc65a659 Author: Eelco Dolstra Date: Wed Oct 16 15:58:20 2013 +0200 Retry all SQLite operations =20=20=20=20 To deal with SQLITE_PROTOCOL, we also need to retry read-only operations. commit ff02f5336cd0cff0e97fbcf3c54b5b23827702d6 Author: Eelco Dolstra Date: Wed Oct 16 14:51:20 2013 +0200 Fix a race in registerFailedPath() =20=20=20=20 Registering the path as failed can fail if another process does the same thing after the call to hasPathFailed(). This is extremely unlikely though. commit 4bd52825734face53df2ab00052d2457d31c3c68 Author: Eelco Dolstra Date: Wed Oct 16 14:46:35 2013 +0200 Convenience macros for retrying a SQLite transaction commit bce14d0f61801f0f1c3080970619e2ca11683a4e Author: Eelco Dolstra Date: Wed Oct 16 14:36:53 2013 +0200 Don't wrap read-only queries in a transaction =20=20=20=20 There is no risk of getting an inconsistent result here: if the ID returned by queryValidPathId() is deleted from the database concurrently, subsequent queries involving that ID will simply fail (since IDs are never reused). commit 7cdefdbe732c209e13f234eb71022791909a5518 Author: Eelco Dolstra Date: Wed Oct 16 14:27:36 2013 +0200 Print a distinct warning for SQLITE_PROTOCOL commit d05bf044441dbf8e000036d545df9689bdec0b72 Author: Eelco Dolstra Date: Wed Oct 16 14:03:22 2013 +0200 Treat SQLITE_PROTOCOL as SQLITE_BUSY =20=20=20=20 In the Hydra build farm we fairly regularly get SQLITE_PROTOCOL errors (e.g., "querying path in database: locking protocol"). The docs for this error code say that it "is returned if some other process is messing with file locks and has violated the file locking protocol that SQLite uses on its rollback journal files." However, the SQLite source code reveals that this error can also occur under high load: =20=20=20=20 if( cnt>5 ){ int nDelay =3D 1; /* Pause time in microsecond= s */ if( cnt>100 ){ VVA_ONLY( pWal->lockError =3D 1; ) return SQLITE_PROTOCOL; } if( cnt>=3D10 ) nDelay =3D (cnt-9)*238; /* Max delay 21ms. Total d= elay 996ms */ sqlite3OsSleep(pWal->pVfs, nDelay); } =20=20=20=20 i.e. if certain locks cannot be not acquired, SQLite will retry a number of times before giving up and returing SQLITE_PROTOCOL. The comments say: =20=20=20=20 Circumstances that cause a RETRY should only last for the briefest instances of time. No I/O or other system calls are done while the locks are held, so the locks should not be held for very long. But if we are unlucky, another process that is holding a lock might get paged out or take a page-fault that is time-consuming to resolve, during the few nanoseconds that it is holding the lock. In that case, it might take longer than normal for the lock to free. ... The total delay time before giving up is less than 1 second. =20=20=20=20 On a heavily loaded machine like lucifer (the main Hydra server), which often has dozens of processes waiting for I/O, it seems to me that a page fault could easily take more than a second to resolve. So, let's treat SQLITE_PROTOCOL as SQLITE_BUSY and retry the transaction. =20=20=20=20 Issue NixOS/hydra#14. commit 936f9d45baf474358346666ed9ad7f56960bb455 Author: Eelco Dolstra Date: Fri Sep 6 16:36:56 2013 +0200 Don't apply the CPU affinity hack to nix-shell (and other Perl programs) =20=20=20=20 As discovered by Todd Veldhuizen, the shell started by nix-shell has its affinity set to a single CPU. This is because nix-shell connects to the Nix daemon, which causes the affinity hack to be applied. So we turn this off for Perl programs. commit b29d3f4aee9fa91f4ea1019d09bf63bc81b9f830 Author: Eelco Dolstra Date: Mon Sep 2 12:01:04 2013 +0200 Only show trace messages when tracing is enabled commit efe428946431c6c670151c949884fa8c1fa31794 Author: Eelco Dolstra Date: Mon Sep 2 11:58:18 2013 +0200 Add an option to limit the log output of builders =20=20=20=20 This is mostly useful for Hydra to deal with builders that get stuck in an infinite loop writing data to stdout/stderr. commit 34bb806f747d5edc16119e38e2ac44e6b236ac12 Author: Ivan Kozik Date: Sat Aug 10 21:36:16 2013 +0000 Fix typos, especially those that end up in the Nix manual commit c6c024ca6f587dab991589ad6fdf010b9f0e6d62 Author: Gergely Risko Date: Thu Aug 22 17:57:39 2013 +0200 Fix personality switching from x86_64 to i686 =20=20=20=20 On Linux, Nix can build i686 packages even on x86_64 systems. It's not enough to recognize this situation by settings.thisSystem, we also have to consult uname(). E.g. we can be running on a i686 Debian with an amd64 kernel. In that situation settings.thisSystem is i686-linux, but we still need to change personality to i686 to make builds consistent. commit a583a2bc59a4ee2b067e5520f6c5bc0c61852c32 Author: Eelco Dolstra Date: Wed Aug 7 11:51:55 2013 +0000 Run the daemon worker on the same CPU as the client =20=20=20=20 On a system with multiple CPUs, running Nix operations through the daemon is significantly slower than "direct" mode: =20=20=20=20 $ NIX_REMOTE=3D nix-instantiate '' -A system real 0m0.974s user 0m0.875s sys 0m0.088s =20=20=20=20 $ NIX_REMOTE=3Ddaemon nix-instantiate '' -A system real 0m2.118s user 0m1.463s sys 0m0.218s =20=20=20=20 The main reason seems to be that the client and the worker get moved to a different CPU after every call to the worker. This patch adds a hack to lock them to the same CPU. With this, the overhead of going through the daemon is very small: =20=20=20=20 $ NIX_REMOTE=3Ddaemon nix-instantiate '' -A system real 0m1.074s user 0m0.809s sys 0m0.098s commit a4921b8ceb5bde3fbd1ae25ea4b367199796eded Author: Eelco Dolstra Date: Thu Jul 18 12:52:16 2013 +0200 Revert "build-remote.pl: Enforce timeouts locally" =20=20=20=20 This reverts commit 69b8f9980f39c14a59365a188b300a34d625a2cd. =20=20=20=20 The timeout should be enforced remotely. Otherwise, if the garbage collector is running either locally or remotely, if will block the build or closure copying for some time. If the garbage collector takes too long, the build may time out, which is not what we want. Also, on heavily loaded systems, copying large paths to and from the remote machine can take a long time, also potentially resulting in a timeout. commit 16591eb3cccf86da8cd3f20c56e2dd847576ff5e Author: Shea Levy Date: Fri Jul 12 09:35:33 2013 -0400 Allow bind-mounting regular files into the chroot =20=20=20=20 mount(2) with MS_BIND allows mounting a regular file on top of a regular file, so there's no reason to only bind directories. This allows finer control over just which files are and aren't included in the chroot without having to build symlink trees or the like. =20=20=20=20 Signed-off-by: Shea Levy commit aeb810b01e17d040f9592681ee271f15874dce50 Author: Eelco Dolstra Date: Fri Jul 12 14:01:25 2013 +0200 Garbage collector: Don't follow symlinks arbitrarily =20=20=20=20 Only indirect roots (symlinks to symlinks to the Nix store) are now supported. commit 7ccd9464077180f633e65c15906bdda707077e8c Author: Eelco Dolstra Date: Thu Jun 20 20:05:03 2013 +0200 Don't set $preferLocalBuild and $requiredSystemFeatures in builders =20=20=20=20 With C++ std::map, doing a comparison like =E2=80=98map["foo"] =3D=3D .= ..=E2=80=99 has the side-effect of adding a mapping from "foo" to the empty string if "foo" doesn't exist in the map. So we ended up setting some environment variables by accident. commit 5558652709f27e8a887580b77b93c705659d7a4b Author: Eelco Dolstra Date: Thu Jun 20 19:26:31 2013 +0200 Don't substitute derivations that have preferLocalBuild set =20=20=20=20 In particular this means that "trivial" derivations such as writeText are not substituted, reducing the number of GET requests to the binary cache by about 200 on a typical NixOS configuration. commit 1906cce6fcea88d07b55c0b9734da39675e17a4d Author: Eelco Dolstra Date: Thu Jun 20 14:01:33 2013 +0000 Increase SQLite's auto-checkpoint interval =20=20=20=20 Common operations like instantiating a NixOS system config no longer fitted in 8192 pages, leading to more fsyncs. So increase this limit. commit 9b11165aec8639b021527978603423826b6b9cc3 Author: Eelco Dolstra Date: Thu Jun 20 12:01:33 2013 +0200 Disable the copy-from-other-stores substituter =20=20=20=20 This substituter basically cannot work reliably since we switched to SQLite, since SQLite databases may need write access to open them even just for reading (and in WAL mode they always do). commit 22144afa8d9f8968da351618a1347072a93bd8aa Author: Eelco Dolstra Date: Thu Jun 20 11:55:15 2013 +0200 Don't keep "disabled" substituters running =20=20=20=20 For instance, it's pointless to keep copy-from-other-stores running if there are no other stores, or download-using-manifests if there are no manifests. This also speeds things up because we don't send queries to those substituters.