unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
From: ludo@gnu.org (Ludovic Courtès)
To: 31785@debbugs.gnu.org
Subject: bug#31785: Multiple client 'build-paths' RPCs can lead to daemon deadlock
Date: Mon, 11 Jun 2018 16:06:16 +0200	[thread overview]
Message-ID: <87602ph0yv.fsf@gnu.org> (raw)

Hello,

I tried running this:

  guix build --max-jobs=200 $(guix gc -R $(guix build -d inkscape --no-grafts) | sort) & \
  guix build --max-jobs=200 $(guix gc -R $(guix build -d inkscape --no-grafts) | sort -r)

… also in parallel with this (for good measure):

  guix build --max-jobs=200 $(guix gc -R $(guix build -d inkscape --no-grafts) | sort -R)

Since we have 3 clients, that leads to 3 guix-daemon processes, and
those are stuck in a deadlock:

--8<---------------cut here---------------start------------->8---
$ ps aux | grep guix-daemon
[…]
root     20501  0.6  0.0  39292 12012 ?        Ss   15:51   0:01 guix-daemon 20455
root     20503  4.3  0.0  39420 12096 ?        Ss   15:51   0:12 guix-daemon 20491
root     22154  0.0  0.0  39028 11016 ?        Ss   15:52   0:00 guix-daemon 22148
$ sudo strace -p 22154 -t
strace: Process 22154 attached
15:57:03 select(0, 0x7ffdca30b610, NULL, NULL, {tv_sec=1, tv_usec=314189}) = 0 (Timeout)
15:57:04 openat(AT_FDCWD, "/gnu/store/ghc9vcc8ydv305jz0spd73qrrf30gxik-binutils-bootstrap-0.lock", O_RDWR|O_CREAT, 0600) = 8
15:57:04 fcntl(8, F_GETFD)              = 0
15:57:04 fcntl(8, F_SETFD, FD_CLOEXEC)  = 0
15:57:04 fcntl(8, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = -1 EAGAIN (Risurco dumtempe ne disponeblas)
15:57:04 close(8)                       = 0
15:57:04 openat(AT_FDCWD, "/gnu/store/5hdpffkgkfszpsin2ir50zscz06x0mb5-coreutils-8.29-debug.lock", O_RDWR|O_CREAT, 0600) = 8
15:57:04 fcntl(8, F_GETFD)              = 0
15:57:04 fcntl(8, F_SETFD, FD_CLOEXEC)  = 0
15:57:04 fcntl(8, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = -1 EAGAIN (Risurco dumtempe ne disponeblas)
15:57:04 close(8)                       = 0
15:57:04 openat(AT_FDCWD, "/gnu/store/aasfcliv85frq3aawpa4rw18nvlvl735-gettext-minimal-0.19.8.1.lock", O_RDWR|O_CREAT, 0600) = 8
15:57:04 fcntl(8, F_GETFD)              = 0
15:57:04 fcntl(8, F_SETFD, FD_CLOEXEC)  = 0
15:57:04 fcntl(8, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = -1 EAGAIN (Risurco dumtempe ne disponeblas)
15:57:04 close(8)                       = 0
15:57:04 openat(AT_FDCWD, "/gnu/store/j1kpzcpi1m1ihzwyqb60zd4s74yp85g8-glibc-bootstrap-0.lock", O_RDWR|O_CREAT, 0600) = 8
15:57:04 fcntl(8, F_GETFD)              = 0
15:57:04 fcntl(8, F_SETFD, FD_CLOEXEC)  = 0
15:57:04 fcntl(8, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = -1 EAGAIN (Risurco dumtempe ne disponeblas)
15:57:04 close(8)                       = 0
15:57:04 openat(AT_FDCWD, "/gnu/store/4llnxsgr6mxxj6pk21d9vbwaxnkbn99k-gnutls-3.5.18-doc.lock", O_RDWR|O_CREAT, 0600) = 8
15:57:04 fcntl(8, F_GETFD)              = 0
15:57:04 fcntl(8, F_SETFD, FD_CLOEXEC)  = 0
15:57:04 fcntl(8, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = -1 EAGAIN (Risurco dumtempe ne disponeblas)
15:57:04 close(8)                       = 0
15:57:04 openat(AT_FDCWD, "/gnu/store/7rs9qx86f2w4hjbf9wjm10k4m2vvlbyv-groff-1.22.3-doc.lock", O_RDWR|O_CREAT, 0600) = 8
15:57:04 fcntl(8, F_GETFD)              = 0
15:57:04 fcntl(8, F_SETFD, FD_CLOEXEC)  = 0
15:57:04 fcntl(8, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = -1 EAGAIN (Risurco dumtempe ne disponeblas)
15:57:04 close(8)                       = 0
15:57:04 openat(AT_FDCWD, "/gnu/store/d645d919gp1zk8v65qa2r4ia5x1c5n6k-libstdc++-5.5.0.lock", O_RDWR|O_CREAT, 0600) = 8
15:57:04 fcntl(8, F_GETFD)              = 0
15:57:04 fcntl(8, F_SETFD, FD_CLOEXEC)  = 0
15:57:04 fcntl(8, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = -1 EAGAIN (Risurco dumtempe ne disponeblas)
15:57:04 close(8)                       = 0
15:57:04 openat(AT_FDCWD, "/gnu/store/amcs7l0ynj1qg6fp9ll3asiamd4zsq75-m4-1.4.18.lock", O_RDWR|O_CREAT, 0600) = 8
15:57:04 fcntl(8, F_GETFD)              = 0
15:57:04 fcntl(8, F_SETFD, FD_CLOEXEC)  = 0
15:57:04 fcntl(8, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = -1 EAGAIN (Risurco dumtempe ne disponeblas)
15:57:04 close(8)                       = 0
15:57:04 openat(AT_FDCWD, "/gnu/store/p262qk7zsd2wnlal6x1jax2xbqvdddqq-m4-1.4.18.lock", O_RDWR|O_CREAT, 0600) = 8
15:57:04 fcntl(8, F_GETFD)              = 0
15:57:04 fcntl(8, F_SETFD, FD_CLOEXEC)  = 0
15:57:04 fcntl(8, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = -1 EAGAIN (Risurco dumtempe ne disponeblas)
15:57:04 close(8)                       = 0
15:57:04 openat(AT_FDCWD, "/gnu/store/zvvgr301hz95ac15rp0wsjy5j4cdgkg4-module-import-compiled.lock", O_RDWR|O_CREAT, 0600) = 8
15:57:04 fcntl(8, F_GETFD)              = 0
15:57:04 fcntl(8, F_SETFD, FD_CLOEXEC)  = 0
15:57:04 fcntl(8, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = -1 EAGAIN (Risurco dumtempe ne disponeblas)
15:57:04 close(8)                       = 0
15:57:04 openat(AT_FDCWD, "/gnu/store/760kj698c9x1vcmlzk1qr8i9hqw961n3-openssl-1.0.2o-doc.lock", O_RDWR|O_CREAT, 0600) = 8
^C
$ sudo strace -p 20503 -t
strace: Process 20503 attached
15:58:23 select(445, [15 54 159 177 321 411 414 444], NULL, NULL, {tv_sec=1, tv_usec=595302}) = 0 (Timeout)
15:58:25 openat(AT_FDCWD, "/gnu/store/amcs7l0ynj1qg6fp9ll3asiamd4zsq75-m4-1.4.18.lock", O_RDWR|O_CREAT, 0600) = 14
15:58:25 fcntl(14, F_GETFD)             = 0
15:58:25 fcntl(14, F_SETFD, FD_CLOEXEC) = 0
15:58:25 fcntl(14, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = -1 EAGAIN (Risurco dumtempe ne disponeblas)
15:58:25 close(14)                      = 0
15:58:25 openat(AT_FDCWD, "/gnu/store/zvvgr301hz95ac15rp0wsjy5j4cdgkg4-module-import-compiled.lock", O_RDWR|O_CREAT, 0600) = 14
15:58:25 fcntl(14, F_GETFD)             = 0
15:58:25 fcntl(14, F_SETFD, FD_CLOEXEC) = 0
15:58:25 fcntl(14, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = -1 EAGAIN (Risurco dumtempe ne disponeblas)
15:58:25 close(14)                      = 0
15:58:25 openat(AT_FDCWD, "/gnu/store/760kj698c9x1vcmlzk1qr8i9hqw961n3-openssl-1.0.2o-doc.lock", O_RDWR|O_CREAT, 0600) = 14
15:58:25 fcntl(14, F_GETFD)             = 0
15:58:25 fcntl(14, F_SETFD, FD_CLOEXEC) = 0
15:58:25 fcntl(14, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = -1 EAGAIN (Risurco dumtempe ne disponeblas)
15:58:25 close(14)                      = 0
15:58:25 select(445, [15 54 159 177 321 411 414 444], NULL, NULL, {tv_sec=5, tv_usec=0}) = 0 (Timeout)
15:58:30 openat(AT_FDCWD, "/gnu/store/amcs7l0ynj1qg6fp9ll3asiamd4zsq75-m4-1.4.18.lock", O_RDWR|O_CREAT, 0600) = 14
15:58:30 fcntl(14, F_GETFD)             = 0
15:58:30 fcntl(14, F_SETFD, FD_CLOEXEC) = 0
15:58:30 fcntl(14, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = -1 EAGAIN (Risurco dumtempe ne disponeblas)
15:58:30 close(14)                      = 0
15:58:30 openat(AT_FDCWD, "/gnu/store/zvvgr301hz95ac15rp0wsjy5j4cdgkg4-module-import-compiled.lock", O_RDWR|O_CREAT, 0600) = 14
15:58:30 fcntl(14, F_GETFD)             = 0
15:58:30 fcntl(14, F_SETFD, FD_CLOEXEC) = 0
15:58:30 fcntl(14, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = -1 EAGAIN (Risurco dumtempe ne disponeblas)
15:58:30 close(14)                      = 0
15:58:30 openat(AT_FDCWD, "/gnu/store/760kj698c9x1vcmlzk1qr8i9hqw961n3-openssl-1.0.2o-doc.lock", O_RDWR|O_CREAT, 0600) = 14
15:58:30 fcntl(14, F_GETFD)             = 0
15:58:30 fcntl(14, F_SETFD, FD_CLOEXEC) = 0
15:58:30 fcntl(14, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = -1 EAGAIN (Risurco dumtempe ne disponeblas)
15:58:30 close(14)                      = 0
15:58:30 select(445, [15 54 159 177 321 411 414 444], NULL, NULL, {tv_sec=5, tv_usec=0}^Cstrace: Process 20503 detached
$ sudo strace -p 20501 -t
strace: Process 20501 attached
15:59:09 select(121, [20 30 120], NULL, NULL, {tv_sec=0, tv_usec=535316}) = 0 (Timeout)
15:59:10 openat(AT_FDCWD, "/gnu/store/ghc9vcc8ydv305jz0spd73qrrf30gxik-binutils-bootstrap-0.lock", O_RDWR|O_CREAT, 0600) = 8
15:59:10 fcntl(8, F_GETFD)              = 0
15:59:10 fcntl(8, F_SETFD, FD_CLOEXEC)  = 0
15:59:10 fcntl(8, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = -1 EAGAIN (Risurco dumtempe ne disponeblas)
15:59:10 close(8)                       = 0
15:59:10 openat(AT_FDCWD, "/gnu/store/5hdpffkgkfszpsin2ir50zscz06x0mb5-coreutils-8.29-debug.lock", O_RDWR|O_CREAT, 0600) = 8
15:59:10 fcntl(8, F_GETFD)              = 0
15:59:10 fcntl(8, F_SETFD, FD_CLOEXEC)  = 0
15:59:10 fcntl(8, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = -1 EAGAIN (Risurco dumtempe ne disponeblas)
15:59:10 close(8)                       = 0
15:59:10 openat(AT_FDCWD, "/gnu/store/aasfcliv85frq3aawpa4rw18nvlvl735-gettext-minimal-0.19.8.1.lock", O_RDWR|O_CREAT, 0600) = 8
15:59:10 fcntl(8, F_GETFD)              = 0
15:59:10 fcntl(8, F_SETFD, FD_CLOEXEC)  = 0
15:59:10 fcntl(8, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = -1 EAGAIN (Risurco dumtempe ne disponeblas)
15:59:10 close(8)                       = 0
15:59:10 openat(AT_FDCWD, "/gnu/store/j1kpzcpi1m1ihzwyqb60zd4s74yp85g8-glibc-bootstrap-0.lock", O_RDWR|O_CREAT, 0600) = 8
15:59:10 fcntl(8, F_GETFD)              = 0
15:59:10 fcntl(8, F_SETFD, FD_CLOEXEC)  = 0
15:59:10 fcntl(8, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = -1 EAGAIN (Risurco dumtempe ne disponeblas)
15:59:10 close(8)                       = 0
15:59:10 openat(AT_FDCWD, "/gnu/store/4llnxsgr6mxxj6pk21d9vbwaxnkbn99k-gnutls-3.5.18-doc.lock", O_RDWR|O_CREAT, 0600) = 8
15:59:10 fcntl(8, F_GETFD)              = 0
15:59:10 fcntl(8, F_SETFD, FD_CLOEXEC)  = 0
15:59:10 fcntl(8, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = -1 EAGAIN (Risurco dumtempe ne disponeblas)
15:59:10 close(8)                       = 0
15:59:10 openat(AT_FDCWD, "/gnu/store/7rs9qx86f2w4hjbf9wjm10k4m2vvlbyv-groff-1.22.3-doc.lock", O_RDWR|O_CREAT, 0600) = 8
15:59:10 fcntl(8, F_GETFD)              = 0
15:59:10 fcntl(8, F_SETFD, FD_CLOEXEC)  = 0
15:59:10 fcntl(8, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = -1 EAGAIN (Risurco dumtempe ne disponeblas)
15:59:10 close(8)                       = 0
15:59:10 openat(AT_FDCWD, "/gnu/store/d645d919gp1zk8v65qa2r4ia5x1c5n6k-libstdc++-5.5.0.lock", O_RDWR|O_CREAT, 0600) = 8
15:59:10 fcntl(8, F_GETFD)              = 0
15:59:10 fcntl(8, F_SETFD, FD_CLOEXEC)  = 0
15:59:10 fcntl(8, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = -1 EAGAIN (Risurco dumtempe ne disponeblas)
15:59:10 close(8)                       = 0
15:59:10 openat(AT_FDCWD, "/gnu/store/p262qk7zsd2wnlal6x1jax2xbqvdddqq-m4-1.4.18.lock", O_RDWR|O_CREAT, 0600) = 8
15:59:10 fcntl(8, F_GETFD)              = 0
15:59:10 fcntl(8, F_SETFD, FD_CLOEXEC)  = 0
15:59:10 fcntl(8, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = -1 EAGAIN (Risurco dumtempe ne disponeblas)
15:59:10 close(8)                       = 0
15:59:10 select(121, [20 30 120], NULL, NULL, {tv_sec=5, tv_usec=0}^Cstrace: Process 20501 detached
$ sudo lsof | grep '/gnu/store/.*\.lock'
guix-daem 20501             root   14uW     REG              253,0          0    3150879 /gnu/store/zvvgr301hz95ac15rp0wsjy5j4cdgkg4-module-import-compiled.lock
guix-daem 20501             root   26uW     REG              253,0          0    3212618 /gnu/store/760kj698c9x1vcmlzk1qr8i9hqw961n3-openssl-1.0.2o-doc.lock
guix-daem 20501             root  116uW     REG              253,0          0    3212684 /gnu/store/amcs7l0ynj1qg6fp9ll3asiamd4zsq75-m4-1.4.18.lock
guix-daem 20503             root    8uW     REG              253,0          0    3212520 /gnu/store/7rs9qx86f2w4hjbf9wjm10k4m2vvlbyv-groff-1.22.3-doc.lock
guix-daem 20503             root   50uW     REG              253,0          0    3212514 /gnu/store/ghc9vcc8ydv305jz0spd73qrrf30gxik-binutils-bootstrap-0.lock
guix-daem 20503             root  155uW     REG              253,0          0    3212547 /gnu/store/j1kpzcpi1m1ihzwyqb60zd4s74yp85g8-glibc-bootstrap-0.lock
guix-daem 20503             root  173uW     REG              253,0          0    3212552 /gnu/store/4llnxsgr6mxxj6pk21d9vbwaxnkbn99k-gnutls-3.5.18-doc.lock
guix-daem 20503             root  317uW     REG              253,0          0    3212602 /gnu/store/d645d919gp1zk8v65qa2r4ia5x1c5n6k-libstdc++-5.5.0.lock
guix-daem 20503             root  407uW     REG              253,0          0    3212655 /gnu/store/5hdpffkgkfszpsin2ir50zscz06x0mb5-coreutils-8.29-debug.lock
guix-daem 20503             root  410uW     REG              253,0          0    3212659 /gnu/store/aasfcliv85frq3aawpa4rw18nvlvl735-gettext-minimal-0.19.8.1.lock
guix-daem 20503             root  440uW     REG              253,0          0    3212686 /gnu/store/p262qk7zsd2wnlal6x1jax2xbqvdddqq-m4-1.4.18.lock
--8<---------------cut here---------------end--------------->8---

So for example:

  • 20501 tries to acquire
    aasfcliv85frq3aawpa4rw18nvlvl735-gettext-minimal-0.19.8.1.lock,
    which is held by 20503;

  • 20503 tries to acquire
    amcs7l0ynj1qg6fp9ll3asiamd4zsq75-m4-1.4.18.lock, which is held by
    20501.

This comes from the fact that ‘LocalStore::buildPaths’ takes the
user-supplied derivation list as is, without sorting it, and then
acquires locks in that order in ‘Worker::run’.

A topological sort (or maybe an alphanumeric sort?) should allow us to
guarantee that guix-daemon processes take locks in the same order, and
then don’t end up in a deadlock.

I discovered this bug while monitoring Cuirass on berlin: several
sessions submit batches of 200 derivations in ‘build-paths’ RPCs, and
sometimes most of the corresponding guix-daemon processes would end up
being stuck in a lock-acquiring loop.

Ludo’.

             reply	other threads:[~2018-06-11 14:07 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-11 14:06 Ludovic Courtès [this message]
2019-09-06  9:04 ` bug#31785: Multiple client 'build-paths' RPCs can lead to daemon deadlock Ludovic Courtès
2020-11-04 15:21 ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87602ph0yv.fsf@gnu.org \
    --to=ludo@gnu.org \
    --cc=31785@debbugs.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).