unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#42001: “SQLite database is busy”: contention on the store database
@ 2020-06-22 11:49 Ludovic Courtès
  2020-06-22 16:01 ` Mathieu Othacehe
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Ludovic Courtès @ 2020-06-22 11:49 UTC (permalink / raw)
  To: 42001

Hi,

On berlin.guix.gnu.org, we’ve seen a lot of contention on the store
database for a month or so, with messages like:

  warning: SQLite database is busy

More often than not, everything slows down to a halt, and builds don’t
proceed.

We initially discussed it here:

  https://issues.guix.gnu.org/41119#4

When that happens we typically see 30+ processes that have opened the
database:

--8<---------------cut here---------------start------------->8---
ludo@berlin ~$ sudo fuser -v /var/guix/db/db.sqlite-shm 
                     USER        PID ACCESS COMMAND
/var/guix/db/db.sqlite-shm:
                     root       1211 F...m guix offload
                     root       1845 F...m guix-daemon
                     root       6070 F...m guix-daemon
                     root      31378 F...m guix-daemon
                     root      31380 F...m guix-daemon
                     root      47381 F...m guix-daemon
                     root      47382 F...m guix-daemon
                     root      47383 F...m guix-daemon
                     root      47385 F...m guix-daemon
                     root      47681 F...m guix-daemon
                     root      48506 F...m guix offload
                     root      48542 F...m guix offload
                     root      49417 F...m guix-daemon
                     root      49472 F...m guix-daemon
                     root      49971 F...m guix-daemon
                     root      50027 F...m guix-daemon
                     root      50847 F...m guix offload
                     root      54625 F...m guix-daemon
                     root      54723 F...m guix-daemon
                     root      55470 F...m guix-daemon
                     root      55614 F...m guix-daemon
                     root      55617 F...m guix-daemon
                     root      56225 F...m guix-daemon
                     root      57732 F...m guix-daemon
                     root      57733 F...m guix-daemon
                     root      57738 F...m guix-daemon
                     root      57739 F...m guix-daemon
                     root      57740 F...m guix-daemon
                     root      58772 F...m guix-daemon
                     root      59056 F...m guix-daemon
                     root      99055 F...m guix-daemon
                     root      127690 F...m guix offload
--8<---------------cut here---------------end--------------->8---

In this case, the first process seems to be in the deduplication phase
in ‘finalize-store-file’:

--8<---------------cut here---------------start------------->8---
ludo@berlin ~$ sudo strace -p 1211
strace: Process 1211 attached
utimensat(AT_FDCWD, "/gnu/store/whhc80cx1zpi9cj3bil3sp8wwzcf8p9p-image-root/gnu/store/n6l40ygclxfysnc2nxwrslnn5ab6wv3v-profile/lib/modules/5.4.47-gnu/kernel/net/netfilter", [{tv_sec=1592814617, tv_nsec=396083395} /* 2020-06-22T10:30:17.396083395+0200 */, {tv_sec=1, tv_nsec=0} /* 1970-01-01T01:00:01+0100 */], 0) = 0
chmod("/gnu/store/whhc80cx1zpi9cj3bil3sp8wwzcf8p9p-image-root/gnu/store/n6l40ygclxfysnc2nxwrslnn5ab6wv3v-profile/lib/modules/5.4.47-gnu/kernel/net/netfilter", 040555) = 0
getrusage(RUSAGE_SELF, {ru_utime={tv_sec=1306, tv_usec=334062}, ru_stime={tv_sec=143, tv_usec=819675}, ...}) = 0
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, {tv_sec=1450, tv_nsec=153748295}) = 0 
lstat("/gnu/store/whhc80cx1zpi9cj3bil3sp8wwzcf8p9p-image-root/gnu/store/n6l40ygclxfysnc2nxwrslnn5ab6wv3v-profile/lib/modules/5.4.47-gnu/kernel/net/netfilter/xt_connlimit.ko", {st_mode=S_IFLNK|0777, st_size=125, ...}) = 0
readlink("/gnu/store/whhc80cx1zpi9cj3bil3sp8wwzcf8p9p-image-root/gnu/store/n6l40ygclxfysnc2nxwrslnn5ab6wv3v-profile/lib/modules/5.4.47-gnu/kernel/net/netfilter/xt_connlimit.ko", "/gnu/store/x8d1hmv17jawp4a7nwwb4"..., 100) = 100
readlink("/gnu/store/whhc80cx1zpi9cj3bil3sp8wwzcf8p9p-image-root/gnu/store/n6l40ygclxfysnc2nxwrslnn5ab6wv3v-profile/lib/modules/5.4.47-gnu/kernel/net/netfilter/xt_connlimit.ko", "/gnu/store/x8d1hmv17jawp4a7nwwb4"..., 200) = 125
mkdir("/gnu", 0777)                     = -1 EEXIST (File exists)
mkdir("/gnu/store", 0777)               = -1 EEXIST (File exists)
mkdir("/gnu/store/.links", 0777)        = -1 EEXIST (File exists)
lstat("/gnu/store/whhc80cx1zpi9cj3bil3sp8wwzcf8p9p-image-root/gnu/store/n6l40ygclxfysnc2nxwrslnn5ab6wv3v-profile/lib/modules/5.4.47-gnu/kernel/net/netfilter/xt_connlimit.ko", {st_mode=S_IFLNK|0777, st_size=125, ...}) = 0
stat("/gnu/store/.links/05p6zpj56f00g07c3621747kz3bjc6zbr8bgg8xysdajq18qgvlv", {st_mode=S_IFREG|0444, st_size=6880, ...}) = 0
--8<---------------cut here---------------end--------------->8---

At that point it has the database opened 3 times: once from
‘with-temporary-store-file’, once ‘with-database’ in
‘finalize-store-file’, and once from ‘register-items’ in
‘finalize-store-file’.  Commit d7fb5538013710288e91657499f0e04207115776
removed the last one.

--8<---------------cut here---------------start------------->8---
ludo@berlin ~$ sudo ls -l /proc/1211/fd
total 0
lr-x------ 1 root root 64 Jun 22 11:49 0 -> 'pipe:[10239344]'
l-wx------ 1 root root 64 Jun 22 11:49 1 -> 'pipe:[10239343]'
lr-x------ 1 root root 64 Jun 22 11:49 10 -> 'pipe:[10129708]'
l-wx------ 1 root root 64 Jun 22 11:49 11 -> 'pipe:[10129708]'
lr-x------ 1 root root 64 Jun 22 11:49 12 -> 'pipe:[10214031]'
l-wx------ 1 root root 64 Jun 22 11:49 13 -> 'pipe:[10214031]'
l-wx------ 1 root root 64 Jun 22 11:49 14 -> /var/guix/offload/141.80.167.168:22/1
lrwx------ 1 root root 64 Jun 22 11:49 15 -> 'socket:[8056368]'
lrwx------ 1 root root 64 Jun 22 11:49 16 -> 'socket:[10433466]'
lrwx------ 1 root root 64 Jun 22 11:49 17 -> /var/guix/db/db.sqlite
lrwx------ 1 root root 64 Jun 22 11:49 18 -> /var/guix/db/db.sqlite-wal
lrwx------ 1 root root 64 Jun 22 11:49 19 -> /var/guix/db/db.sqlite-shm
l-wx------ 1 root root 64 Jun 22 11:49 2 -> 'pipe:[10239343]'
lrwx------ 1 root root 64 Jun 22 11:49 20 -> /var/guix/db/db.sqlite
lrwx------ 1 root root 64 Jun 22 11:49 21 -> /var/guix/db/db.sqlite-wal
lr-x------ 1 root root 64 Jun 22 11:49 3 -> 'pipe:[10021329]'
l-wx------ 1 root root 64 Jun 22 11:49 4 -> 'pipe:[10239345]'
lr-x------ 1 root root 64 Jun 22 11:49 40 -> /dev/urandom
l-wx------ 1 root root 64 Jun 22 11:49 5 -> 'pipe:[10021329]'
lr-x------ 1 root root 64 Jun 22 11:49 6 -> 'pipe:[10222125]'
l-wx------ 1 root root 64 Jun 22 11:49 7 -> 'pipe:[10222125]'
lr-x------ 1 root root 64 Jun 22 11:49 8 -> 'pipe:[10021332]'
l-wx------ 1 root root 64 Jun 22 11:49 9 -> 'pipe:[10021332]'
--8<---------------cut here---------------end--------------->8---

The are several possible causes of contention: (1) database opened too
many times, (2) database kept opened for too long, and (3) sqlite
mishandling contention somehow.

I think we can work on #1 and #2.  As for #3, I wonder if there’s a
change in sqlite that could have made contention more contentious (ah
ha!), given that all this happened around the time of the last
‘core-updates’ merge, roughly corresponding to an upgrade from 3.24.0 to
3.31.0:

  http://data.guix.gnu.org/repository/1/branch/master/package/sqlite

Maybe the most likely change that caused contention is that we increased
the number of machines (or rather the sum of build jobs that can be
handled by the build machines), and thus the number of parallel build
processes.

Ludo’.




^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-03-30 15:08 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-22 11:49 bug#42001: “SQLite database is busy”: contention on the store database Ludovic Courtès
2020-06-22 16:01 ` Mathieu Othacehe
2020-06-23  7:42   ` Ludovic Courtès
2020-06-23  9:33 ` Ludovic Courtès
2021-03-30 15:07 ` Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).