* bug#42001: “SQLite database is busy”: contention on the store database
@ 2020-06-22 11:49 Ludovic Courtès
2020-06-22 16:01 ` Mathieu Othacehe
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Ludovic Courtès @ 2020-06-22 11:49 UTC (permalink / raw)
To: 42001
Hi,
On berlin.guix.gnu.org, we’ve seen a lot of contention on the store
database for a month or so, with messages like:
warning: SQLite database is busy
More often than not, everything slows down to a halt, and builds don’t
proceed.
We initially discussed it here:
https://issues.guix.gnu.org/41119#4
When that happens we typically see 30+ processes that have opened the
database:
--8<---------------cut here---------------start------------->8---
ludo@berlin ~$ sudo fuser -v /var/guix/db/db.sqlite-shm
USER PID ACCESS COMMAND
/var/guix/db/db.sqlite-shm:
root 1211 F...m guix offload
root 1845 F...m guix-daemon
root 6070 F...m guix-daemon
root 31378 F...m guix-daemon
root 31380 F...m guix-daemon
root 47381 F...m guix-daemon
root 47382 F...m guix-daemon
root 47383 F...m guix-daemon
root 47385 F...m guix-daemon
root 47681 F...m guix-daemon
root 48506 F...m guix offload
root 48542 F...m guix offload
root 49417 F...m guix-daemon
root 49472 F...m guix-daemon
root 49971 F...m guix-daemon
root 50027 F...m guix-daemon
root 50847 F...m guix offload
root 54625 F...m guix-daemon
root 54723 F...m guix-daemon
root 55470 F...m guix-daemon
root 55614 F...m guix-daemon
root 55617 F...m guix-daemon
root 56225 F...m guix-daemon
root 57732 F...m guix-daemon
root 57733 F...m guix-daemon
root 57738 F...m guix-daemon
root 57739 F...m guix-daemon
root 57740 F...m guix-daemon
root 58772 F...m guix-daemon
root 59056 F...m guix-daemon
root 99055 F...m guix-daemon
root 127690 F...m guix offload
--8<---------------cut here---------------end--------------->8---
In this case, the first process seems to be in the deduplication phase
in ‘finalize-store-file’:
--8<---------------cut here---------------start------------->8---
ludo@berlin ~$ sudo strace -p 1211
strace: Process 1211 attached
utimensat(AT_FDCWD, "/gnu/store/whhc80cx1zpi9cj3bil3sp8wwzcf8p9p-image-root/gnu/store/n6l40ygclxfysnc2nxwrslnn5ab6wv3v-profile/lib/modules/5.4.47-gnu/kernel/net/netfilter", [{tv_sec=1592814617, tv_nsec=396083395} /* 2020-06-22T10:30:17.396083395+0200 */, {tv_sec=1, tv_nsec=0} /* 1970-01-01T01:00:01+0100 */], 0) = 0
chmod("/gnu/store/whhc80cx1zpi9cj3bil3sp8wwzcf8p9p-image-root/gnu/store/n6l40ygclxfysnc2nxwrslnn5ab6wv3v-profile/lib/modules/5.4.47-gnu/kernel/net/netfilter", 040555) = 0
getrusage(RUSAGE_SELF, {ru_utime={tv_sec=1306, tv_usec=334062}, ru_stime={tv_sec=143, tv_usec=819675}, ...}) = 0
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, {tv_sec=1450, tv_nsec=153748295}) = 0
lstat("/gnu/store/whhc80cx1zpi9cj3bil3sp8wwzcf8p9p-image-root/gnu/store/n6l40ygclxfysnc2nxwrslnn5ab6wv3v-profile/lib/modules/5.4.47-gnu/kernel/net/netfilter/xt_connlimit.ko", {st_mode=S_IFLNK|0777, st_size=125, ...}) = 0
readlink("/gnu/store/whhc80cx1zpi9cj3bil3sp8wwzcf8p9p-image-root/gnu/store/n6l40ygclxfysnc2nxwrslnn5ab6wv3v-profile/lib/modules/5.4.47-gnu/kernel/net/netfilter/xt_connlimit.ko", "/gnu/store/x8d1hmv17jawp4a7nwwb4"..., 100) = 100
readlink("/gnu/store/whhc80cx1zpi9cj3bil3sp8wwzcf8p9p-image-root/gnu/store/n6l40ygclxfysnc2nxwrslnn5ab6wv3v-profile/lib/modules/5.4.47-gnu/kernel/net/netfilter/xt_connlimit.ko", "/gnu/store/x8d1hmv17jawp4a7nwwb4"..., 200) = 125
mkdir("/gnu", 0777) = -1 EEXIST (File exists)
mkdir("/gnu/store", 0777) = -1 EEXIST (File exists)
mkdir("/gnu/store/.links", 0777) = -1 EEXIST (File exists)
lstat("/gnu/store/whhc80cx1zpi9cj3bil3sp8wwzcf8p9p-image-root/gnu/store/n6l40ygclxfysnc2nxwrslnn5ab6wv3v-profile/lib/modules/5.4.47-gnu/kernel/net/netfilter/xt_connlimit.ko", {st_mode=S_IFLNK|0777, st_size=125, ...}) = 0
stat("/gnu/store/.links/05p6zpj56f00g07c3621747kz3bjc6zbr8bgg8xysdajq18qgvlv", {st_mode=S_IFREG|0444, st_size=6880, ...}) = 0
--8<---------------cut here---------------end--------------->8---
At that point it has the database opened 3 times: once from
‘with-temporary-store-file’, once ‘with-database’ in
‘finalize-store-file’, and once from ‘register-items’ in
‘finalize-store-file’. Commit d7fb5538013710288e91657499f0e04207115776
removed the last one.
--8<---------------cut here---------------start------------->8---
ludo@berlin ~$ sudo ls -l /proc/1211/fd
total 0
lr-x------ 1 root root 64 Jun 22 11:49 0 -> 'pipe:[10239344]'
l-wx------ 1 root root 64 Jun 22 11:49 1 -> 'pipe:[10239343]'
lr-x------ 1 root root 64 Jun 22 11:49 10 -> 'pipe:[10129708]'
l-wx------ 1 root root 64 Jun 22 11:49 11 -> 'pipe:[10129708]'
lr-x------ 1 root root 64 Jun 22 11:49 12 -> 'pipe:[10214031]'
l-wx------ 1 root root 64 Jun 22 11:49 13 -> 'pipe:[10214031]'
l-wx------ 1 root root 64 Jun 22 11:49 14 -> /var/guix/offload/141.80.167.168:22/1
lrwx------ 1 root root 64 Jun 22 11:49 15 -> 'socket:[8056368]'
lrwx------ 1 root root 64 Jun 22 11:49 16 -> 'socket:[10433466]'
lrwx------ 1 root root 64 Jun 22 11:49 17 -> /var/guix/db/db.sqlite
lrwx------ 1 root root 64 Jun 22 11:49 18 -> /var/guix/db/db.sqlite-wal
lrwx------ 1 root root 64 Jun 22 11:49 19 -> /var/guix/db/db.sqlite-shm
l-wx------ 1 root root 64 Jun 22 11:49 2 -> 'pipe:[10239343]'
lrwx------ 1 root root 64 Jun 22 11:49 20 -> /var/guix/db/db.sqlite
lrwx------ 1 root root 64 Jun 22 11:49 21 -> /var/guix/db/db.sqlite-wal
lr-x------ 1 root root 64 Jun 22 11:49 3 -> 'pipe:[10021329]'
l-wx------ 1 root root 64 Jun 22 11:49 4 -> 'pipe:[10239345]'
lr-x------ 1 root root 64 Jun 22 11:49 40 -> /dev/urandom
l-wx------ 1 root root 64 Jun 22 11:49 5 -> 'pipe:[10021329]'
lr-x------ 1 root root 64 Jun 22 11:49 6 -> 'pipe:[10222125]'
l-wx------ 1 root root 64 Jun 22 11:49 7 -> 'pipe:[10222125]'
lr-x------ 1 root root 64 Jun 22 11:49 8 -> 'pipe:[10021332]'
l-wx------ 1 root root 64 Jun 22 11:49 9 -> 'pipe:[10021332]'
--8<---------------cut here---------------end--------------->8---
The are several possible causes of contention: (1) database opened too
many times, (2) database kept opened for too long, and (3) sqlite
mishandling contention somehow.
I think we can work on #1 and #2. As for #3, I wonder if there’s a
change in sqlite that could have made contention more contentious (ah
ha!), given that all this happened around the time of the last
‘core-updates’ merge, roughly corresponding to an upgrade from 3.24.0 to
3.31.0:
http://data.guix.gnu.org/repository/1/branch/master/package/sqlite
Maybe the most likely change that caused contention is that we increased
the number of machines (or rather the sum of build jobs that can be
handled by the build machines), and thus the number of parallel build
processes.
Ludo’.
^ permalink raw reply [flat|nested] 5+ messages in thread
* bug#42001: “SQLite database is busy”: contention on the store database
2020-06-22 11:49 bug#42001: “SQLite database is busy”: contention on the store database Ludovic Courtès
@ 2020-06-22 16:01 ` Mathieu Othacehe
2020-06-23 7:42 ` Ludovic Courtès
2020-06-23 9:33 ` Ludovic Courtès
2021-03-30 15:07 ` Ludovic Courtès
2 siblings, 1 reply; 5+ messages in thread
From: Mathieu Othacehe @ 2020-06-22 16:01 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: 42001
Hey Ludo,
> utimensat(AT_FDCWD, "/gnu/store/whhc80cx1zpi9cj3bil3sp8wwzcf8p9p-image-root/gnu/store/n6l40ygclxfysnc2nxwrslnn5ab6wv3v-profile/lib/modules/5.4.47-gnu/kernel/net/netfilter", [{tv_sec=1592814617, tv_nsec=396083395} /* 2020-06-22T10:30:17.396083395+0200 */, {tv_sec=1, tv_nsec=0} /* 1970-01-01T01:00:01+0100 */], 0) = 0
> chmod("/gnu/store/whhc80cx1zpi9cj3bil3sp8wwzcf8p9p-image-root/gnu/store/n6l40ygclxfysnc2nxwrslnn5ab6wv3v-profile/lib/modules/5.4.47-gnu/kernel/net/netfilter", 040555) = 0
> getrusage(RUSAGE_SELF, {ru_utime={tv_sec=1306, tv_usec=334062}, ru_stime={tv_sec=143, tv_usec=819675}, ...}) = 0
> clock_gettime(CLOCK_PROCESS_CPUTIME_ID, {tv_sec=1450, tv_nsec=153748295}) = 0
> lstat("/gnu/store/whhc80cx1zpi9cj3bil3sp8wwzcf8p9p-image-root/gnu/store/n6l40ygclxfysnc2nxwrslnn5ab6wv3v-profile/lib/modules/5.4.47-gnu/kernel/net/netfilter/xt_connlimit.ko", {st_mode=S_IFLNK|0777, st_size=125, ...}) = 0
As discussed on IRC, those "image-root" derivations do exercise store
deduplication a lot, are could participate to the observed database
contention.
Turns out I used those derivations as "staging" directories that are
passed to GNU Xorriso when producing ISO images and "mke2fs" and
"mkdosfs" when producing raw disk-images. Using the store was also
convenient to ensure root ownership of the image files.
With 1cb9effc3e875ec7bae31bb06069d16ac89f7e1d and
7f75a7ec08975eb6d6e01db61bd6b91f447f655e, I remove those "image-root"
derivations, and proceed to image creation in a single derivation.
The later commit is a bit more tricky because "mke2fs" is not able to
override file permissions. I used "fakeroot" to make sure that the raw
disk-image files are owned by "root" and not "guixbuilder".
Hoping it will help,
Mathieu
^ permalink raw reply [flat|nested] 5+ messages in thread
* bug#42001: “SQLite database is busy”: contention on the store database
2020-06-22 16:01 ` Mathieu Othacehe
@ 2020-06-23 7:42 ` Ludovic Courtès
0 siblings, 0 replies; 5+ messages in thread
From: Ludovic Courtès @ 2020-06-23 7:42 UTC (permalink / raw)
To: Mathieu Othacehe; +Cc: 42001
Hi,
Mathieu Othacehe <othacehe@gnu.org> skribis:
>> utimensat(AT_FDCWD, "/gnu/store/whhc80cx1zpi9cj3bil3sp8wwzcf8p9p-image-root/gnu/store/n6l40ygclxfysnc2nxwrslnn5ab6wv3v-profile/lib/modules/5.4.47-gnu/kernel/net/netfilter", [{tv_sec=1592814617, tv_nsec=396083395} /* 2020-06-22T10:30:17.396083395+0200 */, {tv_sec=1, tv_nsec=0} /* 1970-01-01T01:00:01+0100 */], 0) = 0
>> chmod("/gnu/store/whhc80cx1zpi9cj3bil3sp8wwzcf8p9p-image-root/gnu/store/n6l40ygclxfysnc2nxwrslnn5ab6wv3v-profile/lib/modules/5.4.47-gnu/kernel/net/netfilter", 040555) = 0
>> getrusage(RUSAGE_SELF, {ru_utime={tv_sec=1306, tv_usec=334062}, ru_stime={tv_sec=143, tv_usec=819675}, ...}) = 0
>> clock_gettime(CLOCK_PROCESS_CPUTIME_ID, {tv_sec=1450, tv_nsec=153748295}) = 0
>> lstat("/gnu/store/whhc80cx1zpi9cj3bil3sp8wwzcf8p9p-image-root/gnu/store/n6l40ygclxfysnc2nxwrslnn5ab6wv3v-profile/lib/modules/5.4.47-gnu/kernel/net/netfilter/xt_connlimit.ko", {st_mode=S_IFLNK|0777, st_size=125, ...}) = 0
>
> As discussed on IRC, those "image-root" derivations do exercise store
> deduplication a lot, are could participate to the observed database
> contention.
>
> Turns out I used those derivations as "staging" directories that are
> passed to GNU Xorriso when producing ISO images and "mke2fs" and
> "mkdosfs" when producing raw disk-images. Using the store was also
> convenient to ensure root ownership of the image files.
>
> With 1cb9effc3e875ec7bae31bb06069d16ac89f7e1d and
> 7f75a7ec08975eb6d6e01db61bd6b91f447f655e, I remove those "image-root"
> derivations, and proceed to image creation in a single derivation.
Nice, thanks for the prompt change!
I’ve been monitoring berlin a bit and I’ve become quite convinced that
these big directories are what triggered contention. The good thing is
that this has allowed us to find weaknesses in (guix store database) and
(guix store deduplicate).
> The later commit is a bit more tricky because "mke2fs" is not able to
> override file permissions. I used "fakeroot" to make sure that the raw
> disk-image files are owned by "root" and not "guixbuilder".
OK, makes sense.
Maybe at some point we’ll have to use libext2fs directly or patch mke2fs
to get the desired flexibility.
Thank you!
Ludo’.
^ permalink raw reply [flat|nested] 5+ messages in thread
* bug#42001: “SQLite database is busy”: contention on the store database
2020-06-22 11:49 bug#42001: “SQLite database is busy”: contention on the store database Ludovic Courtès
2020-06-22 16:01 ` Mathieu Othacehe
@ 2020-06-23 9:33 ` Ludovic Courtès
2021-03-30 15:07 ` Ludovic Courtès
2 siblings, 0 replies; 5+ messages in thread
From: Ludovic Courtès @ 2020-06-23 9:33 UTC (permalink / raw)
To: 42001
Ludovic Courtès <ludo@gnu.org> skribis:
> The are several possible causes of contention: (1) database opened too
> many times, (2) database kept opened for too long, and (3) sqlite
> mishandling contention somehow.
Commit fe5de925aa0f2854a679cebdea02b03cac561c8f should make
deduplication more efficient (it was an embarrassing bug…), partly
addressing #2.
Ludo’.
^ permalink raw reply [flat|nested] 5+ messages in thread
* bug#42001: “SQLite database is busy”: contention on the store database
2020-06-22 11:49 bug#42001: “SQLite database is busy”: contention on the store database Ludovic Courtès
2020-06-22 16:01 ` Mathieu Othacehe
2020-06-23 9:33 ` Ludovic Courtès
@ 2021-03-30 15:07 ` Ludovic Courtès
2 siblings, 0 replies; 5+ messages in thread
From: Ludovic Courtès @ 2021-03-30 15:07 UTC (permalink / raw)
To: 42001-done
Hi,
Ludovic Courtès <ludo@gnu.org> skribis:
> On berlin.guix.gnu.org, we’ve seen a lot of contention on the store
> database for a month or so, with messages like:
>
> warning: SQLite database is busy
>
> More often than not, everything slows down to a halt, and builds don’t
> proceed.
I think we can now close this bug: several issues around ‘guix offload’
were fixed, as mentioned in this issue, and Cuirass no longer uses ‘guix
offload’ anyway.
Ludo’.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-03-30 15:08 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-06-22 11:49 bug#42001: “SQLite database is busy”: contention on the store database Ludovic Courtès
2020-06-22 16:01 ` Mathieu Othacehe
2020-06-23 7:42 ` Ludovic Courtès
2020-06-23 9:33 ` Ludovic Courtès
2021-03-30 15:07 ` Ludovic Courtès
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/guix.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.