* bug#51787: GC takes more than 9 hours on berlin
@ 2021-11-12 11:49 Mathieu Othacehe
2021-11-12 19:17 ` Mathieu Othacehe
` (5 more replies)
0 siblings, 6 replies; 23+ messages in thread
From: Mathieu Othacehe @ 2021-11-12 11:49 UTC (permalink / raw)
To: 51787
Hello,
On berlin, the daily GC command is still running whereas it was started
9 hours ago.
--8<---------------cut here---------------start------------->8---
guix processes
[...]
SessionPID: 37231
ClientPID: 37195
ClientCommand: /gnu/store/49vfv8l1j96bbd73ssbqanpnjz83arss-guix-1.3.0-11.014f1b6/libexec/guix/guile \ /gnu/store/49vfv8l1j96bbd73ssbqanpnjz83arss-guix-1.3.0-11.014f1b6/bin/guix gc -F10995116277760
--8<---------------cut here---------------end--------------->8---
and
--8<---------------cut here---------------start------------->8---
mathieu@berlin ~$ ps auxww|grep 37195
root 37195 0.0 0.0 183260 33440 ? Sl 03:59 0:00 /gnu/store/49vfv8l1j96bbd73ssbqanpnjz83arss-guix-1.3.0-11.014f1b6/libexec/guix/guile \ /gnu/store/49vfv8l1j96bbd73ssbqanpnjz83arss-guix-1.3.0-11.014f1b6/bin/guix gc -F10995116277760
--8<---------------cut here---------------end--------------->8---
That's really problematic as it is blocking some other berlin services
such as Cuirass, which has 4564 packages in its fetch queue:
--8<---------------cut here---------------start------------->8---
mathieu@berlin ~$ less /var/log/cuirass-remote-server.log
[...]
2021-11-12T12:47:01 period update: 0 resumable, 0 failed builds.
2021-11-12T12:47:01 period update: 4564 items in the fetch queue.
--8<---------------cut here---------------end--------------->8---
Thanks,
Mathieu
^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#51787: GC takes more than 9 hours on berlin
2021-11-12 11:49 bug#51787: GC takes more than 9 hours on berlin Mathieu Othacehe
@ 2021-11-12 19:17 ` Mathieu Othacehe
2021-11-22 9:16 ` zimoun
` (4 subsequent siblings)
5 siblings, 0 replies; 23+ messages in thread
From: Mathieu Othacehe @ 2021-11-12 19:17 UTC (permalink / raw)
To: 51787
> mathieu@berlin ~$ ps auxww|grep 37195
> root 37195 0.0 0.0 183260 33440 ? Sl 03:59 0:00 /gnu/store/49vfv8l1j96bbd73ssbqanpnjz83arss-guix-1.3.0-11.014f1b6/libexec/guix/guile \ /gnu/store/49vfv8l1j96bbd73ssbqanpnjz83arss-guix-1.3.0-11.014f1b6/bin/guix gc -F10995116277760
I just killed this process, it was running for more than 16 hours.
Mathieu
^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#51787: GC takes more than 9 hours on berlin
2021-11-12 11:49 bug#51787: GC takes more than 9 hours on berlin Mathieu Othacehe
2021-11-12 19:17 ` Mathieu Othacehe
@ 2021-11-22 9:16 ` zimoun
2021-11-23 17:48 ` Ludovic Courtès
` (3 subsequent siblings)
5 siblings, 0 replies; 23+ messages in thread
From: zimoun @ 2021-11-22 9:16 UTC (permalink / raw)
To: Mathieu Othacehe; +Cc: 51787
Hi,
On Fri, 12 Nov 2021 at 12:51, Mathieu Othacehe <othacehe@gnu.org> wrote:
> On berlin, the daily GC command is still running whereas it was started
> 9 hours ago.
>
> --8<---------------cut here---------------start------------->8---
> guix processes
> [...]
> SessionPID: 37231
> ClientPID: 37195
> ClientCommand: /gnu/store/49vfv8l1j96bbd73ssbqanpnjz83arss-guix-1.3.0-11.014f1b6/libexec/guix/guile \ /gnu/store/49vfv8l1j96bbd73ssbqanpnjz83arss-guix-1.3.0-11.014f1b6/bin/guix gc -F10995116277760
> --8<---------------cut here---------------end--------------->8---
>
> and
>
> --8<---------------cut here---------------start------------->8---
> mathieu@berlin ~$ ps auxww|grep 37195
> root 37195 0.0 0.0 183260 33440 ? Sl 03:59 0:00 /gnu/store/49vfv8l1j96bbd73ssbqanpnjz83arss-guix-1.3.0-11.014f1b6/libexec/guix/guile \ /gnu/store/49vfv8l1j96bbd73ssbqanpnjz83arss-guix-1.3.0-11.014f1b6/bin/guix gc -F10995116277760
> --8<---------------cut here---------------end--------------->8---
>
> That's really problematic as it is blocking some other berlin services
> such as Cuirass, which has 4564 packages in its fetch queue:
>
> --8<---------------cut here---------------start------------->8---
> mathieu@berlin ~$ less /var/log/cuirass-remote-server.log
> [...]
> 2021-11-12T12:47:01 period update: 0 resumable, 0 failed builds.
> 2021-11-12T12:47:01 period update: 4564 items in the fetch queue.
> --8<---------------cut here---------------end--------------->8---
How is it possible to investigate?
Cheers,
simon
^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#51787: GC takes more than 9 hours on berlin
2021-11-12 11:49 bug#51787: GC takes more than 9 hours on berlin Mathieu Othacehe
2021-11-12 19:17 ` Mathieu Othacehe
2021-11-22 9:16 ` zimoun
@ 2021-11-23 17:48 ` Ludovic Courtès
2021-11-25 13:24 ` Christopher Baines
2021-11-27 11:11 ` Ludovic Courtès
2021-12-10 19:38 ` Ricardo Wurmus
` (2 subsequent siblings)
5 siblings, 2 replies; 23+ messages in thread
From: Ludovic Courtès @ 2021-11-23 17:48 UTC (permalink / raw)
To: Mathieu Othacehe; +Cc: 51787
Hello!
Mathieu Othacehe <othacehe@gnu.org> skribis:
> On berlin, the daily GC command is still running whereas it was started
> 9 hours ago.
Some data points:
• I deployed on berlin the new daemon featuring the faster “deleting
unused links” phase from <https://issues.guix.gnu.org/24937> on
Nov. 20.
However, that part runs after the GC lock has been released, so it’s
not really relevant (but it is relevant to I/O load and GC
efficiency.)
• When discussing together with Chris Baines yesterday on IRC, we
found that the sqlite WAL file was 8 GiB. I later ran:
PRAGMA wal_checkpoint(TRUNCATE);
which emptied it immediately. However, GC time wasn’t particularly
different today.
• ‘db.sqlite’ weighs in at 19 GiB (!) so perhaps there’s something to
do, like the “VACUUM” thing maybe. Chris?
• Stracing the session’s guix-daemon process during GC suggests that
most of the time goes into I/O from ‘db.sqlite’. It’s not
surprising because that GC phase is basically about browsing the
database, but it does seem to take a little too long for each store
item.
• I haven’t checked recently but I recall that ‘guix gc --list-roots’
(or its C++ counterpart, ‘findRoots’) would take ages on berlin
because of all the GC roots Cuirass registers. It may be that an
hour or so goes into enumerating GC roots.
Collecting garbage,
Ludo’.
^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#51787: GC takes more than 9 hours on berlin
2021-11-23 17:48 ` Ludovic Courtès
@ 2021-11-25 13:24 ` Christopher Baines
2021-11-27 11:23 ` Ludovic Courtès
2021-11-27 11:11 ` Ludovic Courtès
1 sibling, 1 reply; 23+ messages in thread
From: Christopher Baines @ 2021-11-25 13:24 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: Mathieu Othacehe, 51787
[-- Attachment #1: Type: text/plain, Size: 3720 bytes --]
Ludovic Courtès <ludo@gnu.org> writes:
> • When discussing together with Chris Baines yesterday on IRC, we
> found that the sqlite WAL file was 8 GiB. I later ran:
>
> PRAGMA wal_checkpoint(TRUNCATE);
>
> which emptied it immediately. However, GC time wasn’t particularly
> different today.
So, as I understand it, the WAL is made up of pages, and checking for
this db, I think they're the current default size of 4096 bytes.
sqlite> PRAGMA page_size;
4096
From looking at the code, the wal_autocheckpoint value is set to 40000:
/* Increase the auto-checkpoint interval to 40000 pages. This
seems enough to ensure that instantiating the NixOS system
derivation is done in a single fsync(). */
if (mode == "wal" && sqlite3_exec(db, "pragma wal_autocheckpoint = 40000;", 0, 0, 0) != SQLITE_OK)
throwSQLiteError(db, "setting autocheckpoint interval");
https://git.savannah.gnu.org/cgit/guix.git/tree/nix/libstore/local-store.cc#n253
This means you'd expect the WAL to be in the region of 40000*4096 bytes,
or ~160MB. Assuming the autocheckpointing is keeping up... it doesn't
look to be, since the file is now much larger than this.
As described here [1], the automatic checkpoints are PASSIVE ones, which
has the advantage of not interrupting any readers or writers, but can
also do nothing if it's being blocked by readers or writers.
1: https://www.sqlite.org/wal.html#application_initiated_checkpoints
What I've found while writing the Guix Build Coordinator, is that when
the service is busy (usually new builds being submitted, plus lots of
builds happening), the PASSIVE checkpoints aren't sufficient. To
supplement them, there's a regular check that looks at the size of the
WAL file, and runs a TRUNCATE checkpoint, which is a FULL checkpoint
(which blocks new writers), plus truncating the WAL file once it's
finished checkpointing the entire WAL. The truncating is mostly so that
it's easier to monitor the size of the WAL, by checking the size of the
file.
I feel like I need to defend SQLite at this point. Tuning the
configuration of a database to get acceptable performance is the norm, I
had to tune the PostgreSQL configuration for data.guix.gnu.org to
improve the performance. It's easier to get in to trouble with SQLite
because it's a lower level too, and requires you to actually initiate
things like checkpoints or periodic optimisation if you want them to
happen.
Unfortunately, I don't know enough about the internals of the daemon to
say anything specific though.
> • ‘db.sqlite’ weighs in at 19 GiB (!) so perhaps there’s something to
> do, like the “VACUUM” thing maybe. Chris?
Doing a VACCUM might address some fragmentation and improve performance,
it's probably worth trying.
> • Stracing the session’s guix-daemon process during GC suggests that
> most of the time goes into I/O from ‘db.sqlite’. It’s not
> surprising because that GC phase is basically about browsing the
> database, but it does seem to take a little too long for each store
> item.
At least the way I've approached finding and fixing the poor performance
issues in the Guix Build Coordinator is through adding instrumentation,
so just recording the time that calling particular procedures takes, and
then logging if it's longer than some threshold.
Since this issue is about Cuirass, there's also the possibility of
avoiding the problems of a large store, by avoiding having a large
store. That's what bordeaux.guix.gnu.org does, and I thought it was part
of the plan for ci.guix.gnu.org (at least around a year ago)?
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]
^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#51787: GC takes more than 9 hours on berlin
2021-11-23 17:48 ` Ludovic Courtès
2021-11-25 13:24 ` Christopher Baines
@ 2021-11-27 11:11 ` Ludovic Courtès
1 sibling, 0 replies; 23+ messages in thread
From: Ludovic Courtès @ 2021-11-27 11:11 UTC (permalink / raw)
To: Mathieu Othacehe; +Cc: 51787
Ludovic Courtès <ludo@gnu.org> skribis:
> • Stracing the session’s guix-daemon process during GC suggests that
> most of the time goes into I/O from ‘db.sqlite’. It’s not
> surprising because that GC phase is basically about browsing the
> database, but it does seem to take a little too long for each store
> item.
Stracing the client shows that the daemon spends several seconds on a
single store item occasionally:
--8<---------------cut here---------------start------------->8---
read(27, "gmlo\0\0\0\0", 8) = 8 <0.363064>
read(27, "c\0\0\0\0\0\0\0", 8) = 8 <0.000013>
read(27, "[95%] deleting '/gnu/store/p6r2jjy6frp682z3x94nvnmdh71p1p58-ecl-quicksearch-0.01"..., 104) = 104 <0.000010>
write(2, "[95%] deleting '/gnu/store/p6r2jjy6frp682z3x94nvnmdh71p1p58-ecl-quicksearch-0.01"..., 99) = 99 <0.000019>
read(27, "gmlo\0\0\0\0", 8) = 8 <0.017863>
read(27, "^\0\0\0\0\0\0\0", 8) = 8 <0.000019>
read(27, "[95%] deleting '/gnu/store/v6zd510kfmqd8j4w7q3zy9bid1fj96dk-shepherd-guix-daemon"..., 96) = 96 <0.000007>
write(2, "[95%] deleting '/gnu/store/v6zd510kfmqd8j4w7q3zy9bid1fj96dk-shepherd-guix-daemon"..., 94) = 94 <0.000012>
read(27, "gmlo\0\0\0\0", 8) = 8 <5.861071>
read(27, "T\0\0\0\0\0\0\0", 8) = 8 <0.000061>
read(27, "[95%] deleting '/gnu/store/0hpwig8cwdnzygjjzs9zjbxicvhif2vv-rust-bitvec-0.19.4.d"..., 88) = 88 <0.000087>
write(2, "[95%] deleting '/gnu/store/0hpwig8cwdnzygjjzs9zjbxicvhif2vv-rust-bitvec-0.19.4.d"..., 84) = 84 <0.000033>
--8<---------------cut here---------------end--------------->8---
(Notice ‘read’ taking 5.9s above.)
Ludo’.
^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#51787: GC takes more than 9 hours on berlin
2021-11-25 13:24 ` Christopher Baines
@ 2021-11-27 11:23 ` Ludovic Courtès
2021-12-03 9:45 ` Mathieu Othacehe
0 siblings, 1 reply; 23+ messages in thread
From: Ludovic Courtès @ 2021-11-27 11:23 UTC (permalink / raw)
To: Christopher Baines; +Cc: Mathieu Othacehe, 51787
Hi Chris,
Christopher Baines <mail@cbaines.net> skribis:
> So, as I understand it, the WAL is made up of pages, and checking for
> this db, I think they're the current default size of 4096 bytes.
>
> sqlite> PRAGMA page_size;
> 4096
>
> From looking at the code, the wal_autocheckpoint value is set to 40000:
>
> /* Increase the auto-checkpoint interval to 40000 pages. This
> seems enough to ensure that instantiating the NixOS system
> derivation is done in a single fsync(). */
> if (mode == "wal" && sqlite3_exec(db, "pragma wal_autocheckpoint = 40000;", 0, 0, 0) != SQLITE_OK)
> throwSQLiteError(db, "setting autocheckpoint interval");
>
> https://git.savannah.gnu.org/cgit/guix.git/tree/nix/libstore/local-store.cc#n253
>
> This means you'd expect the WAL to be in the region of 40000*4096 bytes,
> or ~160MB. Assuming the autocheckpointing is keeping up... it doesn't
> look to be, since the file is now much larger than this.
>
> As described here [1], the automatic checkpoints are PASSIVE ones, which
> has the advantage of not interrupting any readers or writers, but can
> also do nothing if it's being blocked by readers or writers.
>
> 1: https://www.sqlite.org/wal.html#application_initiated_checkpoints
>
> What I've found while writing the Guix Build Coordinator, is that when
> the service is busy (usually new builds being submitted, plus lots of
> builds happening), the PASSIVE checkpoints aren't sufficient. To
> supplement them, there's a regular check that looks at the size of the
> WAL file, and runs a TRUNCATE checkpoint, which is a FULL checkpoint
> (which blocks new writers), plus truncating the WAL file once it's
> finished checkpointing the entire WAL. The truncating is mostly so that
> it's easier to monitor the size of the WAL, by checking the size of the
> file.
OK. That may well be what happens on berlin these days: the database is
kept busy all day long, so presumably checkpoints don’t happen and the
WAL file grows.
> I feel like I need to defend SQLite at this point. Tuning the
> configuration of a database to get acceptable performance is the norm, I
> had to tune the PostgreSQL configuration for data.guix.gnu.org to
> improve the performance. It's easier to get in to trouble with SQLite
> because it's a lower level too, and requires you to actually initiate
> things like checkpoints or periodic optimisation if you want them to
> happen.
Understood. It’s really not about defending software X against Y, but
rather about finding ways to address the issues we experience.
>> • ‘db.sqlite’ weighs in at 19 GiB (!) so perhaps there’s something to
>> do, like the “VACUUM” thing maybe. Chris?
>
> Doing a VACCUM might address some fragmentation and improve performance,
> it's probably worth trying.
Alright, let’s give it a try.
>> • Stracing the session’s guix-daemon process during GC suggests that
>> most of the time goes into I/O from ‘db.sqlite’. It’s not
>> surprising because that GC phase is basically about browsing the
>> database, but it does seem to take a little too long for each store
>> item.
>
> At least the way I've approached finding and fixing the poor performance
> issues in the Guix Build Coordinator is through adding instrumentation,
> so just recording the time that calling particular procedures takes, and
> then logging if it's longer than some threshold.
Yeah, that makes sense. I think we always took these bits of the daemon
for granted because they’d been used on large stores even before Guix
existed.
> Since this issue is about Cuirass, there's also the possibility of
> avoiding the problems of a large store, by avoiding having a large
> store. That's what bordeaux.guix.gnu.org does, and I thought it was part
> of the plan for ci.guix.gnu.org (at least around a year ago)?
That’s indeed the case: the store is smaller than it used to be (but
still 27 TiB), it’s GC’d more aggressively than before, and instead we
rely on /var/cache/guix/publish for long-term storage.
Perhaps we should go further and keep the store smaller though.
Thanks for your feedback!
Ludo’.
^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#51787: GC takes more than 9 hours on berlin
2021-11-27 11:23 ` Ludovic Courtès
@ 2021-12-03 9:45 ` Mathieu Othacehe
2021-12-10 6:24 ` Mathieu Othacehe
0 siblings, 1 reply; 23+ messages in thread
From: Mathieu Othacehe @ 2021-12-03 9:45 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: 51787
Hello,
> That’s indeed the case: the store is smaller than it used to be (but
> still 27 TiB), it’s GC’d more aggressively than before, and instead we
> rely on /var/cache/guix/publish for long-term storage.
>
> Perhaps we should go further and keep the store smaller though.
That's what I did with 93adf7aaa693d234ee13240e9f4ff22a2dfef599 on
maintenance. It increases the GC threshold to 15TiB. Let's see if it
brings some improvements.
Thanks,
Mathieu
^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#51787: GC takes more than 9 hours on berlin
2021-12-03 9:45 ` Mathieu Othacehe
@ 2021-12-10 6:24 ` Mathieu Othacehe
2021-12-10 10:21 ` Ludovic Courtès
0 siblings, 1 reply; 23+ messages in thread
From: Mathieu Othacehe @ 2021-12-10 6:24 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: 51787
Hey,
New GC recap. The process that has been started yesterday at 04:00 is
still running. I killed the GC that was started today at 04:00 to keep
things clear.
From yesterday 11:00 when I started monitoring it to today when I'm
writing this email, 20 hours have elapsed and the GC is still in the
same phase: removing recursively the /gnu/store/trash directory content.
It corresponds to the following snippet for those of you who would like
to have a look to the corresponding code:
--8<---------------cut here---------------start------------->8---
if (state.shouldDelete) {
if (pathExists(state.trashDir)) deleteGarbage(state, state.trashDir); // > 20 hours
try {
createDirs(state.trashDir);
} catch (SysError & e) {
if (e.errNo == ENOSPC) {
printMsg(lvlInfo, format("note: can't create trash directory: %1%") % e.msg());
state.moveToTrash = false;
}
}
}--8<---------------cut here---------------end--------------->8---
This is an early phase of the garbage collecting, where store items that
were moved to the trash directory by previous GC runs are effectively
removed.
Stracing the guix-daemon process associated with the GC process clearly
shows what's going on:
--8<---------------cut here---------------start------------->8---
chmod("/gnu/store/trash/272ibwb38i0kcbcl3n9v0ka1rsmd1104-guix-web-site/de/packages/rust-syntex-0.58.1", 040755) = 0 <0.000012>
openat(AT_FDCWD, "/gnu/store/trash/272ibwb38i0kcbcl3n9v0ka1rsmd1104-guix-web-site/de/packages/rust-syntex-0.58.1", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 13 <0.000011>
fstat(13, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 <0.000007>
getdents64(13, 0x397a510 /* 3 entries */, 32768) = 80 <0.005059>
getdents64(13, 0x397a510 /* 0 entries */, 32768) = 0 <0.000007>
close(13) = 0 <0.000008>
statx(AT_FDCWD, "/gnu/store/trash/272ibwb38i0kcbcl3n9v0ka1rsmd1104-guix-web-site/de/packages/rust-syntex-0.58.1/index.html", AT_STATX_SYNC_AS_STAT|AT_SYMLINK_NOFOLLOW, STATX_MODE|STATX_NLINK|STATX_SIZE, {stx_mask=STATX_BASIC_STATS|0x1000, stx_attributes=0, stx_mode=S_IFREG|0444, stx_size=10265, ...}) = 0 <0.000023>
unlink("/gnu/store/trash/272ibwb38i0kcbcl3n9v0ka1rsmd1104-guix-web-site/de/packages/rust-syntex-0.58.1/index.html") = 0 <0.000013>
rmdir("/gnu/store/trash/272ibwb38i0kcbcl3n9v0ka1rsmd1104-guix-web-site/de/packages/rust-syntex-0.58.1") = 0 <0.000028>
statx(AT_FDCWD, "/gnu/store/trash/272ibwb38i0kcbcl3n9v0ka1rsmd1104-guix-web-site/de/packages/lofreq-2.1.5", AT_STATX_
--8<---------------cut here---------------end--------------->8---
Several syscalls are involved to clean the trash directory: chmod,
openat, statx, unlink and rmdir. This does not seem particularly wrong.
What is problematic though is that in 20 hours, the free space has
bumped from 9.6T to 9.7T in the store partition. As the GC lock is
preventing most of Berlin services from running, almost all the machine
IO is dedicated to removing this directory, as shown by iotop.
I'm not sure to understand why this removing process is so long, but if
someone has an idea, I'm all ears. In the meantime, I plan to let the GC
run and keep monitoring it.
Thanks,
Mathieu
^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#51787: GC takes more than 9 hours on berlin
2021-12-10 6:24 ` Mathieu Othacehe
@ 2021-12-10 10:21 ` Ludovic Courtès
2021-12-10 17:11 ` Mathieu Othacehe
2021-12-10 17:11 ` Mathieu Othacehe
0 siblings, 2 replies; 23+ messages in thread
From: Ludovic Courtès @ 2021-12-10 10:21 UTC (permalink / raw)
To: Mathieu Othacehe; +Cc: 51787
Hi,
Mathieu Othacehe <othacehe@gnu.org> skribis:
> What is problematic though is that in 20 hours, the free space has
> bumped from 9.6T to 9.7T in the store partition. As the GC lock is
> preventing most of Berlin services from running, almost all the machine
> IO is dedicated to removing this directory, as shown by iotop.
>
> I'm not sure to understand why this removing process is so long, but if
> someone has an idea, I'm all ears. In the meantime, I plan to let the GC
> run and keep monitoring it.
This is the first time GC runs since we’ve increased the threshold from
10 TiB to 15 TiB free. There are at least 5 more TiBs to delete than
usual, so it’s expected to take more time.
Still, I’m surprised a mere ‘rm -rf’ can take this long. ‘strace -T’ on
the child guix-daemon process doesn’t reveal anything obviously wrong,
pause times or similar.
Ludo’.
^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#51787: GC takes more than 9 hours on berlin
2021-12-10 10:21 ` Ludovic Courtès
@ 2021-12-10 17:11 ` Mathieu Othacehe
2021-12-10 21:54 ` Ludovic Courtès
2021-12-10 17:11 ` Mathieu Othacehe
1 sibling, 1 reply; 23+ messages in thread
From: Mathieu Othacehe @ 2021-12-10 17:11 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: 51787
Hey,
> Still, I’m surprised a mere ‘rm -rf’ can take this long. ‘strace -T’ on
> the child guix-daemon process doesn’t reveal anything obviously wrong,
> pause times or similar.
I noticed that the process isn't around anymore. Did anyone killed it,
or maybe it just crashed? Also noticed some trash removing commands were
running. Regardless of the root cause of the problem, getting rid of the
trash directory before the next evaluation seems like a good idea.
Thanks,
Mathieu
^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#51787: GC takes more than 9 hours on berlin
2021-12-10 10:21 ` Ludovic Courtès
2021-12-10 17:11 ` Mathieu Othacehe
@ 2021-12-10 17:11 ` Mathieu Othacehe
1 sibling, 0 replies; 23+ messages in thread
From: Mathieu Othacehe @ 2021-12-10 17:11 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: 51787
Hey,
> Still, I’m surprised a mere ‘rm -rf’ can take this long. ‘strace -T’ on
> the child guix-daemon process doesn’t reveal anything obviously wrong,
> pause times or similar.
I noticed that the process isn't around anymore. Did anyone killed it,
or maybe it just crashed? Also noticed some trash removing commands were
running. Regardless of the root cause of the problem, getting rid of the
trash directory before the next evaluation seems like a good idea.
Thanks,
Mathieu
^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#51787: GC takes more than 9 hours on berlin
2021-11-12 11:49 bug#51787: GC takes more than 9 hours on berlin Mathieu Othacehe
` (2 preceding siblings ...)
2021-11-23 17:48 ` Ludovic Courtès
@ 2021-12-10 19:38 ` Ricardo Wurmus
2021-12-20 21:12 ` Ricardo Wurmus
2023-08-16 10:53 ` Maxim Cournoyer
5 siblings, 0 replies; 23+ messages in thread
From: Ricardo Wurmus @ 2021-12-10 19:38 UTC (permalink / raw)
To: 51787
> I noticed that the process isn't around anymore. Did anyone killed
> it,or maybe it just crashed?
I have not touched it.
--
Ricardo
^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#51787: GC takes more than 9 hours on berlin
2021-12-10 17:11 ` Mathieu Othacehe
@ 2021-12-10 21:54 ` Ludovic Courtès
2021-12-11 9:42 ` Mathieu Othacehe
0 siblings, 1 reply; 23+ messages in thread
From: Ludovic Courtès @ 2021-12-10 21:54 UTC (permalink / raw)
To: Mathieu Othacehe; +Cc: 51787
Mathieu Othacehe <othacehe@gnu.org> skribis:
>> Still, I’m surprised a mere ‘rm -rf’ can take this long. ‘strace -T’ on
>> the child guix-daemon process doesn’t reveal anything obviously wrong,
>> pause times or similar.
>
> I noticed that the process isn't around anymore. Did anyone killed it,
> or maybe it just crashed? Also noticed some trash removing commands were
> running. Regardless of the root cause of the problem, getting rid of the
> trash directory before the next evaluation seems like a good idea.
Yeah I don’t know what happened to the GC process but it disappeared
earlier today. Then I started things like this:
unshare -m sh -c 'mount --bind -o remount,rw /gnu/store; rm -rf --one-file-system /gnu/store/trash/[abc]*'
which is equally slow (perhaps slightly slower: it seems to do more
syscalls per file to remove than the guix-daemon code).
Anyway, I’ll let it run hoping it’ll be done by the next GC.
Ludo’.
^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#51787: GC takes more than 9 hours on berlin
2021-12-10 21:54 ` Ludovic Courtès
@ 2021-12-11 9:42 ` Mathieu Othacehe
2021-12-12 17:09 ` Mathieu Othacehe
0 siblings, 1 reply; 23+ messages in thread
From: Mathieu Othacehe @ 2021-12-11 9:42 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: 51787
Hey,
> Yeah I don’t know what happened to the GC process but it disappeared
> earlier today. Then I started things like this:
>
> unshare -m sh -c 'mount --bind -o remount,rw /gnu/store; rm -rf --one-file-system /gnu/store/trash/[abc]*'
>
> which is equally slow (perhaps slightly slower: it seems to do more
> syscalls per file to remove than the guix-daemon code).
>
> Anyway, I’ll let it run hoping it’ll be done by the next GC.
OK, thanks. Looks it just finished removing the trash directory
content. I started a GC process from my session to monitor it closely.
Mathieu
^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#51787: GC takes more than 9 hours on berlin
2021-12-11 9:42 ` Mathieu Othacehe
@ 2021-12-12 17:09 ` Mathieu Othacehe
2021-12-13 16:13 ` Christian Thäter
` (2 more replies)
0 siblings, 3 replies; 23+ messages in thread
From: Mathieu Othacehe @ 2021-12-12 17:09 UTC (permalink / raw)
To: 51787
Hey,
> OK, thanks. Looks it just finished removing the trash directory
> content. I started a GC process from my session to monitor it closely.
Daily GC recap:
* The GC process I started yesterday, did collect 5.5TiB in
approximately 24 hours, that are now in the /gnu/store/trash
directory.
* The /gnu/store/trash directory contains 288910 entries. If those items
are removed at the same rate than on the previous days, it will take
days/months to delete them all.
* I noticed that the upstream Nix GC process can now operate without
locking. I think it shouldn't be too hard to port it to our fork or
maybe rewrite the process in Guile while we are at it.
That will not fix the slow hard-drives issues though.
Thanks,
Mathieu
^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#51787: GC takes more than 9 hours on berlin
2021-12-12 17:09 ` Mathieu Othacehe
@ 2021-12-13 16:13 ` Christian Thäter
2021-12-14 3:31 ` Christian Thäter
2021-12-17 10:55 ` Mathieu Othacehe
2 siblings, 0 replies; 23+ messages in thread
From: Christian Thäter @ 2021-12-13 16:13 UTC (permalink / raw)
To: 51787
On 2021-12-12 18:09, Mathieu Othacehe wrote:
>Hey,
>
>> OK, thanks. Looks it just finished removing the trash directory
>> content. I started a GC process from my session to monitor it
>> closely.
>
>Daily GC recap:
>
>* The GC process I started yesterday, did collect 5.5TiB in
> approximately 24 hours, that are now in the /gnu/store/trash
> directory.
>
>* The /gnu/store/trash directory contains 288910 entries. If those
>items
> are removed at the same rate than on the previous days, it will take
> days/months to delete them all.
>
>* I noticed that the upstream Nix GC process can now operate without
> locking. I think it shouldn't be too hard to port it to our fork or
> maybe rewrite the process in Guile while we are at it.
>
> That will not fix the slow hard-drives issues though.
While discussing this issue on IRC I came up with some idea:
'rmrfd' a system daemon that deletes huge trees in the background where
'-rf' stands for --really --fast :)
Actually this is an use case that happens for on my backup system too.
With that idea I just started coding and ran some experiments. For me
this looks quite feasible now and I will continue next days on this
small project. Any feedback or help would be welcomed!
The initial ideas and experiments are at https://github.com/cehteh/rmrfd
Note that the important part is that it will put some efforts into
freeing as much space as possible at begin of the freeing process,
Unlike just 'rm -rf' where space may only freed really late when the
last link count of the data goes to zero.
Cheers
Christian
^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#51787: GC takes more than 9 hours on berlin
2021-12-12 17:09 ` Mathieu Othacehe
2021-12-13 16:13 ` Christian Thäter
@ 2021-12-14 3:31 ` Christian Thäter
2021-12-17 10:55 ` Mathieu Othacehe
2 siblings, 0 replies; 23+ messages in thread
From: Christian Thäter @ 2021-12-14 3:31 UTC (permalink / raw)
To: 51787
On 2021-12-12 18:09, Mathieu Othacehe wrote:
>Hey,
>
>> OK, thanks. Looks it just finished removing the trash directory
>> content. I started a GC process from my session to monitor it
>> closely.
>
>Daily GC recap:
>
>* The GC process I started yesterday, did collect 5.5TiB in
> approximately 24 hours, that are now in the /gnu/store/trash
> directory.
>
>* The /gnu/store/trash directory contains 288910 entries. If those
>items
> are removed at the same rate than on the previous days, it will take
> days/months to delete them all.
On another note: when 'guix gc' determines that objects are dead, are
they really dead then or can it be that they are 'locally' dead but in
case the store serves as substitutes/offload server some external
clients may still request dead objects?
In the either case would make sense to run the GC more frequently, but
for the later case a --min-age option to preserve objects that just
died recently could be helping. Further it may consider the atime of
objects for removal.
And finally while I had this Idea: You mount the
guix store with 'relatime' or 'nodiratme', if not that could explain
the slow deletion process as well!
Christian
>
>* I noticed that the upstream Nix GC process can now operate without
> locking. I think it shouldn't be too hard to port it to our fork or
> maybe rewrite the process in Guile while we are at it.
>
> That will not fix the slow hard-drives issues though.
>
>Thanks,
>
>Mathieu
>
>
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#51787: GC takes more than 9 hours on berlin
2021-12-12 17:09 ` Mathieu Othacehe
2021-12-13 16:13 ` Christian Thäter
2021-12-14 3:31 ` Christian Thäter
@ 2021-12-17 10:55 ` Mathieu Othacehe
2021-12-17 13:06 ` Ricardo Wurmus
2 siblings, 1 reply; 23+ messages in thread
From: Mathieu Othacehe @ 2021-12-17 10:55 UTC (permalink / raw)
To: 51787; +Cc: rekado
[-- Attachment #1: Type: text/plain, Size: 1076 bytes --]
Hey,
New day, new benchmark. Berlin has two hard drives, which are roughly
used this way:
/dev/sda -> / (916G)
/dev/sdb -> /gnu (37T)
I ran the fio benchmark tool on both of them. See the reports
attached, and the following summary:
--8<---------------cut here---------------start------------->8---
| | sda | sdb |
|-------+-----------+-----------|
| read | 1565KiB/s | 9695KiB/s |
| write | 523KiB/s | 3240KiB/s |
--8<---------------cut here---------------end--------------->8---
I'm not sure how slow those figures are relatively to the hard drives
technologies. Ricardo, any idea about that?
My own NVME hard drive has 294MiB/s read and 98.1MiB/s write with the
same test in comparison.
I also tried to benchmark file removal this way, but this is not really
conclusive:
--8<---------------cut here---------------start------------->8---
# 3.8s on sdb
time for i in {0..100000}; do echo 'test' > "fs_test/test${i}.txt"; done
# 2.7s on sdb
time rf -rf fs_test
--8<---------------cut here---------------end--------------->8---
Thanks,
Mathieu
[-- Attachment #2: fio_sda --]
[-- Type: application/octet-stream, Size: 2264 bytes --]
mathieu@berlin ~$ fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.28
Starting 1 process
test: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [m(1)][100.0%][r=1437KiB/s,w=428KiB/s][r=359,w=107 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=71924: Fri Dec 17 11:17:12 2021
read: IOPS=391, BW=1565KiB/s (1602kB/s)(3070MiB/2008868msec)
bw ( KiB/s): min= 304, max= 3190, per=100.00%, avg=1566.30, stdev=323.23, samples=4014
iops : min= 76, max= 797, avg=391.48, stdev=80.84, samples=4014
write: IOPS=130, BW=523KiB/s (536kB/s)(1026MiB/2008868msec); 0 zone resets
bw ( KiB/s): min= 120, max= 1146, per=100.00%, avg=523.43, stdev=124.21, samples=4014
iops : min= 30, max= 286, avg=130.80, stdev=31.06, samples=4014
cpu : usr=0.16%, sys=0.58%, ctx=1039425, majf=0, minf=589
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=1565KiB/s (1602kB/s), 1565KiB/s-1565KiB/s (1602kB/s-1602kB/s), io=3070MiB (3219MB), run=2008868-2008868msec
WRITE: bw=523KiB/s (536kB/s), 523KiB/s-523KiB/s (536kB/s-536kB/s), io=1026MiB (1076MB), run=2008868-2008868msec
Disk stats (read/write):
sda: ios=805734/364174, merge=7143/37100, ticks=137586969/41308, in_queue=137628277, util=99.71%
mathieu@berlin ~$ fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=/var/cache/test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.28
Starting 1 process
test: Laying out IO file (1 file / 4096MiB)
fio: pid=0, err=13/file:filesetup.c:174, func=open, error=Permission denied
[-- Attachment #3: fio_sdb --]
[-- Type: application/octet-stream, Size: 2329 bytes --]
mathieu@berlin ~$ fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=/var/cache/test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.28
Starting 1 process
test: Laying out IO file (1 file / 4096MiB)
fio: pid=0, err=13/file:filesetup.c:174, func=open, error=Permission denied
Run status group 0 (all jobs):
mathieu@berlin ~$ sudo fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=/var/cache/test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
Password:
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.28
Starting 1 process
test: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [m(1)][100.0%][r=6610KiB/s,w=2190KiB/s][r=1652,w=547 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=49274: Fri Dec 17 11:30:38 2021
read: IOPS=2423, BW=9695KiB/s (9927kB/s)(3070MiB/324270msec)
bw ( KiB/s): min= 2392, max=17808, per=100.00%, avg=9704.06, stdev=2476.08, samples=648
iops : min= 598, max= 4452, avg=2425.92, stdev=619.02, samples=648
write: IOPS=809, BW=3240KiB/s (3318kB/s)(1026MiB/324270msec); 0 zone resets
bw ( KiB/s): min= 1080, max= 5968, per=100.00%, avg=3243.27, stdev=833.97, samples=648
iops : min= 270, max= 1492, avg=810.71, stdev=208.49, samples=648
cpu : usr=0.55%, sys=2.84%, ctx=1004749, majf=0, minf=129
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=9695KiB/s (9927kB/s), 9695KiB/s-9695KiB/s (9927kB/s-9927kB/s), io=3070MiB (3219MB), run=324270-324270msec
WRITE: bw=3240KiB/s (3318kB/s), 3240KiB/s-3240KiB/s (3318kB/s-3318kB/s), io=1026MiB (1076MB), run=324270-324270msec
Disk stats (read/write):
sdb: ios=790930/270846, merge=2539/10683, ticks=21982026/264629, in_queue=22246666, util=100.00%
^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#51787: GC takes more than 9 hours on berlin
2021-12-17 10:55 ` Mathieu Othacehe
@ 2021-12-17 13:06 ` Ricardo Wurmus
2021-12-17 14:08 ` Ricardo Wurmus
0 siblings, 1 reply; 23+ messages in thread
From: Ricardo Wurmus @ 2021-12-17 13:06 UTC (permalink / raw)
To: Mathieu Othacehe; +Cc: 51787
Hi Mathieu,
> New day, new benchmark. Berlin has two hard drives, which are roughly
> used this way:
>
> /dev/sda -> / (916G)
> /dev/sdb -> /gnu (37T)
sda consists of two local hard disks that are combined to a RAID.
Here are the disk details:
--8<---------------cut here---------------start------------->8---
Disk.Bay.0:Enclosure.Internal.0-1:RAID.Slot.3-1
Status = Ok
DeviceDescription = Disk 0 in Backplane 1 of RAID Controller in Slot 3
RollupStatus = Ok
Name = Physical Disk 0:1:0
State = Online
OperationState = Not Applicable
PowerStatus = Spun-Up
Size = 931.000 GB
FailurePredicted = NO
RemainingRatedWriteEndurance = Not Applicable
SecurityStatus = Not Capable
BusProtocol = SATA
MediaType = HDD
UsedRaidDiskSpace = 931.000 GB
AvailableRaidDiskSpace = 0.001 GB
Hotspare = NO
Manufacturer = SEAGATE
ProductId = ST1000NX0443
Revision = NB33
SerialNumber = W470QK7K
PartNumber = CN08DN1YSGW0076S00L8A00
NegotiatedSpeed = 6.0 Gb/s
ManufacturedDay = 0
ManufacturedWeek = 0
ManufacturedYear = 0
ForeignKeyIdentifier = null
SasAddress = 0x4433221106000000
FormFactor = 2.5 Inch
RaidNominalMediumRotationRate = 7200
T10PICapability = Not Capable
BlockSizeInBytes = 512
MaxCapableSpeed = 6 Gb/s
RaidType = None
SystemEraseCapability = CryptographicErasePD
SelfEncryptingDriveCapability = Not Capable
EncryptionCapability = Not Capable
CryptographicEraseCapability = Capable
Disk.Bay.1:Enclosure.Internal.0-1:RAID.Slot.3-1
Status = Ok
DeviceDescription = Disk 1 in Backplane 1 of RAID Controller in Slot 3
RollupStatus = Ok
Name = Physical Disk 0:1:1
State = Online
OperationState = Not Applicable
PowerStatus = Spun-Up
Size = 931.000 GB
FailurePredicted = NO
RemainingRatedWriteEndurance = Not Applicable
SecurityStatus = Not Capable
BusProtocol = SATA
MediaType = HDD
UsedRaidDiskSpace = 931.000 GB
AvailableRaidDiskSpace = 0.001 GB
Hotspare = NO
Manufacturer = SEAGATE
ProductId = ST1000NX0443
Revision = NB33
SerialNumber = W470SYTP
PartNumber = CN08DN1YSGW0077F00FQA00
NegotiatedSpeed = 6.0 Gb/s
ManufacturedDay = 0
ManufacturedWeek = 0
ManufacturedYear = 0
ForeignKeyIdentifier = null
SasAddress = 0x4433221107000000
FormFactor = 2.5 Inch
RaidNominalMediumRotationRate = 7200
T10PICapability = Not Capable
BlockSizeInBytes = 512
MaxCapableSpeed = 6 Gb/s
RaidType = None
SystemEraseCapability = CryptographicErasePD
SelfEncryptingDriveCapability = Not Capable
EncryptionCapability = Not Capable
CryptographicEraseCapability = Capable
--8<---------------cut here---------------end--------------->8---
sdb is an external storage array (Dell MD3400) filled with 10 hard disks
(SAS) in a RAID 10 configuration (36.36 TB effective capacity). There
are two hot spares that are currently unassigned. They are used
automatically when the RAID is degraded. The two RAID controllers have
read and write caches enabled. The enclosure has two redundant host
interfaces.
Berlin has two host based adapter cards of which *one* is connected to
the array. Why only one? Because we don’t have multipathd configured
so that the system could *boot* off the external array with multipath.
Without multipath the storage would appear as one disk device per card,
but it would not be safe to mount them both at the same time.
If we wanted to make use of the redundant connection here: figure out
how to add multipathd to the initrd and set up multipath *before*
handing off control to Linux. This would effectively double our
bandwidth to the storage.
My guess is that we’re not even close to saturating the available
bandwidth.
> I ran the fio benchmark tool on both of them. See the reports
> attached, and the following summary:
>
> | | sda | sdb |
> |-------+-----------+-----------|
> | read | 1565KiB/s | 9695KiB/s |
> | write | 523KiB/s | 3240KiB/s |
>
>
> I'm not sure how slow those figures are relatively to the hard drives
> technologies. Ricardo, any idea about that?
It seems awfully slow. Especially performance of sda is abysmal: this
is a local disk. sdb is the fancy external disk array that’s hooked up
to two HBA cards. It should not perform *better* than sda.
I’ll run this on a few of the build nodes to get some more comparisons.
--
Ricardo
^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#51787: GC takes more than 9 hours on berlin
2021-12-17 13:06 ` Ricardo Wurmus
@ 2021-12-17 14:08 ` Ricardo Wurmus
0 siblings, 0 replies; 23+ messages in thread
From: Ricardo Wurmus @ 2021-12-17 14:08 UTC (permalink / raw)
To: Mathieu Othacehe; +Cc: 51787
Ricardo Wurmus <rekado@elephly.net> writes:
> I’ll run this on a few of the build nodes to get some more comparisons.
I ran “guix deploy” for hydra-guix-107, a node that was bought at the
same time as the one that is now the head node of ci.guix.gnu.org. I
copied over a current Guix and installed “fio”
(/gnu/store/qs9cyy5s95n2fbjmxs48iccqvsvj6wxr-fio-3.28/bin/fio) there.
Here’s the output:
--8<---------------cut here---------------start------------->8---
root@hydra-guix-107 ~# fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.28
Starting 1 process
test: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [m(1)][100.0%][r=172MiB/s,w=56.9MiB/s][r=44.2k,w=14.6k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=45365: Fri Dec 17 14:50:41 2021
read: IOPS=42.9k, BW=167MiB/s (176MB/s)(3070MiB/18331msec)
bw ( KiB/s): min=21208, max=199928, per=100.00%, avg=171621.56, stdev=46624.05, samples=36
iops : min= 5302, max=49982, avg=42905.39, stdev=11656.01, samples=36
write: IOPS=14.3k, BW=56.0MiB/s (58.7MB/s)(1026MiB/18331msec); 0 zone resets
bw ( KiB/s): min= 7424, max=66720, per=100.00%, avg=57364.22, stdev=15612.42, samples=36
iops : min= 1856, max=16680, avg=14341.06, stdev=3903.10, samples=36
cpu : usr=6.27%, sys=24.78%, ctx=121626, majf=0, minf=11
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=167MiB/s (176MB/s), 167MiB/s-167MiB/s (176MB/s-176MB/s), io=3070MiB (3219MB), run=18331-18331msec
WRITE: bw=56.0MiB/s (58.7MB/s), 56.0MiB/s-56.0MiB/s (58.7MB/s-58.7MB/s), io=1026MiB (1076MB), run=18331-18331msec
Disk stats (read/write):
sda: ios=779469/260494, merge=0/6, ticks=932265/195191, in_queue=1127456, util=99.50%
--8<---------------cut here---------------end--------------->8---
Here sda is RAID of two SSDs:
--8<---------------cut here---------------start------------->8---
racadm>>storage get pdisks -o
Disk.Bay.0:Enclosure.Internal.0-1:RAID.Integrated.1-1
Status = Ok
DeviceDescription = Disk 0 in Backplane 1 of Integrated RAID Controller 1
RollupStatus = Ok
Name = Solid State Disk 0:1:0
State = Online
OperationState = Not Applicable
PowerStatus = On
Size = 223.001 GB
FailurePredicted = NO
RemainingRatedWriteEndurance = 99 %
SecurityStatus = Not Capable
BusProtocol = SATA
MediaType = SSD
UsedRaidDiskSpace = 223.001 GB
AvailableRaidDiskSpace = 0.001 GB
Hotspare = NO
Manufacturer = INTEL
ProductId = SSDSC2KG240G8R
Revision = XCV1DL67
SerialNumber = BTYG91520CHD240AGN
PartNumber = CN0T1WH8PESIT95302LTA01
NegotiatedSpeed = 6.0 Gb/s
ManufacturedDay = 0
ManufacturedWeek = 0
ManufacturedYear = 0
ForeignKeyIdentifier = null
SasAddress = 0x4433221104000000
WWN = 0x4433221104000000
FormFactor = 2.5 Inch
RaidNominalMediumRotationRate = 1
T10PICapability = Not Capable
BlockSizeInBytes = 512
MaxCapableSpeed = 6 Gb/s
RaidType = Unknown
SystemEraseCapability = CryptographicErasePD
SelfEncryptingDriveCapability = Not Capable
EncryptionCapability = Not Capable
CryptographicEraseCapability = Capable
Certified = Yes
NonRAIDDiskCachePolicy = Not Applicable
EncryptionProtocol = None
Disk.Bay.1:Enclosure.Internal.0-1:RAID.Integrated.1-1
Status = Ok
DeviceDescription = Disk 1 in Backplane 1 of Integrated RAID Controller 1
RollupStatus = Ok
Name = Solid State Disk 0:1:1
State = Online
OperationState = Not Applicable
PowerStatus = On
Size = 223.001 GB
FailurePredicted = NO
RemainingRatedWriteEndurance = 99 %
SecurityStatus = Not Capable
BusProtocol = SATA
MediaType = SSD
UsedRaidDiskSpace = 223.001 GB
AvailableRaidDiskSpace = 0.001 GB
Hotspare = NO
Manufacturer = INTEL
ProductId = SSDSC2KG240G8R
Revision = XCV1DL67
SerialNumber = BTYG915502D9240AGN
PartNumber = CN0T1WH8PESIT95303BSA01
NegotiatedSpeed = 6.0 Gb/s
ManufacturedDay = 0
ManufacturedWeek = 0
ManufacturedYear = 0
ForeignKeyIdentifier = null
SasAddress = 0x4433221100000000
WWN = 0x4433221100000000
FormFactor = 2.5 Inch
RaidNominalMediumRotationRate = 1
T10PICapability = Not Capable
BlockSizeInBytes = 512
MaxCapableSpeed = 6 Gb/s
RaidType = Unknown
SystemEraseCapability = CryptographicErasePD
SelfEncryptingDriveCapability = Not Capable
EncryptionCapability = Not Capable
CryptographicEraseCapability = Capable
Certified = Yes
NonRAIDDiskCachePolicy = Not Applicable
EncryptionProtocol = None
--8<---------------cut here---------------end--------------->8---
--
Ricardo
^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#51787: GC takes more than 9 hours on berlin
2021-11-12 11:49 bug#51787: GC takes more than 9 hours on berlin Mathieu Othacehe
` (3 preceding siblings ...)
2021-12-10 19:38 ` Ricardo Wurmus
@ 2021-12-20 21:12 ` Ricardo Wurmus
2023-08-16 10:53 ` Maxim Cournoyer
5 siblings, 0 replies; 23+ messages in thread
From: Ricardo Wurmus @ 2021-12-20 21:12 UTC (permalink / raw)
To: 51787
My colleague extended the SAN slice to 5TB for more realistic testing.
I formatted the disk with btrfs, and mounted it like this:
mount /dev/sdd /mnt_test/
Then I ran the test with block size 512k:
--8<---------------cut here---------------start------------->8---
root@berlin ~# fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=/mnt_test/test --bs=512k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
test: (g=0): rw=randrw, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=64
fio-3.6
Starting 1 process
test: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [m(1)][100.0%][r=802MiB/s,w=274MiB/s][r=1603,w=547 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=16949: Mon Dec 20 22:18:28 2021
read: IOPS=1590, BW=795MiB/s (834MB/s)(3055MiB/3842msec)
bw ( KiB/s): min=747520, max=857088, per=99.83%, avg=812763.43, stdev=44213.07, samples=7
iops : min= 1460, max= 1674, avg=1587.43, stdev=86.35, samples=7
write: IOPS=542, BW=271MiB/s (284MB/s)(1042MiB/3842msec)
bw ( KiB/s): min=262144, max=297984, per=100.00%, avg=278820.57, stdev=15115.88, samples=7
iops : min= 512, max= 582, avg=544.57, stdev=29.52, samples=7
cpu : usr=1.98%, sys=96.28%, ctx=1096, majf=0, minf=6
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=0.4%, >=64=99.2%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=6109,2083,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=795MiB/s (834MB/s), 795MiB/s-795MiB/s (834MB/s-834MB/s), io=3055MiB (3203MB), run=3842-3842msec
WRITE: bw=271MiB/s (284MB/s), 271MiB/s-271MiB/s (284MB/s-284MB/s), io=1042MiB (1092MB), run=3842-3842msec
--8<---------------cut here---------------end--------------->8---
Because this is fun I reran it with the same arguments:
--8<---------------cut here---------------start------------->8---
root@berlin ~# fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=/mnt_test/test --bs=512k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
test: (g=0): rw=randrw, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=64
fio-3.6
Starting 1 process
Jobs: 1 (f=0): [f(1)][-.-%][r=756MiB/s,w=260MiB/s][r=1511,w=519 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=17488: Mon Dec 20 22:18:56 2021
read: IOPS=1647, BW=824MiB/s (864MB/s)(3055MiB/3708msec)
bw ( KiB/s): min=738304, max=929792, per=99.28%, avg=837485.71, stdev=73710.05, samples=7
iops : min= 1442, max= 1816, avg=1635.71, stdev=143.96, samples=7
write: IOPS=561, BW=281MiB/s (295MB/s)(1042MiB/3708msec)
bw ( KiB/s): min=234496, max=320512, per=99.79%, avg=287012.57, stdev=29009.60, samples=7
iops : min= 458, max= 626, avg=560.57, stdev=56.66, samples=7
cpu : usr=1.38%, sys=96.47%, ctx=1394, majf=0, minf=16420
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=0.4%, >=64=99.2%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=6109,2083,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=824MiB/s (864MB/s), 824MiB/s-824MiB/s (864MB/s-864MB/s), io=3055MiB (3203MB), run=3708-3708msec
WRITE: bw=281MiB/s (295MB/s), 281MiB/s-281MiB/s (295MB/s-295MB/s), io=1042MiB (1092MB), run=3708-3708msec
--8<---------------cut here---------------end--------------->8---
Then I mounted with compression and space cache:
mount /dev/sdd -o compress-force=zstd,space_cache=v2 /mnt_test/
The numbers don’t differ much at all.
--8<---------------cut here---------------start------------->8---
Run status group 0 (all jobs):
READ: bw=882MiB/s (925MB/s), 882MiB/s-882MiB/s (925MB/s-925MB/s), io=3055MiB (3203MB), run=3464-3464msec
WRITE: bw=301MiB/s (315MB/s), 301MiB/s-301MiB/s (315MB/s-315MB/s), io=1042MiB (1092MB), run=3464-3464msec
--8<---------------cut here---------------end--------------->8---
I then erased the file system and again put on a big ext4:
--8<---------------cut here---------------start------------->8---
root@berlin ~# fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=/mnt_test/test --bs=512k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
test: (g=0): rw=randrw, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=64
fio-3.6
Starting 1 process
test: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [m(1)][-.-%][r=1539MiB/s,w=526MiB/s][r=3078,w=1052 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=20672: Mon Dec 20 22:23:29 2021
read: IOPS=3077, BW=1539MiB/s (1614MB/s)(3055MiB/1985msec)
bw ( MiB/s): min= 1530, max= 1548, per=100.00%, avg=1539.33, stdev= 9.02, samples=3
iops : min= 3060, max= 3096, avg=3078.67, stdev=18.04, samples=3
write: IOPS=1049, BW=525MiB/s (550MB/s)(1042MiB/1985msec)
bw ( KiB/s): min=533504, max=557056, per=100.00%, avg=546133.33, stdev=11868.39, samples=3
iops : min= 1042, max= 1088, avg=1066.67, stdev=23.18, samples=3
cpu : usr=2.17%, sys=11.24%, ctx=4787, majf=0, minf=8
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=0.4%, >=64=99.2%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=6109,2083,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=1539MiB/s (1614MB/s), 1539MiB/s-1539MiB/s (1614MB/s-1614MB/s), io=3055MiB (3203MB), run=1985-1985msec
WRITE: bw=525MiB/s (550MB/s), 525MiB/s-525MiB/s (550MB/s-550MB/s), io=1042MiB (1092MB), run=1985-1985msec
Disk stats (read/write):
sdd: ios=5926/2087, merge=1/0, ticks=119183/3276, in_queue=122460, util=94.87%
--8<---------------cut here---------------end--------------->8---
No idea why btrfs performs so much worse in comparison.
I’ll copy over /gnu/store/trash next.
--
Ricardo
^ permalink raw reply [flat|nested] 23+ messages in thread
* bug#51787: GC takes more than 9 hours on berlin
2021-11-12 11:49 bug#51787: GC takes more than 9 hours on berlin Mathieu Othacehe
` (4 preceding siblings ...)
2021-12-20 21:12 ` Ricardo Wurmus
@ 2023-08-16 10:53 ` Maxim Cournoyer
5 siblings, 0 replies; 23+ messages in thread
From: Maxim Cournoyer @ 2023-08-16 10:53 UTC (permalink / raw)
To: Mathieu Othacehe; +Cc: 51787-done
Hi,
Mathieu Othacehe <othacehe@gnu.org> writes:
> Hello,
>
> On berlin, the daily GC command is still running whereas it was started
> 9 hours ago.
>
> guix processes
> [...]
> SessionPID: 37231
> ClientPID: 37195
> ClientCommand: /gnu/store/49vfv8l1j96bbd73ssbqanpnjz83arss-guix-1.3.0-11.014f1b6/libexec/guix/guile \ /gnu/store/49vfv8l1j96bbd73ssbqanpnjz83arss-guix-1.3.0-11.014f1b6/bin/guix gc -F10995116277760
>
>
> and
>
> mathieu@berlin ~$ ps auxww|grep 37195
> root 37195 0.0 0.0 183260 33440 ? Sl 03:59 0:00 /gnu/store/49vfv8l1j96bbd73ssbqanpnjz83arss-guix-1.3.0-11.014f1b6/libexec/guix/guile \ /gnu/store/49vfv8l1j96bbd73ssbqanpnjz83arss-guix-1.3.0-11.014f1b6/bin/guix gc -F10995116277760
>
>
> That's really problematic as it is blocking some other berlin services
> such as Cuirass, which has 4564 packages in its fetch queue:
>
> mathieu@berlin ~$ less /var/log/cuirass-remote-server.log
> [...]
> 2021-11-12T12:47:01 period update: 0 resumable, 0 failed builds.
> 2021-11-12T12:47:01 period update: 4564 items in the fetch queue.
I'm more than happy to close this old, no longer relevant bug :-).
Since it was reported, the storage was migrated to a much faster array,
and is using the Btrfs file system with zstd compression, which avoids
the inodes exhaustion issue we've had with ext4 in the past.
A full garbage collection (GC) is run daily to keep the store growth in
check and for the GC to always take a similar amount of time (less than
20 minutes) every day.
--
Thanks,
Maxim
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2023-08-16 10:54 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-11-12 11:49 bug#51787: GC takes more than 9 hours on berlin Mathieu Othacehe
2021-11-12 19:17 ` Mathieu Othacehe
2021-11-22 9:16 ` zimoun
2021-11-23 17:48 ` Ludovic Courtès
2021-11-25 13:24 ` Christopher Baines
2021-11-27 11:23 ` Ludovic Courtès
2021-12-03 9:45 ` Mathieu Othacehe
2021-12-10 6:24 ` Mathieu Othacehe
2021-12-10 10:21 ` Ludovic Courtès
2021-12-10 17:11 ` Mathieu Othacehe
2021-12-10 21:54 ` Ludovic Courtès
2021-12-11 9:42 ` Mathieu Othacehe
2021-12-12 17:09 ` Mathieu Othacehe
2021-12-13 16:13 ` Christian Thäter
2021-12-14 3:31 ` Christian Thäter
2021-12-17 10:55 ` Mathieu Othacehe
2021-12-17 13:06 ` Ricardo Wurmus
2021-12-17 14:08 ` Ricardo Wurmus
2021-12-10 17:11 ` Mathieu Othacehe
2021-11-27 11:11 ` Ludovic Courtès
2021-12-10 19:38 ` Ricardo Wurmus
2021-12-20 21:12 ` Ricardo Wurmus
2023-08-16 10:53 ` Maxim Cournoyer
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).