unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
From: "Ludovic Courtès" <ludo@gnu.org>
To: Christopher Baines <mail@cbaines.net>
Cc: Mathieu Othacehe <othacehe@gnu.org>, 51787@debbugs.gnu.org
Subject: bug#51787: GC takes more than 9 hours on berlin
Date: Sat, 27 Nov 2021 12:23:00 +0100	[thread overview]
Message-ID: <87zgpp971n.fsf@gnu.org> (raw)
In-Reply-To: <87wnkv15k3.fsf@cbaines.net> (Christopher Baines's message of "Thu, 25 Nov 2021 13:24:17 +0000")

Hi Chris,

Christopher Baines <mail@cbaines.net> skribis:

> So, as I understand it, the WAL is made up of pages, and checking for
> this db, I think they're the current default size of 4096 bytes.
>
> sqlite> PRAGMA page_size;
> 4096
>
> From looking at the code, the wal_autocheckpoint value is set to 40000:
>
>     /* Increase the auto-checkpoint interval to 40000 pages.  This
>        seems enough to ensure that instantiating the NixOS system
>        derivation is done in a single fsync(). */
>     if (mode == "wal" && sqlite3_exec(db, "pragma wal_autocheckpoint = 40000;", 0, 0, 0) != SQLITE_OK)
>         throwSQLiteError(db, "setting autocheckpoint interval");
>
> https://git.savannah.gnu.org/cgit/guix.git/tree/nix/libstore/local-store.cc#n253
>
> This means you'd expect the WAL to be in the region of 40000*4096 bytes,
> or ~160MB. Assuming the autocheckpointing is keeping up... it doesn't
> look to be, since the file is now much larger than this.
>
> As described here [1], the automatic checkpoints are PASSIVE ones, which
> has the advantage of not interrupting any readers or writers, but can
> also do nothing if it's being blocked by readers or writers.
>
> 1: https://www.sqlite.org/wal.html#application_initiated_checkpoints
>
> What I've found while writing the Guix Build Coordinator, is that when
> the service is busy (usually new builds being submitted, plus lots of
> builds happening), the PASSIVE checkpoints aren't sufficient. To
> supplement them, there's a regular check that looks at the size of the
> WAL file, and runs a TRUNCATE checkpoint, which is a FULL checkpoint
> (which blocks new writers), plus truncating the WAL file once it's
> finished checkpointing the entire WAL. The truncating is mostly so that
> it's easier to monitor the size of the WAL, by checking the size of the
> file.

OK.  That may well be what happens on berlin these days: the database is
kept busy all day long, so presumably checkpoints don’t happen and the
WAL file grows.

> I feel like I need to defend SQLite at this point. Tuning the
> configuration of a database to get acceptable performance is the norm, I
> had to tune the PostgreSQL configuration for data.guix.gnu.org to
> improve the performance. It's easier to get in to trouble with SQLite
> because it's a lower level too, and requires you to actually initiate
> things like checkpoints or periodic optimisation if you want them to
> happen.

Understood.  It’s really not about defending software X against Y, but
rather about finding ways to address the issues we experience.

>>   • ‘db.sqlite’ weighs in at 19 GiB (!) so perhaps there’s something to
>>     do, like the “VACUUM” thing maybe.  Chris?
>
> Doing a VACCUM might address some fragmentation and improve performance,
> it's probably worth trying.

Alright, let’s give it a try.

>>   • Stracing the session’s guix-daemon process during GC suggests that
>>     most of the time goes into I/O from ‘db.sqlite’.  It’s not
>>     surprising because that GC phase is basically about browsing the
>>     database, but it does seem to take a little too long for each store
>>     item.
>
> At least the way I've approached finding and fixing the poor performance
> issues in the Guix Build Coordinator is through adding instrumentation,
> so just recording the time that calling particular procedures takes, and
> then logging if it's longer than some threshold.

Yeah, that makes sense.  I think we always took these bits of the daemon
for granted because they’d been used on large stores even before Guix
existed.

> Since this issue is about Cuirass, there's also the possibility of
> avoiding the problems of a large store, by avoiding having a large
> store. That's what bordeaux.guix.gnu.org does, and I thought it was part
> of the plan for ci.guix.gnu.org (at least around a year ago)?

That’s indeed the case: the store is smaller than it used to be (but
still 27 TiB), it’s GC’d more aggressively than before, and instead we
rely on /var/cache/guix/publish for long-term storage.

Perhaps we should go further and keep the store smaller though.

Thanks for your feedback!

Ludo’.




  reply	other threads:[~2021-11-27 11:24 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-12 11:49 bug#51787: GC takes more than 9 hours on berlin Mathieu Othacehe
2021-11-12 19:17 ` Mathieu Othacehe
2021-11-22  9:16 ` zimoun
2021-11-23 17:48 ` Ludovic Courtès
2021-11-25 13:24   ` Christopher Baines
2021-11-27 11:23     ` Ludovic Courtès [this message]
2021-12-03  9:45       ` Mathieu Othacehe
2021-12-10  6:24         ` Mathieu Othacehe
2021-12-10 10:21           ` Ludovic Courtès
2021-12-10 17:11             ` Mathieu Othacehe
2021-12-10 21:54               ` Ludovic Courtès
2021-12-11  9:42                 ` Mathieu Othacehe
2021-12-12 17:09                   ` Mathieu Othacehe
2021-12-13 16:13                     ` Christian Thäter
2021-12-14  3:31                     ` Christian Thäter
2021-12-17 10:55                     ` Mathieu Othacehe
2021-12-17 13:06                       ` Ricardo Wurmus
2021-12-17 14:08                         ` Ricardo Wurmus
2021-12-10 17:11             ` Mathieu Othacehe
2021-11-27 11:11   ` Ludovic Courtès
2021-12-10 19:38 ` Ricardo Wurmus
2021-12-20 21:12 ` Ricardo Wurmus
2023-08-16 10:53 ` Maxim Cournoyer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zgpp971n.fsf@gnu.org \
    --to=ludo@gnu.org \
    --cc=51787@debbugs.gnu.org \
    --cc=mail@cbaines.net \
    --cc=othacehe@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).