Ludovic Courtès writes: > • When discussing together with Chris Baines yesterday on IRC, we > found that the sqlite WAL file was 8 GiB. I later ran: > > PRAGMA wal_checkpoint(TRUNCATE); > > which emptied it immediately. However, GC time wasn’t particularly > different today. So, as I understand it, the WAL is made up of pages, and checking for this db, I think they're the current default size of 4096 bytes. sqlite> PRAGMA page_size; 4096 From looking at the code, the wal_autocheckpoint value is set to 40000: /* Increase the auto-checkpoint interval to 40000 pages. This seems enough to ensure that instantiating the NixOS system derivation is done in a single fsync(). */ if (mode == "wal" && sqlite3_exec(db, "pragma wal_autocheckpoint = 40000;", 0, 0, 0) != SQLITE_OK) throwSQLiteError(db, "setting autocheckpoint interval"); https://git.savannah.gnu.org/cgit/guix.git/tree/nix/libstore/local-store.cc#n253 This means you'd expect the WAL to be in the region of 40000*4096 bytes, or ~160MB. Assuming the autocheckpointing is keeping up... it doesn't look to be, since the file is now much larger than this. As described here [1], the automatic checkpoints are PASSIVE ones, which has the advantage of not interrupting any readers or writers, but can also do nothing if it's being blocked by readers or writers. 1: https://www.sqlite.org/wal.html#application_initiated_checkpoints What I've found while writing the Guix Build Coordinator, is that when the service is busy (usually new builds being submitted, plus lots of builds happening), the PASSIVE checkpoints aren't sufficient. To supplement them, there's a regular check that looks at the size of the WAL file, and runs a TRUNCATE checkpoint, which is a FULL checkpoint (which blocks new writers), plus truncating the WAL file once it's finished checkpointing the entire WAL. The truncating is mostly so that it's easier to monitor the size of the WAL, by checking the size of the file. I feel like I need to defend SQLite at this point. Tuning the configuration of a database to get acceptable performance is the norm, I had to tune the PostgreSQL configuration for data.guix.gnu.org to improve the performance. It's easier to get in to trouble with SQLite because it's a lower level too, and requires you to actually initiate things like checkpoints or periodic optimisation if you want them to happen. Unfortunately, I don't know enough about the internals of the daemon to say anything specific though. > • ‘db.sqlite’ weighs in at 19 GiB (!) so perhaps there’s something to > do, like the “VACUUM” thing maybe. Chris? Doing a VACCUM might address some fragmentation and improve performance, it's probably worth trying. > • Stracing the session’s guix-daemon process during GC suggests that > most of the time goes into I/O from ‘db.sqlite’. It’s not > surprising because that GC phase is basically about browsing the > database, but it does seem to take a little too long for each store > item. At least the way I've approached finding and fixing the poor performance issues in the Guix Build Coordinator is through adding instrumentation, so just recording the time that calling particular procedures takes, and then logging if it's longer than some threshold. Since this issue is about Cuirass, there's also the possibility of avoiding the problems of a large store, by avoiding having a large store. That's what bordeaux.guix.gnu.org does, and I thought it was part of the plan for ci.guix.gnu.org (at least around a year ago)?