From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH] lei/store: use WAL for over.sqlite3
Date: Fri, 29 Nov 2024 06:45:11 +0000 [thread overview]
Message-ID: <20241129064511.M940572@dcvr> (raw)
In-Reply-To: <20241115222315.2761178-1-e@80x24.org>
Eric Wong <e@80x24.org> wrote:
> DBD::SQLite (not SQLite itself) sets a 30s busy_timeout which we
> currently do not override. This means readers can wait up to
> 30s for a writer to finish. For long imports exceeding 30s,
> SQLite readers (for deduplication during import) can die with a
> "database is locked" message while the lei/store process holds a
> long write transaction open.
Well, this is a problem for non-WAL (write-ahead log), but lei
is the one part of public-inbox which is WAL-friendly since each
lei/store is only intended to be accessed by a single user.
> Forcing commits every 5s ought to fix the problem in most cases,
> assuming commits themselves happen in under 25s (which isn't
> always true on slow devices). 5 seconds was chosen since it
> matches the default commit interval on ext* filesystems and the
> vm.dirty_writeback_centisecs sysctl.
Yeah, it's probably no longer necessary if everything is WAL
(but I think `public-inbox-* --reindex' should need it)
> Many (but not all) failures around long-running `lei import'
> processes.
Eep, broken sentence :x
What I meant was it fixes most (but not all) `lei import'
failures; the rest seem fixed in 99fc3d76
(v2writable: done: force synchronous awaitpid, 2024-11-19)
-----8<-----
Subject: [PATCH] lei/store: use WAL for over.sqlite3
WAL (write-ahead log) improves parallelism for readers when
they also have write access to the SQLite DB. While we
can't use WAL for public-inboxes where the -netd processes
are intended to only have read-only permssions, lei/store
always assumes read-write access.
The lei/store */ei15/over.sqlite3 DB was the only SQLite DB used
by lei without WAL. lei already set WAL for mail_sync.sqlite3
and the saved-searches/*/over.sqlite3 DBs.
Now that all SQLite DBs used by lei are WAL, commit 807abf67
(lei/store: auto-commit for long-running imports, 2024-11-15)
is no longer strictly necessary for parallelism during
long-running imports. However, 807abf67 may continue to be
useful to minimize the need to refetch after a power outage
during `lei import').
For saved-searches, we'll make use of the new mechanism for
setting {journal_mode} per-instance.
---
lib/PublicInbox/LeiSavedSearch.pm | 6 ++----
lib/PublicInbox/Over.pm | 10 ++++++++--
lib/PublicInbox/OverIdx.pm | 3 ++-
lib/PublicInbox/V2Writable.pm | 2 +-
4 files changed, 13 insertions(+), 8 deletions(-)
diff --git a/lib/PublicInbox/LeiSavedSearch.pm b/lib/PublicInbox/LeiSavedSearch.pm
index ab0a2858..83e19357 100644
--- a/lib/PublicInbox/LeiSavedSearch.pm
+++ b/lib/PublicInbox/LeiSavedSearch.pm
@@ -234,11 +234,9 @@ sub prepare_dedupe {
my $lk = $self->lock_for_scope; # git-config doesn't wait
my $oidx = PublicInbox::OverIdx->new($self->{-ovf});
$oidx->{-no_fsync} = 1;
+ $oidx->{journal_mode} = 'WAL';
$oidx->dbh;
- if ($creat) {
- $oidx->{dbh}->do('PRAGMA journal_mode = WAL');
- $oidx->eidx_prep; # for xref3
- }
+ $oidx->eidx_prep if $creat; # for xref3
$oidx
};
}
diff --git a/lib/PublicInbox/Over.pm b/lib/PublicInbox/Over.pm
index f68964c2..ff5332e7 100644
--- a/lib/PublicInbox/Over.pm
+++ b/lib/PublicInbox/Over.pm
@@ -62,8 +62,14 @@ sub dbh_new {
# If an admin is willing to give read-only daemons R/W
# permissions; they can enable WAL manually and we will
# respect that by not clobbering it.
- my $jm = $dbh->selectrow_array('PRAGMA journal_mode');
- $dbh->do('PRAGMA journal_mode = TRUNCATE') if $jm ne 'wal';
+ my $jm = $self->{journal_mode}; # set by lei
+ if (defined $jm) {
+ $dbh->do('PRAGMA journal_mode = '.$jm);
+ } else {
+ $jm = $dbh->selectrow_array('PRAGMA journal_mode');
+ $jm eq 'wal' or
+ $dbh->do('PRAGMA journal_mode = TRUNCATE');
+ }
$dbh->do('PRAGMA synchronous = OFF') if $rw > 1;
}
diff --git a/lib/PublicInbox/OverIdx.pm b/lib/PublicInbox/OverIdx.pm
index 879ae045..10cf8c39 100644
--- a/lib/PublicInbox/OverIdx.pm
+++ b/lib/PublicInbox/OverIdx.pm
@@ -464,7 +464,7 @@ sub dbh_close {
}
sub create {
- my ($self) = @_;
+ my ($self, $opt) = @_;
my $fn = $self->{filename} // do {
croak('BUG: no {filename}') unless $self->{dbh};
return;
@@ -474,6 +474,7 @@ sub create {
my ($dir) = ($fn =~ m!(.*?/)[^/]+\z!);
File::Path::mkpath($dir);
}
+ $self->{journal_mode} = 'WAL' if $opt->{-private};
# create the DB:
PublicInbox::Over::dbh($self);
$self->dbh_close;
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 9f686bfa..25751966 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -219,7 +219,7 @@ sub v2_num_for_harder {
sub _idx_init { # with_umask callback
my ($self, $opt) = @_;
$self->lock_acquire unless $opt && $opt->{-skip_lock};
- $self->{oidx}->create;
+ $self->{oidx}->create($opt);
# xcpdb can change shard count while -watch is idle
my $nshards = count_shards($self);
prev parent reply other threads:[~2024-11-29 6:45 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-15 22:23 [PATCH] lei/store: auto-commit for long-running imports Eric Wong
2024-11-29 6:45 ` Eric Wong [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241129064511.M940572@dcvr \
--to=e@80x24.org \
--cc=meta@public-inbox.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).