From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.2 required=3.0 tests=ALL_TRUSTED,AWL,BAYES_00, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF shortcircuit=no autolearn=ham autolearn_force=no version=3.4.6 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 1E6951F4C1 for ; Fri, 29 Nov 2024 06:45:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=80x24.org; s=selector1; t=1732862712; bh=p2JNBrK8AfJWAibgBLVJDve2yBA34yvnRLlRZ7S3Vbo=; h=Date:From:To:Subject:References:In-Reply-To:From; b=OmrY4hdmBZYSwfjYepw/50ZIsybjUIsMjRLMlvQeOXh1YFQsKULZJ/hRTIar98Gtu rBPaw6QOePe6iwrfbZxUSYJbd2n+H8oeR2sezjRwxcLaGlBxjTosVHP9HXCROaa/Nh GyVe05P+hHUSWFGOFypsQjbgZJp1u2DZvy+0SFjQ= Date: Fri, 29 Nov 2024 06:45:11 +0000 From: Eric Wong To: meta@public-inbox.org Subject: [PATCH] lei/store: use WAL for over.sqlite3 Message-ID: <20241129064511.M940572@dcvr> References: <20241115222315.2761178-1-e@80x24.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20241115222315.2761178-1-e@80x24.org> List-Id: Eric Wong wrote: > DBD::SQLite (not SQLite itself) sets a 30s busy_timeout which we > currently do not override. This means readers can wait up to > 30s for a writer to finish. For long imports exceeding 30s, > SQLite readers (for deduplication during import) can die with a > "database is locked" message while the lei/store process holds a > long write transaction open. Well, this is a problem for non-WAL (write-ahead log), but lei is the one part of public-inbox which is WAL-friendly since each lei/store is only intended to be accessed by a single user. > Forcing commits every 5s ought to fix the problem in most cases, > assuming commits themselves happen in under 25s (which isn't > always true on slow devices). 5 seconds was chosen since it > matches the default commit interval on ext* filesystems and the > vm.dirty_writeback_centisecs sysctl. Yeah, it's probably no longer necessary if everything is WAL (but I think `public-inbox-* --reindex' should need it) > Many (but not all) failures around long-running `lei import' > processes. Eep, broken sentence :x What I meant was it fixes most (but not all) `lei import' failures; the rest seem fixed in 99fc3d76 (v2writable: done: force synchronous awaitpid, 2024-11-19) -----8<----- Subject: [PATCH] lei/store: use WAL for over.sqlite3 WAL (write-ahead log) improves parallelism for readers when they also have write access to the SQLite DB. While we can't use WAL for public-inboxes where the -netd processes are intended to only have read-only permssions, lei/store always assumes read-write access. The lei/store */ei15/over.sqlite3 DB was the only SQLite DB used by lei without WAL. lei already set WAL for mail_sync.sqlite3 and the saved-searches/*/over.sqlite3 DBs. Now that all SQLite DBs used by lei are WAL, commit 807abf67 (lei/store: auto-commit for long-running imports, 2024-11-15) is no longer strictly necessary for parallelism during long-running imports. However, 807abf67 may continue to be useful to minimize the need to refetch after a power outage during `lei import'). For saved-searches, we'll make use of the new mechanism for setting {journal_mode} per-instance. --- lib/PublicInbox/LeiSavedSearch.pm | 6 ++---- lib/PublicInbox/Over.pm | 10 ++++++++-- lib/PublicInbox/OverIdx.pm | 3 ++- lib/PublicInbox/V2Writable.pm | 2 +- 4 files changed, 13 insertions(+), 8 deletions(-) diff --git a/lib/PublicInbox/LeiSavedSearch.pm b/lib/PublicInbox/LeiSavedSearch.pm index ab0a2858..83e19357 100644 --- a/lib/PublicInbox/LeiSavedSearch.pm +++ b/lib/PublicInbox/LeiSavedSearch.pm @@ -234,11 +234,9 @@ sub prepare_dedupe { my $lk = $self->lock_for_scope; # git-config doesn't wait my $oidx = PublicInbox::OverIdx->new($self->{-ovf}); $oidx->{-no_fsync} = 1; + $oidx->{journal_mode} = 'WAL'; $oidx->dbh; - if ($creat) { - $oidx->{dbh}->do('PRAGMA journal_mode = WAL'); - $oidx->eidx_prep; # for xref3 - } + $oidx->eidx_prep if $creat; # for xref3 $oidx }; } diff --git a/lib/PublicInbox/Over.pm b/lib/PublicInbox/Over.pm index f68964c2..ff5332e7 100644 --- a/lib/PublicInbox/Over.pm +++ b/lib/PublicInbox/Over.pm @@ -62,8 +62,14 @@ sub dbh_new { # If an admin is willing to give read-only daemons R/W # permissions; they can enable WAL manually and we will # respect that by not clobbering it. - my $jm = $dbh->selectrow_array('PRAGMA journal_mode'); - $dbh->do('PRAGMA journal_mode = TRUNCATE') if $jm ne 'wal'; + my $jm = $self->{journal_mode}; # set by lei + if (defined $jm) { + $dbh->do('PRAGMA journal_mode = '.$jm); + } else { + $jm = $dbh->selectrow_array('PRAGMA journal_mode'); + $jm eq 'wal' or + $dbh->do('PRAGMA journal_mode = TRUNCATE'); + } $dbh->do('PRAGMA synchronous = OFF') if $rw > 1; } diff --git a/lib/PublicInbox/OverIdx.pm b/lib/PublicInbox/OverIdx.pm index 879ae045..10cf8c39 100644 --- a/lib/PublicInbox/OverIdx.pm +++ b/lib/PublicInbox/OverIdx.pm @@ -464,7 +464,7 @@ sub dbh_close { } sub create { - my ($self) = @_; + my ($self, $opt) = @_; my $fn = $self->{filename} // do { croak('BUG: no {filename}') unless $self->{dbh}; return; @@ -474,6 +474,7 @@ sub create { my ($dir) = ($fn =~ m!(.*?/)[^/]+\z!); File::Path::mkpath($dir); } + $self->{journal_mode} = 'WAL' if $opt->{-private}; # create the DB: PublicInbox::Over::dbh($self); $self->dbh_close; diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm index 9f686bfa..25751966 100644 --- a/lib/PublicInbox/V2Writable.pm +++ b/lib/PublicInbox/V2Writable.pm @@ -219,7 +219,7 @@ sub v2_num_for_harder { sub _idx_init { # with_umask callback my ($self, $opt) = @_; $self->lock_acquire unless $opt && $opt->{-skip_lock}; - $self->{oidx}->create; + $self->{oidx}->create($opt); # xcpdb can change shard count while -watch is idle my $nshards = count_shards($self);