unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* [PATCH 0/5] fix extindex reindex harder
@ 2021-10-12 22:44 Eric Wong
  2021-10-12 22:44 ` [PATCH 1/5] extindex: flush pending reindex before unref Eric Wong
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Eric Wong @ 2021-10-12 22:44 UTC (permalink / raw)
  To: meta

1/5 may affect some users
3/5 quiets things down for users on SQLite <3.18
5/5 is a good usability fix for me

Eric Wong (5):
  extindex: flush pending reindex before unref
  lei/store: use remove_doc to save some LoC
  index: optimize after all SQLite DB commits
  doc: relnotes: note some recent improvements
  lei up --all: show output for warnings

 Documentation/RelNotes/v1.7.0.wip | 14 +++++++++++++-
 lib/PublicInbox/ExtSearchIdx.pm   |  4 ++++
 lib/PublicInbox/LEI.pm            | 12 ++++++++----
 lib/PublicInbox/LeiMailSync.pm    |  2 +-
 lib/PublicInbox/LeiStore.pm       |  3 +--
 lib/PublicInbox/LeiUp.pm          |  7 +++++++
 lib/PublicInbox/OverIdx.pm        |  1 +
 lib/PublicInbox/SearchIdx.pm      |  1 +
 lib/PublicInbox/V2Writable.pm     | 16 +++++-----------
 9 files changed, 41 insertions(+), 19 deletions(-)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/5] extindex: flush pending reindex before unref
  2021-10-12 22:44 [PATCH 0/5] fix extindex reindex harder Eric Wong
@ 2021-10-12 22:44 ` Eric Wong
  2021-10-12 22:44 ` [PATCH 2/5] lei/store: use remove_doc to save some LoC Eric Wong
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2021-10-12 22:44 UTC (permalink / raw)
  To: meta

This prevents unnecessary message renumbering and I/O.

Without this change, there is a small window for long-running
WWW streaming requests to miss a message that was unref-ed
before reindexing.  If we expose an "All Mail" mailbox via
IMAP/JMAP, this will save client traffic.
---
 lib/PublicInbox/ExtSearchIdx.pm | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index c2ab0447e176..40489eab4c66 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -193,6 +193,7 @@ sub do_xpost ($$) {
 		$idx->ipc_do('add_eidx_info', $docid, $eidx_key, $eml);
 		apply_boost($req, $smsg) if $req->{boost_in_use};
 	} else { # 'd' no {xnum}
+		$self->git->async_wait_all;
 		$oid = pack('H*', $oid);
 		_unref_doc($req, $docid, $xibx, undef, $oid, $eml);
 	}
@@ -261,6 +262,7 @@ sub _blob_missing ($$) { # called when a known $smsg->{blob} is gone
 	# xnum and ibx are unknown, we only call this when an entry from
 	# /ei*/over.sqlite3 is bad, not on entries from xap*/over.sqlite3
 	my $oidbin = pack('H*', $smsg->{blob});
+	$req->{self}->git->async_wait_all;
 	_unref_doc($req, $smsg, undef, undef, $oidbin);
 }
 
@@ -552,6 +554,7 @@ sub _reindex_finalize ($$$) {
 	}
 	return if $nr == 1; # likely, all good
 
+	$self->git->async_wait_all;
 	warn "W: #$docid split into $nr due to deduplication change\n";
 	my @todo;
 	for my $ary (values %$by_chash) {
@@ -896,6 +899,7 @@ ibx_id = ? AND xnum >= ? AND xnum <= ?
 		}
 		return if $sync->{quit};
 		next unless scalar keys %x3m;
+		$self->git->async_wait_all; # wait for reindex_unseen
 
 		# eliminate stale/mismatched entries
 		my %mismatch = map { $_->{num} => $_->{blob} } @$msgs;

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/5] lei/store: use remove_doc to save some LoC
  2021-10-12 22:44 [PATCH 0/5] fix extindex reindex harder Eric Wong
  2021-10-12 22:44 ` [PATCH 1/5] extindex: flush pending reindex before unref Eric Wong
@ 2021-10-12 22:44 ` Eric Wong
  2021-10-12 22:44 ` [PATCH 3/5] index: optimize after all SQLite DB commits Eric Wong
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2021-10-12 22:44 UTC (permalink / raw)
  To: meta

---
 lib/PublicInbox/LeiStore.pm | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/PublicInbox/LeiStore.pm b/lib/PublicInbox/LeiStore.pm
index 613d1d31f581..bf41dcf53094 100644
--- a/lib/PublicInbox/LeiStore.pm
+++ b/lib/PublicInbox/LeiStore.pm
@@ -281,8 +281,7 @@ sub remove_docids ($;@) {
 	my ($self, @docids) = @_;
 	my $eidx = eidx_init($self);
 	for my $docid (@docids) {
-		$eidx->idx_shard($docid)->ipc_do('xdb_remove', $docid);
-		$eidx->{oidx}->delete_by_num($docid);
+		$eidx->remove_doc($docid);
 		$eidx->{oidx}->{dbh}->do(<<EOF, undef, $docid);
 DELETE FROM xref3 WHERE docid = ?
 EOF

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/5] index: optimize after all SQLite DB commits
  2021-10-12 22:44 [PATCH 0/5] fix extindex reindex harder Eric Wong
  2021-10-12 22:44 ` [PATCH 1/5] extindex: flush pending reindex before unref Eric Wong
  2021-10-12 22:44 ` [PATCH 2/5] lei/store: use remove_doc to save some LoC Eric Wong
@ 2021-10-12 22:44 ` Eric Wong
  2021-10-12 22:44 ` [PATCH 4/5] doc: relnotes: note some recent improvements Eric Wong
  2021-10-12 22:45 ` [PATCH 5/5] lei up --all: show output for warnings Eric Wong
  4 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2021-10-12 22:44 UTC (permalink / raw)
  To: meta

This covers v1 inboxes, as well.  We also guard the execution
since "PRAGMA optimize" was only introduced in SQLite 3.18.0
(2017-03-30)
---
 lib/PublicInbox/LeiMailSync.pm |  2 +-
 lib/PublicInbox/OverIdx.pm     |  1 +
 lib/PublicInbox/SearchIdx.pm   |  1 +
 lib/PublicInbox/V2Writable.pm  | 16 +++++-----------
 4 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/lib/PublicInbox/LeiMailSync.pm b/lib/PublicInbox/LeiMailSync.pm
index c6cd1bc58d0a..f7e37ad9ca80 100644
--- a/lib/PublicInbox/LeiMailSync.pm
+++ b/lib/PublicInbox/LeiMailSync.pm
@@ -48,7 +48,7 @@ sub lms_pause {
 	my ($self) = @_;
 	$self->{fmap} = {};
 	my $dbh = delete $self->{dbh};
-	$dbh->do('PRAGMA optimize') if $dbh;
+	eval { $dbh->do('PRAGMA optimize') } if $dbh;
 }
 
 sub create_tables {
diff --git a/lib/PublicInbox/OverIdx.pm b/lib/PublicInbox/OverIdx.pm
index d6d706f7fed0..9fdb26c0d5c2 100644
--- a/lib/PublicInbox/OverIdx.pm
+++ b/lib/PublicInbox/OverIdx.pm
@@ -434,6 +434,7 @@ sub commit_lazy {
 	my ($self) = @_;
 	delete $self->{txn} or return;
 	$self->{dbh}->commit;
+	eval { $self->{dbh}->do('PRAGMA optimize') };
 }
 
 sub begin_lazy {
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index a2ed94993993..928152ec4df4 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -793,6 +793,7 @@ sub v1_checkpoint ($$;$) {
 	${$sync->{max}} = $self->{batch_bytes};
 
 	$self->{mm}->{dbh}->commit;
+	eval { $self->{mm}->{dbh}->do('PRAGMA optimize') };
 	my $xdb = $self->{xdb};
 	if ($newest && $xdb) {
 		my $cur = $xdb->get_metadata('last_commit');
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index efcc1fc21a18..3914383cc9d3 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -547,11 +547,11 @@ sub checkpoint ($;$) {
 	}
 	my $shards = $self->{idx_shards};
 	if ($shards) {
-		my $mm = $self->{mm};
-		my $dbh = $mm->{dbh} if $mm;
+		my $dbh = $self->{mm}->{dbh} if $self->{mm};
 
 		# SQLite msgmap data is second in importance
 		$dbh->commit if $dbh;
+		eval { $dbh->do('PRAGMA optimize') };
 
 		# SQLite overview is third
 		$self->{oidx}->commit_lazy;
@@ -620,16 +620,10 @@ sub done {
 		my $m = $err ? 'rollback' : 'commit';
 		eval { $mm->{dbh}->$m };
 		$err .= "msgmap $m: $@\n" if $@;
-		eval { $mm->{dbh}->do('PRAGMA optimize') };
-		$err .= "msgmap optimize: $@\n" if $@;
 	}
-	if ($self->{oidx} && $self->{oidx}->{dbh}) {
-		if ($err) {
-			eval { $self->{oidx}->rollback_lazy };
-			$err .= "overview rollback: $@\n" if $@;
-		}
-		eval { $self->{oidx}->{dbh}->do('PRAGMA optimize') };
-		$err .= "overview optimize: $@\n" if $@;
+	if ($self->{oidx} && $self->{oidx}->{dbh} && $err) {
+		eval { $self->{oidx}->rollback_lazy };
+		$err .= "overview rollback: $@\n" if $@;
 	}
 
 	my $shards = delete $self->{idx_shards};

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 4/5] doc: relnotes: note some recent improvements
  2021-10-12 22:44 [PATCH 0/5] fix extindex reindex harder Eric Wong
                   ` (2 preceding siblings ...)
  2021-10-12 22:44 ` [PATCH 3/5] index: optimize after all SQLite DB commits Eric Wong
@ 2021-10-12 22:44 ` Eric Wong
  2021-10-12 22:45 ` [PATCH 5/5] lei up --all: show output for warnings Eric Wong
  4 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2021-10-12 22:44 UTC (permalink / raw)
  To: meta

---
 Documentation/RelNotes/v1.7.0.wip | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/Documentation/RelNotes/v1.7.0.wip b/Documentation/RelNotes/v1.7.0.wip
index f71f447feb4d..854c2fce7c88 100644
--- a/Documentation/RelNotes/v1.7.0.wip
+++ b/Documentation/RelNotes/v1.7.0.wip
@@ -8,7 +8,11 @@ Another big release focused on multi-inbox search and scalability.
 
 * general changes
 
-  config file parsing is 2x faster with 50K inboxes
+  - config file parsing is 2x faster with 50K inboxes
+
+  - deduplication ignores whitespace differences within address fields
+
+  - "PRAGMA optimize" is now issued on commits for SQLite 3.18+
 
 * read-only public-inbox-daemon (-httpd, -nntpd, -imapd):
 
@@ -47,6 +51,14 @@ Another big release focused on multi-inbox search and scalability.
   filesystem or over HTTP(S).  See lei(1), lei-overview(7), and other
   lei-* manpages for details.
 
+* public-inbox-index
+
+  - non-strict (Subject-based) threading supports non-ASCII characters,
+    reindexing is necessary for old messages with non-ASCII subjects.
+
+  - --batch-size is now 8M on 64-bit systems for throughput improvements,
+    higher values are still advised for more powerful hardware.
+
 * public-inbox-watch
 
   - IMAP and NNTP code shared with lei, fixing an off-by-one error

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 5/5] lei up --all: show output for warnings
  2021-10-12 22:44 [PATCH 0/5] fix extindex reindex harder Eric Wong
                   ` (3 preceding siblings ...)
  2021-10-12 22:44 ` [PATCH 4/5] doc: relnotes: note some recent improvements Eric Wong
@ 2021-10-12 22:45 ` Eric Wong
  4 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2021-10-12 22:45 UTC (permalink / raw)
  To: meta

This helps users make sense of which saved searches some
warnings were coming from.

Since I often create and discard externals, some warnings
from saved searches were confusing to me without output context:

  "`$FOO' is unknown"
  "$FOO not indexed by Xapian"
---
 lib/PublicInbox/LEI.pm   | 12 ++++++++----
 lib/PublicInbox/LeiUp.pm |  7 +++++++
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm
index 51b0e95e1728..183cb545fe55 100644
--- a/lib/PublicInbox/LEI.pm
+++ b/lib/PublicInbox/LEI.pm
@@ -522,7 +522,7 @@ sub sigint_reap {
 sub fail ($$;$) {
 	my ($self, $buf, $exit_code) = @_;
 	$self->{failed}++;
-	err($self, $buf) if defined $buf;
+	warn($buf, "\n") if defined $buf;
 	$self->{pkt_op_p}->pkt_do('fail_handler') if $self->{pkt_op_p};
 	x_it($self, ($exit_code // 1) << 8);
 	undef;
@@ -542,7 +542,7 @@ sub puts ($;@) { out(shift, map { "$_\n" } @_) }
 sub child_error { # passes non-fatal curl exit codes to user
 	my ($self, $child_error, $msg) = @_; # child_error is $?
 	$child_error ||= 1 << 8;
-	$self->err($msg) if $msg;
+	warn($msg, "\n") if defined $msg;
 	if ($self->{pkt_op_p}) { # to top lei-daemon
 		$self->{pkt_op_p}->pkt_do('child_error', $child_error);
 	} elsif ($self->{sock}) { # to lei(1) client
@@ -588,8 +588,12 @@ sub _lei_atfork_child {
 	eval 'no warnings; undef $PublicInbox::LeiNoteEvent::to_flush';
 	undef $errors_log;
 	$quit = \&CORE::exit;
-	$self->{-eml_noisy} or # only "lei import" sets this atm
-		$SIG{__WARN__} = PublicInbox::Eml::warn_ignore_cb();
+	if (!$self->{-eml_noisy}) { # only "lei import" sets this atm
+		my $cb = $SIG{__WARN__} // \&CORE::warn;
+		$SIG{__WARN__} = sub {
+			$cb->(@_) unless PublicInbox::Eml::warn_ignore(@_)
+		};
+	}
 	$current_lei = $persist ? undef : $self; # for SIG{__WARN__}
 }
 
diff --git a/lib/PublicInbox/LeiUp.pm b/lib/PublicInbox/LeiUp.pm
index 3e1ca21e29e7..3011300dd836 100644
--- a/lib/PublicInbox/LeiUp.pm
+++ b/lib/PublicInbox/LeiUp.pm
@@ -159,6 +159,13 @@ sub event_step { # runs via PublicInbox::DS::requeue
 	delete $l->{opt}->{all};
 	$l->qerr("# updating $self->{out}");
 	$l->{up_op_p} = $self->{op_p}; # ($l => $lei => script/lei)
+	my $cb = $SIG{__WARN__} // \&CORE::warn;
+	my $o = " (output: $self->{out})";
+	local $SIG{__WARN__} = sub {
+		my @m = @_;
+		push(@m, $o) if !@m || $m[-1] !~ s/\n\z/$o\n/;
+		$cb->(@m);
+	};
 	eval { $l->dispatch('up', $self->{out}) };
 	$lei->child_error(0, $@) if $@ || $l->{failed}; # lei->fail()
 

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-10-12 22:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-12 22:44 [PATCH 0/5] fix extindex reindex harder Eric Wong
2021-10-12 22:44 ` [PATCH 1/5] extindex: flush pending reindex before unref Eric Wong
2021-10-12 22:44 ` [PATCH 2/5] lei/store: use remove_doc to save some LoC Eric Wong
2021-10-12 22:44 ` [PATCH 3/5] index: optimize after all SQLite DB commits Eric Wong
2021-10-12 22:44 ` [PATCH 4/5] doc: relnotes: note some recent improvements Eric Wong
2021-10-12 22:45 ` [PATCH 5/5] lei up --all: show output for warnings Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).