* [PATCH 0/2] extsearch: avoid stale Xapian results @ 2020-12-27 11:01 Eric Wong 2020-12-27 11:01 ` [PATCH 1/2] extsearch: unconditionally reopen on access Eric Wong ` (2 more replies) 0 siblings, 3 replies; 4+ messages in thread From: Eric Wong @ 2020-12-27 11:01 UTC (permalink / raw) To: meta I noticed recent messages weren't showing up in search results on http://lore.czquwvybam4bgbro.onion/all/ These should fix it, and we'll probably get rid of the cleanup timers for per-inbox search and follow this strategy. Eric Wong (2): extsearch: unconditionally reopen on access miscsearch: take reopen from Search and use it lib/PublicInbox/ExtSearch.pm | 4 +--- lib/PublicInbox/MiscSearch.pm | 4 ++++ lib/PublicInbox/WwwListing.pm | 3 +++ 3 files changed, 8 insertions(+), 3 deletions(-) ^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH 1/2] extsearch: unconditionally reopen on access 2020-12-27 11:01 [PATCH 0/2] extsearch: avoid stale Xapian results Eric Wong @ 2020-12-27 11:01 ` Eric Wong 2020-12-27 11:01 ` [PATCH 2/2] miscsearch: take reopen from Search and use it Eric Wong 2020-12-28 15:32 ` [PATCH 0/2] extsearch: avoid stale Xapian results Kyle Meyer 2 siblings, 0 replies; 4+ messages in thread From: Eric Wong @ 2020-12-27 11:01 UTC (permalink / raw) To: meta Since ExtSearch lacks the janky cleanup timer of PublicInbox::Inbox objects, its search results get stale. Reopen the Xapian DB on every ->search call for now, as reducing reopen calls doesn't seem worth the complexity. The Xapian::Database::reopen operation itself takes only ~50us on my old workstation with 3 shards totaling <200GB. Other parts of Xapian dominates the search time, so the reopen seems inconsequential with single-digit shard counts. --- lib/PublicInbox/ExtSearch.pm | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/lib/PublicInbox/ExtSearch.pm b/lib/PublicInbox/ExtSearch.pm index a2b97798..7c9586a6 100644 --- a/lib/PublicInbox/ExtSearch.pm +++ b/lib/PublicInbox/ExtSearch.pm @@ -29,8 +29,6 @@ sub misc { $self->{misc} //= PublicInbox::MiscSearch->new("$self->{xpfx}/misc"); } -sub search { $_[0] } # self - # overrides PublicInbox::Search::_xdb sub _xdb { my ($self) = @_; @@ -126,6 +124,6 @@ no warnings 'once'; *recent = \&PublicInbox::Inbox::recent; *max_git_epoch = *nntp_usable = *msg_by_path = \&mm; # undef -*isrch = *search; +*isrch = *search = \&PublicInbox::Search::reopen; 1; ^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH 2/2] miscsearch: take reopen from Search and use it 2020-12-27 11:01 [PATCH 0/2] extsearch: avoid stale Xapian results Eric Wong 2020-12-27 11:01 ` [PATCH 1/2] extsearch: unconditionally reopen on access Eric Wong @ 2020-12-27 11:01 ` Eric Wong 2020-12-28 15:32 ` [PATCH 0/2] extsearch: avoid stale Xapian results Kyle Meyer 2 siblings, 0 replies; 4+ messages in thread From: Eric Wong @ 2020-12-27 11:01 UTC (permalink / raw) To: meta As with ExtSearch, MiscSearch lacks a janky cleanup timer of PublicInbox::Inbox objects, leading to info about inboxes/newsgroups going stale. Fortunately, we don't use MiscSearch very heavily, yet. In the future, we may be able to detect new inboxes without having to SIGHUP or restart daemons using MiscSearch. --- lib/PublicInbox/MiscSearch.pm | 4 ++++ lib/PublicInbox/WwwListing.pm | 3 +++ 2 files changed, 7 insertions(+) diff --git a/lib/PublicInbox/MiscSearch.pm b/lib/PublicInbox/MiscSearch.pm index c6ce255f..6683d564 100644 --- a/lib/PublicInbox/MiscSearch.pm +++ b/lib/PublicInbox/MiscSearch.pm @@ -73,6 +73,7 @@ sub misc_enquire_once { # retry_reopen callback sub mset { my ($self, $qs, $opt) = @_; $opt ||= {}; + reopen($self); my $qp = $self->{qp} //= mi_qp_new($self); $qs = 'type:inbox' if $qs eq ''; my $qr = $qp->parse_query($qs, $PublicInbox::Search::QP_FLAGS); @@ -184,4 +185,7 @@ sub nntpd_cache_load { retry_reopen($self, \&_nntpd_cache_load); } +no warnings 'once'; +*reopen = \&PublicInbox::Search::reopen; + 1; diff --git a/lib/PublicInbox/WwwListing.pm b/lib/PublicInbox/WwwListing.pm index fce0e530..4b3f1674 100644 --- a/lib/PublicInbox/WwwListing.pm +++ b/lib/PublicInbox/WwwListing.pm @@ -69,6 +69,9 @@ sub hide_key { 'www' } sub response { my ($class, $ctx) = @_; bless $ctx, $class; + if (my $ALL = $ctx->{www}->{pi_cfg}->ALL) { + $ALL->misc->reopen; + } my $re = $ctx->url_regexp or return $ctx->psgi_triple; my $iter = PublicInbox::ConfigIter->new($ctx->{www}->{pi_cfg}, \&list_match_i, $re, $ctx); ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH 0/2] extsearch: avoid stale Xapian results 2020-12-27 11:01 [PATCH 0/2] extsearch: avoid stale Xapian results Eric Wong 2020-12-27 11:01 ` [PATCH 1/2] extsearch: unconditionally reopen on access Eric Wong 2020-12-27 11:01 ` [PATCH 2/2] miscsearch: take reopen from Search and use it Eric Wong @ 2020-12-28 15:32 ` Kyle Meyer 2 siblings, 0 replies; 4+ messages in thread From: Kyle Meyer @ 2020-12-28 15:32 UTC (permalink / raw) To: Eric Wong; +Cc: meta Eric Wong writes: > I noticed recent messages weren't showing up in search results > on http://lore.czquwvybam4bgbro.onion/all/ > > These should fix it, and we'll probably get rid of the > cleanup timers for per-inbox search and follow this > strategy. I noticed that too but hadn't gotten around to reporting it. This seems to resolve the issue on my end. Thanks! ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2020-12-28 15:32 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-12-27 11:01 [PATCH 0/2] extsearch: avoid stale Xapian results Eric Wong 2020-12-27 11:01 ` [PATCH 1/2] extsearch: unconditionally reopen on access Eric Wong 2020-12-27 11:01 ` [PATCH 2/2] miscsearch: take reopen from Search and use it Eric Wong 2020-12-28 15:32 ` [PATCH 0/2] extsearch: avoid stale Xapian results Kyle Meyer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).