* [PATCH 0/3] force reindex for threading changes
@ 2017-02-06 21:55 Eric Wong
2017-02-06 21:55 ` [PATCH 1/3] searchidx: reindex clobbers old thread IDs Eric Wong
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Eric Wong @ 2017-02-06 21:55 UTC (permalink / raw)
To: meta
We cannot rely on in-place --reindex to handle thread_id
changes when we fix threading bugs in the search indexer
like in commit 83425ef12e4b65cdcecd11ddcb38175d4a91d5a0
("searchidx: deal with empty In-Reply-To and References headers")
So, bump the schema version and pay the cost of requiring
extra disk space to create a new index in parallel.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH 1/3] searchidx: reindex clobbers old thread IDs
2017-02-06 21:55 [PATCH 0/3] force reindex for threading changes Eric Wong
@ 2017-02-06 21:55 ` Eric Wong
2017-02-06 21:55 ` [PATCH 2/3] Revert "searchidx: reindex clobbers old thread IDs" Eric Wong
2017-02-06 21:55 ` [PATCH 3/3] search: schema version bump for empty References/In-Reply-To Eric Wong
2 siblings, 0 replies; 4+ messages in thread
From: Eric Wong @ 2017-02-06 21:55 UTC (permalink / raw)
To: meta
We cannot always reuse thread IDs since our threading
logic may change as bugs are fixed.
---
lib/PublicInbox/SearchIdx.pm | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 1142ca7..bc003c6 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -157,6 +157,10 @@ sub add_message {
# it will also clobber any existing regular message
$doc_id = $smsg->{doc_id};
$old_tid = $smsg->thread_id;
+
+ # no need to remove_term for old_tid, we use a new
+ # doc to replace the old one when reindexing:
+ $old_tid = undef if $self->{reindex};
}
$smsg = PublicInbox::SearchMsg->new($mime);
my $doc = $smsg->{doc};
@@ -464,7 +468,7 @@ sub _git_log {
sub _index_sync {
my ($self, $opts) = @_;
my $tip = $opts->{ref} || 'HEAD';
- my $reindex = $opts->{reindex};
+ $self->{reindex} = $opts->{reindex};
my ($mkey, $last_commit, $lx, $xlog);
$self->{git}->batch_prepare;
my $xdb = _xdb_acquire($self);
@@ -474,7 +478,7 @@ sub _index_sync {
$mkey = 'last_commit';
$last_commit = $xdb->get_metadata('last_commit');
$lx = $last_commit;
- if ($reindex) {
+ if ($self->{reindex}) {
$lx = '';
$mkey = undef if $last_commit ne '';
}
--
EW
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH 2/3] Revert "searchidx: reindex clobbers old thread IDs"
2017-02-06 21:55 [PATCH 0/3] force reindex for threading changes Eric Wong
2017-02-06 21:55 ` [PATCH 1/3] searchidx: reindex clobbers old thread IDs Eric Wong
@ 2017-02-06 21:55 ` Eric Wong
2017-02-06 21:55 ` [PATCH 3/3] search: schema version bump for empty References/In-Reply-To Eric Wong
2 siblings, 0 replies; 4+ messages in thread
From: Eric Wong @ 2017-02-06 21:55 UTC (permalink / raw)
To: meta
Oops, that's broken, too. I guess the only way to reindex
after fixing the thread detection is to start from scratch.
This reverts commit 5d91adedf5f33ef1cb87df2a86306ddf370b4f8d.
---
lib/PublicInbox/SearchIdx.pm | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index bc003c6..1142ca7 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -157,10 +157,6 @@ sub add_message {
# it will also clobber any existing regular message
$doc_id = $smsg->{doc_id};
$old_tid = $smsg->thread_id;
-
- # no need to remove_term for old_tid, we use a new
- # doc to replace the old one when reindexing:
- $old_tid = undef if $self->{reindex};
}
$smsg = PublicInbox::SearchMsg->new($mime);
my $doc = $smsg->{doc};
@@ -468,7 +464,7 @@ sub _git_log {
sub _index_sync {
my ($self, $opts) = @_;
my $tip = $opts->{ref} || 'HEAD';
- $self->{reindex} = $opts->{reindex};
+ my $reindex = $opts->{reindex};
my ($mkey, $last_commit, $lx, $xlog);
$self->{git}->batch_prepare;
my $xdb = _xdb_acquire($self);
@@ -478,7 +474,7 @@ sub _index_sync {
$mkey = 'last_commit';
$last_commit = $xdb->get_metadata('last_commit');
$lx = $last_commit;
- if ($self->{reindex}) {
+ if ($reindex) {
$lx = '';
$mkey = undef if $last_commit ne '';
}
--
EW
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH 3/3] search: schema version bump for empty References/In-Reply-To
2017-02-06 21:55 [PATCH 0/3] force reindex for threading changes Eric Wong
2017-02-06 21:55 ` [PATCH 1/3] searchidx: reindex clobbers old thread IDs Eric Wong
2017-02-06 21:55 ` [PATCH 2/3] Revert "searchidx: reindex clobbers old thread IDs" Eric Wong
@ 2017-02-06 21:55 ` Eric Wong
2 siblings, 0 replies; 4+ messages in thread
From: Eric Wong @ 2017-02-06 21:55 UTC (permalink / raw)
To: meta
We cannot distinguish between legitimate ghosts and mis-threaded
messages before commit 83425ef12e4b65cdcecd11ddcb38175d4a91d5a0
("searchidx: deal with empty In-Reply-To and References headers")
so we must rebuild the index in parallel to fix it.
---
lib/PublicInbox/Search.pm | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/lib/PublicInbox/Search.pm b/lib/PublicInbox/Search.pm
index c909424..8c72fa1 100644
--- a/lib/PublicInbox/Search.pm
+++ b/lib/PublicInbox/Search.pm
@@ -39,7 +39,9 @@ use constant {
# 10 - optimize doc for NNTP overviews
# 11 - merge threads when vivifying ghosts
# 12 - change YYYYMMDD value column to numeric
- SCHEMA_VERSION => 12,
+ # 13 - fix threading for empty References/In-Reply-To
+ # (commit 83425ef12e4b65cdcecd11ddcb38175d4a91d5a0)
+ SCHEMA_VERSION => 13,
# n.b. FLAG_PURE_NOT is expensive not suitable for a public website
# as it could become a denial-of-service vector
--
EW
^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2017-02-06 21:55 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-02-06 21:55 [PATCH 0/3] force reindex for threading changes Eric Wong
2017-02-06 21:55 ` [PATCH 1/3] searchidx: reindex clobbers old thread IDs Eric Wong
2017-02-06 21:55 ` [PATCH 2/3] Revert "searchidx: reindex clobbers old thread IDs" Eric Wong
2017-02-06 21:55 ` [PATCH 3/3] search: schema version bump for empty References/In-Reply-To Eric Wong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).