From: "Eric Wong (Contractor, The Linux Foundation)" <e@80x24.org>
To: meta@public-inbox.org
Cc: "Eric Wong (Contractor, The Linux Foundation)" <e@80x24.org>
Subject: [PATCH 08/14] search: cleanup uniqueness checking
Date: Thu, 29 Mar 2018 10:28:13 +0000 [thread overview]
Message-ID: <20180329102819.15234-9-e@80x24.org> (raw)
In-Reply-To: <20180329102819.15234-1-e@80x24.org>
The only Xapian term which should be unique is the NNTP article
number; so we no longer need find_unique_doc_id.
---
lib/PublicInbox/Search.pm | 24 ++++++++----------------
1 file changed, 8 insertions(+), 16 deletions(-)
diff --git a/lib/PublicInbox/Search.pm b/lib/PublicInbox/Search.pm
index a4e2498..584a508 100644
--- a/lib/PublicInbox/Search.pm
+++ b/lib/PublicInbox/Search.pm
@@ -396,9 +396,16 @@ sub lookup_article {
retry_reopen($self, sub {
my $db = $self->{skel} || $self->{xdb};
my $head = $db->postlist_begin($term);
- return if $head == $db->postlist_end($term);
+ my $tail = $db->postlist_end($term);
+ return if $head->equal($tail);
my $doc_id = $head->get_docid;
return unless defined $doc_id;
+ $head->inc;
+ if ($head->nequal($tail)) {
+ my $loc= $self->{mainrepo} .
+ ($self->{skel} ? 'skel' : 'xdb');
+ warn "article #$num is not unique in $loc\n";
+ }
# raises on error:
my $doc = $db->get_document($doc_id);
$smsg = PublicInbox::SearchMsg->wrap($doc);
@@ -432,21 +439,6 @@ sub each_smsg_by_mid {
}
}
-sub find_unique_doc_id {
- my ($self, $termval) = @_;
-
- my ($begin, $end) = $self->find_doc_ids($termval);
-
- return undef if $begin->equal($end); # not found
-
- my $rv = $begin->get_docid;
-
- # sanity check
- $begin->inc;
- $begin->equal($end) or die "Term '$termval' is not unique\n";
- $rv;
-}
-
# returns begin and end PostingIterator
sub find_doc_ids {
my ($self, $termval) = @_;
--
EW
next prev parent reply other threads:[~2018-03-29 10:28 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-03-29 10:28 [PATCH 00/14] purging support, v1 conversions, cleanups + more Eric Wong (Contractor, The Linux Foundation)
2018-03-29 10:28 ` [PATCH 01/14] www: remove unnecessary ghost checks Eric Wong (Contractor, The Linux Foundation)
2018-03-29 10:28 ` [PATCH 02/14] v2writable: append, instead of prepending generated Message-ID Eric Wong (Contractor, The Linux Foundation)
2018-03-29 10:28 ` [PATCH 03/14] lookup by Message-ID favors the "primary" one Eric Wong (Contractor, The Linux Foundation)
2018-03-29 10:28 ` [PATCH 04/14] www: fix attachment downloads for conflicted Message-IDs Eric Wong (Contractor, The Linux Foundation)
2018-03-29 10:28 ` [PATCH 05/14] searchmsg: document why we store To: and Cc: for NNTP Eric Wong (Contractor, The Linux Foundation)
2018-03-29 10:28 ` [PATCH 06/14] public-inbox-convert: tool for converting old to new inboxes Eric Wong (Contractor, The Linux Foundation)
2018-03-29 10:28 ` [PATCH 07/14] v2writable: support purging messages from git entirely Eric Wong (Contractor, The Linux Foundation)
2018-03-29 10:28 ` Eric Wong (Contractor, The Linux Foundation) [this message]
2018-03-29 10:28 ` [PATCH 09/14] search: get rid of most lookup_* subroutines Eric Wong (Contractor, The Linux Foundation)
2018-03-29 10:28 ` [PATCH 10/14] search: move find_doc_ids to searchidx Eric Wong (Contractor, The Linux Foundation)
2018-03-29 10:28 ` [PATCH 11/14] v2writable: cleanup: get rid of unused fields Eric Wong (Contractor, The Linux Foundation)
2018-03-29 10:28 ` [PATCH 12/14] mbox: avoid extracting Message-ID for linkification Eric Wong (Contractor, The Linux Foundation)
2018-03-29 10:28 ` [PATCH 13/14] www: cleanup expensive fallback for legacy URLs Eric Wong (Contractor, The Linux Foundation)
2018-03-29 10:28 ` [PATCH 14/14] view: get rid of some unnecessary imports Eric Wong (Contractor, The Linux Foundation)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180329102819.15234-9-e@80x24.org \
--to=e@80x24.org \
--cc=meta@public-inbox.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).