unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
From: "Eric Wong (Contractor, The Linux Foundation)" <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH 22/34] nntp: use NNTP article numbers for lookups
Date: Tue,  6 Mar 2018 08:42:30 +0000	[thread overview]
Message-ID: <20180306084242.19988-23-e@80x24.org> (raw)
In-Reply-To: <20180306084242.19988-1-e@80x24.org>

Since Message-IDs are no longer unique within Xapian
(but are within the SQLite Msgmap); favor NNTP article
numbers for internal lookups.  This will prevent us
from finding the "wrong" internal Message-ID.
---
 lib/PublicInbox/NNTP.pm   | 29 ++++++++++++++---------------
 lib/PublicInbox/Search.pm | 21 +++++++++++++++++++++
 2 files changed, 35 insertions(+), 15 deletions(-)

diff --git a/lib/PublicInbox/NNTP.pm b/lib/PublicInbox/NNTP.pm
index 56d8e01..895e502 100644
--- a/lib/PublicInbox/NNTP.pm
+++ b/lib/PublicInbox/NNTP.pm
@@ -463,18 +463,16 @@ find_mid:
 		defined $mid or return $err;
 	}
 found:
-	my $bytes;
-	my $s = eval { $ng->msg_by_mid($mid, \$bytes) } or return $err;
-	$s = Email::Simple->new($s);
-	my $lines;
+	my $smsg = $ng->search->lookup_article($n) or return $err;
+	my $msg = $ng->msg_by_smsg($smsg) or return $err;
+	my $s = Email::Simple->new($msg);
 	if ($set_headers) {
 		set_nntp_headers($s->header_obj, $ng, $n, $mid);
-		$lines = $s->body =~ tr!\n!\n!;
 
 		# must be last
 		$s->body_set('') if ($set_headers == 2);
 	}
-	[ $n, $mid, $s, $bytes, $lines, $ng ];
+	[ $n, $mid, $s, $smsg->bytes, $smsg->lines, $ng ];
 }
 
 sub simple_body_write ($$) {
@@ -693,8 +691,8 @@ sub hdr_xref ($$$) { # optimize XHDR Xref [range] for rtin
 }
 
 sub search_header_for {
-	my ($srch, $mid, $field) = @_;
-	my $smsg = $srch->lookup_mail($mid) or return;
+	my ($srch, $num, $field) = @_;
+	my $smsg = $srch->lookup_article($num) or return;
 	$smsg->$field;
 }
 
@@ -702,8 +700,8 @@ sub hdr_searchmsg ($$$$) {
 	my ($self, $xhdr, $field, $range) = @_;
 	if (defined $range && $range =~ /\A<(.+)>\z/) { # Message-ID
 		my ($ng, $n) = mid_lookup($self, $1);
-		return r430 unless $n;
-		my $v = search_header_for($ng->search, $range, $field);
+		return r430 unless defined $n;
+		my $v = search_header_for($ng->search, $n, $field);
 		hdr_mid_response($self, $xhdr, $ng, $n, $range, $v);
 	} else { # numeric range
 		$range = $self->{article} unless defined $range;
@@ -803,9 +801,10 @@ sub cmd_xrover ($;$) {
 	more($self, '224 Overview information follows');
 	long_response($self, $beg, $end, sub {
 		my ($i) = @_;
-		my $mid = $mm->mid_for($$i) or return;
-		my $h = search_header_for($srch, $mid, 'references');
-		more($self, "$$i $h");
+		my $num = $$i;
+		my $h = search_header_for($srch, $num, 'references');
+		defined $h or return;
+		more($self, "$num $h");
 	});
 }
 
@@ -829,8 +828,8 @@ sub cmd_over ($;$) {
 	my ($self, $range) = @_;
 	if ($range && $range =~ /\A<(.+)>\z/) {
 		my ($ng, $n) = mid_lookup($self, $1);
-		my $smsg = $ng->search->lookup_mail($range) or
-			return '430 No article with that message-id';
+		defined $n or return r430;
+		my $smsg = $ng->search->lookup_article($n) or return r430;
 		more($self, '224 Overview information follows (multi-line)');
 
 		# Only set article number column if it's the current group
diff --git a/lib/PublicInbox/Search.pm b/lib/PublicInbox/Search.pm
index a1c423c..802984b 100644
--- a/lib/PublicInbox/Search.pm
+++ b/lib/PublicInbox/Search.pm
@@ -372,6 +372,27 @@ sub lookup_mail { # no ghosts!
 	});
 }
 
+sub lookup_article {
+	my ($self, $num) = @_;
+	my $term = 'XNUM'.$num;
+	my $smsg;
+	eval {
+		retry_reopen($self, sub {
+			my $db = $self->{skel} || $self->{xdb};
+			my $head = $db->postlist_begin($term);
+			return if $head == $db->postlist_end($term);
+			my $doc_id = $head->get_docid;
+			return unless defined $doc_id;
+			# raises on error:
+			my $doc = $db->get_document($doc_id);
+			$smsg = PublicInbox::SearchMsg->wrap($doc);
+			$smsg->load_expand;
+			$smsg->{doc_id} = $doc_id;
+		});
+	};
+	$smsg;
+}
+
 sub each_smsg_by_mid {
 	my ($self, $mid, $cb) = @_;
 	my $xdb = $self->{xdb};
-- 
EW


  parent reply	other threads:[~2018-03-06  8:42 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-06  8:42 [v2 PATCH 00/34] duplicate handling, smaller Xapian DBs, date fixes Eric Wong (Contractor, The Linux Foundation)
2018-03-06  8:42 ` [PATCH 01/34] v2writable: delete ::Import obj when ->done Eric Wong (Contractor, The Linux Foundation)
2018-03-06  8:42 ` [PATCH 02/34] search: remove informational "warning" message Eric Wong (Contractor, The Linux Foundation)
2018-03-06  8:42 ` [PATCH 03/34] searchidx: add PID to error message when die-ing Eric Wong (Contractor, The Linux Foundation)
2018-03-06  8:42 ` [PATCH 04/34] content_id: special treatment for Message-Id headers Eric Wong (Contractor, The Linux Foundation)
2018-03-06  8:42 ` [PATCH 05/34] evcleanup: disable outside of daemon Eric Wong (Contractor, The Linux Foundation)
2018-03-06  8:42 ` [PATCH 06/34] v2writable: deduplicate detection on add Eric Wong (Contractor, The Linux Foundation)
2018-03-06  8:42 ` [PATCH 07/34] evcleanup: do not create event loop if nothing was registered Eric Wong (Contractor, The Linux Foundation)
2018-03-06  8:42 ` [PATCH 08/34] mid: add `mids' and `references' methods for extraction Eric Wong (Contractor, The Linux Foundation)
2018-03-06  8:42 ` [PATCH 09/34] content_id: use `mids' and `references' for MID extraction Eric Wong (Contractor, The Linux Foundation)
2018-03-06  8:42 ` [PATCH 10/34] searchidx: use new `references' method for parsing References Eric Wong (Contractor, The Linux Foundation)
2018-03-06  8:42 ` [PATCH 11/34] content_id: no need to be human-friendly Eric Wong (Contractor, The Linux Foundation)
2018-03-06  8:42 ` [PATCH 12/34] v2writable: inject new Message-IDs on true duplicates Eric Wong (Contractor, The Linux Foundation)
2018-03-06  8:42 ` [PATCH 13/34] search: revert to using 'Q' as a uniQue id per-Xapian conventions Eric Wong (Contractor, The Linux Foundation)
2018-03-06  8:42 ` [PATCH 14/34] searchidx: support indexing multiple MIDs Eric Wong (Contractor, The Linux Foundation)
2018-03-06  8:42 ` [PATCH 15/34] mid: be strict with References, but loose on Message-Id Eric Wong (Contractor, The Linux Foundation)
2018-03-06  8:42 ` [PATCH 16/34] searchidx: avoid excessive XNQ indexing with diffs Eric Wong (Contractor, The Linux Foundation)
2018-03-06  8:42 ` [PATCH 17/34] searchidxskeleton: add a note about locking Eric Wong (Contractor, The Linux Foundation)
2018-03-06  8:42 ` [PATCH 18/34] v2writable: generated Message-ID goes first Eric Wong (Contractor, The Linux Foundation)
2018-03-06  8:42 ` [PATCH 19/34] searchidx: use add_boolean_term for internal terms Eric Wong (Contractor, The Linux Foundation)
2018-03-06  8:42 ` [PATCH 20/34] searchidx: add NNTP article number as a searchable term Eric Wong (Contractor, The Linux Foundation)
2018-03-06  8:42 ` [PATCH 21/34] mid: truncate excessively long MIDs early Eric Wong (Contractor, The Linux Foundation)
2018-03-06  8:42 ` Eric Wong (Contractor, The Linux Foundation) [this message]
2018-03-06  8:42 ` [PATCH 23/34] nntp: fix NEWNEWS command Eric Wong (Contractor, The Linux Foundation)
2018-03-06  8:42 ` [PATCH 24/34] searchidx: store the primary MID in doc data for NNTP Eric Wong (Contractor, The Linux Foundation)
2018-03-06  8:42 ` [PATCH 25/34] import: consolidate object info for v2 imports Eric Wong (Contractor, The Linux Foundation)
2018-03-06  8:42 ` [PATCH 26/34] v2: avoid redundant/repeated configs for git partition repos Eric Wong (Contractor, The Linux Foundation)
2018-03-06  8:42 ` [PATCH 27/34] INSTALL: document more optional dependencies Eric Wong (Contractor, The Linux Foundation)
2018-03-06  8:42 ` [PATCH 28/34] search: favor skeleton DB for lookup_mail Eric Wong (Contractor, The Linux Foundation)
2018-03-06  8:42 ` [PATCH 29/34] search: each_smsg_by_mid uses skeleton if available Eric Wong (Contractor, The Linux Foundation)
2018-03-06  8:42 ` [PATCH 30/34] v2writable: remove unnecessary skeleton commit Eric Wong (Contractor, The Linux Foundation)
2018-03-06  8:42 ` [PATCH 31/34] favor Received: date over Date: header globally Eric Wong (Contractor, The Linux Foundation)
2018-03-06  8:42 ` [PATCH 32/34] import: fall back to Sender for extracting name and email Eric Wong (Contractor, The Linux Foundation)
2018-03-06  8:42 ` [PATCH 33/34] scripts/import_vger_from_mbox: perform mboxrd or mboxo escaping Eric Wong (Contractor, The Linux Foundation)
2018-03-06  8:42 ` [PATCH 34/34] v2writable: detect and use previous partition count Eric Wong (Contractor, The Linux Foundation)
2018-03-06  8:53 ` [v2 PATCH 00/34] duplicate handling, smaller Xapian DBs, date fixes Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180306084242.19988-23-e@80x24.org \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).