unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* [PATCH 0/5] nntp: round 2 of ->ALL extindex speedups
@ 2020-11-28  5:09 Eric Wong
  2020-11-28  5:09 ` [PATCH 1/5] nntp: NEWGROUPS uses long_response Eric Wong
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Eric Wong @ 2020-11-28  5:09 UTC (permalink / raw)
  To: meta

All of the O(n) iterations through newsgroups can either go
through the ->ALL extindex or is broken out into long_response
to not monopolize event loops.

So 50-100K newsgroups ought to be usable...

Further speeding up NEWGROUPS will require some MiscIdx
indexing additions, but that no longer hogs up an event
loop iteration.

One remaining problems is startup time with 50-100K newsgroups;
and that affects everything (especially -mda...)

Eric Wong (5):
  nntp: NEWGROUPS uses long_response
  nntp: speed up mid_lookup() using ->ALL extindex
  nntp: art_lookup: use mid_lookup and simplify
  nntp: XPATH uses ->ALL extindex, too
  nntpd: remove redundant {groups} shortcut

 lib/PublicInbox/NNTP.pm  | 125 +++++++++++++++++++++++++--------------
 lib/PublicInbox/NNTPD.pm |   8 +--
 t/extsearch.t            |   4 +-
 t/nntp.t                 |   1 -
 4 files changed, 87 insertions(+), 51 deletions(-)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/5] nntp: NEWGROUPS uses long_response
  2020-11-28  5:09 [PATCH 0/5] nntp: round 2 of ->ALL extindex speedups Eric Wong
@ 2020-11-28  5:09 ` Eric Wong
  2020-11-28  5:09 ` [PATCH 2/5] nntp: speed up mid_lookup() using ->ALL extindex Eric Wong
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2020-11-28  5:09 UTC (permalink / raw)
  To: meta

We can amortize the cost of NEWGROUPS time filtering using the
long_response API.  This lets us handle hundreds/thousands of
inboxes without monopolizing the event loop for this command.

Further speedup is possible using MiscSearch, but that requires
not-yet-done indexing changes to MiscIdx.
---
 lib/PublicInbox/NNTP.pm  | 21 +++++++++++++++------
 lib/PublicInbox/NNTPD.pm |  5 +----
 t/nntp.t                 |  1 -
 3 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/lib/PublicInbox/NNTP.pm b/lib/PublicInbox/NNTP.pm
index 8eec6b91..cc6534b9 100644
--- a/lib/PublicInbox/NNTP.pm
+++ b/lib/PublicInbox/NNTP.pm
@@ -263,6 +263,19 @@ sub group_line ($$) {
 	more($self, "$ng->{newsgroup} $max $min n");
 }
 
+sub newgroups_i {
+	my ($self, $ts, $i, $groupnames) = @_;
+	my $end = $$i + 100;
+	my $groups = $self->{nntpd}->{pi_config}->{-by_newsgroup};
+	while ($$i < $end) {
+		my $ngname = $groupnames->[$$i++] // return;
+		my $ibx = $groups->{$ngname} or next; # expired on reload
+		next unless (eval { $ibx->uidvalidity } // 0) > $ts;
+		group_line($self, $ibx);
+	}
+	1;
+}
+
 sub cmd_newgroups ($$$;$$) {
 	my ($self, $date, $time, $gmt, $dists) = @_;
 	my $ts = eval { parse_time($date, $time, $gmt) };
@@ -270,12 +283,8 @@ sub cmd_newgroups ($$$;$$) {
 
 	# TODO dists
 	more($self, '231 list of new newsgroups follows');
-	foreach my $ng (@{$self->{nntpd}->{grouplist}}) {
-		my $c = eval { $ng->uidvalidity } // 0;
-		next unless $c > $ts;
-		group_line($self, $ng);
-	}
-	'.'
+	long_response($self, \&newgroups_i, $ts, \(my $i = 0),
+				$self->{nntpd}->{groupnames});
 }
 
 sub wildmat2re (;$) {
diff --git a/lib/PublicInbox/NNTPD.pm b/lib/PublicInbox/NNTPD.pm
index 4de1944b..33bc5fda 100644
--- a/lib/PublicInbox/NNTPD.pm
+++ b/lib/PublicInbox/NNTPD.pm
@@ -24,7 +24,6 @@ sub new {
 		groups => {},
 		err => \*STDERR,
 		out => \*STDOUT,
-		grouplist => [],
 		pi_config => $pi_config,
 		servername => $name,
 		greet => \"201 $name ready - post via email\r\n",
@@ -60,9 +59,7 @@ sub refresh_groups {
 			delete $groups->{$ngname};
 		}
 	});
-	my @names = sort(keys %$groups);
-	$self->{grouplist} = [ map { $groups->{$_} } @names ];
-	$self->{groupnames} = \@names;
+	$self->{groupnames} = [ sort(keys %$groups) ];
 	$self->{pi_config} = $pi_config;
 	# this will destroy old groups that got deleted
 	$self->{groups} = $groups;
diff --git a/t/nntp.t b/t/nntp.t
index 91a2aff7..ea2ef876 100644
--- a/t/nntp.t
+++ b/t/nntp.t
@@ -112,7 +112,6 @@ use PublicInbox::Config;
 	my $hdr = $mime->header_obj;
 	my $mock_self = {
 		nntpd => {
-			grouplist => [],
 			servername => 'example.com',
 			pi_config => bless {}, 'PublicInbox::Config',
 		},

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/5] nntp: speed up mid_lookup() using ->ALL extindex
  2020-11-28  5:09 [PATCH 0/5] nntp: round 2 of ->ALL extindex speedups Eric Wong
  2020-11-28  5:09 ` [PATCH 1/5] nntp: NEWGROUPS uses long_response Eric Wong
@ 2020-11-28  5:09 ` Eric Wong
  2020-11-28  5:09 ` [PATCH 3/5] nntp: art_lookup: use mid_lookup and simplify Eric Wong
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2020-11-28  5:09 UTC (permalink / raw)
  To: meta

We can reuse "xref3" information in extindex to quickly match
messages matching a given Message-ID across hundreds or
thousands of newsgroups with a few SQL statements.

"XHDR Xref $MESSAGE_ID" is around 40% faster, on top of
previous speedups.
---
 lib/PublicInbox/NNTP.pm | 34 ++++++++++++++++++++++++++++++----
 1 file changed, 30 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/NNTP.pm b/lib/PublicInbox/NNTP.pm
index cc6534b9..7b3b1ffe 100644
--- a/lib/PublicInbox/NNTP.pm
+++ b/lib/PublicInbox/NNTP.pm
@@ -730,10 +730,36 @@ sub mid_lookup ($$) {
 		my $n = $self_ng->mm->num_for($mid);
 		return ($self_ng, $n) if defined $n;
 	}
-	foreach my $ng (values %{$self->{nntpd}->{groups}}) {
-		next if defined $self_ng && $ng eq $self_ng;
-		my $n = $ng->mm->num_for($mid);
-		return ($ng, $n) if defined $n;
+	my $pi_cfg = $self->{nntpd}->{pi_config};
+	if (my $ALL = $pi_cfg->ALL) {
+		my ($id, $prev);
+		while (my $smsg = $ALL->over->next_by_mid($mid, \$id, \$prev)) {
+			my $xr3 = $ALL->over->get_xref3($smsg->{num});
+			if (my @x = grep(/:$smsg->{blob}\z/, @$xr3)) {
+				my ($ngname, $xnum) = split(/:/, $x[0]);
+				my $ibx = $pi_cfg->{-by_newsgroup}->{$ngname};
+				return ($ibx, $xnum) if $ibx;
+				# fall through to trying all xref3s
+			} else {
+				warn <<EOF;
+W: xref3 missing for <$mid> ($smsg->{blob}) in $ALL->{topdir}, -extindex bug?
+EOF
+			}
+			# try all xref3s
+			for my $x (@$xr3) {
+				my ($ngname, $xnum) = split(/:/, $x);
+				my $ibx = $pi_cfg->{-by_newsgroup}->{$ngname};
+				return ($ibx, $xnum) if $ibx;
+				warn "W: `$ngname' does not exist for #$xnum\n";
+			}
+		}
+		# no warning here, $mid is just invalid
+	} else { # slow path for non-ALL users
+		foreach my $ibx (values %{$self->{nntpd}->{groups}}) {
+			next if defined $self_ng && $ibx eq $self_ng;
+			my $n = $ibx->mm->num_for($mid);
+			return ($ibx, $n) if defined $n;
+		}
 	}
 	(undef, undef);
 }

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/5] nntp: art_lookup: use mid_lookup and simplify
  2020-11-28  5:09 [PATCH 0/5] nntp: round 2 of ->ALL extindex speedups Eric Wong
  2020-11-28  5:09 ` [PATCH 1/5] nntp: NEWGROUPS uses long_response Eric Wong
  2020-11-28  5:09 ` [PATCH 2/5] nntp: speed up mid_lookup() using ->ALL extindex Eric Wong
@ 2020-11-28  5:09 ` Eric Wong
  2020-11-28  5:09 ` [PATCH 4/5] nntp: XPATH uses ->ALL extindex, too Eric Wong
  2020-11-28  5:09 ` [PATCH 5/5] nntpd: remove redundant {groups} shortcut Eric Wong
  4 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2020-11-28  5:09 UTC (permalink / raw)
  To: meta

This lets us take advantage of mid_lookup speedup from the
previous commit.

While we're at it, start moving towards using `$ibx' as the
abbreviation for PublicInbox::Inbox objects even in the NNTP
code, since they've been shared with the WWW code for several
years, now.
---
 lib/PublicInbox/NNTP.pm | 36 ++++++++++++------------------------
 1 file changed, 12 insertions(+), 24 deletions(-)

diff --git a/lib/PublicInbox/NNTP.pm b/lib/PublicInbox/NNTP.pm
index 7b3b1ffe..c014eff0 100644
--- a/lib/PublicInbox/NNTP.pm
+++ b/lib/PublicInbox/NNTP.pm
@@ -483,42 +483,30 @@ sub set_nntp_headers ($$) {
 
 sub art_lookup ($$$) {
 	my ($self, $art, $code) = @_;
-	my $ng = $self->{ng};
-	my ($n, $mid);
+	my ($ibx, $n);
 	my $err;
 	if (defined $art) {
 		if ($art =~ /\A[0-9]+\z/) {
 			$err = '423 no such article number in this group';
 			$n = int($art);
-			goto find_mid;
+			goto find_ibx;
 		} elsif ($art =~ $ONE_MSGID) {
-			$mid = $1;
-			$err = r430;
-			$n = $ng->mm->num_for($mid) if $ng;
-			goto found if defined $n;
-			foreach my $g (values %{$self->{nntpd}->{groups}}) {
-				$n = $g->mm->num_for($mid);
-				if (defined $n) {
-					$ng = $g;
-					goto found;
-				}
-			}
-			return $err;
+			($ibx, $n) = mid_lookup($self, $1);
+			goto found if $ibx;
+			return r430;
 		} else {
 			return r501;
 		}
 	} else {
 		$err = '420 no current article has been selected';
-		$n = $self->{article};
-		defined $n or return $err;
-find_mid:
-		$ng or return '412 no newsgroup has been selected';
-		$mid = $ng->mm->mid_for($n);
-		defined $mid or return $err;
+		$n = $self->{article} // return $err;
+find_ibx:
+		$ibx = $self->{ng} or
+				return '412 no newsgroup has been selected';
 	}
 found:
-	my $smsg = $ng->over->get_art($n) or return $err;
-	$smsg->{-ibx} = $ng;
+	my $smsg = $ibx->over->get_art($n) or return $err;
+	$smsg->{-ibx} = $ibx;
 	if ($code == 223) { # STAT
 		set_art($self, $n);
 		"223 $n <$smsg->{mid}> article retrieved - " .
@@ -528,7 +516,7 @@ found:
 		$smsg->{nntp_code} = $code;
 		set_art($self, $art);
 		# this dereferences to `undef'
-		${git_async_cat($ng->git, $smsg->{blob}, \&blob_cb, $smsg)};
+		${git_async_cat($ibx->git, $smsg->{blob}, \&blob_cb, $smsg)};
 	}
 }
 

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 4/5] nntp: XPATH uses ->ALL extindex, too
  2020-11-28  5:09 [PATCH 0/5] nntp: round 2 of ->ALL extindex speedups Eric Wong
                   ` (2 preceding siblings ...)
  2020-11-28  5:09 ` [PATCH 3/5] nntp: art_lookup: use mid_lookup and simplify Eric Wong
@ 2020-11-28  5:09 ` Eric Wong
  2020-11-28  5:09 ` [PATCH 5/5] nntpd: remove redundant {groups} shortcut Eric Wong
  4 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2020-11-28  5:09 UTC (permalink / raw)
  To: meta

Another 30-40% speedup when testing against a local
lore.kernel.org mirror.  In either case, we'll consistently sort
the response for ease-of-testing and client-side
cache-friendliness.
---
 lib/PublicInbox/NNTP.pm | 24 ++++++++++++++++++++----
 t/extsearch.t           |  4 +++-
 2 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/lib/PublicInbox/NNTP.pm b/lib/PublicInbox/NNTP.pm
index c014eff0..d314a3d1 100644
--- a/lib/PublicInbox/NNTP.pm
+++ b/lib/PublicInbox/NNTP.pm
@@ -994,12 +994,28 @@ sub cmd_xpath ($$) {
 	return r501 unless $mid =~ $ONE_MSGID;
 	$mid = $1;
 	my @paths;
-	foreach my $ng (values %{$self->{nntpd}->{groups}}) {
-		my $n = $ng->mm->num_for($mid);
-		push @paths, "$ng->{newsgroup}/$n" if defined $n;
+	my $pi_cfg = $self->{nntpd}->{pi_config};
+	my $groups = $pi_cfg->{-by_newsgroup};
+	if (my $ALL = $pi_cfg->ALL) {
+		my ($id, $prev, %seen);
+		while (my $smsg = $ALL->over->next_by_mid($mid, \$id, \$prev)) {
+			my $xr3 = $ALL->over->get_xref3($smsg->{num});
+			for my $x (@$xr3) {
+				my ($ngname, $n) = split(/:/, $x);
+				$x = "$ngname/$n";
+				if ($groups->{$ngname} && !$seen{$x}++) {
+					push(@paths, $x);
+				}
+			}
+		}
+	} else { # slow path, no point in using long_response
+		for my $ibx (values %$groups) {
+			my $n = $ibx->mm->num_for($mid) // next;
+			push @paths, "$ibx->{newsgroup}/$n";
+		}
 	}
 	return '430 no such article on server' unless @paths;
-	'223 '.join(' ', @paths);
+	'223 '.join(' ', sort(@paths));
 }
 
 sub res ($$) { do_write($_[0], $_[1] . "\r\n") }
diff --git a/t/extsearch.t b/t/extsearch.t
index f9f74e5c..b2439971 100644
--- a/t/extsearch.t
+++ b/t/extsearch.t
@@ -74,9 +74,11 @@ EOF
 SKIP: {
 	require_mods(qw(Net::NNTP), 1);
 	my ($out, $err) = ("$home/nntpd.out.log", "$home/nntpd.err.log");
-	my $cmd = [ '-nntpd', '-W0' ]; #, "--stdout=$out", "--stderr=$err" ];
+	my $cmd = [ '-nntpd', '-W0', "--stdout=$out", "--stderr=$err" ];
 	my $td = start_script($cmd, undef, { 3 => $sock });
 	my $n = Net::NNTP->new($host_port);
+	my @xp = $n->xpath('<testmessage@example.com>');
+	is_deeply(\@xp, [ qw(v1.example/1 v2.example/1) ]);
 	$n->group('v1.example');
 	my $res = $n->head(1);
 	@$res = grep(/^Xref: /, @$res);

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 5/5] nntpd: remove redundant {groups} shortcut
  2020-11-28  5:09 [PATCH 0/5] nntp: round 2 of ->ALL extindex speedups Eric Wong
                   ` (3 preceding siblings ...)
  2020-11-28  5:09 ` [PATCH 4/5] nntp: XPATH uses ->ALL extindex, too Eric Wong
@ 2020-11-28  5:09 ` Eric Wong
  4 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2020-11-28  5:09 UTC (permalink / raw)
  To: meta

It's not worth confusing hackers reading the source to have
two ways to access the same (large) hash table.  So just
go through PublicInbox::Config objects for now since the
extra hash lookup isn't going to be noticeable.

I've also started favoring "for" instead of "foreach"
since they're the equivalent perlop and less wear on
my fingers + keyboard.
---
 lib/PublicInbox/NNTP.pm  | 12 ++++++------
 lib/PublicInbox/NNTPD.pm |  3 +--
 2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/lib/PublicInbox/NNTP.pm b/lib/PublicInbox/NNTP.pm
index d314a3d1..3b16a66a 100644
--- a/lib/PublicInbox/NNTP.pm
+++ b/lib/PublicInbox/NNTP.pm
@@ -321,7 +321,7 @@ sub ngpat2re (;$) {
 sub newnews_i {
 	my ($self, $names, $ts, $prev) = @_;
 	my $ngname = $names->[0];
-	if (my $ibx = $self->{nntpd}->{groups}->{$ngname}) {
+	if (my $ibx = $self->{nntpd}->{pi_config}->{-by_newsgroup}->{$ngname}) {
 		if (my $over = $ibx->over) {
 			my $msgs = $over->query_ts($ts, $$prev);
 			if (scalar @$msgs) {
@@ -360,13 +360,13 @@ sub cmd_newnews ($$$$;$$) {
 
 sub cmd_group ($$) {
 	my ($self, $group) = @_;
-	my $no_such = '411 no such news group';
 	my $nntpd = $self->{nntpd};
-	my $ng = $nntpd->{groups}->{$group} or return $no_such;
+	my $ibx = $nntpd->{pi_config}->{-by_newsgroup}->{$group} or
+		return '411 no such news group';
 	$nntpd->idler_start;
 
-	$self->{ng} = $ng;
-	my ($min, $max) = $ng->mm->minmax;
+	$self->{ng} = $ibx;
+	my ($min, $max) = $ibx->mm->minmax;
 	$self->{article} = $min;
 	my $est_size = $max - $min;
 	"211 $est_size $min $max $group";
@@ -743,7 +743,7 @@ EOF
 		}
 		# no warning here, $mid is just invalid
 	} else { # slow path for non-ALL users
-		foreach my $ibx (values %{$self->{nntpd}->{groups}}) {
+		for my $ibx (values %{$pi_cfg->{-by_newsgroup}}) {
 			next if defined $self_ng && $ibx eq $self_ng;
 			my $n = $ibx->mm->num_for($mid);
 			return ($ibx, $n) if defined $n;
diff --git a/lib/PublicInbox/NNTPD.pm b/lib/PublicInbox/NNTPD.pm
index 33bc5fda..5e287857 100644
--- a/lib/PublicInbox/NNTPD.pm
+++ b/lib/PublicInbox/NNTPD.pm
@@ -60,9 +60,8 @@ sub refresh_groups {
 		}
 	});
 	$self->{groupnames} = [ sort(keys %$groups) ];
-	$self->{pi_config} = $pi_config;
 	# this will destroy old groups that got deleted
-	$self->{groups} = $groups;
+	$self->{pi_config} = $pi_config;
 }
 
 sub idler_start {

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-11-28  5:09 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-28  5:09 [PATCH 0/5] nntp: round 2 of ->ALL extindex speedups Eric Wong
2020-11-28  5:09 ` [PATCH 1/5] nntp: NEWGROUPS uses long_response Eric Wong
2020-11-28  5:09 ` [PATCH 2/5] nntp: speed up mid_lookup() using ->ALL extindex Eric Wong
2020-11-28  5:09 ` [PATCH 3/5] nntp: art_lookup: use mid_lookup and simplify Eric Wong
2020-11-28  5:09 ` [PATCH 4/5] nntp: XPATH uses ->ALL extindex, too Eric Wong
2020-11-28  5:09 ` [PATCH 5/5] nntpd: remove redundant {groups} shortcut Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).