unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
From: Eric Wong <e@80x24.org>
To: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Cc: meta@public-inbox.org
Subject: [PATCH] www: use correct threadid for per-thread search
Date: Fri, 16 Jun 2023 23:13:01 +0000	[thread overview]
Message-ID: <20230616231301.M394415@dcvr> (raw)
In-Reply-To: <20230616-rudy-comedy-vision-2b9f92@meerkat>

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Thu, Mar 30, 2023 at 11:29:51AM +0000, Eric Wong wrote:
> > This implements the mbox.gz retrieval.  I didn't want to deal
> > with HTML nor figuring out how to expose more <form> elements,
> > yet; but I figure mbox.gz is the most important.
> > 
> > Now deployed on 80x24.org/lore:
> > 
> > MSGID=20230327080502.GA570847@ziqianlu-desk2
> > curl -d '' -sSf \
> >    https://80x24.org/lore/all/"$MSGID/?x=m&q=rt:2023-03-29.." | \
> >    zcat | grep -i ^Message-ID:
> 
> Eric:
> 
> Reviving this old thread for some clarification. I noticed that this only
> works for /all/, but not for individual inboxes. E.g.:
> 
>     $ curl -d '' -sSf \
>       https://lore.kernel.org/all/"$MSGID/?x=m&q=rt:2023-03-29.." \
>       | zgrep -i ^Message-ID:
>     Message-ID: <cfcf852c-e9f0-f560-542d-0f72777a85b2@leemhuis.info>
> 
> but with /lkml/ I get a 404:
> 
>     $ curl -d '' -sSf \
>       https://lore.kernel.org/lkml/"$MSGID/?x=m&q=rt:2023-03-29.." \
>       | zgrep -i ^Message-ID:
>     curl: (22) The requested URL returned error: 404
> 
> Is that intentionally restricted to just extindex?

It's a bug, fix below and deployed to https://80x24.org/lore/

---------8<---------
Subject: [PATCH] www: use correct threadid for per-thread search

For individual public-inboxes relying on extindex for per-inbox
search, we must use the threadid from the extindex over.sqlite3
rather than the per-inbox over.sqlite3 file.

Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Link: https://public-inbox.org/meta/20230616-rudy-comedy-vision-2b9f92@meerkat/
---
 lib/PublicInbox/Mbox.pm | 10 +++++++---
 t/extindex-psgi.t       | 39 +++++++++++++++++++++++++++++++++++++--
 2 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/lib/PublicInbox/Mbox.pm b/lib/PublicInbox/Mbox.pm
index e1abf7ec..bf61bb0e 100644
--- a/lib/PublicInbox/Mbox.pm
+++ b/lib/PublicInbox/Mbox.pm
@@ -225,15 +225,19 @@ sub mbox_all {
 	return mbox_all_ids($ctx) if $q_string !~ /\S/;
 	my $srch = $ctx->{ibx}->isrch or
 		return PublicInbox::WWW::need($ctx, 'Search');
-	my $over = $ctx->{ibx}->over or
-		return PublicInbox::WWW::need($ctx, 'Overview');
 
 	my $qopts = $ctx->{qopts} = { relevance => -2 }; # ORDER BY docid DESC
 
 	# {threadid} limits results to a given thread
 	# {threads} collapses results from messages in the same thread,
 	# allowing us to use ->expand_thread w/o duplicates in our own code
-	$qopts->{threadid} = $over->mid2tid($ctx->{mid}) if defined($ctx->{mid});
+	if (defined($ctx->{mid})) {
+		my $over = ($ctx->{ibx}->{isrch} ?
+				$ctx->{ibx}->{isrch}->{es}->over :
+				$ctx->{ibx}->over) or
+			return PublicInbox::WWW::need($ctx, 'Overview');
+		$qopts->{threadid} = $over->mid2tid($ctx->{mid});
+	}
 	$qopts->{threads} = 1 if $q->{t};
 	$srch->query_approxidate($ctx->{ibx}->git, $q_string);
 	my $mset = $srch->mset($q_string, $qopts);
diff --git a/t/extindex-psgi.t b/t/extindex-psgi.t
index 98dc2e48..f10ffbb6 100644
--- a/t/extindex-psgi.t
+++ b/t/extindex-psgi.t
@@ -1,5 +1,5 @@
 #!perl -w
-# Copyright (C) 2020-2021 all contributors <meta@public-inbox.org>
+# Copyright (C) all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 use strict;
 use v5.10.1;
@@ -21,7 +21,28 @@ mkdir "$home/.public-inbox" or BAIL_OUT $!;
 my $pi_config = "$home/.public-inbox/config";
 cp($cfg_path, $pi_config) or BAIL_OUT;
 my $env = { HOME => $home };
-run_script([qw(-extindex --all), "$tmpdir/eidx"], $env) or BAIL_OUT;
+my $m2t = create_inbox 'mid2tid', version => 2, indexlevel => 'basic', sub {
+	my ($im, $ibx) = @_;
+	for my $n (1..3) {
+		$im->add(PublicInbox::Eml->new(<<EOM)) or xbail 'add';
+Date: Fri, 02 Oct 1993 00:0$n:00 +0000
+Message-ID: <t\@$n>
+Subject: tid $n
+From: x\@example.com
+References: <a-mid\@b>
+
+$n
+EOM
+		$im->add(PublicInbox::Eml->new(<<EOM)) or xbail 'add';
+Date: Fri, 02 Oct 1993 00:0$n:00 +0000
+Message-ID: <ut\@$n>
+Subject: unrelated tid $n
+From: x\@example.com
+References: <b-mid\@b>
+
+EOM
+	}
+};
 {
 	open my $cfgfh, '>>', $pi_config or BAIL_OUT;
 	$cfgfh->autoflush(1);
@@ -32,8 +53,14 @@ run_script([qw(-extindex --all), "$tmpdir/eidx"], $env) or BAIL_OUT;
 [publicinbox]
 	wwwlisting = all
 	grokManifest = all
+[publicinbox "m2t"]
+	inboxdir = $m2t->{inboxdir}
+	address = $m2t->{-primary_address}
 EOM
+	close $cfgfh or xbail "close: $!";
 }
+
+run_script([qw(-extindex --all), "$tmpdir/eidx"], $env) or BAIL_OUT;
 my $www = PublicInbox::WWW->new(PublicInbox::Config->new($pi_config));
 my $client = sub {
 	my ($cb) = @_;
@@ -83,6 +110,14 @@ my $client = sub {
 		't2 manifest');
 	is_deeply([ sort keys %{$m->{'/t1'}} ], [ '/t1' ],
 		't2 manifest');
+
+	# ensure ibx->{isrch}->{es}->over is used instead of ibx->over:
+	$res = $cb->(POST("/m2t/t\@1/?q=dt:19931002000259..&x=m"));
+	is($res->code, 200, 'hit on mid2tid query');
+	$res = $cb->(POST("/m2t/t\@1/?q=dt:19931002000400..&x=m"));
+	is($res->code, 404, '404 on out-of-range mid2tid query');
+	$res = $cb->(POST("/m2t/t\@1/?q=s:unrelated&x=m"));
+	is($res->code, 404, '404 on cross-thread search');
 };
 test_psgi(sub { $www->call(@_) }, $client);
 %$env = (%$env, TMPDIR => $tmpdir, PI_CONFIG => $pi_config);

  reply	other threads:[~2023-06-16 23:13 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-27 15:08 Cheap way to check for new messages in a thread Konstantin Ryabitsev
2023-03-27 19:10 ` Eric Wong
2023-03-27 20:47   ` Konstantin Ryabitsev
2023-03-27 21:38     ` Eric Wong
2023-03-28 14:04       ` Konstantin Ryabitsev
2023-03-28 19:45         ` Eric Wong
2023-03-28 20:00           ` Konstantin Ryabitsev
2023-03-28 22:08             ` Eric Wong
2023-03-28 23:30               ` Konstantin Ryabitsev
2023-03-29 21:25                 ` Eric Wong
2023-03-30 11:29                   ` Eric Wong
2023-03-30 16:45                     ` Konstantin Ryabitsev
2023-03-31  1:40                       ` Eric Wong
2023-04-11 11:27                         ` Eric Wong
2023-06-16 19:11                     ` Konstantin Ryabitsev
2023-06-16 23:13                       ` Eric Wong [this message]
2023-06-21 17:11                         ` [PATCH] www: use correct threadid for per-thread search Konstantin Ryabitsev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230616231301.M394415@dcvr \
    --to=e@80x24.org \
    --cc=konstantin@linuxfoundation.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).