From: Eric Wong <e@80x24.org>
To: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Cc: meta@public-inbox.org
Subject: [RFC] www: add topics.html endpoint [was: Query to see all new "topics"]
Date: Thu, 9 Nov 2023 02:45:08 +0000 [thread overview]
Message-ID: <20231109024508.M429662@dcvr> (raw)
In-Reply-To: <20231107-skilled-cobra-of-swiftness-a6ff26@meerkat>
Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> Hello:
>
> Following the discussion on the ksummit list [1], I wanted to give someone a query
> they could use to keep an eye on any new threads. Is there a xapian query that
> can be used to effectively say "return just top-level messages and exclude any
> follow-ups"? It's not quite as simple as "s:* AND NOT s:Re:" because we also
> want to exclude threaded patches. Some kind of equivalent of "any messages
> without an in-reply-to/references header"?
Not easily with current Xapian schema..
It can get kinda close but you don't get the thread root with:
https://yhbt.net/lore/all/?q=rt:yesterday..&o=-1&t=1
The above isn't very useful IMHO, and also very expensive...
SQLite can actually do it pretty quickly, but it's WWW-only
(patch below): https://yhbt.net/lore/all/topics.html
I don't know if it can work with the way lei is supposed to dump
output for MUAs to consume... So maybe a custom TUI is the way
forward, but that comes with all the problems with
developing+maintaining a TUI I wrote about[1] previously...
-------8<------
Subject: [PATCH] www: add topics.html endpoint
This seems like a easy (but WWW-specific) way to get recent
topics as suggested by Konstantin. Perhaps an Atom endpoint
will also be useful.
To do this with Xapian would require a new columns and
reindexing; and I'm not sure if the current lei handling of
search results by dumping results to a format readable by common
MUAs would work well with this.
Suggested-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Link: https://public-inbox.org/meta/20231107-skilled-cobra-of-swiftness-a6ff26@meerkat/
---
MANIFEST | 1 +
lib/PublicInbox/WWW.pm | 9 ++++++
lib/PublicInbox/WwwStream.pm | 1 +
lib/PublicInbox/WwwTopics.pm | 55 ++++++++++++++++++++++++++++++++++++
t/extindex-psgi.t | 6 ++++
t/plack.t | 9 ++++--
6 files changed, 78 insertions(+), 3 deletions(-)
create mode 100644 lib/PublicInbox/WwwTopics.pm
diff --git a/MANIFEST b/MANIFEST
index 51dcffaf..e1c3dc97 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -371,6 +371,7 @@ lib/PublicInbox/WwwListing.pm
lib/PublicInbox/WwwStatic.pm
lib/PublicInbox/WwwStream.pm
lib/PublicInbox/WwwText.pm
+lib/PublicInbox/WwwTopics.pm
lib/PublicInbox/XapClient.pm
lib/PublicInbox/XapHelper.pm
lib/PublicInbox/XapHelperCxx.pm
diff --git a/lib/PublicInbox/WWW.pm b/lib/PublicInbox/WWW.pm
index d2bd68ea..dcaf93cb 100644
--- a/lib/PublicInbox/WWW.pm
+++ b/lib/PublicInbox/WWW.pm
@@ -101,6 +101,8 @@ sub call {
invalid_inbox($ctx, $1) || get_atom($ctx);
} elsif ($path_info =~ m!$INBOX_RE/new\.html\z!o) {
invalid_inbox($ctx, $1) || get_new($ctx);
+ } elsif ($path_info =~ m!$INBOX_RE/topics\.html\z!o) {
+ invalid_inbox($ctx, $1) || get_topics($ctx);
} elsif ($path_info =~ m!$INBOX_RE/description\z!o) {
get_description($ctx, $1);
} elsif ($path_info =~ m!$INBOX_RE/(?:(?:git/)?([0-9]+)(?:\.git)?/)?
@@ -270,6 +272,13 @@ sub get_new {
PublicInbox::Feed::new_html($ctx);
}
+# /$INBOX/topics.html -> HTML only
+sub get_topics {
+ my ($ctx) = @_;
+ require PublicInbox::WwwTopics;
+ PublicInbox::WwwTopics::topics_html($ctx) || r404($ctx);
+}
+
# /$INBOX/?r=$GIT_COMMIT -> HTML only
sub get_index {
my ($ctx) = @_;
diff --git a/lib/PublicInbox/WwwStream.pm b/lib/PublicInbox/WwwStream.pm
index 4cbdda99..3a1d6edf 100644
--- a/lib/PublicInbox/WwwStream.pm
+++ b/lib/PublicInbox/WwwStream.pm
@@ -113,6 +113,7 @@ sub html_top ($) {
qq(<a\nid=mirror) .
qq(\nhref="${upfx}_/text/mirror/">mirror</a>$code / ).
qq(<a\nhref="$atom">Atom feed</a>);
+ $links .= delete($ctx->{-html_more_links}) if $ctx->{-html_more_links};
if ($ibx->isrch) {
my $q_val = delete($ctx->{-q_value_html}) // '';
$q_val = qq(\nvalue="$q_val") if $q_val ne '';
diff --git a/lib/PublicInbox/WwwTopics.pm b/lib/PublicInbox/WwwTopics.pm
new file mode 100644
index 00000000..5605cfbe
--- /dev/null
+++ b/lib/PublicInbox/WwwTopics.pm
@@ -0,0 +1,55 @@
+# Copyright (C) all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+
+package PublicInbox::WwwTopics;
+use v5.12;
+use autodie qw(open);
+use PublicInbox::Hval qw(ascii_html mid_href fmt_ts);
+use PublicInbox::WwwStream;
+
+sub add_topic_line ($$$) {
+ my (undef, $prev, $nr) = @_;
+ my $s = ascii_html($prev->{subject});
+ $s = '(no subject)' if $s eq '';
+ $_[0] .= "\n".fmt_ts($prev->{ds}).
+ qq{ <a\nhref="}.mid_href($prev->{mid}).qq{/#r">$s</a>};
+ $_[0] .= " $nr+ messages" if $nr > 1;
+}
+
+sub topics_html { # GET /$INBOX_NAME/topics.html
+ my ($ctx) = @_;
+ my $over = $ctx->{ibx}->over or
+ return $ctx->{www}->can('need')->($ctx,'Overview');
+
+ # XXX there is likely faster ways to do this.
+ # OTOH SQLite tends to be faster with multiple simple queries
+ # rather than more complex ones
+ my $msgs = $over->do_get(<<EOS, { limit => 10000 });
+SELECT num,ts,ds,tid,ddd FROM over WHERE tid IN
+(SELECT DISTINCT(tid) FROM over WHERE tid > 0 ORDER BY tid DESC LIMIT 200)
+AND +num > 0
+ORDER BY tid,ts ASC
+EOS
+ # can't use SQL to filter references since our schema wasn't designed
+ # for it, but our SQL sorts by ascending time to favor top-level
+ # messages while our final result (post-references filter) favors
+ # recent messages
+ chomp($ctx->{-html_more_links} = <<EOM);
+\n- recent:[<a href="./">subjects (threaded)</a>|topics] (all times UTC)
+EOM
+ my $buf = '<pre>';
+ my ($nr, $prev);
+ while (my $smsg = pop @$msgs) {
+ if ($prev && $smsg->{tid} != $prev->{tid}) {
+ add_topic_line($buf, $prev, $nr);
+ $nr = 0;
+ }
+ ++$nr;
+ $prev = $smsg;
+ }
+ add_topic_line($buf, $prev, $nr) if $prev;
+ $buf .= '</pre>';
+ PublicInbox::WwwStream::html_oneshot($ctx, 200, $buf);
+}
+
+1;
diff --git a/t/extindex-psgi.t b/t/extindex-psgi.t
index f71210a5..9e0c7dc3 100644
--- a/t/extindex-psgi.t
+++ b/t/extindex-psgi.t
@@ -118,6 +118,12 @@ my $client = sub {
is($res->code, 404, '404 on out-of-range mid2tid query');
$res = $cb->(POST("/m2t/t\@1/?q=s:unrelated&x=m"));
is($res->code, 404, '404 on cross-thread search');
+
+
+ $res = $cb->(GET('/m2t/topics.html'));
+ is($res->code, 200, 'topics.html on basic v2');
+ $res = $cb->(GET('/all/topics.html'));
+ is($res->code, 200, 'topics.html on extindex');
};
test_psgi(sub { $www->call(@_) }, $client);
%$env = (%$env, TMPDIR => $tmpdir, PI_CONFIG => $pi_config);
diff --git a/t/plack.t b/t/plack.t
index 7f80f488..7ec35e7a 100644
--- a/t/plack.t
+++ b/t/plack.t
@@ -204,9 +204,12 @@ my $c1 = sub {
my $raw = PublicInbox::Eml->new(\$body);
is($raw->body_raw, $eml->body_raw, 'ISO-2022-JP body unmodified');
- $res = $cb->(GET($pfx . '/blah@example.com/t.mbox.gz'));
- is(501, $res->code, '501 when overview missing');
- like($res->content, qr!\bOverview\b!, 'overview omission noted');
+ for my $u (qw(blah@example.com/t.mbox.gz topics.html)) {
+ $res = $cb->(GET("$pfx/$u"));
+ is(501, $res->code, "501 on /$u when overview missing");
+ like($res->content, qr!\bOverview\b!,
+ "overview omission noted for /$u");
+ }
# legacy redirects
for my $t (qw(m f)) {
[1] https://public-inbox.org/meta/20230922203353.M780211@dcvr/
next prev parent reply other threads:[~2023-11-09 2:45 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-07 18:07 Query to see all new "topics" Konstantin Ryabitsev
2023-11-09 2:45 ` Eric Wong [this message]
2023-11-09 18:10 ` [RFC] www: add topics.html endpoint [was: Query to see all new "topics"] Konstantin Ryabitsev
2023-11-10 3:09 ` [RFC v2] www: add topics_(new|active).(html|atom) endpoints Eric Wong
2023-11-10 17:16 ` Konstantin Ryabitsev
2023-11-10 22:23 ` Eric Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231109024508.M429662@dcvr \
--to=e@80x24.org \
--cc=konstantin@linuxfoundation.org \
--cc=meta@public-inbox.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).