From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.2 required=3.0 tests=ALL_TRUSTED,AWL,BAYES_00, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, T_SCC_BODY_TEXT_LINE shortcircuit=no autolearn=ham autolearn_force=no version=3.4.6 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id E91171F406; Fri, 10 Nov 2023 03:09:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=80x24.org; s=selector1; t=1699585800; bh=Lx0rozLY6xnzTI3xlCjAUE1Bs+p134bDcZum1XAl0c4=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=5tv6Rhwf64M1BYLEwsSG7lftKD6FVy2ulrfT7FmWMd9rN7zHszGv4Obdl+zJeK+3W YLS6Bi6oE/qiFGDlxh0fSj+vNn5h/ey0mDj8eRMDDTNLkc0aEoANMEIZD4PJBDxGal yVQ7F0yvwC3/mLe4rVoTTdlywZsbFnZAXYxIdX2I= Date: Fri, 10 Nov 2023 03:09:59 +0000 From: Eric Wong To: Konstantin Ryabitsev Cc: meta@public-inbox.org Subject: [RFC v2] www: add topics_(new|active).(html|atom) endpoints Message-ID: <20231110030959.M879021@dcvr> References: <20231107-skilled-cobra-of-swiftness-a6ff26@meerkat> <20231109024508.M429662@dcvr> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: List-Id: Konstantin Ryabitsev wrote: > On Thu, Nov 09, 2023 at 02:45:08AM +0000, Eric Wong wrote: > > This seems like a easy (but WWW-specific) way to get recent > > topics as suggested by Konstantin. Perhaps an Atom endpoint > > will also be useful. > > Yes, actually thinking about this some more, perhaps it makes sense to expose > this as an RSS feed feature (maybe even exclusively as an RSS feed feature?). I assume Atom is OK? I don't know of any widely-used feed readers which only do RSS without Atom support. IIRC Atom is less ambiguous and supports the in-reply-to extension. That said, the Atom feeds generated by this RFC includes full messages because that's the easiest way to tie into our existing Atom generation code, so it's currently slower than the HTML version which never retrieves git blobs. > Have two different feeds: > > - new topics: just all the new threads > - hot topics: NN most active threads (kinda lkml.org's "hottest messages") I'm not sure if `hot' means it's the most read (not just replied-to); but tracking read counts isn't something that scales on decentralized systems. So I'm naming it "active" instead... > Have this available per-list and for the extindex -- I think this would be > a great feature that we can point people at as a mechanism to keep an eye on > overall activity. Yeah, lots of the WWW and lei code works transparently between extindex and regular inboxes: extindex: https://yhbt.net/lore/all/topics_new.atom https://yhbt.net/lore/all/topics_active.atom https://yhbt.net/lore/all/topics_new.html https://yhbt.net/lore/all/topics_active.html v2: https://yhbt.net/lore/lkml/topics_new.atom https://yhbt.net/lore/lkml/topics_active.atom https://yhbt.net/lore/lkml/topics_new.html https://yhbt.net/lore/lkml/topics_active.html v1: https://public-inbox.org/git/topics_new.atom https://public-inbox.org/git/topics_active.atom https://public-inbox.org/git/topics_new.html https://public-inbox.org/git/topics_active.html > I haven't tried your patch yet -- doubt I will be able before coming back from > Plumbers next week. No worries. master seems pretty stable these days and I should really get my brain around getting cindex wired up to WWW to cut a release... Anyways, this replaces my prior RFC. I've remembered to use "GROUP BY" so the SQL is a bit faster than before. -----------8<---------- Subject: [RFC v2] www: add topics_(new|active).(html|atom) endpoints This seems like a easy (but WWW-specific) way to get recently created and recently active topics as suggested by Konstantin. To do this with Xapian will require a new columns and reindexing; and I'm not sure if the current lei handling of search results by dumping results to a format readable by common MUAs would work well with this. A new TUI may be required... Suggested-by: Konstantin Ryabitsev Link: https://public-inbox.org/meta/20231107-skilled-cobra-of-swiftness-a6ff26@meerkat/ --- MANIFEST | 1 + lib/PublicInbox/WWW.pm | 15 +++++- lib/PublicInbox/WwwAtomStream.pm | 11 ++-- lib/PublicInbox/WwwStream.pm | 1 + lib/PublicInbox/WwwTopics.pm | 86 ++++++++++++++++++++++++++++++++ t/extindex-psgi.t | 8 +++ t/plack.t | 10 ++-- 7 files changed, 122 insertions(+), 10 deletions(-) create mode 100644 lib/PublicInbox/WwwTopics.pm diff --git a/MANIFEST b/MANIFEST index 51dcffaf..e1c3dc97 100644 --- a/MANIFEST +++ b/MANIFEST @@ -371,6 +371,7 @@ lib/PublicInbox/WwwListing.pm lib/PublicInbox/WwwStatic.pm lib/PublicInbox/WwwStream.pm lib/PublicInbox/WwwText.pm +lib/PublicInbox/WwwTopics.pm lib/PublicInbox/XapClient.pm lib/PublicInbox/XapHelper.pm lib/PublicInbox/XapHelperCxx.pm diff --git a/lib/PublicInbox/WWW.pm b/lib/PublicInbox/WWW.pm index d2bd68ea..6b616bd4 100644 --- a/lib/PublicInbox/WWW.pm +++ b/lib/PublicInbox/WWW.pm @@ -101,6 +101,9 @@ sub call { invalid_inbox($ctx, $1) || get_atom($ctx); } elsif ($path_info =~ m!$INBOX_RE/new\.html\z!o) { invalid_inbox($ctx, $1) || get_new($ctx); + } elsif ($path_info =~ + m!$INBOX_RE/topics_(new|active)\.(atom|html)\z!o) { + get_topics($ctx, $1, $2, $3); } elsif ($path_info =~ m!$INBOX_RE/description\z!o) { get_description($ctx, $1); } elsif ($path_info =~ m!$INBOX_RE/(?:(?:git/)?([0-9]+)(?:\.git)?/)? @@ -270,6 +273,13 @@ sub get_new { PublicInbox::Feed::new_html($ctx); } +# /$INBOX/topics_(new|active).(html|atom) +sub get_topics { + my ($ctx, $ibx_name, $category, $type) = @_; + require PublicInbox::WwwTopics; + PublicInbox::WwwTopics::response($ctx, $ibx_name, $category, $type); +} + # /$INBOX/?r=$GIT_COMMIT -> HTML only sub get_index { my ($ctx) = @_; @@ -338,11 +348,12 @@ sub get_altid_dump { } sub need { - my ($ctx, $extra) = @_; + my ($ctx, $extra, $upref) = @_; require PublicInbox::WwwStream; + $upref //= '../'; PublicInbox::WwwStream::html_oneshot($ctx, 501, <$extra is not available for this public-inbox -Return to index +Return to index EOF } diff --git a/lib/PublicInbox/WwwAtomStream.pm b/lib/PublicInbox/WwwAtomStream.pm index 737cc6cb..26b366f5 100644 --- a/lib/PublicInbox/WwwAtomStream.pm +++ b/lib/PublicInbox/WwwAtomStream.pm @@ -99,15 +99,16 @@ sub atom_header { $base_url .= '?' . $search_q->qs_html(x => undef); $self_url .= '?' . $search_q->qs_html; $page_id = to_uuid("q\n".$query); + } elsif (defined(my $cat = $ctx->{topic_category})) { + $title = title_tag("$cat topics - ".$ibx->description); + $self_url .= "topics_$cat.atom"; } else { $title = title_tag($ibx->description); $self_url .= 'new.atom'; - if (defined(my $addr = $ibx->{-primary_address})) { - $page_id = "mailto:$addr"; - } else { - $page_id = to_uuid($self_url); - } + my $addr = $ibx->{-primary_address}; + $page_id = "mailto:$addr" if defined $addr; } + $page_id //= to_uuid($self_url); qq(\n) . qq() . diff --git a/lib/PublicInbox/WwwStream.pm b/lib/PublicInbox/WwwStream.pm index 4cbdda99..3a1d6edf 100644 --- a/lib/PublicInbox/WwwStream.pm +++ b/lib/PublicInbox/WwwStream.pm @@ -113,6 +113,7 @@ sub html_top ($) { qq(mirror$code / ). qq(Atom feed); + $links .= delete($ctx->{-html_more_links}) if $ctx->{-html_more_links}; if ($ibx->isrch) { my $q_val = delete($ctx->{-q_value_html}) // ''; $q_val = qq(\nvalue="$q_val") if $q_val ne ''; diff --git a/lib/PublicInbox/WwwTopics.pm b/lib/PublicInbox/WwwTopics.pm new file mode 100644 index 00000000..ad85a46d --- /dev/null +++ b/lib/PublicInbox/WwwTopics.pm @@ -0,0 +1,86 @@ +# Copyright (C) all contributors +# License: AGPL-3.0+ + +package PublicInbox::WwwTopics; +use v5.12; +use PublicInbox::Hval qw(ascii_html mid_href fmt_ts); + +sub add_topic_html ($$) { + my (undef, $smsg) = @_; + my $s = ascii_html($smsg->{subject}); + $s = '(no subject)' if $s eq ''; + $_[0] .= "\n".fmt_ts($smsg->{'MAX(ds)'} // $smsg->{ds}) . + qq{ {mid}).qq{/#r">$s}; + my $nr = $smsg->{'COUNT(num)'}; + $_[0] .= " $nr+ messages" if $nr > 1; +} + +# n.b. the `SELECT DISTINCT(tid)' subquery is critical for performance +# with giant inboxes and extindices +sub topics_new ($) { + $_[0]->do_get(< 0 ORDER BY ts DESC LIMIT 200) +AND +num > 0 +GROUP BY tid +ORDER BY ds ASC +EOS +} + +sub topics_active ($) { + $_[0]->do_get(< 0 ORDER BY ts DESC LIMIT 200) +AND +num > 0 +GROUP BY tid +ORDER BY ds ASC +EOS +} + +sub topics_i { pop @{$_[0]->{msgs}} } + +sub topics_atom { # GET /$INBOX_NAME/topics_(new|active).atom + my ($ctx) = @_; + require PublicInbox::WwwAtomStream; + my ($hdr, $smsg, $val); + $_->{ds} //= $_->{'MAX(ds)'} // 0 for @{$ctx->{msgs}}; + PublicInbox::WwwAtomStream->response($ctx, \&topics_i); +} + +sub topics_html { # GET /$INBOX_NAME/topics_(new|active).html + my ($ctx) = @_; + require PublicInbox::WwwStream; + my $buf = '
';
+	$ctx->{-html_more_links} = qq{\n- recent:[subjects (threaded)|};
+
+	if ($ctx->{topic_category} eq 'new') {
+		$ctx->{-html_more_links} .= qq{topics (new)|topics (active)]};
+	} else { # topic_category eq "active" - topics with recent replies
+		$ctx->{-html_more_links} .= qq{topics (new)|topics (active)]};
+	}
+	# can't use SQL to filter references since our schema wasn't designed
+	# for it, but our SQL sorts by ascending time to favor top-level
+	# messages while our final result (post-references filter) favors
+	# recent messages
+	my $msgs = delete $ctx->{msgs};
+	add_topic_html($buf, pop @$msgs) while scalar(@$msgs);
+	$buf .= '
'; + PublicInbox::WwwStream::html_oneshot($ctx, 200, $buf); +} + +sub response { + my ($ctx, $ibx_name, $category, $type) = @_; + my ($ret, $over); + $ret = PublicInbox::WWW::invalid_inbox($ctx, $ibx_name) and return $ret; + $over = $ctx->{ibx}->over or + return PublicInbox::WWW::need($ctx, 'Overview', './'); + $ctx->{msgs} = $category eq 'new' ? topics_new($over) : + topics_active($over); + $ctx->{topic_category} = $category; + $type eq 'atom' ? topics_atom($ctx) : topics_html($ctx); +} + +1; diff --git a/t/extindex-psgi.t b/t/extindex-psgi.t index f71210a5..896c46ff 100644 --- a/t/extindex-psgi.t +++ b/t/extindex-psgi.t @@ -118,6 +118,14 @@ my $client = sub { is($res->code, 404, '404 on out-of-range mid2tid query'); $res = $cb->(POST("/m2t/t\@1/?q=s:unrelated&x=m")); is($res->code, 404, '404 on cross-thread search'); + + + for my $c (qw(new active)) { + $res = $cb->(GET("/m2t/topics_$c.html")); + is($res->code, 200, "topics_$c.html on basic v2"); + $res = $cb->(GET("/all/topics_$c.html")); + is($res->code, 200, "topics_$c.html on extindex"); + } }; test_psgi(sub { $www->call(@_) }, $client); %$env = (%$env, TMPDIR => $tmpdir, PI_CONFIG => $pi_config); diff --git a/t/plack.t b/t/plack.t index 7f80f488..07cab12a 100644 --- a/t/plack.t +++ b/t/plack.t @@ -204,9 +204,13 @@ my $c1 = sub { my $raw = PublicInbox::Eml->new(\$body); is($raw->body_raw, $eml->body_raw, 'ISO-2022-JP body unmodified'); - $res = $cb->(GET($pfx . '/blah@example.com/t.mbox.gz')); - is(501, $res->code, '501 when overview missing'); - like($res->content, qr!\bOverview\b!, 'overview omission noted'); + for my $u (qw(blah@example.com/t.mbox.gz topics_new.html + topics_active.html)) { + $res = $cb->(GET("$pfx/$u")); + is(501, $res->code, "501 on /$u when overview missing"); + like($res->content, qr!\bOverview\b!, + "overview omission noted for /$u"); + } # legacy redirects for my $t (qw(m f)) {