From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.2 required=3.0 tests=ALL_TRUSTED,AWL,BAYES_00, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, T_SCC_BODY_TEXT_LINE shortcircuit=no autolearn=ham autolearn_force=no version=3.4.6 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id BCE521F7BE; Mon, 17 Jun 2024 00:01:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=80x24.org; s=selector1; t=1718582500; bh=LLloIkzPy71fw6i1d6OpLhra5nCwiK3XEwEooQ4hjZo=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=uF+2rMPOS5V14q6fRC8tasWL7UnXYOESXFe6LCL3k2cbeC/kJRIaULNARo70g8nO2 E/Yxp/l8NLQQqJmZZGYelqVZMQLi9dvrsXTT+VQRxXo+S5Q4ToFcURBrJ6dlQaVFFR YJXI/Porx5THtiPk5ZkiIHxFsi6N0ZLL4OaTaqRg= Date: Mon, 17 Jun 2024 00:01:40 +0000 From: Eric Wong To: Junio C Hamano Cc: meta@public-inbox.org Subject: [PATCH] www: strip and redirect on `<' and `>' in MSGID of URL Message-ID: <20240617000140.M652375@dcvr> References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: List-Id: Junio C Hamano wrote: > When I have a specific message on a mailing list, and I am > interested in the discussion around the message, I often go to the > URL of that message in public-inbox powered mailing list archive. > For example, I go to > > https://public-inbox.org/meta/20240606074416.3900983-1-e@80x24.org/ > > when I find "Message-ID: <20240606074416.3900983-1-e@80x24.org>" > > It would be immensely convenient if cutting and pasting including > the surrounding , i.e. > > https://public-inbox.org/meta/<20240606074416.3900983-1-e@80x24.org>/ > > is silently accepted and redirected to > > https://public-inbox.org/meta/20240606074416.3900983-1-e@80x24.org/ > > instead of the "partial matches found" page. Seems reasonable; especially since sr.ht uses <> in URLs nowadays and some users may be conditioned to include them. I don't see 404s in my logs from this, but I don't keep a lot of logs. ------8<------ Subject: [PATCH] www: strip and redirect on `<' and `>' in MSGID of URL Some users may needlessly include `<' and `>' braces in URLs, so account for this common mistake and redirect users to the non-braced URL. This common mistake could be learned behavior from other sites (e.g. sr.ht) which include `<' and `>' in URLs. Reported-by: Junio C Hamano Link: https://public-inbox.org/meta/xmqqtthvh4r6.fsf@gitster.g/ --- lib/PublicInbox/View.pm | 10 +++++++--- t/psgi_search.t | 11 +++++++++++ 2 files changed, 18 insertions(+), 3 deletions(-) diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm index dcceb311..cc1ab79a 100644 --- a/lib/PublicInbox/View.pm +++ b/lib/PublicInbox/View.pm @@ -74,9 +74,13 @@ sub msg_page { my ($id, $prev); my $next_arg = $ctx->{next_arg} = [ $ctx->{mid}, \$id, \$prev ]; - my $smsg = $ctx->{smsg} = $over->next_by_mid(@$next_arg) or - return; # undef == 404 - + my $smsg = $ctx->{smsg} = $over->next_by_mid(@$next_arg); + if (!$smsg && $ctx->{mid} =~ /\A\<(.+)\>\z/ and + ($next_arg->[0] = $1) and + ($over->next_by_mid(@$next_arg))) { + return PublicInbox::WWW::r301($ctx, undef, $next_arg->[0]); + } + $smsg or return; # undef=404 # allow user to easily browse the range around this message if # they have ->over $ctx->{-t_max} = $smsg->{ts}; diff --git a/t/psgi_search.t b/t/psgi_search.t index 8c981c6c..759dab78 100644 --- a/t/psgi_search.t +++ b/t/psgi_search.t @@ -179,6 +179,17 @@ test_psgi(sub { $www->call(@_) }, sub { $res = $cb->(GET(q{/test/?q=%22s'more%22&x=A})); is $res->code, 200, 'single quote inside phrase'; + + $res = $cb->(GET("/test/<$mid>/")); + is $res->code, 301, "redirect for raw `<' and `>' in msgid"; + like $res->header('location'), qr!/test/\Q$mid\E/\z!, + "redirected to URL without raw `<' and `>'"; + + $res = $cb->(GET("/test/%3c$mid%3e/")); + is $res->code, 301, "redirect for escaped `<' and `>' in msgid"; + like $res->header('location'), qr!/test/\Q$mid\E/\z!, + "redirected to URL without escaped `<' and `>'"; + # TODO: more tests and odd cases });