unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* [PATCH 0/5] prefer shorter, less-ambiguous URLs
@ 2015-08-27  4:33 Eric Wong
  2015-08-27  4:33 ` [PATCH 1/5] www: minor cleanups to shorten code Eric Wong
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Eric Wong @ 2015-08-27  4:33 UTC (permalink / raw)
  To: meta

Unfortunately, it's possible to have Message-IDs which end in '.txt',
'.html' or some other suffix we might use.  Instead of '.html',
use '/' as a suffix to allow '/raw' for the mbox version (following
a lead from gmane).

In summary:

	/m/$MESSAGE_ID.html    -> /m/$MESSAGE_ID/
	/m/$MESSAGE_ID.txt     -> /m/$MESSAGE_ID/raw
	/f/$MESSAGE_ID.html    -> /f/$MESSAGE_ID/
	/t/$MESSAGE_ID.html    -> /t/$MESSAGE_ID/
	/t/$MESSAGE_ID.mbox.gz -> /t/$MESSAGE_ID/mbox.gz

Redirects for old URLs remain in place to not break existing
links.

Eric Wong (5):
      www: minor cleanups to shorten code
      wire up shorter, less ambiguous URLs
      mid: extract Message-ID from inside '<>'
      wire up to display non-suffixed Message-ID links
      implement legacy redirects for old URLs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/5] www: minor cleanups to shorten code
  2015-08-27  4:33 [PATCH 0/5] prefer shorter, less-ambiguous URLs Eric Wong
@ 2015-08-27  4:33 ` Eric Wong
  2015-08-27  4:33 ` [PATCH 2/5] wire up shorter, less ambiguous URLs Eric Wong
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2015-08-27  4:33 UTC (permalink / raw)
  To: meta; +Cc: Eric Wong

Less scrolling is more efficient.
---
 lib/PublicInbox/WWW.pm | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/lib/PublicInbox/WWW.pm b/lib/PublicInbox/WWW.pm
index d1ee2ff..527d213 100644
--- a/lib/PublicInbox/WWW.pm
+++ b/lib/PublicInbox/WWW.pm
@@ -138,7 +138,6 @@ sub mid2blob {
 	my $path = PublicInbox::MID::mid2path($ctx->{mid});
 	my @cmd = ('git', "--git-dir=$ctx->{git_dir}",
 			qw(cat-file blob), "HEAD:$path");
-	my $cmd = join(' ', @cmd);
 	my $pid = open my $fh, '-|';
 	defined $pid or die "fork failed: $!\n";
 	if ($pid == 0) {
@@ -162,8 +161,7 @@ sub get_mid_txt {
 # /$LISTNAME/m/$MESSAGE_ID.html                   -> HTML content (short quotes)
 sub get_mid_html {
 	my ($ctx) = @_;
-	my $x = mid2blob($ctx);
-	return r404() unless $x;
+	my $x = mid2blob($ctx) or return r404();
 
 	require PublicInbox::View;
 	my $pfx = msg_pfx($ctx);
@@ -178,8 +176,8 @@ sub get_mid_html {
 # /$LISTNAME/f/$MESSAGE_ID.html                   -> HTML content (fullquotes)
 sub get_full_html {
 	my ($ctx) = @_;
-	my $x = mid2blob($ctx);
-	return r404() unless $x;
+	my $x = mid2blob($ctx) or return r404();
+
 	require PublicInbox::View;
 	my $foot = footer($ctx);
 	require Email::MIME;
-- 
EW


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/5] wire up shorter, less ambiguous URLs
  2015-08-27  4:33 [PATCH 0/5] prefer shorter, less-ambiguous URLs Eric Wong
  2015-08-27  4:33 ` [PATCH 1/5] www: minor cleanups to shorten code Eric Wong
@ 2015-08-27  4:33 ` Eric Wong
  2015-08-27  4:34 ` [PATCH 3/5] mid: extract Message-ID from inside '<>' Eric Wong
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2015-08-27  4:33 UTC (permalink / raw)
  To: meta; +Cc: Eric Wong

We will prefer URLs without suffixes for now to avoid ambiguity
in case a Message-ID ends with ".html", ".txt", ".mbox.gz" or
any other suffix we may use.

Static file compatibility is preserved by using a trailing slash
as most servers can/will fall back to an index.html file in this
case.

For raw text files, we will follow gmane's lead with "/raw"
---
 lib/PublicInbox/WWW.pm | 13 ++++++++++---
 t/cgi.t                |  2 +-
 t/plack.t              | 19 +++++++++++++++++++
 3 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/WWW.pm b/lib/PublicInbox/WWW.pm
index 527d213..ca338fb 100644
--- a/lib/PublicInbox/WWW.pm
+++ b/lib/PublicInbox/WWW.pm
@@ -40,12 +40,18 @@ sub run {
 		invalid_list(\%ctx, $1) || get_atom(\%ctx);
 
 	# single-message pages
+	} elsif ($path_info =~ m!$LISTNAME_RE/m/(\S+)/\z!o) {
+		invalid_list_mid(\%ctx, $1, $2) || get_mid_html(\%ctx);
+	} elsif ($path_info =~ m!$LISTNAME_RE/m/(\S+)/raw\z!o) {
+		invalid_list_mid(\%ctx, $1, $2) || get_mid_txt(\%ctx);
 	} elsif ($path_info =~ m!$LISTNAME_RE/m/(\S+)\.txt\z!o) {
 		invalid_list_mid(\%ctx, $1, $2) || get_mid_txt(\%ctx);
 	} elsif ($path_info =~ m!$LISTNAME_RE/m/(\S+)\.html\z!o) {
 		invalid_list_mid(\%ctx, $1, $2) || get_mid_html(\%ctx);
 
 	# full-message page
+	} elsif ($path_info =~ m!$LISTNAME_RE/f/(\S+)/\z!o) {
+		invalid_list_mid(\%ctx, $1, $2) || get_full_html(\%ctx);
 	} elsif ($path_info =~ m!$LISTNAME_RE/f/(\S+)\.html\z!o) {
 		invalid_list_mid(\%ctx, $1, $2) || get_full_html(\%ctx);
 
@@ -53,7 +59,8 @@ sub run {
 	} elsif ($path_info =~ m!$LISTNAME_RE/t/(\S+)\.html\z!o) {
 		invalid_list_mid(\%ctx, $1, $2) || get_thread(\%ctx);
 
-	} elsif ($path_info =~ m!$LISTNAME_RE/t/(\S+)\.mbox(\.gz)?\z!o) {
+	} elsif ($path_info =~ m!$LISTNAME_RE/t/(\S+)/mbox(\.gz)?\z!ox ||
+	         $path_info =~ m!$LISTNAME_RE/t/(\S+)\.mbox(\.gz)?\z!o) {
 		my $sfx = $3;
 		invalid_list_mid(\%ctx, $1, $2) ||
 			get_thread_mbox(\%ctx, $sfx);
@@ -325,8 +332,8 @@ sub msg_pfx {
 	"../f/$href.html";
 }
 
-# /$LISTNAME/t/$MESSAGE_ID.mbox           -> thread as mbox
-# /$LISTNAME/t/$MESSAGE_ID.mbox.gz        -> thread as gzipped mbox
+# /$LISTNAME/t/$MESSAGE_ID/mbox           -> thread as mbox
+# /$LISTNAME/t/$MESSAGE_ID/mbox.gz        -> thread as gzipped mbox
 # note: I'm not a big fan of other compression formats since they're
 # significantly more expensive on CPU than gzip and less-widely available,
 # especially on older systems.  Stick to zlib since that's what git uses.
diff --git a/t/cgi.t b/t/cgi.t
index e87f7dc..020dfe7 100644
--- a/t/cgi.t
+++ b/t/cgi.t
@@ -183,7 +183,7 @@ EOF
 {
 	local $ENV{HOME} = $home;
 	local $ENV{PATH} = $main_path;
-	my $path = "/test/t/blahblah%40example.com.mbox.gz";
+	my $path = "/test/t/blahblah%40example.com/mbox.gz";
 	my $res = cgi_run($path);
 	like($res->{head}, qr/^Status: 501 /, "search not-yet-enabled");
 	my $indexed = system($index, $maindir) == 0;
diff --git a/t/plack.t b/t/plack.t
index 85dd337..ed41ab1 100644
--- a/t/plack.t
+++ b/t/plack.t
@@ -101,6 +101,25 @@ EOF
 			qr!link\s+href="\Q$pfx\E/m/blah%40example\.com\.html"!s,
 			'atom feed generated correct URL');
 	});
+
+	foreach my $t (qw(f m)) {
+		test_psgi($app, sub {
+			my ($cb) = @_;
+			my $pfx = 'http://example.com/test';
+			my $path = "/$t/blah%40example.com/";
+			my $res = $cb->(GET($pfx . $path));
+			is(200, $res->code, "success for $path");
+			like($res->content, qr!<title>hihi - Me</title>!,
+				"HTML returned");
+		});
+	}
+	test_psgi($app, sub {
+		my ($cb) = @_;
+		my $pfx = 'http://example.com/test';
+		my $res = $cb->(GET($pfx . '/m/blah%40example.com/raw'));
+		is(200, $res->code, 'success response received for /m/*/raw');
+		like($res->content, qr!\AFrom !, "mbox returned");
+	});
 }
 
 done_testing();
-- 
EW


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/5] mid: extract Message-ID from inside '<>'
  2015-08-27  4:33 [PATCH 0/5] prefer shorter, less-ambiguous URLs Eric Wong
  2015-08-27  4:33 ` [PATCH 1/5] www: minor cleanups to shorten code Eric Wong
  2015-08-27  4:33 ` [PATCH 2/5] wire up shorter, less ambiguous URLs Eric Wong
@ 2015-08-27  4:34 ` Eric Wong
  2015-08-27  4:34 ` [PATCH 4/5] wire up to display non-suffixed Message-ID links Eric Wong
  2015-08-27  4:34 ` [PATCH 5/5] implement legacy redirects for old URLs Eric Wong
  4 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2015-08-27  4:34 UTC (permalink / raw)
  To: meta; +Cc: Eric Wong

This is necessary for some mailers which include comment text
in in the In-Reply-To header, merely assuming there is nothing
outside of '<>' as we were doing is not enough.
---
 lib/PublicInbox/MID.pm | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/MID.pm b/lib/PublicInbox/MID.pm
index 02ac709..8ca3c57 100644
--- a/lib/PublicInbox/MID.pm
+++ b/lib/PublicInbox/MID.pm
@@ -12,8 +12,9 @@ sub mid_clean {
 	my ($mid) = @_;
 	defined($mid) or die "no Message-ID";
 	# MDA->precheck did more checking for us
-	$mid =~ s/\A\s*<?//;
-	$mid =~ s/>?\s*\z//;
+	if ($mid =~ /<([^>]+)>/) {
+		$mid = $1;
+	}
 	$mid;
 }
 
-- 
EW


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 4/5] wire up to display non-suffixed Message-ID links
  2015-08-27  4:33 [PATCH 0/5] prefer shorter, less-ambiguous URLs Eric Wong
                   ` (2 preceding siblings ...)
  2015-08-27  4:34 ` [PATCH 3/5] mid: extract Message-ID from inside '<>' Eric Wong
@ 2015-08-27  4:34 ` Eric Wong
  2015-08-27  4:34 ` [PATCH 5/5] implement legacy redirects for old URLs Eric Wong
  4 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2015-08-27  4:34 UTC (permalink / raw)
  To: meta; +Cc: Eric Wong

These URLs are preferable in case somebody decides to get cute and
use a suffix we would've used to prevent others from linking to
their message.  The common /m/$MESSAGE_ID/ URLs are now 4 characters
shorter so may fit better on terminals.
---
 lib/PublicInbox/Feed.pm |  4 ++--
 lib/PublicInbox/View.pm | 40 ++++++++++++++++++++--------------------
 lib/PublicInbox/WWW.pm  |  7 +++++--
 t/cgi.t                 | 28 ++++++++++++++--------------
 t/feed.t                |  2 +-
 t/plack.t               |  4 ++--
 t/view.t                |  7 ++++---
 7 files changed, 48 insertions(+), 44 deletions(-)

diff --git a/lib/PublicInbox/Feed.pm b/lib/PublicInbox/Feed.pm
index d34978c..9e56747 100644
--- a/lib/PublicInbox/Feed.pm
+++ b/lib/PublicInbox/Feed.pm
@@ -273,7 +273,7 @@ sub add_to_feed {
 	my $mid = $header_obj->header('Message-ID');
 	defined $mid or return 0;
 	$mid = PublicInbox::Hval->new_msgid($mid);
-	my $href = $mid->as_href . '.html';
+	my $href = $mid->as_href . '/';
 	my $content = PublicInbox::View->feed_entry($mime, $fullurl . $href);
 	defined($content) or return 0;
 	$mime = undef;
@@ -362,7 +362,7 @@ sub dump_topics {
 		$mid = PublicInbox::Hval->new($mid)->as_href;
 		$subj = PublicInbox::Hval->new($subj)->as_html;
 		$u = PublicInbox::Hval->new($u)->as_html;
-		$dst .= "\n<a\nhref=\"t/$mid.html#u\"><b>$subj</b></a>\n- ";
+		$dst .= "\n<a\nhref=\"t/$mid/#u\"><b>$subj</b></a>\n- ";
 		$ts = POSIX::strftime('%Y-%m-%d %H:%M', gmtime($ts));
 		if ($n == 1) {
 			$dst .= "created by $u @ $ts UTC\n"
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 7412ccf..8ccdcfa 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -72,7 +72,7 @@ sub index_entry {
 	$subj = PublicInbox::Hval->new_oneline($subj)->as_html;
 	my $more = 'permalink';
 	my $root_anchor = $state->{root_anchor};
-	my $path = $root_anchor ? '../' : '';
+	my $path = $root_anchor ? '../../' : '';
 	my $href = $mid->as_href;
 	my $irt = $header_obj->header('In-Reply-To');
 	my ($anchor_idx, $anchor, $t_anchor);
@@ -84,7 +84,7 @@ sub index_entry {
 		$t_anchor = '';
 	}
 	if ($srch) {
-		$subj = "<a\nhref=\"${path}t/$href.html#u\">$subj</a>";
+		$subj = "<a\nhref=\"${path}t/$href/#u\">$subj</a>";
 	}
 	if ($root_anchor && $root_anchor eq $id) {
 		$subj = "<u\nid=\"u\">$subj</u>";
@@ -110,9 +110,9 @@ sub index_entry {
 	$fh->write($rv .= "\n\n");
 
 	my ($fhref, $more_ref);
-	my $mhref = "${path}m/$href.html";
+	my $mhref = "${path}m/$href/";
 	if ($level > 0) {
-		$fhref = "${path}f/$href.html";
+		$fhref = "${path}f/$href/";
 		$more_ref = \$more;
 	}
 	# scan through all parts, looking for displayable text
@@ -121,7 +121,7 @@ sub index_entry {
 	});
 	$mime->body_set('');
 
-	my $txt = "${path}m/$href.txt";
+	my $txt = "${path}m/$href/raw";
 	$rv = "\n<a\nhref=\"$mhref\">$more</a> <a\nhref=\"$txt\">raw</a> ";
 	$rv .= html_footer($mime, 0, undef, $ctx);
 
@@ -129,14 +129,14 @@ sub index_entry {
 		unless (defined $anchor) {
 			my $v = PublicInbox::Hval->new_msgid($irt);
 			$v = $v->as_href;
-			$anchor = "${path}m/$v.html";
+			$anchor = "${path}m/$v/";
 			$seen->{$anchor_idx} = $anchor;
 		}
 		$rv .= " <a\nhref=\"$anchor\">parent</a>";
 	}
 
 	if ($srch) {
-		$rv .= " <a\nhref=\"${path}t/$href.html$t_anchor\">" .
+		$rv .= " <a\nhref=\"${path}t/$href/$t_anchor\">" .
 		       "threadlink</a>";
 	}
 
@@ -173,9 +173,9 @@ sub emit_thread_html {
 	my $final_anchor = $state->{anchor_idx};
 	my $next = "<a\nid=\"s$final_anchor\">";
 	$next .= $final_anchor == 1 ? 'only message in' : 'end of';
-	$next .= " thread</a>, back to <a\nhref=\"../\">index</a>\n";
-	$mid = PublicInbox::Hval->new_msgid($mid)->as_href;
-	$next .= "download: <a\nhref=\"$mid.mbox.gz\">mbox.gz</a>\n\n";
+	$next .= " thread</a>, back to <a\nhref=\"../../\">index</a>\n";
+	# $mid = PublicInbox::Hval->new_msgid($mid)->as_href;
+	$next .= "download: <a\nhref=\"mbox.gz\">mbox.gz</a>\n\n";
 	$fh->write("<hr />" . PRE_WRAP . $next . $foot .
 		   "</pre></body></html>");
 	$fh->close;
@@ -361,7 +361,7 @@ sub headers_to_html_header {
 		} elsif ($h eq 'Subject') {
 			$title[0] = $v->as_html;
 			if ($srch) {
-				$rv .= "$h: <a\nhref=\"../t/$mid_href.html\">";
+				$rv .= "$h: <a\nhref=\"../../t/$mid_href/\">";
 				$rv .= $v->as_html . "</a>\n";
 				next;
 			}
@@ -371,8 +371,8 @@ sub headers_to_html_header {
 	}
 
 	$rv .= 'Message-ID: &lt;' . $mid->as_html . '&gt; ';
-	$mid_href = "../m/$mid_href" unless $full_pfx;
-	$rv .= "(<a\nhref=\"$mid_href.txt\">raw</a>)\n";
+	my $raw_ref = $full_pfx ? 'raw' : "../../m/$mid_href/raw";
+	$rv .= "(<a\nhref=\"$raw_ref\">raw</a>)\n";
 
 	my $irt = $header_obj->header('In-Reply-To');
 	if (defined $irt) {
@@ -380,7 +380,7 @@ sub headers_to_html_header {
 		my $html = $v->as_html;
 		my $href = $v->as_href;
 		$rv .= "In-Reply-To: &lt;";
-		$rv .= "<a\nhref=\"$href.html\">$html</a>&gt;\n";
+		$rv .= "<a\nhref=\"../$href/\">$html</a>&gt;\n";
 	}
 
 	my $refs = $header_obj->header('References');
@@ -437,12 +437,12 @@ sub html_footer {
 	my $href = "mailto:$to?In-Reply-To=$irt&Cc=${cc}&Subject=$subj";
 
 	my $srch = $ctx->{srch} if $ctx;
-	my $idx = $standalone ? " <a\nhref=\"../\">index</a>" : '';
+	my $idx = $standalone ? " <a\nhref=\"../../\">index</a>" : '';
 	if ($idx && $srch) {
 		$irt = $mime->header('In-Reply-To') || '';
 		$mid = mid_compress(mid_clean($mid));
 		my $t_anchor = length $irt ? T_ANCHOR : '';
-		$idx = " <a\nhref=\"../t/$mid.html$t_anchor\">".
+		$idx = " <a\nhref=\"../../t/$mid/$t_anchor\">".
 		       "threadlink</a>$idx";
 		my $res = $srch->get_followups($mid);
 		if (my $c = $res->{total}) {
@@ -461,7 +461,7 @@ sub html_footer {
 		if ($irt) {
 			$irt = PublicInbox::Hval->new_msgid($irt);
 			$irt = $irt->as_href;
-			$irt = "<a\nhref=\"$irt\">parent</a> ";
+			$irt = "<a\nhref=\"../$irt/\">parent</a> ";
 		} else {
 			$irt = ' ' x length('parent ');
 		}
@@ -476,7 +476,7 @@ sub linkify_ref {
 	my $v = PublicInbox::Hval->new_msgid($_[0]);
 	my $html = $v->as_html;
 	my $href = $v->as_href;
-	"&lt;<a\nhref=\"$href.html\">$html</a>&gt;";
+	"&lt;<a\nhref=\"../$href/\">$html</a>&gt;";
 }
 
 sub anchor_for {
@@ -511,7 +511,7 @@ sub simple_dump {
 			my $m = PublicInbox::Hval->new_msgid($mid);
 			$f = PublicInbox::Hval->new($f);
 			$d = PublicInbox::Hval->new($d);
-			$m = $m->as_href . '.html';
+			$m = $m->as_href . '/';
 			$f = $f->as_html;
 			$d = $d->as_html . ' UTC';
 			if (length($s) == 0) {
@@ -592,7 +592,7 @@ sub missing_thread {
 	my $title = 'Thread does not exist';
 	$cb->([404, ['Content-Type' => 'text/html']])->write(<<EOF);
 <html><head><title>$title</title></head><body><pre>$title
-<a href="../">Return to index</a></pre></body></html>
+<a href="../../">Return to index</a></pre></body></html>
 EOF
 }
 
diff --git a/lib/PublicInbox/WWW.pm b/lib/PublicInbox/WWW.pm
index ca338fb..ceb34d6 100644
--- a/lib/PublicInbox/WWW.pm
+++ b/lib/PublicInbox/WWW.pm
@@ -56,6 +56,9 @@ sub run {
 		invalid_list_mid(\%ctx, $1, $2) || get_full_html(\%ctx);
 
 	# thread display
+	} elsif ($path_info =~ m!$LISTNAME_RE/t/(\S+)/\z!o) {
+		invalid_list_mid(\%ctx, $1, $2) || get_thread(\%ctx);
+
 	} elsif ($path_info =~ m!$LISTNAME_RE/t/(\S+)\.html\z!o) {
 		invalid_list_mid(\%ctx, $1, $2) || get_thread(\%ctx);
 
@@ -220,7 +223,7 @@ sub redirect_mid {
 	if (lc($pfx) eq 't') {
 		$anchor = '#u'; # <u id='#u'> is used to highlight in View.pm
 	}
-	do_redirect($url . ".html$anchor");
+	do_redirect($url . "/$anchor");
 }
 
 # only hit when somebody tries to guess URLs manually:
@@ -329,7 +332,7 @@ EOF
 sub msg_pfx {
 	my ($ctx) = @_;
 	my $href = PublicInbox::Hval::ascii_html(uri_escape_utf8($ctx->{mid}));
-	"../f/$href.html";
+	"../../f/$href/";
 }
 
 # /$LISTNAME/t/$MESSAGE_ID/mbox           -> thread as mbox
diff --git a/t/cgi.t b/t/cgi.t
index 020dfe7..fc28ae3 100644
--- a/t/cgi.t
+++ b/t/cgi.t
@@ -152,27 +152,27 @@ EOF
 	}
 	local $ENV{GIT_DIR} = $maindir;
 
-	my $res = cgi_run("/test/m/slashy%2fasdf%40example.com.txt");
+	my $res = cgi_run("/test/m/slashy%2fasdf%40example.com/raw");
 	like($res->{body}, qr/Message-Id: <\Q$slashy_mid\E>/,
-		"slashy mid.txt hit");
+		"slashy mid raw hit");
 
-	$res = cgi_run("/test/m/blahblah\@example.com.txt");
+	$res = cgi_run("/test/m/blahblah\@example.com/raw");
 	like($res->{body}, qr/Message-Id: <blahblah\@example\.com>/,
-		"mid.txt hit");
-	$res = cgi_run("/test/m/blahblah\@example.con.txt");
-	like($res->{head}, qr/Status: 404 Not Found/, "mid.txt miss");
+		"mid raw hit");
+	$res = cgi_run("/test/m/blahblah\@example.con/raw");
+	like($res->{head}, qr/Status: 404 Not Found/, "mid raw miss");
 
-	$res = cgi_run("/test/m/blahblah\@example.com.html");
-	like($res->{body}, qr/\A<html>/, "mid.html hit");
+	$res = cgi_run("/test/m/blahblah\@example.com/");
+	like($res->{body}, qr/\A<html>/, "mid html hit");
 	like($res->{head}, qr/Status: 200 OK/, "200 response");
-	$res = cgi_run("/test/m/blahblah\@example.con.html");
-	like($res->{head}, qr/Status: 404 Not Found/, "mid.html miss");
+	$res = cgi_run("/test/m/blahblah\@example.con/");
+	like($res->{head}, qr/Status: 404 Not Found/, "mid html miss");
 
-	$res = cgi_run("/test/f/blahblah\@example.com.html");
-	like($res->{body}, qr/\A<html>/, "mid.html hit");
+	$res = cgi_run("/test/f/blahblah\@example.com/");
+	like($res->{body}, qr/\A<html>/, "mid html");
 	like($res->{head}, qr/Status: 200 OK/, "200 response");
-	$res = cgi_run("/test/f/blahblah\@example.con.html");
-	like($res->{head}, qr/Status: 404 Not Found/, "mid.html miss");
+	$res = cgi_run("/test/f/blahblah\@example.con/");
+	like($res->{head}, qr/Status: 404 Not Found/, "mid html miss");
 
 	$res = cgi_run("/test/");
 	like($res->{body}, qr/slashy%2Fasdf%40example\.com/,
diff --git a/t/feed.t b/t/feed.t
index 6102e8a..a9955f0 100644
--- a/t/feed.t
+++ b/t/feed.t
@@ -77,7 +77,7 @@ EOF
 		}
 
 		unlike($feed, qr/drop me/, "long quoted text dropped");
-		like($feed, qr!/f/\d%40example\.com\.html\b!,
+		like($feed, qr!/f/\d%40example\.com/#q!,
 			"/f/ url generated for long quoted text");
 		like($feed, qr/inline me here/, "short quoted text kept");
 		like($feed, qr/keep me/, "unquoted text saved");
diff --git a/t/plack.t b/t/plack.t
index ed41ab1..ee77291 100644
--- a/t/plack.t
+++ b/t/plack.t
@@ -88,7 +88,7 @@ EOF
 		is(200, $res->code, 'success response received');
 		like($res->content, qr!href="\Q$atomurl\E"!,
 			'atom URL generated');
-		like($res->content, qr!href="m/blah%40example\.com\.html"!,
+		like($res->content, qr!href="m/blah%40example\.com/"!,
 			'index generated');
 	});
 
@@ -98,7 +98,7 @@ EOF
 		my $res = $cb->(GET($pfx . '/atom.xml'));
 		is(200, $res->code, 'success response received for atom');
 		like($res->content,
-			qr!link\s+href="\Q$pfx\E/m/blah%40example\.com\.html"!s,
+			qr!link\s+href="\Q$pfx\E/m/blah%40example\.com/"!s,
 			'atom feed generated correct URL');
 	});
 
diff --git a/t/view.t b/t/view.t
index 151fa77..77cf3a3 100644
--- a/t/view.t
+++ b/t/view.t
@@ -44,17 +44,18 @@ EOF
 	my $html = PublicInbox::View::msg_html(undef, $mime);
 
 	# ghetto tests
-	like($html, qr!<a\nhref="\.\./m/hello%40!s, "MID link present");
+	like($html, qr!<a\nhref="\.\./\.\./m/hello%40!s, "MID link present");
 	like($html, qr/hello world\b/, "body present");
 	like($html, qr/&gt; keep this inline/, "short quoted text is inline");
 	like($html, qr/<a\nid=[^>]+><\/a>&gt; Long and wordy/,
 		"long quoted text is anchored");
 
 	# short page
-	my $pfx = "http://example.com/test/f";
+	my $pfx = "../../f/hello%40example.com/";
 	$mime = Email::MIME->new($s);
 	my $short = PublicInbox::View::msg_html(undef, $mime, $pfx);
-	like($short, qr!<a\nhref="hello%40!s, "MID link present");
+	like($short, qr!<a\nhref="\.\./\.\./f/hello%40example\.com/!s,
+		"MID link present");
 	like($short, qr/\n&gt; keep this inline/,
 		"short quoted text is inline");
 	like($short, qr/<a\nhref="\Q$pfx\E#[^>]+>Long and wordy/,
-- 
EW


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 5/5] implement legacy redirects for old URLs
  2015-08-27  4:33 [PATCH 0/5] prefer shorter, less-ambiguous URLs Eric Wong
                   ` (3 preceding siblings ...)
  2015-08-27  4:34 ` [PATCH 4/5] wire up to display non-suffixed Message-ID links Eric Wong
@ 2015-08-27  4:34 ` Eric Wong
  4 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2015-08-27  4:34 UTC (permalink / raw)
  To: meta; +Cc: Eric Wong

We should not break existing URLs.  Redirect them to
the newer, less-ambiguous URLs to improve cache hit
ratios.
---
 lib/PublicInbox/WWW.pm | 37 ++++++++++++++++++++-----------------
 t/plack.t              | 36 +++++++++++++++++++++++++++++++++---
 2 files changed, 53 insertions(+), 20 deletions(-)

diff --git a/lib/PublicInbox/WWW.pm b/lib/PublicInbox/WWW.pm
index ceb34d6..8058f3e 100644
--- a/lib/PublicInbox/WWW.pm
+++ b/lib/PublicInbox/WWW.pm
@@ -44,37 +44,39 @@ sub run {
 		invalid_list_mid(\%ctx, $1, $2) || get_mid_html(\%ctx);
 	} elsif ($path_info =~ m!$LISTNAME_RE/m/(\S+)/raw\z!o) {
 		invalid_list_mid(\%ctx, $1, $2) || get_mid_txt(\%ctx);
-	} elsif ($path_info =~ m!$LISTNAME_RE/m/(\S+)\.txt\z!o) {
-		invalid_list_mid(\%ctx, $1, $2) || get_mid_txt(\%ctx);
-	} elsif ($path_info =~ m!$LISTNAME_RE/m/(\S+)\.html\z!o) {
-		invalid_list_mid(\%ctx, $1, $2) || get_mid_html(\%ctx);
 
 	# full-message page
 	} elsif ($path_info =~ m!$LISTNAME_RE/f/(\S+)/\z!o) {
 		invalid_list_mid(\%ctx, $1, $2) || get_full_html(\%ctx);
-	} elsif ($path_info =~ m!$LISTNAME_RE/f/(\S+)\.html\z!o) {
-		invalid_list_mid(\%ctx, $1, $2) || get_full_html(\%ctx);
 
 	# thread display
 	} elsif ($path_info =~ m!$LISTNAME_RE/t/(\S+)/\z!o) {
 		invalid_list_mid(\%ctx, $1, $2) || get_thread(\%ctx);
 
-	} elsif ($path_info =~ m!$LISTNAME_RE/t/(\S+)\.html\z!o) {
-		invalid_list_mid(\%ctx, $1, $2) || get_thread(\%ctx);
-
-	} elsif ($path_info =~ m!$LISTNAME_RE/t/(\S+)/mbox(\.gz)?\z!ox ||
-	         $path_info =~ m!$LISTNAME_RE/t/(\S+)\.mbox(\.gz)?\z!o) {
+	} elsif ($path_info =~ m!$LISTNAME_RE/t/(\S+)/mbox(\.gz)?\z!x) {
 		my $sfx = $3;
 		invalid_list_mid(\%ctx, $1, $2) ||
 			get_thread_mbox(\%ctx, $sfx);
 
-	} elsif ($path_info =~ m!$LISTNAME_RE/f/\S+\.txt\z!o) {
-		invalid_list_mid(\%ctx, $1, $2) || redirect_mid_txt(\%ctx);
+	# legacy redirects
+	} elsif ($path_info =~ m!$LISTNAME_RE/(t|m|f)/(\S+)\.html\z!o) {
+		my $pfx = $2;
+		invalid_list_mid(\%ctx, $1, $3) ||
+			redirect_mid(\%ctx, $pfx, qr/\.html\z/, '/');
+	} elsif ($path_info =~ m!$LISTNAME_RE/(m|f)/(\S+)\.txt\z!o) {
+		my $pfx = $2;
+		invalid_list_mid(\%ctx, $1, $3) ||
+			redirect_mid(\%ctx, $pfx, qr/\.txt\z/, '/raw');
+	} elsif ($path_info =~ m!$LISTNAME_RE/t/(\S+)(\.mbox(?:\.gz)?)\z!o) {
+		my $end = $3;
+		invalid_list_mid(\%ctx, $1, $2) ||
+			redirect_mid(\%ctx, 't', $end, '/mbox.gz');
 
 	# convenience redirects, order matters
 	} elsif ($path_info =~ m!$LISTNAME_RE/(m|f|t|s)/(\S+)\z!o) {
 		my $pfx = $2;
-		invalid_list_mid(\%ctx, $1, $3) || redirect_mid(\%ctx, $2);
+		invalid_list_mid(\%ctx, $1, $3) ||
+			redirect_mid(\%ctx, $pfx, qr/\z/, '/');
 
 	} else {
 		r404();
@@ -217,13 +219,14 @@ sub redirect_list_index {
 }
 
 sub redirect_mid {
-	my ($ctx, $pfx) = @_;
+	my ($ctx, $pfx, $old, $sfx) = @_;
 	my $url = self_url($ctx->{cgi});
 	my $anchor = '';
-	if (lc($pfx) eq 't') {
+	if (lc($pfx) eq 't' && $sfx eq '/') {
 		$anchor = '#u'; # <u id='#u'> is used to highlight in View.pm
 	}
-	do_redirect($url . "/$anchor");
+	$url =~ s/$old/$sfx/;
+	do_redirect($url . $anchor);
 }
 
 # only hit when somebody tries to guess URLs manually:
diff --git a/t/plack.t b/t/plack.t
index ee77291..b3c8764 100644
--- a/t/plack.t
+++ b/t/plack.t
@@ -92,9 +92,9 @@ EOF
 			'index generated');
 	});
 
+	my $pfx = 'http://example.com/test';
 	test_psgi($app, sub {
 		my ($cb) = @_;
-		my $pfx = 'http://example.com/test';
 		my $res = $cb->(GET($pfx . '/atom.xml'));
 		is(200, $res->code, 'success response received for atom');
 		like($res->content,
@@ -105,7 +105,6 @@ EOF
 	foreach my $t (qw(f m)) {
 		test_psgi($app, sub {
 			my ($cb) = @_;
-			my $pfx = 'http://example.com/test';
 			my $path = "/$t/blah%40example.com/";
 			my $res = $cb->(GET($pfx . $path));
 			is(200, $res->code, "success for $path");
@@ -115,11 +114,42 @@ EOF
 	}
 	test_psgi($app, sub {
 		my ($cb) = @_;
-		my $pfx = 'http://example.com/test';
 		my $res = $cb->(GET($pfx . '/m/blah%40example.com/raw'));
 		is(200, $res->code, 'success response received for /m/*/raw');
 		like($res->content, qr!\AFrom !, "mbox returned");
 	});
+
+	# legacy redirects
+	foreach my $t (qw(m f)) {
+		test_psgi($app, sub {
+			my ($cb) = @_;
+			my $res = $cb->(GET($pfx . "/$t/blah%40example.com.txt"));
+			is(301, $res->code, "redirect for old $t .txt link");
+			my $location = $res->header('Location');
+			like($location, qr!/$t/blah%40example\.com/raw\z!,
+				".txt redirected to /raw");
+		});
+	}
+	foreach my $t (qw(m f t)) {
+		test_psgi($app, sub {
+			my ($cb) = @_;
+			my $res = $cb->(GET($pfx . "/$t/blah%40example.com.html"));
+			is(301, $res->code, "redirect for old $t .html link");
+			my $location = $res->header('Location');
+			like($location, qr!/$t/blah%40example\.com/(?:#u)?\z!,
+				".html redirected to /raw");
+		});
+	}
+	foreach my $sfx (qw(mbox mbox.gz)) {
+		test_psgi($app, sub {
+			my ($cb) = @_;
+			my $res = $cb->(GET($pfx . "/t/blah%40example.com.$sfx"));
+			is(301, $res->code, 'redirect for old thread link');
+			my $location = $res->header('Location');
+			like($location, qr!/t/blah%40example\.com/mbox\.gz\z!,
+				"$sfx redirected to /mbox.gz");
+		});
+	}
 }
 
 done_testing();
-- 
EW


^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-08-27  4:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-27  4:33 [PATCH 0/5] prefer shorter, less-ambiguous URLs Eric Wong
2015-08-27  4:33 ` [PATCH 1/5] www: minor cleanups to shorten code Eric Wong
2015-08-27  4:33 ` [PATCH 2/5] wire up shorter, less ambiguous URLs Eric Wong
2015-08-27  4:34 ` [PATCH 3/5] mid: extract Message-ID from inside '<>' Eric Wong
2015-08-27  4:34 ` [PATCH 4/5] wire up to display non-suffixed Message-ID links Eric Wong
2015-08-27  4:34 ` [PATCH 5/5] implement legacy redirects for old URLs Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).