* [PATCH 0/5] prefer shorter, less-ambiguous URLs
@ 2015-08-27 4:33 Eric Wong
2015-08-27 4:33 ` [PATCH 1/5] www: minor cleanups to shorten code Eric Wong
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: Eric Wong @ 2015-08-27 4:33 UTC (permalink / raw)
To: meta
Unfortunately, it's possible to have Message-IDs which end in '.txt',
'.html' or some other suffix we might use. Instead of '.html',
use '/' as a suffix to allow '/raw' for the mbox version (following
a lead from gmane).
In summary:
/m/$MESSAGE_ID.html -> /m/$MESSAGE_ID/
/m/$MESSAGE_ID.txt -> /m/$MESSAGE_ID/raw
/f/$MESSAGE_ID.html -> /f/$MESSAGE_ID/
/t/$MESSAGE_ID.html -> /t/$MESSAGE_ID/
/t/$MESSAGE_ID.mbox.gz -> /t/$MESSAGE_ID/mbox.gz
Redirects for old URLs remain in place to not break existing
links.
Eric Wong (5):
www: minor cleanups to shorten code
wire up shorter, less ambiguous URLs
mid: extract Message-ID from inside '<>'
wire up to display non-suffixed Message-ID links
implement legacy redirects for old URLs
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/5] www: minor cleanups to shorten code
2015-08-27 4:33 [PATCH 0/5] prefer shorter, less-ambiguous URLs Eric Wong
@ 2015-08-27 4:33 ` Eric Wong
2015-08-27 4:33 ` [PATCH 2/5] wire up shorter, less ambiguous URLs Eric Wong
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2015-08-27 4:33 UTC (permalink / raw)
To: meta; +Cc: Eric Wong
Less scrolling is more efficient.
---
lib/PublicInbox/WWW.pm | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/lib/PublicInbox/WWW.pm b/lib/PublicInbox/WWW.pm
index d1ee2ff..527d213 100644
--- a/lib/PublicInbox/WWW.pm
+++ b/lib/PublicInbox/WWW.pm
@@ -138,7 +138,6 @@ sub mid2blob {
my $path = PublicInbox::MID::mid2path($ctx->{mid});
my @cmd = ('git', "--git-dir=$ctx->{git_dir}",
qw(cat-file blob), "HEAD:$path");
- my $cmd = join(' ', @cmd);
my $pid = open my $fh, '-|';
defined $pid or die "fork failed: $!\n";
if ($pid == 0) {
@@ -162,8 +161,7 @@ sub get_mid_txt {
# /$LISTNAME/m/$MESSAGE_ID.html -> HTML content (short quotes)
sub get_mid_html {
my ($ctx) = @_;
- my $x = mid2blob($ctx);
- return r404() unless $x;
+ my $x = mid2blob($ctx) or return r404();
require PublicInbox::View;
my $pfx = msg_pfx($ctx);
@@ -178,8 +176,8 @@ sub get_mid_html {
# /$LISTNAME/f/$MESSAGE_ID.html -> HTML content (fullquotes)
sub get_full_html {
my ($ctx) = @_;
- my $x = mid2blob($ctx);
- return r404() unless $x;
+ my $x = mid2blob($ctx) or return r404();
+
require PublicInbox::View;
my $foot = footer($ctx);
require Email::MIME;
--
EW
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/5] wire up shorter, less ambiguous URLs
2015-08-27 4:33 [PATCH 0/5] prefer shorter, less-ambiguous URLs Eric Wong
2015-08-27 4:33 ` [PATCH 1/5] www: minor cleanups to shorten code Eric Wong
@ 2015-08-27 4:33 ` Eric Wong
2015-08-27 4:34 ` [PATCH 3/5] mid: extract Message-ID from inside '<>' Eric Wong
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2015-08-27 4:33 UTC (permalink / raw)
To: meta; +Cc: Eric Wong
We will prefer URLs without suffixes for now to avoid ambiguity
in case a Message-ID ends with ".html", ".txt", ".mbox.gz" or
any other suffix we may use.
Static file compatibility is preserved by using a trailing slash
as most servers can/will fall back to an index.html file in this
case.
For raw text files, we will follow gmane's lead with "/raw"
---
lib/PublicInbox/WWW.pm | 13 ++++++++++---
t/cgi.t | 2 +-
t/plack.t | 19 +++++++++++++++++++
3 files changed, 30 insertions(+), 4 deletions(-)
diff --git a/lib/PublicInbox/WWW.pm b/lib/PublicInbox/WWW.pm
index 527d213..ca338fb 100644
--- a/lib/PublicInbox/WWW.pm
+++ b/lib/PublicInbox/WWW.pm
@@ -40,12 +40,18 @@ sub run {
invalid_list(\%ctx, $1) || get_atom(\%ctx);
# single-message pages
+ } elsif ($path_info =~ m!$LISTNAME_RE/m/(\S+)/\z!o) {
+ invalid_list_mid(\%ctx, $1, $2) || get_mid_html(\%ctx);
+ } elsif ($path_info =~ m!$LISTNAME_RE/m/(\S+)/raw\z!o) {
+ invalid_list_mid(\%ctx, $1, $2) || get_mid_txt(\%ctx);
} elsif ($path_info =~ m!$LISTNAME_RE/m/(\S+)\.txt\z!o) {
invalid_list_mid(\%ctx, $1, $2) || get_mid_txt(\%ctx);
} elsif ($path_info =~ m!$LISTNAME_RE/m/(\S+)\.html\z!o) {
invalid_list_mid(\%ctx, $1, $2) || get_mid_html(\%ctx);
# full-message page
+ } elsif ($path_info =~ m!$LISTNAME_RE/f/(\S+)/\z!o) {
+ invalid_list_mid(\%ctx, $1, $2) || get_full_html(\%ctx);
} elsif ($path_info =~ m!$LISTNAME_RE/f/(\S+)\.html\z!o) {
invalid_list_mid(\%ctx, $1, $2) || get_full_html(\%ctx);
@@ -53,7 +59,8 @@ sub run {
} elsif ($path_info =~ m!$LISTNAME_RE/t/(\S+)\.html\z!o) {
invalid_list_mid(\%ctx, $1, $2) || get_thread(\%ctx);
- } elsif ($path_info =~ m!$LISTNAME_RE/t/(\S+)\.mbox(\.gz)?\z!o) {
+ } elsif ($path_info =~ m!$LISTNAME_RE/t/(\S+)/mbox(\.gz)?\z!ox ||
+ $path_info =~ m!$LISTNAME_RE/t/(\S+)\.mbox(\.gz)?\z!o) {
my $sfx = $3;
invalid_list_mid(\%ctx, $1, $2) ||
get_thread_mbox(\%ctx, $sfx);
@@ -325,8 +332,8 @@ sub msg_pfx {
"../f/$href.html";
}
-# /$LISTNAME/t/$MESSAGE_ID.mbox -> thread as mbox
-# /$LISTNAME/t/$MESSAGE_ID.mbox.gz -> thread as gzipped mbox
+# /$LISTNAME/t/$MESSAGE_ID/mbox -> thread as mbox
+# /$LISTNAME/t/$MESSAGE_ID/mbox.gz -> thread as gzipped mbox
# note: I'm not a big fan of other compression formats since they're
# significantly more expensive on CPU than gzip and less-widely available,
# especially on older systems. Stick to zlib since that's what git uses.
diff --git a/t/cgi.t b/t/cgi.t
index e87f7dc..020dfe7 100644
--- a/t/cgi.t
+++ b/t/cgi.t
@@ -183,7 +183,7 @@ EOF
{
local $ENV{HOME} = $home;
local $ENV{PATH} = $main_path;
- my $path = "/test/t/blahblah%40example.com.mbox.gz";
+ my $path = "/test/t/blahblah%40example.com/mbox.gz";
my $res = cgi_run($path);
like($res->{head}, qr/^Status: 501 /, "search not-yet-enabled");
my $indexed = system($index, $maindir) == 0;
diff --git a/t/plack.t b/t/plack.t
index 85dd337..ed41ab1 100644
--- a/t/plack.t
+++ b/t/plack.t
@@ -101,6 +101,25 @@ EOF
qr!link\s+href="\Q$pfx\E/m/blah%40example\.com\.html"!s,
'atom feed generated correct URL');
});
+
+ foreach my $t (qw(f m)) {
+ test_psgi($app, sub {
+ my ($cb) = @_;
+ my $pfx = 'http://example.com/test';
+ my $path = "/$t/blah%40example.com/";
+ my $res = $cb->(GET($pfx . $path));
+ is(200, $res->code, "success for $path");
+ like($res->content, qr!<title>hihi - Me</title>!,
+ "HTML returned");
+ });
+ }
+ test_psgi($app, sub {
+ my ($cb) = @_;
+ my $pfx = 'http://example.com/test';
+ my $res = $cb->(GET($pfx . '/m/blah%40example.com/raw'));
+ is(200, $res->code, 'success response received for /m/*/raw');
+ like($res->content, qr!\AFrom !, "mbox returned");
+ });
}
done_testing();
--
EW
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 3/5] mid: extract Message-ID from inside '<>'
2015-08-27 4:33 [PATCH 0/5] prefer shorter, less-ambiguous URLs Eric Wong
2015-08-27 4:33 ` [PATCH 1/5] www: minor cleanups to shorten code Eric Wong
2015-08-27 4:33 ` [PATCH 2/5] wire up shorter, less ambiguous URLs Eric Wong
@ 2015-08-27 4:34 ` Eric Wong
2015-08-27 4:34 ` [PATCH 4/5] wire up to display non-suffixed Message-ID links Eric Wong
2015-08-27 4:34 ` [PATCH 5/5] implement legacy redirects for old URLs Eric Wong
4 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2015-08-27 4:34 UTC (permalink / raw)
To: meta; +Cc: Eric Wong
This is necessary for some mailers which include comment text
in in the In-Reply-To header, merely assuming there is nothing
outside of '<>' as we were doing is not enough.
---
lib/PublicInbox/MID.pm | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/lib/PublicInbox/MID.pm b/lib/PublicInbox/MID.pm
index 02ac709..8ca3c57 100644
--- a/lib/PublicInbox/MID.pm
+++ b/lib/PublicInbox/MID.pm
@@ -12,8 +12,9 @@ sub mid_clean {
my ($mid) = @_;
defined($mid) or die "no Message-ID";
# MDA->precheck did more checking for us
- $mid =~ s/\A\s*<?//;
- $mid =~ s/>?\s*\z//;
+ if ($mid =~ /<([^>]+)>/) {
+ $mid = $1;
+ }
$mid;
}
--
EW
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 4/5] wire up to display non-suffixed Message-ID links
2015-08-27 4:33 [PATCH 0/5] prefer shorter, less-ambiguous URLs Eric Wong
` (2 preceding siblings ...)
2015-08-27 4:34 ` [PATCH 3/5] mid: extract Message-ID from inside '<>' Eric Wong
@ 2015-08-27 4:34 ` Eric Wong
2015-08-27 4:34 ` [PATCH 5/5] implement legacy redirects for old URLs Eric Wong
4 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2015-08-27 4:34 UTC (permalink / raw)
To: meta; +Cc: Eric Wong
These URLs are preferable in case somebody decides to get cute and
use a suffix we would've used to prevent others from linking to
their message. The common /m/$MESSAGE_ID/ URLs are now 4 characters
shorter so may fit better on terminals.
---
lib/PublicInbox/Feed.pm | 4 ++--
lib/PublicInbox/View.pm | 40 ++++++++++++++++++++--------------------
lib/PublicInbox/WWW.pm | 7 +++++--
t/cgi.t | 28 ++++++++++++++--------------
t/feed.t | 2 +-
t/plack.t | 4 ++--
t/view.t | 7 ++++---
7 files changed, 48 insertions(+), 44 deletions(-)
diff --git a/lib/PublicInbox/Feed.pm b/lib/PublicInbox/Feed.pm
index d34978c..9e56747 100644
--- a/lib/PublicInbox/Feed.pm
+++ b/lib/PublicInbox/Feed.pm
@@ -273,7 +273,7 @@ sub add_to_feed {
my $mid = $header_obj->header('Message-ID');
defined $mid or return 0;
$mid = PublicInbox::Hval->new_msgid($mid);
- my $href = $mid->as_href . '.html';
+ my $href = $mid->as_href . '/';
my $content = PublicInbox::View->feed_entry($mime, $fullurl . $href);
defined($content) or return 0;
$mime = undef;
@@ -362,7 +362,7 @@ sub dump_topics {
$mid = PublicInbox::Hval->new($mid)->as_href;
$subj = PublicInbox::Hval->new($subj)->as_html;
$u = PublicInbox::Hval->new($u)->as_html;
- $dst .= "\n<a\nhref=\"t/$mid.html#u\"><b>$subj</b></a>\n- ";
+ $dst .= "\n<a\nhref=\"t/$mid/#u\"><b>$subj</b></a>\n- ";
$ts = POSIX::strftime('%Y-%m-%d %H:%M', gmtime($ts));
if ($n == 1) {
$dst .= "created by $u @ $ts UTC\n"
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 7412ccf..8ccdcfa 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -72,7 +72,7 @@ sub index_entry {
$subj = PublicInbox::Hval->new_oneline($subj)->as_html;
my $more = 'permalink';
my $root_anchor = $state->{root_anchor};
- my $path = $root_anchor ? '../' : '';
+ my $path = $root_anchor ? '../../' : '';
my $href = $mid->as_href;
my $irt = $header_obj->header('In-Reply-To');
my ($anchor_idx, $anchor, $t_anchor);
@@ -84,7 +84,7 @@ sub index_entry {
$t_anchor = '';
}
if ($srch) {
- $subj = "<a\nhref=\"${path}t/$href.html#u\">$subj</a>";
+ $subj = "<a\nhref=\"${path}t/$href/#u\">$subj</a>";
}
if ($root_anchor && $root_anchor eq $id) {
$subj = "<u\nid=\"u\">$subj</u>";
@@ -110,9 +110,9 @@ sub index_entry {
$fh->write($rv .= "\n\n");
my ($fhref, $more_ref);
- my $mhref = "${path}m/$href.html";
+ my $mhref = "${path}m/$href/";
if ($level > 0) {
- $fhref = "${path}f/$href.html";
+ $fhref = "${path}f/$href/";
$more_ref = \$more;
}
# scan through all parts, looking for displayable text
@@ -121,7 +121,7 @@ sub index_entry {
});
$mime->body_set('');
- my $txt = "${path}m/$href.txt";
+ my $txt = "${path}m/$href/raw";
$rv = "\n<a\nhref=\"$mhref\">$more</a> <a\nhref=\"$txt\">raw</a> ";
$rv .= html_footer($mime, 0, undef, $ctx);
@@ -129,14 +129,14 @@ sub index_entry {
unless (defined $anchor) {
my $v = PublicInbox::Hval->new_msgid($irt);
$v = $v->as_href;
- $anchor = "${path}m/$v.html";
+ $anchor = "${path}m/$v/";
$seen->{$anchor_idx} = $anchor;
}
$rv .= " <a\nhref=\"$anchor\">parent</a>";
}
if ($srch) {
- $rv .= " <a\nhref=\"${path}t/$href.html$t_anchor\">" .
+ $rv .= " <a\nhref=\"${path}t/$href/$t_anchor\">" .
"threadlink</a>";
}
@@ -173,9 +173,9 @@ sub emit_thread_html {
my $final_anchor = $state->{anchor_idx};
my $next = "<a\nid=\"s$final_anchor\">";
$next .= $final_anchor == 1 ? 'only message in' : 'end of';
- $next .= " thread</a>, back to <a\nhref=\"../\">index</a>\n";
- $mid = PublicInbox::Hval->new_msgid($mid)->as_href;
- $next .= "download: <a\nhref=\"$mid.mbox.gz\">mbox.gz</a>\n\n";
+ $next .= " thread</a>, back to <a\nhref=\"../../\">index</a>\n";
+ # $mid = PublicInbox::Hval->new_msgid($mid)->as_href;
+ $next .= "download: <a\nhref=\"mbox.gz\">mbox.gz</a>\n\n";
$fh->write("<hr />" . PRE_WRAP . $next . $foot .
"</pre></body></html>");
$fh->close;
@@ -361,7 +361,7 @@ sub headers_to_html_header {
} elsif ($h eq 'Subject') {
$title[0] = $v->as_html;
if ($srch) {
- $rv .= "$h: <a\nhref=\"../t/$mid_href.html\">";
+ $rv .= "$h: <a\nhref=\"../../t/$mid_href/\">";
$rv .= $v->as_html . "</a>\n";
next;
}
@@ -371,8 +371,8 @@ sub headers_to_html_header {
}
$rv .= 'Message-ID: <' . $mid->as_html . '> ';
- $mid_href = "../m/$mid_href" unless $full_pfx;
- $rv .= "(<a\nhref=\"$mid_href.txt\">raw</a>)\n";
+ my $raw_ref = $full_pfx ? 'raw' : "../../m/$mid_href/raw";
+ $rv .= "(<a\nhref=\"$raw_ref\">raw</a>)\n";
my $irt = $header_obj->header('In-Reply-To');
if (defined $irt) {
@@ -380,7 +380,7 @@ sub headers_to_html_header {
my $html = $v->as_html;
my $href = $v->as_href;
$rv .= "In-Reply-To: <";
- $rv .= "<a\nhref=\"$href.html\">$html</a>>\n";
+ $rv .= "<a\nhref=\"../$href/\">$html</a>>\n";
}
my $refs = $header_obj->header('References');
@@ -437,12 +437,12 @@ sub html_footer {
my $href = "mailto:$to?In-Reply-To=$irt&Cc=${cc}&Subject=$subj";
my $srch = $ctx->{srch} if $ctx;
- my $idx = $standalone ? " <a\nhref=\"../\">index</a>" : '';
+ my $idx = $standalone ? " <a\nhref=\"../../\">index</a>" : '';
if ($idx && $srch) {
$irt = $mime->header('In-Reply-To') || '';
$mid = mid_compress(mid_clean($mid));
my $t_anchor = length $irt ? T_ANCHOR : '';
- $idx = " <a\nhref=\"../t/$mid.html$t_anchor\">".
+ $idx = " <a\nhref=\"../../t/$mid/$t_anchor\">".
"threadlink</a>$idx";
my $res = $srch->get_followups($mid);
if (my $c = $res->{total}) {
@@ -461,7 +461,7 @@ sub html_footer {
if ($irt) {
$irt = PublicInbox::Hval->new_msgid($irt);
$irt = $irt->as_href;
- $irt = "<a\nhref=\"$irt\">parent</a> ";
+ $irt = "<a\nhref=\"../$irt/\">parent</a> ";
} else {
$irt = ' ' x length('parent ');
}
@@ -476,7 +476,7 @@ sub linkify_ref {
my $v = PublicInbox::Hval->new_msgid($_[0]);
my $html = $v->as_html;
my $href = $v->as_href;
- "<<a\nhref=\"$href.html\">$html</a>>";
+ "<<a\nhref=\"../$href/\">$html</a>>";
}
sub anchor_for {
@@ -511,7 +511,7 @@ sub simple_dump {
my $m = PublicInbox::Hval->new_msgid($mid);
$f = PublicInbox::Hval->new($f);
$d = PublicInbox::Hval->new($d);
- $m = $m->as_href . '.html';
+ $m = $m->as_href . '/';
$f = $f->as_html;
$d = $d->as_html . ' UTC';
if (length($s) == 0) {
@@ -592,7 +592,7 @@ sub missing_thread {
my $title = 'Thread does not exist';
$cb->([404, ['Content-Type' => 'text/html']])->write(<<EOF);
<html><head><title>$title</title></head><body><pre>$title
-<a href="../">Return to index</a></pre></body></html>
+<a href="../../">Return to index</a></pre></body></html>
EOF
}
diff --git a/lib/PublicInbox/WWW.pm b/lib/PublicInbox/WWW.pm
index ca338fb..ceb34d6 100644
--- a/lib/PublicInbox/WWW.pm
+++ b/lib/PublicInbox/WWW.pm
@@ -56,6 +56,9 @@ sub run {
invalid_list_mid(\%ctx, $1, $2) || get_full_html(\%ctx);
# thread display
+ } elsif ($path_info =~ m!$LISTNAME_RE/t/(\S+)/\z!o) {
+ invalid_list_mid(\%ctx, $1, $2) || get_thread(\%ctx);
+
} elsif ($path_info =~ m!$LISTNAME_RE/t/(\S+)\.html\z!o) {
invalid_list_mid(\%ctx, $1, $2) || get_thread(\%ctx);
@@ -220,7 +223,7 @@ sub redirect_mid {
if (lc($pfx) eq 't') {
$anchor = '#u'; # <u id='#u'> is used to highlight in View.pm
}
- do_redirect($url . ".html$anchor");
+ do_redirect($url . "/$anchor");
}
# only hit when somebody tries to guess URLs manually:
@@ -329,7 +332,7 @@ EOF
sub msg_pfx {
my ($ctx) = @_;
my $href = PublicInbox::Hval::ascii_html(uri_escape_utf8($ctx->{mid}));
- "../f/$href.html";
+ "../../f/$href/";
}
# /$LISTNAME/t/$MESSAGE_ID/mbox -> thread as mbox
diff --git a/t/cgi.t b/t/cgi.t
index 020dfe7..fc28ae3 100644
--- a/t/cgi.t
+++ b/t/cgi.t
@@ -152,27 +152,27 @@ EOF
}
local $ENV{GIT_DIR} = $maindir;
- my $res = cgi_run("/test/m/slashy%2fasdf%40example.com.txt");
+ my $res = cgi_run("/test/m/slashy%2fasdf%40example.com/raw");
like($res->{body}, qr/Message-Id: <\Q$slashy_mid\E>/,
- "slashy mid.txt hit");
+ "slashy mid raw hit");
- $res = cgi_run("/test/m/blahblah\@example.com.txt");
+ $res = cgi_run("/test/m/blahblah\@example.com/raw");
like($res->{body}, qr/Message-Id: <blahblah\@example\.com>/,
- "mid.txt hit");
- $res = cgi_run("/test/m/blahblah\@example.con.txt");
- like($res->{head}, qr/Status: 404 Not Found/, "mid.txt miss");
+ "mid raw hit");
+ $res = cgi_run("/test/m/blahblah\@example.con/raw");
+ like($res->{head}, qr/Status: 404 Not Found/, "mid raw miss");
- $res = cgi_run("/test/m/blahblah\@example.com.html");
- like($res->{body}, qr/\A<html>/, "mid.html hit");
+ $res = cgi_run("/test/m/blahblah\@example.com/");
+ like($res->{body}, qr/\A<html>/, "mid html hit");
like($res->{head}, qr/Status: 200 OK/, "200 response");
- $res = cgi_run("/test/m/blahblah\@example.con.html");
- like($res->{head}, qr/Status: 404 Not Found/, "mid.html miss");
+ $res = cgi_run("/test/m/blahblah\@example.con/");
+ like($res->{head}, qr/Status: 404 Not Found/, "mid html miss");
- $res = cgi_run("/test/f/blahblah\@example.com.html");
- like($res->{body}, qr/\A<html>/, "mid.html hit");
+ $res = cgi_run("/test/f/blahblah\@example.com/");
+ like($res->{body}, qr/\A<html>/, "mid html");
like($res->{head}, qr/Status: 200 OK/, "200 response");
- $res = cgi_run("/test/f/blahblah\@example.con.html");
- like($res->{head}, qr/Status: 404 Not Found/, "mid.html miss");
+ $res = cgi_run("/test/f/blahblah\@example.con/");
+ like($res->{head}, qr/Status: 404 Not Found/, "mid html miss");
$res = cgi_run("/test/");
like($res->{body}, qr/slashy%2Fasdf%40example\.com/,
diff --git a/t/feed.t b/t/feed.t
index 6102e8a..a9955f0 100644
--- a/t/feed.t
+++ b/t/feed.t
@@ -77,7 +77,7 @@ EOF
}
unlike($feed, qr/drop me/, "long quoted text dropped");
- like($feed, qr!/f/\d%40example\.com\.html\b!,
+ like($feed, qr!/f/\d%40example\.com/#q!,
"/f/ url generated for long quoted text");
like($feed, qr/inline me here/, "short quoted text kept");
like($feed, qr/keep me/, "unquoted text saved");
diff --git a/t/plack.t b/t/plack.t
index ed41ab1..ee77291 100644
--- a/t/plack.t
+++ b/t/plack.t
@@ -88,7 +88,7 @@ EOF
is(200, $res->code, 'success response received');
like($res->content, qr!href="\Q$atomurl\E"!,
'atom URL generated');
- like($res->content, qr!href="m/blah%40example\.com\.html"!,
+ like($res->content, qr!href="m/blah%40example\.com/"!,
'index generated');
});
@@ -98,7 +98,7 @@ EOF
my $res = $cb->(GET($pfx . '/atom.xml'));
is(200, $res->code, 'success response received for atom');
like($res->content,
- qr!link\s+href="\Q$pfx\E/m/blah%40example\.com\.html"!s,
+ qr!link\s+href="\Q$pfx\E/m/blah%40example\.com/"!s,
'atom feed generated correct URL');
});
diff --git a/t/view.t b/t/view.t
index 151fa77..77cf3a3 100644
--- a/t/view.t
+++ b/t/view.t
@@ -44,17 +44,18 @@ EOF
my $html = PublicInbox::View::msg_html(undef, $mime);
# ghetto tests
- like($html, qr!<a\nhref="\.\./m/hello%40!s, "MID link present");
+ like($html, qr!<a\nhref="\.\./\.\./m/hello%40!s, "MID link present");
like($html, qr/hello world\b/, "body present");
like($html, qr/> keep this inline/, "short quoted text is inline");
like($html, qr/<a\nid=[^>]+><\/a>> Long and wordy/,
"long quoted text is anchored");
# short page
- my $pfx = "http://example.com/test/f";
+ my $pfx = "../../f/hello%40example.com/";
$mime = Email::MIME->new($s);
my $short = PublicInbox::View::msg_html(undef, $mime, $pfx);
- like($short, qr!<a\nhref="hello%40!s, "MID link present");
+ like($short, qr!<a\nhref="\.\./\.\./f/hello%40example\.com/!s,
+ "MID link present");
like($short, qr/\n> keep this inline/,
"short quoted text is inline");
like($short, qr/<a\nhref="\Q$pfx\E#[^>]+>Long and wordy/,
--
EW
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 5/5] implement legacy redirects for old URLs
2015-08-27 4:33 [PATCH 0/5] prefer shorter, less-ambiguous URLs Eric Wong
` (3 preceding siblings ...)
2015-08-27 4:34 ` [PATCH 4/5] wire up to display non-suffixed Message-ID links Eric Wong
@ 2015-08-27 4:34 ` Eric Wong
4 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2015-08-27 4:34 UTC (permalink / raw)
To: meta; +Cc: Eric Wong
We should not break existing URLs. Redirect them to
the newer, less-ambiguous URLs to improve cache hit
ratios.
---
lib/PublicInbox/WWW.pm | 37 ++++++++++++++++++++-----------------
t/plack.t | 36 +++++++++++++++++++++++++++++++++---
2 files changed, 53 insertions(+), 20 deletions(-)
diff --git a/lib/PublicInbox/WWW.pm b/lib/PublicInbox/WWW.pm
index ceb34d6..8058f3e 100644
--- a/lib/PublicInbox/WWW.pm
+++ b/lib/PublicInbox/WWW.pm
@@ -44,37 +44,39 @@ sub run {
invalid_list_mid(\%ctx, $1, $2) || get_mid_html(\%ctx);
} elsif ($path_info =~ m!$LISTNAME_RE/m/(\S+)/raw\z!o) {
invalid_list_mid(\%ctx, $1, $2) || get_mid_txt(\%ctx);
- } elsif ($path_info =~ m!$LISTNAME_RE/m/(\S+)\.txt\z!o) {
- invalid_list_mid(\%ctx, $1, $2) || get_mid_txt(\%ctx);
- } elsif ($path_info =~ m!$LISTNAME_RE/m/(\S+)\.html\z!o) {
- invalid_list_mid(\%ctx, $1, $2) || get_mid_html(\%ctx);
# full-message page
} elsif ($path_info =~ m!$LISTNAME_RE/f/(\S+)/\z!o) {
invalid_list_mid(\%ctx, $1, $2) || get_full_html(\%ctx);
- } elsif ($path_info =~ m!$LISTNAME_RE/f/(\S+)\.html\z!o) {
- invalid_list_mid(\%ctx, $1, $2) || get_full_html(\%ctx);
# thread display
} elsif ($path_info =~ m!$LISTNAME_RE/t/(\S+)/\z!o) {
invalid_list_mid(\%ctx, $1, $2) || get_thread(\%ctx);
- } elsif ($path_info =~ m!$LISTNAME_RE/t/(\S+)\.html\z!o) {
- invalid_list_mid(\%ctx, $1, $2) || get_thread(\%ctx);
-
- } elsif ($path_info =~ m!$LISTNAME_RE/t/(\S+)/mbox(\.gz)?\z!ox ||
- $path_info =~ m!$LISTNAME_RE/t/(\S+)\.mbox(\.gz)?\z!o) {
+ } elsif ($path_info =~ m!$LISTNAME_RE/t/(\S+)/mbox(\.gz)?\z!x) {
my $sfx = $3;
invalid_list_mid(\%ctx, $1, $2) ||
get_thread_mbox(\%ctx, $sfx);
- } elsif ($path_info =~ m!$LISTNAME_RE/f/\S+\.txt\z!o) {
- invalid_list_mid(\%ctx, $1, $2) || redirect_mid_txt(\%ctx);
+ # legacy redirects
+ } elsif ($path_info =~ m!$LISTNAME_RE/(t|m|f)/(\S+)\.html\z!o) {
+ my $pfx = $2;
+ invalid_list_mid(\%ctx, $1, $3) ||
+ redirect_mid(\%ctx, $pfx, qr/\.html\z/, '/');
+ } elsif ($path_info =~ m!$LISTNAME_RE/(m|f)/(\S+)\.txt\z!o) {
+ my $pfx = $2;
+ invalid_list_mid(\%ctx, $1, $3) ||
+ redirect_mid(\%ctx, $pfx, qr/\.txt\z/, '/raw');
+ } elsif ($path_info =~ m!$LISTNAME_RE/t/(\S+)(\.mbox(?:\.gz)?)\z!o) {
+ my $end = $3;
+ invalid_list_mid(\%ctx, $1, $2) ||
+ redirect_mid(\%ctx, 't', $end, '/mbox.gz');
# convenience redirects, order matters
} elsif ($path_info =~ m!$LISTNAME_RE/(m|f|t|s)/(\S+)\z!o) {
my $pfx = $2;
- invalid_list_mid(\%ctx, $1, $3) || redirect_mid(\%ctx, $2);
+ invalid_list_mid(\%ctx, $1, $3) ||
+ redirect_mid(\%ctx, $pfx, qr/\z/, '/');
} else {
r404();
@@ -217,13 +219,14 @@ sub redirect_list_index {
}
sub redirect_mid {
- my ($ctx, $pfx) = @_;
+ my ($ctx, $pfx, $old, $sfx) = @_;
my $url = self_url($ctx->{cgi});
my $anchor = '';
- if (lc($pfx) eq 't') {
+ if (lc($pfx) eq 't' && $sfx eq '/') {
$anchor = '#u'; # <u id='#u'> is used to highlight in View.pm
}
- do_redirect($url . "/$anchor");
+ $url =~ s/$old/$sfx/;
+ do_redirect($url . $anchor);
}
# only hit when somebody tries to guess URLs manually:
diff --git a/t/plack.t b/t/plack.t
index ee77291..b3c8764 100644
--- a/t/plack.t
+++ b/t/plack.t
@@ -92,9 +92,9 @@ EOF
'index generated');
});
+ my $pfx = 'http://example.com/test';
test_psgi($app, sub {
my ($cb) = @_;
- my $pfx = 'http://example.com/test';
my $res = $cb->(GET($pfx . '/atom.xml'));
is(200, $res->code, 'success response received for atom');
like($res->content,
@@ -105,7 +105,6 @@ EOF
foreach my $t (qw(f m)) {
test_psgi($app, sub {
my ($cb) = @_;
- my $pfx = 'http://example.com/test';
my $path = "/$t/blah%40example.com/";
my $res = $cb->(GET($pfx . $path));
is(200, $res->code, "success for $path");
@@ -115,11 +114,42 @@ EOF
}
test_psgi($app, sub {
my ($cb) = @_;
- my $pfx = 'http://example.com/test';
my $res = $cb->(GET($pfx . '/m/blah%40example.com/raw'));
is(200, $res->code, 'success response received for /m/*/raw');
like($res->content, qr!\AFrom !, "mbox returned");
});
+
+ # legacy redirects
+ foreach my $t (qw(m f)) {
+ test_psgi($app, sub {
+ my ($cb) = @_;
+ my $res = $cb->(GET($pfx . "/$t/blah%40example.com.txt"));
+ is(301, $res->code, "redirect for old $t .txt link");
+ my $location = $res->header('Location');
+ like($location, qr!/$t/blah%40example\.com/raw\z!,
+ ".txt redirected to /raw");
+ });
+ }
+ foreach my $t (qw(m f t)) {
+ test_psgi($app, sub {
+ my ($cb) = @_;
+ my $res = $cb->(GET($pfx . "/$t/blah%40example.com.html"));
+ is(301, $res->code, "redirect for old $t .html link");
+ my $location = $res->header('Location');
+ like($location, qr!/$t/blah%40example\.com/(?:#u)?\z!,
+ ".html redirected to /raw");
+ });
+ }
+ foreach my $sfx (qw(mbox mbox.gz)) {
+ test_psgi($app, sub {
+ my ($cb) = @_;
+ my $res = $cb->(GET($pfx . "/t/blah%40example.com.$sfx"));
+ is(301, $res->code, 'redirect for old thread link');
+ my $location = $res->header('Location');
+ like($location, qr!/t/blah%40example\.com/mbox\.gz\z!,
+ "$sfx redirected to /mbox.gz");
+ });
+ }
}
done_testing();
--
EW
^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-08-27 4:34 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-27 4:33 [PATCH 0/5] prefer shorter, less-ambiguous URLs Eric Wong
2015-08-27 4:33 ` [PATCH 1/5] www: minor cleanups to shorten code Eric Wong
2015-08-27 4:33 ` [PATCH 2/5] wire up shorter, less ambiguous URLs Eric Wong
2015-08-27 4:34 ` [PATCH 3/5] mid: extract Message-ID from inside '<>' Eric Wong
2015-08-27 4:34 ` [PATCH 4/5] wire up to display non-suffixed Message-ID links Eric Wong
2015-08-27 4:34 ` [PATCH 5/5] implement legacy redirects for old URLs Eric Wong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).