From: Eric Wong <e@80x24.org>
To: Leah Neukirchen <leah@vuxu.org>
Cc: meta@public-inbox.org
Subject: [PATCH] www: use undecoded paths for Message-ID extraction
Date: Wed, 13 Jun 2018 22:43:56 +0000 [thread overview]
Message-ID: <20180613224356.jz7abxkyg4i3tlf5@dcvr> (raw)
In-Reply-To: <20180613214055.2nudcx5e7w2y4q73@dcvr>
> Leah Neukirchen <leah@vuxu.org> wrote:
> > During testing, we also found another thing when obscure characters
> > are used in Message-IDs, esp. / and ?.
> >
> > E.g. using a Message-ID of <F1WYEAZPOF.3LOD2T7ZHY9I1@localdomain/raw/T>
> > will create a corrupt link. Some more "ideas" are at
> > https://inbox.vuxu.org/pi-test/
I guess I'm spoiled by Rack where PATH_INFO is undecoded :x
However, REQUEST_URI is specified in PSGI specs(*)
Very lightly tested, but this seems to work; additions to the
test suite will be necessary...
------8<----
Subject: [PATCH] www: use undecoded paths for Message-ID extraction
In PSGI, PATH_INFO contains URI-decoded paths which cause
problems when Message-IDs contain ambiguous characters for used
for routing. Instead, extract the undecoded path from
REQUEST_URI and use that.
Reported-by: Leah Neukirchen <leah@vuxu.org>
https://public-inbox.org/meta/8736xsb5s5.fsf@vuxu.org/
---
lib/PublicInbox/WWW.pm | 40 ++++++++++++++++++++++++++++------------
t/cgi.t | 2 ++
2 files changed, 30 insertions(+), 12 deletions(-)
diff --git a/lib/PublicInbox/WWW.pm b/lib/PublicInbox/WWW.pm
index 24e24f1..c1c3926 100644
--- a/lib/PublicInbox/WWW.pm
+++ b/lib/PublicInbox/WWW.pm
@@ -36,6 +36,17 @@ sub run {
PublicInbox::WWW->new->call($req->env);
}
+my %path_re_cache;
+
+sub path_re ($) {
+ my $sn = $_[0]->{SCRIPT_NAME};
+ $path_re_cache{$sn} ||= do {
+ $sn = '/'.$sn unless index($sn, '/') == 0;
+ $sn =~ s!/\z!!;
+ qr!\A(?:https?://[^/]+)?\Q$sn\E(/[^\?\#]+)!;
+ };
+}
+
sub call {
my ($self, $env) = @_;
my $ctx = { env => $env, www => $self };
@@ -50,7 +61,8 @@ sub call {
} split(/[&;]+/, $env->{QUERY_STRING});
$ctx->{qp} = \%qp;
- my $path_info = $env->{PATH_INFO};
+ # not using $env->{PATH_INFO} here since that's already decoded
+ my ($path_info) = ($env->{REQUEST_URI} =~ path_re($env));
my $method = $env->{REQUEST_METHOD};
if ($method eq 'POST') {
@@ -91,13 +103,13 @@ sub call {
invalid_inbox_mid($ctx, $1, $2) || get_attach($ctx, $idx, $fn);
# in case people leave off the trailing slash:
} elsif ($path_info =~ m!$INBOX_RE/$MID_RE/(T|t)\z!o) {
- my ($inbox, $mid, $suffix) = ($1, $2, $3);
+ my ($inbox, $mid_ue, $suffix) = ($1, $2, $3);
$suffix .= $suffix =~ /\A[tT]\z/ ? '/#u' : '/';
- r301($ctx, $inbox, $mid, $suffix);
+ r301($ctx, $inbox, $mid_ue, $suffix);
} elsif ($path_info =~ m!$INBOX_RE/$MID_RE/R/?\z!o) {
- my ($inbox, $mid) = ($1, $2);
- r301($ctx, $inbox, $mid, '#R');
+ my ($inbox, $mid_ue) = ($1, $2);
+ r301($ctx, $inbox, $mid_ue, '#R');
} elsif ($path_info =~ m!$INBOX_RE/$MID_RE/f/?\z!o) {
r301($ctx, $1, $2);
@@ -164,11 +176,11 @@ sub invalid_inbox ($$) {
# returns undef if valid, array ref response if invalid
sub invalid_inbox_mid {
- my ($ctx, $inbox, $mid) = @_;
+ my ($ctx, $inbox, $mid_ue) = @_;
my $ret = invalid_inbox($ctx, $inbox);
return $ret if $ret;
- $ctx->{mid} = $mid;
+ my $mid = $ctx->{mid} = uri_unescape($mid_ue);
my $ibx = $ctx->{-inbox};
if ($mid =~ m!\A([a-f0-9]{2})([a-f0-9]{38})\z!) {
my ($x2, $x38) = ($1, $2);
@@ -177,7 +189,7 @@ sub invalid_inbox_mid {
require Email::Simple;
my $s = Email::Simple->new($str);
$mid = PublicInbox::MID::mid_clean($s->header('Message-ID'));
- return r301($ctx, $inbox, $mid);
+ return r301($ctx, $inbox, mid_escape($mid));
}
undef;
}
@@ -352,7 +364,7 @@ sub legacy_redirects {
}
sub r301 {
- my ($ctx, $inbox, $mid, $suffix) = @_;
+ my ($ctx, $inbox, $mid_ue, $suffix) = @_;
my $obj = $ctx->{-inbox};
unless ($obj) {
my $r404 = invalid_inbox($ctx, $inbox);
@@ -361,7 +373,11 @@ sub r301 {
}
my $url = $obj->base_url($ctx->{env});
my $qs = $ctx->{env}->{QUERY_STRING};
- $url .= (mid_escape($mid) . '/') if (defined $mid);
+ if (defined $mid_ue) {
+ # common, and much nicer as '@' than '%40':
+ $mid_ue =~ s/%40/@/g;
+ $url .= $mid_ue . '/';
+ }
$url .= $suffix if (defined $suffix);
$url .= "?$qs" if $qs ne '';
@@ -371,9 +387,9 @@ sub r301 {
}
sub msg_page {
- my ($ctx, $inbox, $mid, $e) = @_;
+ my ($ctx, $inbox, $mid_ue, $e) = @_;
my $ret;
- $ret = invalid_inbox_mid($ctx, $inbox, $mid) and return $ret;
+ $ret = invalid_inbox_mid($ctx, $inbox, $mid_ue) and return $ret;
'' eq $e and return get_mid_html($ctx);
'T/' eq $e and return get_thread($ctx, 1);
't/' eq $e and return get_thread($ctx);
diff --git a/t/cgi.t b/t/cgi.t
index bd92ca3..2e2476d 100644
--- a/t/cgi.t
+++ b/t/cgi.t
@@ -225,6 +225,8 @@ sub cgi_run {
my %env = (
PATH_INFO => $_[0],
QUERY_STRING => $_[1] || "",
+ SCRIPT_NAME => '',
+ REQUEST_URI => $_[0] . ($_[1] ? "?$_[1]" : ''),
REQUEST_METHOD => $_[2] || "GET",
GATEWAY_INTERFACE => 'CGI/1.1',
HTTP_ACCEPT => '*/*',
--
(*) git clone https://github.com/plack/psgi-specs.git
next prev parent reply other threads:[~2018-06-13 22:43 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-06-09 17:06 Some points on public-inbox Leah Neukirchen
2018-06-12 10:09 ` Eric Wong
2018-06-12 11:31 ` Leah Neukirchen
2018-06-13 2:07 ` [PATCH] Makefile.PL: do not depend on git Eric Wong
2018-06-13 14:26 ` Leah Neukirchen
2018-06-13 21:04 ` Eric Wong
2018-06-13 21:20 ` Leah Neukirchen
2018-06-13 21:40 ` Some points on public-inbox Eric Wong
2018-06-13 22:43 ` Eric Wong [this message]
2018-06-26 7:46 ` [PATCH] additional tests for bad Message-IDs in URLs Eric Wong
2018-06-12 13:19 ` Some points on public-inbox Leah Neukirchen
2019-01-05 8:39 ` Eric Wong
2018-06-12 17:05 ` Konstantin Ryabitsev
2018-06-13 1:57 ` Eric Wong
2019-04-18 8:25 ` [RFC] www: support listing of inboxes Eric Wong
2019-05-05 23:36 ` Eric Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180613224356.jz7abxkyg4i3tlf5@dcvr \
--to=e@80x24.org \
--cc=leah@vuxu.org \
--cc=meta@public-inbox.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).