* [RFC 0/2] support for /~/$MESSAGE_ID endpoint @ 2019-01-09 11:43 Eric Wong 2019-01-09 11:43 ` [RFC 1/2] config: inbox name checking matches git.git more closely Eric Wong 2019-01-09 11:43 ` [RFC 2/2] www: add /~/$MESSAGE_ID global redirector endpoint Eric Wong 0 siblings, 2 replies; 10+ messages in thread From: Eric Wong @ 2019-01-09 11:43 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: meta Only lightly-tested at the moment, and unsure about the "/~/" (I suppose "/_/" could conflict with a valid inbox name). Eric Wong (2): config: inbox name checking matches git.git more closely www: add /~/$MESSAGE_ID global redirector endpoint MANIFEST | 1 + lib/PublicInbox/Config.pm | 20 ++++++++++-- lib/PublicInbox/WWW.pm | 48 ++++++++++++++++++++++++--- t/config.t | 36 ++++++++++++++++++++ t/psgi_scan_all.t | 69 +++++++++++++++++++++++++++++++++++++++ 5 files changed, 168 insertions(+), 6 deletions(-) create mode 100644 t/psgi_scan_all.t -- EW ^ permalink raw reply [flat|nested] 10+ messages in thread
* [RFC 1/2] config: inbox name checking matches git.git more closely 2019-01-09 11:43 [RFC 0/2] support for /~/$MESSAGE_ID endpoint Eric Wong @ 2019-01-09 11:43 ` Eric Wong 2019-01-09 11:43 ` [RFC 2/2] www: add /~/$MESSAGE_ID global redirector endpoint Eric Wong 1 sibling, 0 replies; 10+ messages in thread From: Eric Wong @ 2019-01-09 11:43 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: meta Actually, it turns out git.git/remote.c::valid_remote_nick rules alone are insufficient. More checking is performed as part of the refname in the git.git/refs.c::check_refname_component I also considered rejecting URL-unfriendly inbox names entirely, but realized some users may intentionally configure names not handled by our WWW endpoint for archives they don't want accessible over HTTP. --- lib/PublicInbox/Config.pm | 20 ++++++++++++++++++-- lib/PublicInbox/WWW.pm | 4 +++- t/config.t | 36 ++++++++++++++++++++++++++++++++++++ 3 files changed, 57 insertions(+), 3 deletions(-) diff --git a/lib/PublicInbox/Config.pm b/lib/PublicInbox/Config.pm index a2b721d..bea2617 100644 --- a/lib/PublicInbox/Config.pm +++ b/lib/PublicInbox/Config.pm @@ -152,6 +152,23 @@ sub git_config_dump { \%rv; } +sub valid_inbox_name ($) { + my ($name) = @_; + + # Similar rules found in git.git/remote.c::valid_remote_nick + # and git.git/refs.c::check_refname_component + # We don't reject /\.lock\z/, however, since we don't lock refs + if ($name eq '' || $name =~ /\@\{/ || + $name =~ /\.\./ || $name =~ m![/:\?\[\]\^~\s\f[:cntrl:]\*]! || + $name =~ /\A\./ || $name =~ /\.\z/) { + return 0; + } + + # Note: we allow URL-unfriendly characters; users may configure + # non-HTTP-accessible inboxes + 1; +} + sub _fill { my ($self, $pfx) = @_; my $rv = {}; @@ -185,8 +202,7 @@ sub _fill { my $name = $pfx; $name =~ s/\Apublicinbox\.//; - # same rules as git.git/remote.c::valid_remote_nick - if ($name eq '' || $name =~ m!/! || $name eq '.' || $name eq '..') { + if (!valid_inbox_name($name)) { warn "invalid inbox name: '$name'\n"; return; } diff --git a/lib/PublicInbox/WWW.pm b/lib/PublicInbox/WWW.pm index c1c3926..3562e46 100644 --- a/lib/PublicInbox/WWW.pm +++ b/lib/PublicInbox/WWW.pm @@ -19,7 +19,9 @@ use URI::Escape qw(uri_unescape); use PublicInbox::MID qw(mid_escape); require PublicInbox::Git; use PublicInbox::GitHTTPBackend; -our $INBOX_RE = qr!\A/([\w\.\-]+)!; + +# TODO: consider a routing tree now that we have more endpoints: +our $INBOX_RE = qr!\A/([\w\-][\w\.\-]*)!; our $MID_RE = qr!([^/]+)!; our $END_RE = qr!(T/|t/|t\.mbox(?:\.gz)?|t\.atom|raw|)!; our $ATTACH_RE = qr!(\d[\.\d]*)-([[:alnum:]][\w\.-]+[[:alnum:]])!i; diff --git a/t/config.t b/t/config.t index 6a6b98c..5f0a95b 100644 --- a/t/config.t +++ b/t/config.t @@ -114,4 +114,40 @@ my $tmpdir = tempdir('pi-config-XXXXXX', TMPDIR => 1, CLEANUP => 1); }, 'known addresses populated'); } +my @invalid = ( + # git rejects this because it locks refnames, but we don't have + # this problem with inbox names: + # 'inbox.lock', + + # git rejects these: + '', '..', '.', 'stash@{9}', 'inbox.', '^caret', '~tilde', + '*asterisk', 's p a c e s', ' leading-space', 'trailing-space ', + 'question?', 'colon:', '[square-brace]', "\fformfeed", + "\0zero", "\bbackspace", + +); + +require Data::Dumper; +for my $s (@invalid) { + my $d = Data::Dumper->new([$s])->Terse(1)->Indent(0)->Dump; + ok(!PublicInbox::Config::valid_inbox_name($s), "$d name rejected"); +} + +# obviously-valid examples +my @valid = qw(a a@example a@example.com); + +# Rejecting more was considered, but then it dawned on me that +# people may intentionally use inbox names which are not URL-friendly +# to prevent the PSGI interface from displaying them... +# URL-unfriendly +# '<', '>', '%', '#', '?', '&', '(', ')', + +# maybe these aren't so bad, they're common in Message-IDs, even: +# '!', '$', '=', '+' +push @valid, qw[bang! ca$h less< more> 1% (parens) &more eql= +plus], '#hash'; +for my $s (@valid) { + my $d = Data::Dumper->new([$s])->Terse(1)->Indent(0)->Dump; + ok(PublicInbox::Config::valid_inbox_name($s), "$d name accepted"); +} + done_testing(); -- EW ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RFC 2/2] www: add /~/$MESSAGE_ID global redirector endpoint 2019-01-09 11:43 [RFC 0/2] support for /~/$MESSAGE_ID endpoint Eric Wong 2019-01-09 11:43 ` [RFC 1/2] config: inbox name checking matches git.git more closely Eric Wong @ 2019-01-09 11:43 ` Eric Wong 2019-01-27 2:06 ` Eric Wong 1 sibling, 1 reply; 10+ messages in thread From: Eric Wong @ 2019-01-09 11:43 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: meta The "/~/" is not finalized, yet. Initially I chose "/_/", but it could conflict with valid git remote names. Perhaps even "/.$MESSAGE_ID" or "/~$MESSAGE_ID" could work to save a byte. Requested-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> cf. https://public-inbox.org/meta/20190107190719.GE9442@pure.paranoia.local/ --- MANIFEST | 1 + lib/PublicInbox/WWW.pm | 44 +++++++++++++++++++++++++-- t/psgi_scan_all.t | 69 ++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 111 insertions(+), 3 deletions(-) create mode 100644 t/psgi_scan_all.t diff --git a/MANIFEST b/MANIFEST index e4f3df8..73d1047 100644 --- a/MANIFEST +++ b/MANIFEST @@ -193,6 +193,7 @@ t/psgi_attach.t t/psgi_bad_mids.t t/psgi_mount.t t/psgi_multipart_not.t +t/psgi_scan_all.t t/psgi_search.t t/psgi_text.t t/psgi_v2.t diff --git a/lib/PublicInbox/WWW.pm b/lib/PublicInbox/WWW.pm index 3562e46..9e0973f 100644 --- a/lib/PublicInbox/WWW.pm +++ b/lib/PublicInbox/WWW.pm @@ -360,11 +360,23 @@ sub legacy_redirects { r301($ctx, $1, $2, $3 eq 't' ? 't/#u' : $3); } elsif ($path_info =~ m!$INBOX_RE/(\S+/\S+)/f\z!o) { r301($ctx, $1, $2); + + # scan across all inboxes + # XXX '/~/$MESSAGE_ID' not finalized + } elsif ($path_info =~ m!\A/~/(\S+)\z!) { + scan_all($ctx, $1); } else { $ctx->{www}->news_www->call($ctx->{env}); } } +sub redirect ($$) { + my ($code, $url) = @_; + [ $code, + [ Location => $url, 'Content-Type' => 'text/plain' ], + [ "Redirecting to $url\n" ] ] +} + sub r301 { my ($ctx, $inbox, $mid_ue, $suffix) = @_; my $obj = $ctx->{-inbox}; @@ -383,9 +395,7 @@ sub r301 { $url .= $suffix if (defined $suffix); $url .= "?$qs" if $qs ne ''; - [ 301, - [ Location => $url, 'Content-Type' => 'text/plain' ], - [ "Redirecting to $url\n" ] ] + redirect(301, $url); } sub msg_page { @@ -446,4 +456,32 @@ sub get_attach { PublicInbox::WwwAttach::get_attach($ctx, $idx, $fn); } +sub scan_all { + my ($ctx, $mid) = @_; # mid may have trailing slash + + # TODO: user-sortable + + my @found; + do { + $ctx->{www}->{pi_config}->each_inbox(sub { + my ($ibx) = @_; + # do not pass $env, since HTTP_HOST can be different + my $url = $ibx->base_url or next; + + my $n = eval { $ibx->mm->num_for($mid) } or return; + + # ambiguous, so 302 instead of 301: + push @found, redirect(302, $url .= "$mid/"); + }); + + # account for trailing slash, since the rest of our API uses it + } while (!@found && $mid =~ s!/+\z!!); + + # FIXME: It's possible for a message to have the same Message-ID but + # different content across multiple groups... + @found ? $found[0] : r404(); + + # n.b. we use trailing slash in most URLs to allow "wget -r" mirrors :) +} + 1; diff --git a/t/psgi_scan_all.t b/t/psgi_scan_all.t new file mode 100644 index 0000000..bf03f22 --- /dev/null +++ b/t/psgi_scan_all.t @@ -0,0 +1,69 @@ +# Copyright (C) 2019 all contributors <meta@public-inbox.org> +# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> +use strict; +use warnings; +use Test::More; +use Email::MIME; +use File::Temp qw/tempdir/; +use PublicInbox::Config; +my @mods = qw(HTTP::Request::Common Plack::Test URI::Escape Search::Xapian + DBD::SQLite); +foreach my $mod (@mods) { + eval "require $mod"; + plan skip_all => "$mod missing for psgi_scan_all.t" if $@; +} +use_ok 'PublicInbox::V2Writable'; +foreach my $mod (@mods) { use_ok $mod; } +my $tmp = tempdir('pi-scan_all-XXXXXX', TMPDIR => 1, CLEANUP => 1); +my $cfg = {}; + +foreach my $i (1..2) { + my $cfgpfx = "publicinbox.test-$i"; + my $addr = $cfg->{"$cfgpfx.address"} = "test-$i\@example.com"; + my $mainrepo = $cfg->{"$cfgpfx.mainrepo"} = "$tmp/$i"; + $cfg->{"$cfgpfx.url"} = "http://example.com/$i"; + my $opt = { + mainrepo => $mainrepo, + name => "test-$i", + version => 2, + -primary_address => $addr, + }; + my $ibx = PublicInbox::Inbox->new($opt); + my $im = PublicInbox::V2Writable->new($ibx, 1); + $im->{parallel} = 0; + $im->init_inbox(0); + my $mime = PublicInbox::MIME->new(<<EOF); +From: a\@example.com +To: $addr +Subject: s$i +Message-ID: <a-mid-$i\@b> +Date: Fri, 02 Oct 1993 00:00:00 +0000 + +hello world +EOF + + ok($im->add($mime), "added message to $i"); + $im->done; +} +my $config = PublicInbox::Config->new($cfg); +use_ok 'PublicInbox::WWW'; +my $www = PublicInbox::WWW->new($config); + +test_psgi(sub { $www->call(@_) }, sub { + my ($cb) = @_; + foreach my $i (1..2) { + foreach my $end ('', '/') { + my $res = $cb->(GET("/~/a-mid-$i\@b$end")); + is($res->code, 302, 'got 302'); + is($res->header('Location'), + "http://example.com/$i/a-mid-$i\@b/", + "redirected OK to $i"); + } + } + foreach my $x (qw(inv@lid inv@lid/ i/v/a l/i/d/)) { + my $res = $cb->(GET("/~/$x")); + is($res->code, 404, "404 on $x"); + } +}); + +done_testing(); -- EW ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [RFC 2/2] www: add /~/$MESSAGE_ID global redirector endpoint 2019-01-09 11:43 ` [RFC 2/2] www: add /~/$MESSAGE_ID global redirector endpoint Eric Wong @ 2019-01-27 2:06 ` Eric Wong 2019-01-28 13:50 ` Konstantin Ryabitsev 0 siblings, 1 reply; 10+ messages in thread From: Eric Wong @ 2019-01-27 2:06 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: meta Eric Wong <e@80x24.org> wrote: > The "/~/" is not finalized, yet. Initially I chose "/_/", but > it could conflict with valid git remote names. Any thoughts on the use of "~"? Thinking about combining this with the "solver" feature for blob recreation... ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC 2/2] www: add /~/$MESSAGE_ID global redirector endpoint 2019-01-27 2:06 ` Eric Wong @ 2019-01-28 13:50 ` Konstantin Ryabitsev 2019-02-01 9:00 ` Eric Wong 0 siblings, 1 reply; 10+ messages in thread From: Konstantin Ryabitsev @ 2019-01-28 13:50 UTC (permalink / raw) To: Eric Wong; +Cc: meta On Sun, 27 Jan 2019 at 07:06, Eric Wong <e@80x24.org> wrote: > > Eric Wong <e@80x24.org> wrote: > > The "/~/" is not finalized, yet. Initially I chose "/_/", but > > it could conflict with valid git remote names. > > Any thoughts on the use of "~"? Thinking about combining this > with the "solver" feature for blob recreation... I don't really have a strong opinion on this one -- it's purely cosmetics. I'm wondering if it will be confusing to some folks due to ~ usually denoting $HOME. I would opt for /_mid/ to indicate that it's a special message-id lookup URL. Maybe marry the two and call it /~mid/? That would avoid clashing with potentially valid mailbox names. -K ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC 2/2] www: add /~/$MESSAGE_ID global redirector endpoint 2019-01-28 13:50 ` Konstantin Ryabitsev @ 2019-02-01 9:00 ` Eric Wong 2019-02-01 18:31 ` [PATCH v2] newswww: add /$MESSAGE_ID " Eric Wong 2019-02-19 19:53 ` [RFC 2/2] www: add /~/$MESSAGE_ID global redirector endpoint Konstantin Ryabitsev 0 siblings, 2 replies; 10+ messages in thread From: Eric Wong @ 2019-02-01 9:00 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: meta Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote: > > Eric Wong <e@80x24.org> wrote: > > > The "/~/" is not finalized, yet. Initially I chose "/_/", but > > > it could conflict with valid git remote names. > > > > Any thoughts on the use of "~"? Thinking about combining this > > with the "solver" feature for blob recreation... > > I don't really have a strong opinion on this one -- it's purely > cosmetics. I'm wondering if it will be confusing to some folks due to > ~ usually denoting $HOME. I would opt for /_mid/ to indicate that it's > a special message-id lookup URL. Maybe marry the two and call it > /~mid/? That would avoid clashing with potentially valid mailbox > names. Both "_mid" and "~mid" can cause usability problems and work badly when people try to select part of the URL using a pointing device. But I guess "/-/" or "/_/" are safe-enough choices and we can disallow them as inbox names... However, my screwing up solver hrefs in the Atom feeds got me thinking this can even be a 404 handler at the top level (similar to how PublicInbox::NewsWWW works). That would allow it to be mapped to any path (or domain) via the PSGI builder file... ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2] newswww: add /$MESSAGE_ID global redirector endpoint 2019-02-01 9:00 ` Eric Wong @ 2019-02-01 18:31 ` Eric Wong 2019-02-04 11:11 ` [PATCH v2] examples/newswww.psgi: demonstrate standalone NewsWWW usage Eric Wong 2019-02-19 19:53 ` [RFC 2/2] www: add /~/$MESSAGE_ID global redirector endpoint Konstantin Ryabitsev 1 sibling, 1 reply; 10+ messages in thread From: Eric Wong @ 2019-02-01 18:31 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: meta Eric Wong <e@80x24.org> wrote: > However, my screwing up solver hrefs in the Atom feeds got me > thinking this can even be a 404 handler at the top level > (similar to how PublicInbox::NewsWWW works). That would allow > it to be mapped to any path (or domain) via the PSGI builder > file... Or just use NewsWWW, because nntp://<HOSTNAME>/<Message-ID> is valid. Going to think about it while I eat and do other things, but will very likely merge it to master, soon. --------8<----------- Subject: [PATCH] newswww: add /$MESSAGE_ID global redirector endpoint This is the fallback for the normal WWW endpoint. Adding this to the top-level seems to be alright, since lynx and w3m both understand nntp://<HOSTNAME>/<Message-ID> anyways. If newsgroup and inbox names conflict, then consider it the fault of the original sender. Since NewsWWW is intended to support buggy linkifiers in mail clients, they can interpret nntp:// URLs as http://<HOSTNAME>/<Message-ID> Inbox ordering from the config file is preserved since commit cfa8ff7c256e20f3240aed5f98d155c019788e3b ("config: each_inbox iteration preserves config order"), so admins can rely on that to configure how scanning works. Requested-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> cf. https://public-inbox.org/meta/20190107190719.GE9442@pure.paranoia.local/ nntp://news.public-inbox.org/20190107190719.GE9442@pure.paranoia.local --- MANIFEST | 1 + lib/PublicInbox/NewsWWW.pm | 50 ++++++++++++++++++++++----- t/psgi_scan_all.t | 69 ++++++++++++++++++++++++++++++++++++++ 3 files changed, 111 insertions(+), 9 deletions(-) create mode 100644 t/psgi_scan_all.t diff --git a/MANIFEST b/MANIFEST index c4a9349..6ff2bfe 100644 --- a/MANIFEST +++ b/MANIFEST @@ -208,6 +208,7 @@ t/psgi_attach.t t/psgi_bad_mids.t t/psgi_mount.t t/psgi_multipart_not.t +t/psgi_scan_all.t t/psgi_search.t t/psgi_text.t t/psgi_v2.t diff --git a/lib/PublicInbox/NewsWWW.pm b/lib/PublicInbox/NewsWWW.pm index 01e34d7..d7fcb0d 100644 --- a/lib/PublicInbox/NewsWWW.pm +++ b/lib/PublicInbox/NewsWWW.pm @@ -1,4 +1,4 @@ -# Copyright (C) 2016-2018 all contributors <meta@public-inbox.org> +# Copyright (C) 2016-2019 all contributors <meta@public-inbox.org> # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> # # Plack app redirector for mapping /$NEWSGROUP requests to @@ -17,16 +17,34 @@ sub new { bless { pi_config => $pi_config }, $class; } +sub redirect ($$) { + my ($code, $url) = @_; + [ $code, + [ Location => $url, 'Content-Type' => 'text/plain' ], + [ "Redirecting to $url\n" ] ] +} + +sub try_inbox ($$) { + my ($ibx, $mid) = @_; + # do not pass $env since HTTP_HOST may differ + my $url = $ibx->base_url or return; + + eval { $ibx->mm->num_for($mid) } or return; + + # 302 since the same message may show up on + # multiple inboxes and inboxes can be added/reordered + redirect(302, $url .= mid_escape($mid) . '/'); +} + sub call { my ($self, $env) = @_; - my $path = $env->{PATH_INFO}; - $path =~ s!\A/+!!; - $path =~ s!/+\z!!; # some links may have the article number in them: # /inbox.foo.bar/123456 - my ($ng, $article) = split(m!/+!, $path, 2); - if (my $inbox = $self->{pi_config}->lookup_newsgroup($ng)) { + my (undef, @parts) = split(m!/!, $env->{PATH_INFO}); + my ($ng, $article) = @parts; + my $pi_config = $self->{pi_config}; + if (my $inbox = $pi_config->lookup_newsgroup($ng)) { my $url = PublicInbox::Hval::prurl($env, $inbox->{url}); my $code = 301; if (defined $article && $article =~ /\A\d+\z/) { @@ -38,12 +56,26 @@ sub call { $url .= mid_escape($mid) . '/'; } } + return redirect($code, $url); + } - my $h = [ Location => $url, 'Content-Type' => 'text/plain' ]; + my $res; + my @try = (join('/', @parts)); + + # trailing slash is in the rest of our WWW, so maybe some users + # will assume it: + if ($parts[-1] eq '') { + pop @parts; + push @try, join('/', @parts); + } - return [ $code, $h, [ "Redirecting to $url\n" ] ] + foreach my $mid (@try) { + $pi_config->each_inbox(sub { + $res ||= try_inbox($_[0], $mid); + }); + last if defined $res; } - [ 404, [ 'Content-Type' => 'text/plain' ], [ "404 Not Found\n" ] ]; + $res || [ 404, [qw(Content-Type text/plain)], ["404 Not Found\n"] ]; } 1; diff --git a/t/psgi_scan_all.t b/t/psgi_scan_all.t new file mode 100644 index 0000000..e9c439e --- /dev/null +++ b/t/psgi_scan_all.t @@ -0,0 +1,69 @@ +# Copyright (C) 2019 all contributors <meta@public-inbox.org> +# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> +use strict; +use warnings; +use Test::More; +use Email::MIME; +use File::Temp qw/tempdir/; +use PublicInbox::Config; +my @mods = qw(HTTP::Request::Common Plack::Test URI::Escape Search::Xapian + DBD::SQLite); +foreach my $mod (@mods) { + eval "require $mod"; + plan skip_all => "$mod missing for psgi_scan_all.t" if $@; +} +use_ok 'PublicInbox::V2Writable'; +foreach my $mod (@mods) { use_ok $mod; } +my $tmp = tempdir('pi-scan_all-XXXXXX', TMPDIR => 1, CLEANUP => 1); +my $cfg = {}; + +foreach my $i (1..2) { + my $cfgpfx = "publicinbox.test-$i"; + my $addr = $cfg->{"$cfgpfx.address"} = "test-$i\@example.com"; + my $mainrepo = $cfg->{"$cfgpfx.mainrepo"} = "$tmp/$i"; + $cfg->{"$cfgpfx.url"} = "http://example.com/$i"; + my $opt = { + mainrepo => $mainrepo, + name => "test-$i", + version => 2, + -primary_address => $addr, + }; + my $ibx = PublicInbox::Inbox->new($opt); + my $im = PublicInbox::V2Writable->new($ibx, 1); + $im->{parallel} = 0; + $im->init_inbox(0); + my $mime = PublicInbox::MIME->new(<<EOF); +From: a\@example.com +To: $addr +Subject: s$i +Message-ID: <a-mid-$i\@b> +Date: Fri, 02 Oct 1993 00:00:00 +0000 + +hello world +EOF + + ok($im->add($mime), "added message to $i"); + $im->done; +} +my $config = PublicInbox::Config->new($cfg); +use_ok 'PublicInbox::WWW'; +my $www = PublicInbox::WWW->new($config); + +test_psgi(sub { $www->call(@_) }, sub { + my ($cb) = @_; + foreach my $i (1..2) { + foreach my $end ('', '/') { + my $res = $cb->(GET("/a-mid-$i\@b$end")); + is($res->code, 302, 'got 302'); + is($res->header('Location'), + "http://example.com/$i/a-mid-$i\@b/", + "redirected OK to $i"); + } + } + foreach my $x (qw(inv@lid inv@lid/ i/v/a l/i/d/)) { + my $res = $cb->(GET("/$x")); + is($res->code, 404, "404 on $x"); + } +}); + +done_testing(); -- EW ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2] examples/newswww.psgi: demonstrate standalone NewsWWW usage 2019-02-01 18:31 ` [PATCH v2] newswww: add /$MESSAGE_ID " Eric Wong @ 2019-02-04 11:11 ` Eric Wong 0 siblings, 0 replies; 10+ messages in thread From: Eric Wong @ 2019-02-04 11:11 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: meta Eric Wong <e@80x24.org> wrote: > Or just use NewsWWW, because nntp://<HOSTNAME>/<Message-ID> is valid. > Going to think about it while I eat and do other things, but > will very likely merge it to master, soon. Yep. It's in NewsWWW, now. Also going to add this. I think it'll be helpful for nntp.lore.kernel.org to have this on 80/443 because somebody could share NNTP URLs and some software somewhere will interpret it as "HTTP" --------8<------- Subject: [PATCH] examples/newswww.psgi: demonstrate standalone NewsWWW usage Plack::Builder allows "mounting" on with hostnames as well as path names to enable virtual hosting. This example demonstrates how port 80/443 for "news.example.com" can redirect browser requests when somebody attempts to use a "nntp://" URL and the software assumes "http://" --- examples/newswww.psgi | 48 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 48 insertions(+) create mode 100644 examples/newswww.psgi diff --git a/examples/newswww.psgi b/examples/newswww.psgi new file mode 100644 index 0000000..0f66782 --- /dev/null +++ b/examples/newswww.psgi @@ -0,0 +1,48 @@ +#!/usr/bin/perl -w +# Copyright (C) 2019 all contributors <meta@public-inbox.org> +# License: GPL-3.0+ <https://www.gnu.org/licenses/gpl-3.0.txt> +# +# NewsWWW may be used independently of WWW. This can be useful +# for mapping HTTP/HTTPS requests to the hostname of an NNTP server +# to redirect users to the proper HTTP/HTTPS endpoint for a given +# inbox. NewsWWW exists because people (or software) can mishandle +# "nntp://" or "news://" URLs as "http://" (or "https://") +# +# Usage: +# plackup -I lib -o 127.0.0.1 -R lib -r examples/newswww.psgi +use strict; +use warnings; +use Plack::Builder; +use PublicInbox::WWW; +use PublicInbox::NewsWWW; + +my $newswww = PublicInbox::NewsWWW->new; + +# Optional, (you may drop the "mount '/'" section below) +my $www = PublicInbox::WWW->new; +$www->preload; + +builder { + # HTTP/1.1 requests to "Host: news.example.com" will hit this: + mount 'http://news.example.com/' => builder { + enable 'Head'; + sub { $newswww->call($_[0]) }; + }; + + # rest of requests will hit this (optional) part for the + # regular PublicInbox::WWW code: + # see comments in examples/public-inbox.psgi for more info: + mount '/' => builder { + eval { + enable 'Deflater', + content_type => [ qw( + text/html + text/plain + application/atom+xml + )] + }; + eval { enable 'ReverseProxy' }; + enable 'Head'; + sub { $www->call($_[0]) } + }; +} -- EW ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [RFC 2/2] www: add /~/$MESSAGE_ID global redirector endpoint 2019-02-01 9:00 ` Eric Wong 2019-02-01 18:31 ` [PATCH v2] newswww: add /$MESSAGE_ID " Eric Wong @ 2019-02-19 19:53 ` Konstantin Ryabitsev 2019-02-19 22:55 ` Eric Wong 1 sibling, 1 reply; 10+ messages in thread From: Konstantin Ryabitsev @ 2019-02-19 19:53 UTC (permalink / raw) To: Eric Wong; +Cc: meta On Fri, 1 Feb 2019 at 04:00, Eric Wong <e@80x24.org> wrote: > Both "_mid" and "~mid" can cause usability problems and > work badly when people try to select part of the URL using > a pointing device. But I guess "/-/" or "/_/" are safe-enough > choices and we can disallow them as inbox names... > > However, my screwing up solver hrefs in the Atom feeds got me > thinking this can even be a 404 handler at the top level > (similar to how PublicInbox::NewsWWW works). That would allow > it to be mapped to any path (or domain) via the PSGI builder > file... So, what's the latest decision at this point? I got a little lost looking through the latest commit messages. :) Is the feature going to be supported through toplevel /message_id URL, or am I misreading the code? -K ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC 2/2] www: add /~/$MESSAGE_ID global redirector endpoint 2019-02-19 19:53 ` [RFC 2/2] www: add /~/$MESSAGE_ID global redirector endpoint Konstantin Ryabitsev @ 2019-02-19 22:55 ` Eric Wong 0 siblings, 0 replies; 10+ messages in thread From: Eric Wong @ 2019-02-19 22:55 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: meta Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote: > So, what's the latest decision at this point? I got a little lost > looking through the latest commit messages. :) Is the feature going to > be supported through toplevel /message_id URL, or am I misreading the > code? Yup, /$MESSAGE_ID should work But, I suggest also looking into putting 80 and 443 on nntp.lore.kernel.org up for software (or people) who thinks all URLs are HTTP/HTTPS: https://public-inbox.org/meta/20190204111148.asdznzud6oblg3h4@dcvr/ ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2019-02-19 22:55 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-01-09 11:43 [RFC 0/2] support for /~/$MESSAGE_ID endpoint Eric Wong 2019-01-09 11:43 ` [RFC 1/2] config: inbox name checking matches git.git more closely Eric Wong 2019-01-09 11:43 ` [RFC 2/2] www: add /~/$MESSAGE_ID global redirector endpoint Eric Wong 2019-01-27 2:06 ` Eric Wong 2019-01-28 13:50 ` Konstantin Ryabitsev 2019-02-01 9:00 ` Eric Wong 2019-02-01 18:31 ` [PATCH v2] newswww: add /$MESSAGE_ID " Eric Wong 2019-02-04 11:11 ` [PATCH v2] examples/newswww.psgi: demonstrate standalone NewsWWW usage Eric Wong 2019-02-19 19:53 ` [RFC 2/2] www: add /~/$MESSAGE_ID global redirector endpoint Konstantin Ryabitsev 2019-02-19 22:55 ` Eric Wong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).