* Some points on public-inbox @ 2018-06-09 17:06 Leah Neukirchen 2018-06-12 10:09 ` Eric Wong 0 siblings, 1 reply; 16+ messages in thread From: Leah Neukirchen @ 2018-06-09 17:06 UTC (permalink / raw) To: meta Hi, over the last few days I've set up a public-inbox 1.1.0pre1 instance, and noticed some things: 1) Makefile.PL only works properly when run from a checkout, not a tarball. I replaced the beginning with my @EXE_FILES = split("\n", `printf '%s\n' script/* 2>/dev/null`); my $PM_FILES = `find lib 2>/dev/null`; 2) public-inbox-mda returns with status 1 when it gets a mail it doesn't know where to deliver to. I think status 67 would be more appropriate (EX_NOUSER). 3) IPv6 support needs the Socket6 module, this is not stated anywhere. 4) I think it would be useful if the thread overview displayed the name of the initial poster, could this be added as an option? 5) Is there a way for the HTML view to list all served lists? / results in 404. How did you add links to meta/ and test/ on https://public-inbox.org/ ? 6) I have a user account that uses .forward to call public-inbox-mda, and use /etc/aliases to route the lists that are hosted primarily on the server to it. What's the best approach to do this for mailing lists I only mirror? Subscribe with a "secret" second address to the list, and add this second adress to publicinbox.<name>.address? Or can public-inbox-mda also scan for List-Id etc and sort by it somehow? Thank you very much for your work, -- Leah Neukirchen <leah@vuxu.org> http://leah.zone ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Some points on public-inbox 2018-06-09 17:06 Some points on public-inbox Leah Neukirchen @ 2018-06-12 10:09 ` Eric Wong 2018-06-12 11:31 ` Leah Neukirchen ` (3 more replies) 0 siblings, 4 replies; 16+ messages in thread From: Eric Wong @ 2018-06-12 10:09 UTC (permalink / raw) To: Leah Neukirchen; +Cc: meta Leah Neukirchen <leah@vuxu.org> wrote: > Hi, > > over the last few days I've set up a public-inbox 1.1.0pre1 instance, > and noticed some things: Hey Leah, thanks for giving it a try! Sorry for the late reply, been trying to avoid being at the computer too much for health reasons. > 1) Makefile.PL only works properly when run from a checkout, not a tarball. > I replaced the beginning with > > my @EXE_FILES = split("\n", `printf '%s\n' script/* 2>/dev/null`); > my $PM_FILES = `find lib 2>/dev/null`; Thanks, I'd probably add "-name '*.pm'" to find(1) to filter out directories. But I wonder if it's better to grep the MANIFEST file... > 2) public-inbox-mda returns with status 1 when it gets a mail it > doesn't know where to deliver to. I think status 67 would be more > appropriate (EX_NOUSER). Sure. There's a bunch of places where we just die() and ignore sysexits.h or similar. Could use some help checking for that and patches are welcome :> > 3) IPv6 support needs the Socket6 module, this is not stated anywhere. Oops, I thought this was standard :x Care to send a patch to INSTALL for that? > 4) I think it would be useful if the thread overview displayed > the name of the initial poster, could this be added as an option? If anything, I'd rather list ALL the recent participants in a thread (it wouldn't require extra lookups). But my philosophy is not to give anybody more credit/weight than anybody else; and I'd rather people follow links if the Subject seems interesting, not because who started a particular topic (especially when it comes to emails with [PATCH] subjects). I also prefer to avoid having too many options to reduce support/documentation costs, so if we do something like this, it would be the default. But also, it's more clutter. > 5) Is there a way for the HTML view to list all served lists? Not currently... I'm not sure how the UI or configuration should be or how to avoid clutter/scalability problems with many inboxes. NNTP has standardized commands and clients can decide how to show them, at least. > / results in 404. How did you add links to meta/ and test/ on > https://public-inbox.org/ ? mkdir /srv/public-inbox/{meta,test} # static directory listing Rack::Builder routes meta/ and test/ to varnish => public-inbox-httpd All the HTTPS, static files and reverse proxying is handled using yet-another-horribly-named-server written in Ruby and Rack :) ==> config.ru snippet <== # random crap from yahns extras/ require "autoindex" require "try_gzip_static" require "yahns/proxy_pass" autoindex = lambda do |path| Autoindex.new(TryGzipStatic.new(path), skip_dotfiles: true) end pi = Rack::Builder.new do run Yahns::ProxyPass.new('http://127.0.0.1:6081', # varnish response_headers: { 'Age' => :ignore, 'X-Varnish' => :ignore, 'Via' => :ignore }) end.to_app unsub = Rack::Builder.new do run Yahns::ProxyPass.new('unix:/run/unsubscribe-psgi.sock') end.to_app unsub_re = %r{\A/u/[^/]+/[^/]+\z} map('http://public-inbox.org/') do pfx = 'public-inbox' static = autoindex["/srv/#{pfx}"] txt_html = %r{\.txt\.html\z} # oops :x proxy = Yahns::ProxyPass.new("http://127.0.0.1:2080/$host$fullpath", proxy_buffering: false) cascade = Rack::Cascade.new([static, proxy]) run(lambda do |env| return redirect(env, nil, nil) if env['rack.url_scheme'] == -'http' case path_info = env["PATH_INFO"] when %r{\A/test(?:/?.*)\z} redirect(env, "try.public-inbox.org", path_info) when %r{\A/(?:git|meta|public-inbox(?:\.git)?)(?:/|\z)}x, '/HEAD', '/info/refs', '/git-upload-pack', '/description', '/cloneurl', %r{\A/objects/} pi.call(env) when unsub_re unsub.call(env) when txt_html redirect(env, 'public-inbox.org', path_info.sub(txt_html, '.html')) else cascade.call(env) end end) end ==> end snippet <= > 6) I have a user account that uses .forward to call public-inbox-mda, > and use /etc/aliases to route the lists that are hosted primarily on > the server to it. What's the best approach to do this for mailing > lists I only mirror? Subscribe with a "secret" second address to the > list, and add this second adress to publicinbox.<name>.address? > Or can public-inbox-mda also scan for List-Id etc and sort by it somehow? I prefer to use public-inbox-watch for mirroring existing lists. -mda is also a bit strict and opinionated (though I have plans to make it less so, optionally), so it's mainly for non-mirrored inboxes. -watch is also safer and less likely to lose/bounce mail since it hits a Maildir, first. -watch will scan for List-Id (or any other header, such as X-Mailing-List) and put it into the correct inbox. If space is a problem, a cronjob to remove old files will help, but maybe it can unlink-on-import-commit in the future. I haven't thought much about mirroring with -mda, but I suppose having a per-list subscriber address and extra publicinbox.<name>.address entry works, too. > Thank you very much for your work, No problem :> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Some points on public-inbox 2018-06-12 10:09 ` Eric Wong @ 2018-06-12 11:31 ` Leah Neukirchen 2018-06-13 2:07 ` [PATCH] Makefile.PL: do not depend on git Eric Wong 2018-06-13 21:40 ` Some points on public-inbox Eric Wong 2018-06-12 13:19 ` Some points on public-inbox Leah Neukirchen ` (2 subsequent siblings) 3 siblings, 2 replies; 16+ messages in thread From: Leah Neukirchen @ 2018-06-12 11:31 UTC (permalink / raw) To: Eric Wong; +Cc: meta Eric Wong <e@80x24.org> writes: > Leah Neukirchen <leah@vuxu.org> wrote: >> Hi, >> >> over the last few days I've set up a public-inbox 1.1.0pre1 instance, >> and noticed some things: > > Hey Leah, thanks for giving it a try! Sorry for the late reply, > been trying to avoid being at the computer too much for health > reasons. No problem, get well soon. >> 1) Makefile.PL only works properly when run from a checkout, not a tarball. >> I replaced the beginning with >> >> my @EXE_FILES = split("\n", `printf '%s\n' script/* 2>/dev/null`); >> my $PM_FILES = `find lib 2>/dev/null`; > > Thanks, I'd probably add "-name '*.pm'" to find(1) to filter out > directories. But I wonder if it's better to grep the MANIFEST > file... Yes, using MANIFEST is a better solution. >> 2) public-inbox-mda returns with status 1 when it gets a mail it >> doesn't know where to deliver to. I think status 67 would be more >> appropriate (EX_NOUSER). > > Sure. There's a bunch of places where we just die() and ignore > sysexits.h or similar. Could use some help checking for that > and patches are welcome :> I'll have a look at this. >> 3) IPv6 support needs the Socket6 module, this is not stated anywhere. > > Oops, I thought this was standard :x Care to send a patch to > INSTALL for that? Will do. >> 5) Is there a way for the HTML view to list all served lists? > > Not currently... I'm not sure how the UI or configuration > should be or how to avoid clutter/scalability problems with many > inboxes. NNTP has standardized commands and clients can decide > how to show them, at least. Yes, I was thinking of just having a list of "name - description", straight from the config file? >> / results in 404. How did you add links to meta/ and test/ on >> https://public-inbox.org/ ? > > mkdir /srv/public-inbox/{meta,test} # static directory listing I built something similar with nginx now. >> and use /etc/aliases to route the lists that are hosted primarily on >> the server to it. What's the best approach to do this for mailing >> lists I only mirror? Subscribe with a "secret" second address to the >> list, and add this second adress to publicinbox.<name>.address? >> Or can public-inbox-mda also scan for List-Id etc and sort by it somehow? > > I prefer to use public-inbox-watch for mirroring existing lists. > > -mda is also a bit strict and opinionated (though I have plans to > make it less so, optionally), so it's mainly for non-mirrored > inboxes. > > -watch is also safer and less likely to lose/bounce mail since > it hits a Maildir, first. -watch will scan for List-Id (or any > other header, such as X-Mailing-List) and put it into the > correct inbox. If space is a problem, a cronjob to remove > old files will help, but maybe it can unlink-on-import-commit > in the future. Space is not an issue, and scanning for special headers will avoid getting password reminders and administrative messages into the archive. I'll use watch then. During testing, we also found another thing when obscure characters are used in Message-IDs, esp. / and ?. E.g. using a Message-ID of <F1WYEAZPOF.3LOD2T7ZHY9I1@localdomain/raw/T> will create a corrupt link. Some more "ideas" are at https://inbox.vuxu.org/pi-test/ Thanks, -- Leah Neukirchen <leah@vuxu.org> http://leah.zone ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH] Makefile.PL: do not depend on git 2018-06-12 11:31 ` Leah Neukirchen @ 2018-06-13 2:07 ` Eric Wong 2018-06-13 14:26 ` Leah Neukirchen 2018-06-13 21:40 ` Some points on public-inbox Eric Wong 1 sibling, 1 reply; 16+ messages in thread From: Eric Wong @ 2018-06-13 2:07 UTC (permalink / raw) To: Leah Neukirchen; +Cc: meta Leah Neukirchen <leah@vuxu.org> wrote: > Eric Wong <e@80x24.org> writes: > > been trying to avoid being at the computer too much for health > > reasons. > > No problem, get well soon. Thanks, but they're more preventative measures than anything. > >> 1) Makefile.PL only works properly when run from a checkout, not a tarball. > >> I replaced the beginning with > >> > >> my @EXE_FILES = split("\n", `printf '%s\n' script/* 2>/dev/null`); > >> my $PM_FILES = `find lib 2>/dev/null`; > > > > Thanks, I'd probably add "-name '*.pm'" to find(1) to filter out > > directories. But I wonder if it's better to grep the MANIFEST > > file... > > Yes, using MANIFEST is a better solution. OK, will push this out: ---------8<------- Subject: [PATCH] Makefile.PL: do not depend on git Otherwise, things do not work from a tarball distribution. Reported-by: Leah Neukirchen <leah@vuxu.org> https://public-inbox.org/meta/871sdfzy80.fsf@gmail.com/ --- Makefile.PL | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/Makefile.PL b/Makefile.PL index a47e17b..027c3e6 100644 --- a/Makefile.PL +++ b/Makefile.PL @@ -3,9 +3,10 @@ # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> use strict; use ExtUtils::MakeMaker; -my @EXE_FILES = split("\n", `git ls-files 'script/*' 2>/dev/null`); -my $PM_FILES = `git ls-files lib '*.pm' 2>/dev/null`; -$PM_FILES =~ tr/\n/ /; +open my $m, '<', 'MANIFEST' or die "open(MANIFEST): $!\n"; +chomp(my @manifest = (<$m>)); +my @EXE_FILES = grep(m!^script/!, @manifest); +my $PM_FILES = join(' ', grep(m!^lib/.*\.pm$!, @manifest)); WriteMakefile( NAME => 'PublicInbox', -- EW ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH] Makefile.PL: do not depend on git 2018-06-13 2:07 ` [PATCH] Makefile.PL: do not depend on git Eric Wong @ 2018-06-13 14:26 ` Leah Neukirchen 2018-06-13 21:04 ` Eric Wong 0 siblings, 1 reply; 16+ messages in thread From: Leah Neukirchen @ 2018-06-13 14:26 UTC (permalink / raw) To: Eric Wong; +Cc: meta Eric Wong <e@80x24.org> writes: > Subject: [PATCH] Makefile.PL: do not depend on git > > Otherwise, things do not work from a tarball distribution. The same issue also applies to ssoma. -- Leah Neukirchen <leah@vuxu.org> http://leah.zone ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Makefile.PL: do not depend on git 2018-06-13 14:26 ` Leah Neukirchen @ 2018-06-13 21:04 ` Eric Wong 2018-06-13 21:20 ` Leah Neukirchen 0 siblings, 1 reply; 16+ messages in thread From: Eric Wong @ 2018-06-13 21:04 UTC (permalink / raw) To: Leah Neukirchen; +Cc: meta Leah Neukirchen <leah@vuxu.org> wrote: > Eric Wong <e@80x24.org> writes: > > > Subject: [PATCH] Makefile.PL: do not depend on git > > > > Otherwise, things do not work from a tarball distribution. > > The same issue also applies to ssoma. Thanks, pushed an identical change to 29b74d2c4cda7e06524ec92d5bc517cc07a05979 to ssoma.git That said, I'm not sure if it's worth continuing work on ssoma... I think focusing on existing standards such as NNTP is a better use of our time. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Makefile.PL: do not depend on git 2018-06-13 21:04 ` Eric Wong @ 2018-06-13 21:20 ` Leah Neukirchen 0 siblings, 0 replies; 16+ messages in thread From: Leah Neukirchen @ 2018-06-13 21:20 UTC (permalink / raw) To: Eric Wong; +Cc: meta Eric Wong <e@80x24.org> writes: > Leah Neukirchen <leah@vuxu.org> wrote: >> Eric Wong <e@80x24.org> writes: >> >> > Subject: [PATCH] Makefile.PL: do not depend on git >> > >> > Otherwise, things do not work from a tarball distribution. >> >> The same issue also applies to ssoma. > > Thanks, pushed an identical change to > 29b74d2c4cda7e06524ec92d5bc517cc07a05979 to ssoma.git > > That said, I'm not sure if it's worth continuing work on ssoma... > I think focusing on existing standards such as NNTP is a better > use of our time. I wrote a prototype of a NNTP fetcher for mblaze: http://git.vuxu.org/mblaze/tree/contrib/msuck -- Leah Neukirchen <leah@vuxu.org> http://leah.zone ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Some points on public-inbox 2018-06-12 11:31 ` Leah Neukirchen 2018-06-13 2:07 ` [PATCH] Makefile.PL: do not depend on git Eric Wong @ 2018-06-13 21:40 ` Eric Wong 2018-06-13 22:43 ` [PATCH] www: use undecoded paths for Message-ID extraction Eric Wong 1 sibling, 1 reply; 16+ messages in thread From: Eric Wong @ 2018-06-13 21:40 UTC (permalink / raw) To: Leah Neukirchen; +Cc: meta Leah Neukirchen <leah@vuxu.org> wrote: > During testing, we also found another thing when obscure characters > are used in Message-IDs, esp. / and ?. > > E.g. using a Message-ID of <F1WYEAZPOF.3LOD2T7ZHY9I1@localdomain/raw/T> > will create a corrupt link. Some more "ideas" are at > https://inbox.vuxu.org/pi-test/ Thanks for this. I've overlooked some things and will investigate :x ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH] www: use undecoded paths for Message-ID extraction 2018-06-13 21:40 ` Some points on public-inbox Eric Wong @ 2018-06-13 22:43 ` Eric Wong 2018-06-26 7:46 ` [PATCH] additional tests for bad Message-IDs in URLs Eric Wong 0 siblings, 1 reply; 16+ messages in thread From: Eric Wong @ 2018-06-13 22:43 UTC (permalink / raw) To: Leah Neukirchen; +Cc: meta > Leah Neukirchen <leah@vuxu.org> wrote: > > During testing, we also found another thing when obscure characters > > are used in Message-IDs, esp. / and ?. > > > > E.g. using a Message-ID of <F1WYEAZPOF.3LOD2T7ZHY9I1@localdomain/raw/T> > > will create a corrupt link. Some more "ideas" are at > > https://inbox.vuxu.org/pi-test/ I guess I'm spoiled by Rack where PATH_INFO is undecoded :x However, REQUEST_URI is specified in PSGI specs(*) Very lightly tested, but this seems to work; additions to the test suite will be necessary... ------8<---- Subject: [PATCH] www: use undecoded paths for Message-ID extraction In PSGI, PATH_INFO contains URI-decoded paths which cause problems when Message-IDs contain ambiguous characters for used for routing. Instead, extract the undecoded path from REQUEST_URI and use that. Reported-by: Leah Neukirchen <leah@vuxu.org> https://public-inbox.org/meta/8736xsb5s5.fsf@vuxu.org/ --- lib/PublicInbox/WWW.pm | 40 ++++++++++++++++++++++++++++------------ t/cgi.t | 2 ++ 2 files changed, 30 insertions(+), 12 deletions(-) diff --git a/lib/PublicInbox/WWW.pm b/lib/PublicInbox/WWW.pm index 24e24f1..c1c3926 100644 --- a/lib/PublicInbox/WWW.pm +++ b/lib/PublicInbox/WWW.pm @@ -36,6 +36,17 @@ sub run { PublicInbox::WWW->new->call($req->env); } +my %path_re_cache; + +sub path_re ($) { + my $sn = $_[0]->{SCRIPT_NAME}; + $path_re_cache{$sn} ||= do { + $sn = '/'.$sn unless index($sn, '/') == 0; + $sn =~ s!/\z!!; + qr!\A(?:https?://[^/]+)?\Q$sn\E(/[^\?\#]+)!; + }; +} + sub call { my ($self, $env) = @_; my $ctx = { env => $env, www => $self }; @@ -50,7 +61,8 @@ sub call { } split(/[&;]+/, $env->{QUERY_STRING}); $ctx->{qp} = \%qp; - my $path_info = $env->{PATH_INFO}; + # not using $env->{PATH_INFO} here since that's already decoded + my ($path_info) = ($env->{REQUEST_URI} =~ path_re($env)); my $method = $env->{REQUEST_METHOD}; if ($method eq 'POST') { @@ -91,13 +103,13 @@ sub call { invalid_inbox_mid($ctx, $1, $2) || get_attach($ctx, $idx, $fn); # in case people leave off the trailing slash: } elsif ($path_info =~ m!$INBOX_RE/$MID_RE/(T|t)\z!o) { - my ($inbox, $mid, $suffix) = ($1, $2, $3); + my ($inbox, $mid_ue, $suffix) = ($1, $2, $3); $suffix .= $suffix =~ /\A[tT]\z/ ? '/#u' : '/'; - r301($ctx, $inbox, $mid, $suffix); + r301($ctx, $inbox, $mid_ue, $suffix); } elsif ($path_info =~ m!$INBOX_RE/$MID_RE/R/?\z!o) { - my ($inbox, $mid) = ($1, $2); - r301($ctx, $inbox, $mid, '#R'); + my ($inbox, $mid_ue) = ($1, $2); + r301($ctx, $inbox, $mid_ue, '#R'); } elsif ($path_info =~ m!$INBOX_RE/$MID_RE/f/?\z!o) { r301($ctx, $1, $2); @@ -164,11 +176,11 @@ sub invalid_inbox ($$) { # returns undef if valid, array ref response if invalid sub invalid_inbox_mid { - my ($ctx, $inbox, $mid) = @_; + my ($ctx, $inbox, $mid_ue) = @_; my $ret = invalid_inbox($ctx, $inbox); return $ret if $ret; - $ctx->{mid} = $mid; + my $mid = $ctx->{mid} = uri_unescape($mid_ue); my $ibx = $ctx->{-inbox}; if ($mid =~ m!\A([a-f0-9]{2})([a-f0-9]{38})\z!) { my ($x2, $x38) = ($1, $2); @@ -177,7 +189,7 @@ sub invalid_inbox_mid { require Email::Simple; my $s = Email::Simple->new($str); $mid = PublicInbox::MID::mid_clean($s->header('Message-ID')); - return r301($ctx, $inbox, $mid); + return r301($ctx, $inbox, mid_escape($mid)); } undef; } @@ -352,7 +364,7 @@ sub legacy_redirects { } sub r301 { - my ($ctx, $inbox, $mid, $suffix) = @_; + my ($ctx, $inbox, $mid_ue, $suffix) = @_; my $obj = $ctx->{-inbox}; unless ($obj) { my $r404 = invalid_inbox($ctx, $inbox); @@ -361,7 +373,11 @@ sub r301 { } my $url = $obj->base_url($ctx->{env}); my $qs = $ctx->{env}->{QUERY_STRING}; - $url .= (mid_escape($mid) . '/') if (defined $mid); + if (defined $mid_ue) { + # common, and much nicer as '@' than '%40': + $mid_ue =~ s/%40/@/g; + $url .= $mid_ue . '/'; + } $url .= $suffix if (defined $suffix); $url .= "?$qs" if $qs ne ''; @@ -371,9 +387,9 @@ sub r301 { } sub msg_page { - my ($ctx, $inbox, $mid, $e) = @_; + my ($ctx, $inbox, $mid_ue, $e) = @_; my $ret; - $ret = invalid_inbox_mid($ctx, $inbox, $mid) and return $ret; + $ret = invalid_inbox_mid($ctx, $inbox, $mid_ue) and return $ret; '' eq $e and return get_mid_html($ctx); 'T/' eq $e and return get_thread($ctx, 1); 't/' eq $e and return get_thread($ctx); diff --git a/t/cgi.t b/t/cgi.t index bd92ca3..2e2476d 100644 --- a/t/cgi.t +++ b/t/cgi.t @@ -225,6 +225,8 @@ sub cgi_run { my %env = ( PATH_INFO => $_[0], QUERY_STRING => $_[1] || "", + SCRIPT_NAME => '', + REQUEST_URI => $_[0] . ($_[1] ? "?$_[1]" : ''), REQUEST_METHOD => $_[2] || "GET", GATEWAY_INTERFACE => 'CGI/1.1', HTTP_ACCEPT => '*/*', -- (*) git clone https://github.com/plack/psgi-specs.git ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH] additional tests for bad Message-IDs in URLs 2018-06-13 22:43 ` [PATCH] www: use undecoded paths for Message-ID extraction Eric Wong @ 2018-06-26 7:46 ` Eric Wong 0 siblings, 0 replies; 16+ messages in thread From: Eric Wong @ 2018-06-26 7:46 UTC (permalink / raw) To: Leah Neukirchen; +Cc: meta Followup-to: 73cfed86d8a8287a ("www: use undecoded paths for Message-ID extraction") Reported-by: Leah Neukirchen <leah@vuxu.org> https://public-inbox.org/meta/8736xsb5s5.fsf@vuxu.org/ --- Oops, forgot this earlier :x MANIFEST | 1 + t/psgi_bad_mids.t | 85 +++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 86 insertions(+) create mode 100644 t/psgi_bad_mids.t diff --git a/MANIFEST b/MANIFEST index 08a8ef4..68c79c9 100644 --- a/MANIFEST +++ b/MANIFEST @@ -182,6 +182,7 @@ t/perf-threading.t t/plack.t t/precheck.t t/psgi_attach.t +t/psgi_bad_mids.t t/psgi_mount.t t/psgi_search.t t/psgi_text.t diff --git a/t/psgi_bad_mids.t b/t/psgi_bad_mids.t new file mode 100644 index 0000000..5008f5b --- /dev/null +++ b/t/psgi_bad_mids.t @@ -0,0 +1,85 @@ +# Copyright (C) 2018 all contributors <meta@public-inbox.org> +# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> +use strict; +use warnings; +use Test::More; +use File::Temp qw/tempdir/; +use PublicInbox::MIME; +use PublicInbox::Config; +use PublicInbox::WWW; +my @mods = qw(DBD::SQLite Search::Xapian HTTP::Request::Common Plack::Test + URI::Escape Plack::Builder); +foreach my $mod (@mods) { + eval "require $mod"; + plan skip_all => "$mod missing for psgi_bad_mids.t" if $@; +} +use_ok($_) for @mods; +use_ok 'PublicInbox::V2Writable'; +my $mainrepo = tempdir('pi-bad-mids-XXXXXX', TMPDIR => 1, CLEANUP => 1); +my $cfgpfx = "publicinbox.bad-mids"; +my $ibx = { + mainrepo => $mainrepo, + name => 'bad-mids', + version => 2, + -primary_address => 'test@example.com', +}; +$ibx = PublicInbox::Inbox->new($ibx); +my $im = PublicInbox::V2Writable->new($ibx, 1); +$im->{parallel} = 0; + +my $msgs = <<''; +F1V5OR6NMF.3M649JTLO9IXD@tux.localdomain/hehe1"'<foo +F1V5NB0PTU.3U0DCVGAJ750Z@tux.localdomain"'<>/foo +F1V5MIHGCU.2ABINKW6WBE8N@tux.localdomain/raw +F1V5LF9D9C.2QT5PGXZQ050E@tux.localdomain/t.atom +F1V58X3CMU.2DCCVAKQZGADV@tux.localdomain/../../../../foo +F1TVKINT3G.2S6I36MXMHYG6@tux.localdomain" onclick="alert(1)" + +my @mids = split(/\n/, $msgs); +my $i = 0; +foreach my $mid (@mids) { + my $data = << ""; +Subject: test +Message-ID: <$mid> +From: a\@example.com +To: b\@example.com +Date: Fri, 02 Oct 1993 00:00:0$i +0000 + + + my $mime = PublicInbox::MIME->new(\$data); + ok($im->add($mime), "added $mid"); + $i++ +} +$im->done; + +my $cfg = { + "$cfgpfx.address" => $ibx->{-primary_address}, + "$cfgpfx.mainrepo" => $mainrepo, +}; +my $config = PublicInbox::Config->new($cfg); +my $www = PublicInbox::WWW->new($config); +test_psgi(sub { $www->call(@_) }, sub { + my ($cb) = @_; + my $res = $cb->(GET('/bad-mids/')); + is($res->code, 200, 'got 200 OK listing'); + my $raw = $res->content; + foreach my $mid (@mids) { + ok(index($raw, $mid) < 0, "escaped $mid"); + } + + my (@xmids) = ($raw =~ m!\bhref="([^"]+)/t\.mbox\.gz"!sg); + is(scalar(@xmids), scalar(@mids), + 'got escaped links to all messages'); + + @xmids = reverse @xmids; + foreach my $i (0..$#xmids) { + $res = $cb->(GET("/bad-mids/$xmids[$i]/raw")); + is($res->code, 200, 'got 200 OK raw message'); + like($res->content, qr/Message-ID: <\Q$mids[$i]\E>/s, + 'retrieved correct message'); + } +}); + +done_testing(); + +1; -- EW ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: Some points on public-inbox 2018-06-12 10:09 ` Eric Wong 2018-06-12 11:31 ` Leah Neukirchen @ 2018-06-12 13:19 ` Leah Neukirchen 2019-01-05 8:39 ` Eric Wong 2018-06-12 17:05 ` Konstantin Ryabitsev 2019-04-18 8:25 ` [RFC] www: support listing of inboxes Eric Wong 3 siblings, 1 reply; 16+ messages in thread From: Leah Neukirchen @ 2018-06-12 13:19 UTC (permalink / raw) To: Eric Wong; +Cc: meta Eric Wong <e@80x24.org> writes: >> 6) I have a user account that uses .forward to call public-inbox-mda, >> and use /etc/aliases to route the lists that are hosted primarily on >> the server to it. What's the best approach to do this for mailing >> lists I only mirror? Subscribe with a "secret" second address to the >> list, and add this second adress to publicinbox.<name>.address? >> Or can public-inbox-mda also scan for List-Id etc and sort by it somehow? > > I prefer to use public-inbox-watch for mirroring existing lists. > > -mda is also a bit strict and opinionated (though I have plans to > make it less so, optionally), so it's mainly for non-mirrored > inboxes. > > -watch is also safer and less likely to lose/bounce mail since > it hits a Maildir, first. -watch will scan for List-Id (or any > other header, such as X-Mailing-List) and put it into the > correct inbox. If space is a problem, a cronjob to remove > old files will help, but maybe it can unlink-on-import-commit > in the future. Ok, that was a bit more fiddly than expected, because now I needed to run two MDA under the same user. Also I noticed public-inbox-watch expects different Maildir for every list, so I had to put a maildrop in front of it to pre-sort by List-Id... couldn't -watch do that itself? -- Leah Neukirchen <leah@vuxu.org> http://leah.zone ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Some points on public-inbox 2018-06-12 13:19 ` Some points on public-inbox Leah Neukirchen @ 2019-01-05 8:39 ` Eric Wong 0 siblings, 0 replies; 16+ messages in thread From: Eric Wong @ 2019-01-05 8:39 UTC (permalink / raw) To: Leah Neukirchen; +Cc: meta Leah Neukirchen <leah@vuxu.org> wrote: > Also I noticed public-inbox-watch expects different Maildir for every > list, so I had to put a maildrop in front of it to pre-sort by > List-Id... couldn't -watch do that itself? Yeah, I'm trying to figure out why it was done that way in the past; since I have Maildirs with multiple lists going to them. But it's (finally) fixed, now. https://public-inbox.org/meta/20190105083649.27331-2-e@80x24.org/ Lightly-tested, but seems to work... Still slowly going through a bunch of old emails and requests... ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Some points on public-inbox 2018-06-12 10:09 ` Eric Wong 2018-06-12 11:31 ` Leah Neukirchen 2018-06-12 13:19 ` Some points on public-inbox Leah Neukirchen @ 2018-06-12 17:05 ` Konstantin Ryabitsev 2018-06-13 1:57 ` Eric Wong 2019-04-18 8:25 ` [RFC] www: support listing of inboxes Eric Wong 3 siblings, 1 reply; 16+ messages in thread From: Konstantin Ryabitsev @ 2018-06-12 17:05 UTC (permalink / raw) To: Eric Wong; +Cc: Leah Neukirchen, meta [-- Attachment #1: Type: text/plain, Size: 1617 bytes --] On Tue, Jun 12, 2018 at 10:09:15AM +0000, Eric Wong wrote: >I prefer to use public-inbox-watch for mirroring existing lists. > >-mda is also a bit strict and opinionated (though I have plans to >make it less so, optionally), so it's mainly for non-mirrored >inboxes. > >-watch is also safer and less likely to lose/bounce mail since >it hits a Maildir, first. -watch will scan for List-Id (or any >other header, such as X-Mailing-List) and put it into the >correct inbox. If space is a problem, a cronjob to remove >old files will help, but maybe it can unlink-on-import-commit >in the future. I opted in favour of -mda over -watch because Maildir performance usually degrades linearly with the number of messages. A month of LKML mail is anywhere from 25,000 to 40,000 messages, and maildirs tend to handle that poorly due to peformance overhead of listing tens of thousands of files in a single folder. Obviously, I can set up an archival job, but then I'd have to worry about messages that weren't actually imported into the archive (because they didn't pass spam tests, but are actually ham, for example). The -mda script gives me this for free, with such messages being put into the emergency folder for later review. >I haven't thought much about mirroring with -mda, but I suppose >having a per-list subscriber address and extra >publicinbox.<name>.address entry works, too. It works, but cloning details at the bottom of the page expose both addresses: public-inbox-init -V2 lkml lkml/ https://[not-live-yet].kernel.org/lkml \ linux-kernel@[not-live-yet].kernel.org linux-kernel@vger.kernel.org -K [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Some points on public-inbox 2018-06-12 17:05 ` Konstantin Ryabitsev @ 2018-06-13 1:57 ` Eric Wong 0 siblings, 0 replies; 16+ messages in thread From: Eric Wong @ 2018-06-13 1:57 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: Leah Neukirchen, meta Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote: > On Tue, Jun 12, 2018 at 10:09:15AM +0000, Eric Wong wrote: > > I prefer to use public-inbox-watch for mirroring existing lists. > > I opted in favour of -mda over -watch because Maildir performance > usually degrades linearly with the number of messages. A month of LKML > mail is anywhere from 25,000 to 40,000 messages, and maildirs tend to > handle that poorly due to peformance overhead of listing tens of > thousands of files in a single folder. Right; but with inotify, getdents/readdir overhead is not a problem outside of initial startup (or rescanning via SIGUSR1 after config changes). > Obviously, I can set up an archival job, but then I'd have to worry > about messages that weren't actually imported into the archive (because > they didn't pass spam tests, but are actually ham, for example). The > -mda script gives me this for free, with such messages being put into > the emergency folder for later review. Interesting take on it, thanks for sharing. I prefer to keep the Maildir messages around for a bit and do my own reading off that, for now[1]. I occasionally review syslog for spam notices from -watch, but probably not enough :x > > I haven't thought much about mirroring with -mda, but I suppose > > having a per-list subscriber address and extra > > publicinbox.<name>.address entry works, too. > > It works, but cloning details at the bottom of the page expose both > addresses: > > public-inbox-init -V2 lkml lkml/ https://[not-live-yet].kernel.org/lkml \ > linux-kernel@[not-live-yet].kernel.org linux-kernel@vger.kernel.org Hmm, I intended the multi-address support to work as a way to have inboxes hosted simultaneously on multiple domains, either temporarily as a migration strategy or permanently for redundancy. So maybe there should be a way to specify an email address as "hidden" for that, but still let -mda use it for routing. Any thoughts on how to do it? I'm thinking something like replacing '@' with '!' in the .public-inbox/config file. [1] I've thought about a Mairix/notmuch-like tool which extracts messages from public-inboxes, so I won't need a redundant copy in the Maildir. ^ permalink raw reply [flat|nested] 16+ messages in thread
* [RFC] www: support listing of inboxes 2018-06-12 10:09 ` Eric Wong ` (2 preceding siblings ...) 2018-06-12 17:05 ` Konstantin Ryabitsev @ 2019-04-18 8:25 ` Eric Wong 2019-05-05 23:36 ` Eric Wong 3 siblings, 1 reply; 16+ messages in thread From: Eric Wong @ 2019-04-18 8:25 UTC (permalink / raw) To: Leah Neukirchen; +Cc: meta Eric Wong <e@80x24.org> wrote: > Leah Neukirchen <leah@vuxu.org> wrote: > > 5) Is there a way for the HTML view to list all served lists? > > Not currently... I'm not sure how the UI or configuration > should be or how to avoid clutter/scalability problems with many > inboxes. NNTP has standardized commands and clients can decide > how to show them, at least. Better late than never :x I don't like adding more options, but having thought about this for a while... the below seems like the best choice here taking backwards compatibility, clutter, and virtual-hosting into account. --------8<--------- Subject: [PATCH] www: support listing of inboxes We will still return a 404 by default to '/' for compatibility with users of Plack::App::Cascade or similar. Inboxes are sorted by modification times to help users detect activity (similar to the /$INBOX/ topic view). New configuration options: * publicinbox.wwwlisting - configure the listing type * publicinbox.<name>.hide - hide a particular inbox from the listing See changes to public-inbox-config.pod for full descriptions of the new options. Requested-by: Leah Neukirchen <leah@vuxu.org> https://public-inbox.org/meta/871sdfzy80.fsf@gmail.com/ --- Documentation/public-inbox-config.pod | 37 +++++++++ MANIFEST | 1 + lib/PublicInbox/Config.pm | 2 +- lib/PublicInbox/Inbox.pm | 9 +++ lib/PublicInbox/WWW.pm | 14 +++- lib/PublicInbox/WwwListing.pm | 104 ++++++++++++++++++++++++++ lib/PublicInbox/WwwStream.pm | 10 ++- 7 files changed, 171 insertions(+), 6 deletions(-) create mode 100644 lib/PublicInbox/WwwListing.pm diff --git a/Documentation/public-inbox-config.pod b/Documentation/public-inbox-config.pod index e7dcd8a..1477027 100644 --- a/Documentation/public-inbox-config.pod +++ b/Documentation/public-inbox-config.pod @@ -188,6 +188,14 @@ be treated as the default value. Default: 25 +=item publicinbox.<name>.hide + +A comma-delimited list of listings to hide the inbox from. + +Valid values are currently "www". + +Default: none + =item coderepo.<nick>.dir The path to a git repository for "publicinbox.<name>.coderepo" @@ -217,6 +225,35 @@ directive is configured. Default: /var/www/htdocs/cgit/cgit.gti or /usr/lib/cgit/cgit.cgi +=item publicinbox.wwwlisting + +Enable a HTML listing style when the root path of the URL '/' is accessed. +Valid values are: + +=over 8 + +=item all + +Show all inboxes + +=item 404 + +Return a 404 page. This is useful to allow customization with +L<Plack::App::Cascade(3pm)> + +=item match=domain + +Only show inboxes with URLs which belong to the domain of the HTTP +request + +=for TODO comment + +support showing cgit listing + +=back + +Default: 404 + =back =head2 NAMED LIMITER (PSGI) diff --git a/MANIFEST b/MANIFEST index 150e337..881d2f0 100644 --- a/MANIFEST +++ b/MANIFEST @@ -127,6 +127,7 @@ lib/PublicInbox/WatchMaildir.pm lib/PublicInbox/WwwAtomStream.pm lib/PublicInbox/WwwAttach.pm lib/PublicInbox/WwwHighlight.pm +lib/PublicInbox/WwwListing.pm lib/PublicInbox/WwwStream.pm lib/PublicInbox/WwwText.pm sa_config/Makefile diff --git a/lib/PublicInbox/Config.pm b/lib/PublicInbox/Config.pm index 631c788..09f9179 100644 --- a/lib/PublicInbox/Config.pm +++ b/lib/PublicInbox/Config.pm @@ -389,7 +389,7 @@ sub _fill { } # TODO: more arrays, we should support multi-value for # more things to encourage decentralization - foreach my $k (qw(address altid nntpmirror coderepo)) { + foreach my $k (qw(address altid nntpmirror coderepo hide)) { if (defined(my $v = $self->{"$pfx.$k"})) { $ibx->{$k} = _array($v); } diff --git a/lib/PublicInbox/Inbox.pm b/lib/PublicInbox/Inbox.pm index 0d28dd0..286555f 100644 --- a/lib/PublicInbox/Inbox.pm +++ b/lib/PublicInbox/Inbox.pm @@ -95,6 +95,15 @@ sub new { if (defined $dir && -f "$dir/inbox.lock") { $opts->{version} = 2; } + + # allow any combination of multi-line or comma-delimited hide entries + my $hide = {}; + if (defined(my $h = $opts->{hide})) { + foreach my $v (@$h) { + $hide->{$_} = 1 foreach (split(/\s*,\s*/, $v)); + } + $opts->{-hide} = $hide; + } bless $opts, $class; } diff --git a/lib/PublicInbox/WWW.pm b/lib/PublicInbox/WWW.pm index 6e69001..793466e 100644 --- a/lib/PublicInbox/WWW.pm +++ b/lib/PublicInbox/WWW.pm @@ -68,8 +68,9 @@ sub call { } split(/[&;]+/, $env->{QUERY_STRING}); $ctx->{qp} = \%qp; - # not using $env->{PATH_INFO} here since that's already decoded + # avoiding $env->{PATH_INFO} here since that's already decoded my ($path_info) = ($env->{REQUEST_URI} =~ path_re($env)); + $path_info //= $env->{PATH_INFO}; my $method = $env->{REQUEST_METHOD}; if ($method eq 'POST') { @@ -87,7 +88,7 @@ sub call { # top-level indices and feeds if ($path_info eq '/') { - r404(); + www_listing($self)->call($env); } elsif ($path_info =~ m!$INBOX_RE\z!o) { invalid_inbox($ctx, $1) || r301($ctx, $1); } elsif ($path_info =~ m!$INBOX_RE(?:/|/index\.html)?\z!o) { @@ -157,6 +158,7 @@ sub preload { if (ref($self)) { $self->cgit; $self->stylesheets_prepare($_) for ('', '../', '../../'); + $self->www_listing; } } @@ -489,6 +491,14 @@ sub cgit { } } +sub www_listing { + my ($self) = @_; + $self->{www_listing} ||= do { + require PublicInbox::WwwListing; + PublicInbox::WwwListing->new($self); + } +} + sub get_attach { my ($ctx, $idx, $fn) = @_; require PublicInbox::WwwAttach; diff --git a/lib/PublicInbox/WwwListing.pm b/lib/PublicInbox/WwwListing.pm new file mode 100644 index 0000000..e8dad4b --- /dev/null +++ b/lib/PublicInbox/WwwListing.pm @@ -0,0 +1,104 @@ +# Copyright (C) 2019 all contributors <meta@public-inbox.org> +# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> + +# Provide an HTTP-accessible listing of inboxes. +# Used by PublicInbox::WWW +package PublicInbox::WwwListing; +use strict; +use warnings; +use PublicInbox::Hval qw(ascii_html); +use PublicInbox::Linkify; +use PublicInbox::View; + +sub list_all ($$) { + my ($self, undef) = @_; + my @list; + $self->{pi_config}->each_inbox(sub { + my ($ibx) = @_; + push @list, $ibx unless $ibx->{-hide}->{www}; + }); + \@list; +} + +sub list_match_domain ($$) { + my ($self, $env) = @_; + my @list; + my $host = $env->{HTTP_HOST} // $env->{SERVER_NAME}; + $host =~ s/:\d+\z//; + my $re = qr!\A(?:https?:)?//\Q$host\E(?::\d+)?/!i; + $self->{pi_config}->each_inbox(sub { + my ($ibx) = @_; + push @list, $ibx if !$ibx->{-hide}->{www} && $ibx->{url} =~ $re; + }); + \@list; +} + +sub list_404 ($$) { [] } + +# TODO: +cgit +my %VALID = ( + all => *list_all, + 'match=domain' => *list_match_domain, + 404 => *list_404, +); + +sub new { + my ($class, $www) = @_; + my $k = 'publicinbox.wwwListing'; + my $pi_config = $www->{pi_config}; + my $v = $pi_config->{lc($k)} // 404; + bless { + pi_config => $pi_config, + style => $www->style("\0"), + list_cb => $VALID{$v} || do { + warn <<""; +`$v' is not a valid value for `$k' +$k be one of `all', `match=domain', or `404' + + *list_404; + }, + }, $class; +} + +sub ibx_entry { + my ($mtime, $ibx, $env) = @_; + my $ts = PublicInbox::View::fmt_ts($mtime); + my $url = PublicInbox::Hval::prurl($env, $ibx->{url}); + my $tmp = <<""; +* $ts - $url + ${\$ibx->description} + + if (defined(my $info_url = $ibx->{info_url})) { + $tmp .= "\n$info_url"; + } + $tmp; +} + +# not really a stand-alone PSGI app, but maybe it could be... +sub call { + my ($self, $env) = @_; + my $h = [ 'Content-Type', 'text/html; charset=UTF-8' ]; + my $list = $self->{list_cb}->($self, $env); + my $code = 404; + my $title = 'public-inbox'; + my $out = ''; + if (@$list) { + # Swartzian transform since ->modified is expensive + @$list = sort { + $b->[0] <=> $a->[0] + } map { [ $_->modified, $_ ] } @$list; + + $code = 200; + $title .= ' - listing'; + my $tmp = join("\n", map { ibx_entry(@$_, $env) } @$list); + my $l = PublicInbox::Linkify->new; + $l->linkify_1($tmp); + $out = '<pre>'.$l->linkify_2(ascii_html($tmp)).'</pre><hr>'; + } + $out = "<html><head><title>$title</title></head><body>" . $out; + $out .= '<pre>'. PublicInbox::WwwStream::code_footer($env) . + '</pre></body></html>'; + [ $code, $h, [ $out ] ] +} + +1; diff --git a/lib/PublicInbox/WwwStream.pm b/lib/PublicInbox/WwwStream.pm index 8b79923..c708c21 100644 --- a/lib/PublicInbox/WwwStream.pm +++ b/lib/PublicInbox/WwwStream.pm @@ -71,6 +71,12 @@ sub _html_top ($) { "</head><body>". $top . $tip; } +sub code_footer ($) { + my ($env) = @_; + my $u = PublicInbox::Hval::prurl($env, $CODE_URL); + qq(AGPL code for this site: git clone <a\nhref="$u">$u</a> $PROJECT) +} + sub _html_end { my ($self) = @_; my $urls = 'Archives are clonable:'; @@ -139,12 +145,10 @@ EOF $urls .= qq[<a\nhref="$TOR2WEB_URL">$TOR2WEB_URL</a>]; } } - my $url = PublicInbox::Hval::prurl($ctx->{env}, $CODE_URL); '<hr><pre>'.join("\n\n", $desc, $urls, - 'AGPL code for this site: '. - qq(git clone <a\nhref="$url">$url</a> $PROJECT) + code_footer($ctx->{env}) ).'</pre></body></html>'; } -- EW ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [RFC] www: support listing of inboxes 2019-04-18 8:25 ` [RFC] www: support listing of inboxes Eric Wong @ 2019-05-05 23:36 ` Eric Wong 0 siblings, 0 replies; 16+ messages in thread From: Eric Wong @ 2019-05-05 23:36 UTC (permalink / raw) To: Leah Neukirchen; +Cc: meta Merged as e1ed79d1e99e2c58b3edb370e60904cf656dd823 ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2019-05-05 23:36 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-06-09 17:06 Some points on public-inbox Leah Neukirchen 2018-06-12 10:09 ` Eric Wong 2018-06-12 11:31 ` Leah Neukirchen 2018-06-13 2:07 ` [PATCH] Makefile.PL: do not depend on git Eric Wong 2018-06-13 14:26 ` Leah Neukirchen 2018-06-13 21:04 ` Eric Wong 2018-06-13 21:20 ` Leah Neukirchen 2018-06-13 21:40 ` Some points on public-inbox Eric Wong 2018-06-13 22:43 ` [PATCH] www: use undecoded paths for Message-ID extraction Eric Wong 2018-06-26 7:46 ` [PATCH] additional tests for bad Message-IDs in URLs Eric Wong 2018-06-12 13:19 ` Some points on public-inbox Leah Neukirchen 2019-01-05 8:39 ` Eric Wong 2018-06-12 17:05 ` Konstantin Ryabitsev 2018-06-13 1:57 ` Eric Wong 2019-04-18 8:25 ` [RFC] www: support listing of inboxes Eric Wong 2019-05-05 23:36 ` Eric Wong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).