unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* [PATCH 0/6] wwwstatic: support directory listings
@ 2020-01-01 10:38 Eric Wong
  2020-01-01 10:38 ` [PATCH 1/6] wwwstatic: implement Last-Modified and If-Modified-Since Eric Wong
                   ` (5 more replies)
  0 siblings, 6 replies; 8+ messages in thread
From: Eric Wong @ 2020-01-01 10:38 UTC (permalink / raw)
  To: meta

Now it'll be possible to replicate the timeless web design
of https://public-inbox.org/ with our own PSGI code!

I imagine per-inbox docroots might be useful for serving git
bundles, tarball releases, bundles, and maybe altid snapshots,
too.

Eric Wong (6):
  wwwstatic: implement Last-Modified and If-Modified-Since
  www: move more logic into path_info_raw
  wwwstatic: move r(...) functions here
  wwwstatic: do not open() files for HEAD requests
  wwwstatic: avoid TOCTTOU for FIFO check
  wwwstatic: add directory listing + index.html support

 MANIFEST                          |   1 +
 lib/PublicInbox/Cgit.pm           |   9 +-
 lib/PublicInbox/GitHTTPBackend.pm |  19 +--
 lib/PublicInbox/WWW.pm            |  23 +--
 lib/PublicInbox/WwwHighlight.pm   |   9 +-
 lib/PublicInbox/WwwStatic.pm      | 256 ++++++++++++++++++++++++++++--
 t/www_static.t                    |  96 +++++++++++
 xt/git-http-backend.t             |  20 +++
 8 files changed, 368 insertions(+), 65 deletions(-)
 create mode 100644 t/www_static.t

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/6] wwwstatic: implement Last-Modified and If-Modified-Since
  2020-01-01 10:38 [PATCH 0/6] wwwstatic: support directory listings Eric Wong
@ 2020-01-01 10:38 ` Eric Wong
  2020-01-01 19:07   ` Eric Wong
  2020-01-01 10:38 ` [PATCH 2/6] www: move more logic into path_info_raw Eric Wong
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 8+ messages in thread
From: Eric Wong @ 2020-01-01 10:38 UTC (permalink / raw)
  To: meta

We're already static files for cgit, and will serve more
static files, soon.
---
 lib/PublicInbox/WwwStatic.pm | 10 ++++++++--
 xt/git-http-backend.t        | 20 ++++++++++++++++++++
 2 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/WwwStatic.pm b/lib/PublicInbox/WwwStatic.pm
index 58db58b4..b8efcf62 100644
--- a/lib/PublicInbox/WwwStatic.pm
+++ b/lib/PublicInbox/WwwStatic.pm
@@ -4,6 +4,7 @@
 package PublicInbox::WwwStatic;
 use strict;
 use Fcntl qw(:seek);
+use HTTP::Date qw(time2str);
 
 sub prepare_range {
 	my ($env, $in, $h, $beg, $end, $size) = @_;
@@ -50,9 +51,14 @@ sub response {
 	my ($env, $h, $path, $type) = @_;
 	return unless -f $path && -r _; # just in case it's a FIFO :P
 
-	# TODO: If-Modified-Since and Last-Modified?
 	open my $in, '<', $path or return;
 	my $size = -s $in;
+	my $mtime = time2str((stat(_))[9]);
+
+	if (my $ims = $env->{HTTP_IF_MODIFIED_SINCE}) {
+		return [ 304, [], [] ] if $mtime eq $ims;
+	}
+
 	my $len = $size;
 	my $code = 200;
 	push @$h, 'Content-Type', $type;
@@ -63,7 +69,7 @@ sub response {
 			return [ 416, $h, [] ];
 		}
 	}
-	push @$h, 'Content-Length', $len;
+	push @$h, 'Content-Length', $len, 'Last-Modified', $mtime;
 	my $body = bless {
 		initial_rd => 65536,
 		len => $len,
diff --git a/xt/git-http-backend.t b/xt/git-http-backend.t
index 421c6316..7f34d452 100644
--- a/xt/git-http-backend.t
+++ b/xt/git-http-backend.t
@@ -8,6 +8,7 @@ use warnings;
 use Test::More;
 use POSIX qw(setsid);
 use PublicInbox::TestCommon;
+use PublicInbox::Spawn qw(which);
 
 my $git_dir = $ENV{GIANT_GIT_DIR};
 plan 'skip_all' => 'GIANT_GIT_DIR not defined' unless $git_dir;
@@ -74,6 +75,25 @@ SKIP: {
 	}
 }
 
+SKIP: { # make sure Last-Modified + If-Modified-Since works with curl
+	my $nr = 6;
+	skip 'no description', $nr unless -f "$git_dir/description";
+	my $mtime = (stat(_))[9];
+	my $curl = which('curl');
+	skip 'curl(1) not found', $nr unless $curl;
+	my $url = "http://$host:$port/description";
+	my $dst = "$tmpdir/desc";
+	is(system($curl, qw(-RsSf), '-o', $dst, $url), 0, 'curl -R');
+	is((stat($dst))[9], $mtime, 'curl used remote mtime');
+	is(system($curl, qw(-sSf), '-z', $dst, '-o', "$dst.2", $url), 0,
+		'curl -z noop');
+	ok(!-e "$dst.2", 'no modification, nothing retrieved');
+	utime(0, 0, $dst) or die "utime failed: $!";
+	is(system($curl, qw(-sSfR), '-z', $dst, '-o', "$dst.2", $url), 0,
+		'curl -z updates');
+	ok(-e "$dst.2", 'faked modification, got new file retrieved');
+}
+
 {
 	my $c = fork;
 	if ($c == 0) {

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/6] www: move more logic into path_info_raw
  2020-01-01 10:38 [PATCH 0/6] wwwstatic: support directory listings Eric Wong
  2020-01-01 10:38 ` [PATCH 1/6] wwwstatic: implement Last-Modified and If-Modified-Since Eric Wong
@ 2020-01-01 10:38 ` Eric Wong
  2020-01-01 10:38 ` [PATCH 3/6] wwwstatic: move r(...) functions here Eric Wong
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Eric Wong @ 2020-01-01 10:38 UTC (permalink / raw)
  To: meta

It'll be easier to reuse in future code.
---
 lib/PublicInbox/WWW.pm | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/lib/PublicInbox/WWW.pm b/lib/PublicInbox/WWW.pm
index 251979d5..13b66ee6 100644
--- a/lib/PublicInbox/WWW.pm
+++ b/lib/PublicInbox/WWW.pm
@@ -42,15 +42,17 @@ sub run {
 	PublicInbox::WWW->new->call($req->env);
 }
 
+# PATH_INFO is decoded, and we want the undecoded original
 my %path_re_cache;
-
-sub path_re ($) {
-	my $sn = $_[0]->{SCRIPT_NAME};
-	$path_re_cache{$sn} ||= do {
+sub path_info_raw ($) {
+	my ($env) = @_;
+	my $sn = $env->{SCRIPT_NAME};
+	my $re = $path_re_cache{$sn} ||= do {
 		$sn = '/'.$sn unless index($sn, '/') == 0;
 		$sn =~ s!/\z!!;
 		qr!\A(?:https?://[^/]+)?\Q$sn\E(/[^\?\#]+)!;
 	};
+	$env->{REQUEST_URI} =~ $re ? $1 : $env->{PATH_INFO};
 }
 
 sub call {
@@ -67,9 +69,7 @@ sub call {
 		$k => $v;
 	} split(/[&;]+/, $env->{QUERY_STRING});
 
-	# avoiding $env->{PATH_INFO} here since that's already decoded
-	my ($path_info) = ($env->{REQUEST_URI} =~ path_re($env));
-	$path_info //= $env->{PATH_INFO};
+	my $path_info = path_info_raw($env);
 	my $method = $env->{REQUEST_METHOD};
 
 	if ($method eq 'POST') {

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/6] wwwstatic: move r(...) functions here
  2020-01-01 10:38 [PATCH 0/6] wwwstatic: support directory listings Eric Wong
  2020-01-01 10:38 ` [PATCH 1/6] wwwstatic: implement Last-Modified and If-Modified-Since Eric Wong
  2020-01-01 10:38 ` [PATCH 2/6] www: move more logic into path_info_raw Eric Wong
@ 2020-01-01 10:38 ` Eric Wong
  2020-01-01 10:38 ` [PATCH 4/6] wwwstatic: do not open() files for HEAD requests Eric Wong
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Eric Wong @ 2020-01-01 10:38 UTC (permalink / raw)
  To: meta

Remove redundant "r" functions for generating short error
responses.  These responses will no longer be cached by clients,
which is probably a good thing since most errors ought to be
transient, anyways.  This also fixes error responses for our
cgit wrapper when static files are missing.
---
 lib/PublicInbox/Cgit.pm           |  3 +--
 lib/PublicInbox/GitHTTPBackend.pm | 19 +++----------------
 lib/PublicInbox/WWW.pm            |  8 +++-----
 lib/PublicInbox/WwwHighlight.pm   |  9 +--------
 lib/PublicInbox/WwwStatic.pm      | 28 ++++++++++++++++++++++------
 5 files changed, 30 insertions(+), 37 deletions(-)

diff --git a/lib/PublicInbox/Cgit.pm b/lib/PublicInbox/Cgit.pm
index 36239438..c0b1a73b 100644
--- a/lib/PublicInbox/Cgit.pm
+++ b/lib/PublicInbox/Cgit.pm
@@ -10,13 +10,12 @@ use strict;
 use PublicInbox::GitHTTPBackend;
 use PublicInbox::Git;
 # not bothering with Exporter for a one-off
-*r = *PublicInbox::GitHTTPBackend::r;
 *input_prepare = *PublicInbox::GitHTTPBackend::input_prepare;
 *parse_cgi_headers = *PublicInbox::GitHTTPBackend::parse_cgi_headers;
 *serve = *PublicInbox::GitHTTPBackend::serve;
 use warnings;
 use PublicInbox::Qspawn;
-use PublicInbox::WwwStatic;
+use PublicInbox::WwwStatic qw(r);
 use Plack::MIME;
 
 sub locate_cgit ($) {
diff --git a/lib/PublicInbox/GitHTTPBackend.pm b/lib/PublicInbox/GitHTTPBackend.pm
index 8883ec34..d1132fb7 100644
--- a/lib/PublicInbox/GitHTTPBackend.pm
+++ b/lib/PublicInbox/GitHTTPBackend.pm
@@ -9,10 +9,9 @@ use warnings;
 use Fcntl qw(:seek);
 use IO::Handle;
 use HTTP::Date qw(time2str);
-use HTTP::Status qw(status_message);
 use PublicInbox::Qspawn;
 use PublicInbox::Tmpfile;
-use PublicInbox::WwwStatic;
+use PublicInbox::WwwStatic qw(r @NO_CACHE);
 
 # 32 is same as the git-daemon connection limit
 my $default_limiter = PublicInbox::Qspawn::Limiter->new(32);
@@ -32,18 +31,6 @@ our $ANY = join('|', @binary, @text, 'git-upload-pack');
 my $BIN = join('|', @binary);
 my $TEXT = join('|', @text);
 
-my @no_cache = ('Expires', 'Fri, 01 Jan 1980 00:00:00 GMT',
-		'Pragma', 'no-cache',
-		'Cache-Control', 'no-cache, max-age=0, must-revalidate');
-
-sub r ($;$) {
-	my ($code, $msg) = @_;
-	$msg ||= status_message($code);
-	my $len = length($msg);
-	[ $code, [qw(Content-Type text/plain Content-Length), $len, @no_cache],
-		[$msg] ]
-}
-
 sub serve {
 	my ($env, $git, $path) = @_;
 
@@ -88,12 +75,12 @@ sub serve_dumb {
 		cache_one_year($h);
 	} elsif ($path =~ /\A(?:$TEXT)\z/o) {
 		$type = 'text/plain';
-		push @$h, @no_cache;
+		push @$h, @NO_CACHE;
 	} else {
 		return r(404);
 	}
 	$path = "$git->{git_dir}/$path";
-	PublicInbox::WwwStatic::response($env, $h, $path, $type) // r(404);
+	PublicInbox::WwwStatic::response($env, $h, $path, $type);
 }
 
 sub git_parse_hdr { # {parse_hdr} for Qspawn
diff --git a/lib/PublicInbox/WWW.pm b/lib/PublicInbox/WWW.pm
index 13b66ee6..99f9f1dc 100644
--- a/lib/PublicInbox/WWW.pm
+++ b/lib/PublicInbox/WWW.pm
@@ -22,6 +22,7 @@ use PublicInbox::MID qw(mid_escape);
 require PublicInbox::Git;
 use PublicInbox::GitHTTPBackend;
 use PublicInbox::UserContent;
+use PublicInbox::WwwStatic qw(r);
 
 # TODO: consider a routing tree now that we have more endpoints:
 our $INBOX_RE = qr!\A/([\w\-][\w\.\-]*)!;
@@ -83,7 +84,7 @@ sub call {
 		}
 	}
 	elsif ($method !~ /\AGET|HEAD\z/) {
-		return r(405, 'Method Not Allowed');
+		return r(405);
 	}
 
 	# top-level indices and feeds
@@ -176,12 +177,9 @@ sub r404 {
 		require PublicInbox::ExtMsg;
 		return PublicInbox::ExtMsg::ext_msg($ctx);
 	}
-	r(404, 'Not Found');
+	r(404);
 }
 
-# simple response for errors
-sub r { [ $_[0], ['Content-Type' => 'text/plain'], [ join(' ', @_, "\n") ] ] }
-
 sub news_cgit_fallback ($) {
 	my ($ctx) = @_;
 	my $www = $ctx->{www};
diff --git a/lib/PublicInbox/WwwHighlight.pm b/lib/PublicInbox/WwwHighlight.pm
index bc349f8a..6312edae 100644
--- a/lib/PublicInbox/WwwHighlight.pm
+++ b/lib/PublicInbox/WwwHighlight.pm
@@ -22,22 +22,15 @@ package PublicInbox::WwwHighlight;
 use strict;
 use warnings;
 use bytes (); # only for bytes::length
-use HTTP::Status qw(status_message);
 use parent qw(PublicInbox::HlMod);
 use PublicInbox::Linkify qw();
 use PublicInbox::Hval qw(ascii_html);
+use PublicInbox::WwwStatic qw(r);
 
 # TODO: support highlight(1) for distros which don't package the
 # SWIG extension.  Also, there may be admins who don't want to
 # have ugly SWIG-generated code in a long-lived Perl process.
 
-sub r ($) {
-	my ($code) = @_;
-	my $msg = status_message($code);
-	my $len = length($msg);
-	[ $code, [qw(Content-Type text/plain Content-Length), $len], [$msg] ]
-}
-
 # another slurp API hogging up all my memory :<
 # This is capped by whatever the PSGI server allows,
 # $ENV{GIT_HTTP_MAX_REQUEST_BUFFER} for PublicInbox::HTTP (10 MB)
diff --git a/lib/PublicInbox/WwwStatic.pm b/lib/PublicInbox/WwwStatic.pm
index b8efcf62..c605e64f 100644
--- a/lib/PublicInbox/WwwStatic.pm
+++ b/lib/PublicInbox/WwwStatic.pm
@@ -3,8 +3,23 @@
 
 package PublicInbox::WwwStatic;
 use strict;
+use parent qw(Exporter);
 use Fcntl qw(:seek);
 use HTTP::Date qw(time2str);
+use HTTP::Status qw(status_message);
+our @EXPORT_OK = qw(@NO_CACHE r);
+
+our @NO_CACHE = ('Expires', 'Fri, 01 Jan 1980 00:00:00 GMT',
+		'Pragma', 'no-cache',
+		'Cache-Control', 'no-cache, max-age=0, must-revalidate');
+
+sub r ($;$) {
+	my ($code, $msg) = @_;
+	$msg ||= status_message($code);
+	[ $code, [ qw(Content-Type text/plain), 'Content-Length', length($msg),
+		@NO_CACHE ],
+	  [ $msg ] ]
+}
 
 sub prepare_range {
 	my ($env, $in, $h, $beg, $end, $size) = @_;
@@ -36,7 +51,7 @@ sub prepare_range {
 		if ($len <= 0) {
 			$code = 416;
 		} else {
-			sysseek($in, $beg, SEEK_SET) or return [ 500, [], [] ];
+			sysseek($in, $beg, SEEK_SET) or return r(500);
 			push @$h, qw(Accept-Ranges bytes Content-Range);
 			push @$h, "bytes $beg-$end/$size";
 
@@ -44,12 +59,16 @@ sub prepare_range {
 			$env->{'psgix.no-compress'} = 1;
 		}
 	}
+	if ($code == 416) {
+		push @$h, 'Content-Range', "bytes */$size";
+		return [ 416, $h, [] ];
+	}
 	($code, $len);
 }
 
 sub response {
 	my ($env, $h, $path, $type) = @_;
-	return unless -f $path && -r _; # just in case it's a FIFO :P
+	return r(404) unless -f $path && -r _; # just in case it's a FIFO :P
 
 	open my $in, '<', $path or return;
 	my $size = -s $in;
@@ -64,10 +83,7 @@ sub response {
 	push @$h, 'Content-Type', $type;
 	if (($env->{HTTP_RANGE} || '') =~ /\bbytes=([0-9]*)-([0-9]*)\z/) {
 		($code, $len) = prepare_range($env, $in, $h, $1, $2, $size);
-		if ($code == 416) {
-			push @$h, 'Content-Range', "bytes */$size";
-			return [ 416, $h, [] ];
-		}
+		return $code if ref($code);
 	}
 	push @$h, 'Content-Length', $len, 'Last-Modified', $mtime;
 	my $body = bless {

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4/6] wwwstatic: do not open() files for HEAD requests
  2020-01-01 10:38 [PATCH 0/6] wwwstatic: support directory listings Eric Wong
                   ` (2 preceding siblings ...)
  2020-01-01 10:38 ` [PATCH 3/6] wwwstatic: move r(...) functions here Eric Wong
@ 2020-01-01 10:38 ` Eric Wong
  2020-01-01 10:38 ` [PATCH 5/6] wwwstatic: avoid TOCTTOU for FIFO check Eric Wong
  2020-01-01 10:38 ` [PATCH 6/6] wwwstatic: add directory listing + index.html support Eric Wong
  5 siblings, 0 replies; 8+ messages in thread
From: Eric Wong @ 2020-01-01 10:38 UTC (permalink / raw)
  To: meta

open() is a much more expensive syscall than stat(),
so avoid it
---
 lib/PublicInbox/WwwStatic.pm | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/lib/PublicInbox/WwwStatic.pm b/lib/PublicInbox/WwwStatic.pm
index c605e64f..093a7920 100644
--- a/lib/PublicInbox/WwwStatic.pm
+++ b/lib/PublicInbox/WwwStatic.pm
@@ -51,7 +51,9 @@ sub prepare_range {
 		if ($len <= 0) {
 			$code = 416;
 		} else {
-			sysseek($in, $beg, SEEK_SET) or return r(500);
+			if ($in) {
+				sysseek($in, $beg, SEEK_SET) or return r(500);
+			}
 			push @$h, qw(Accept-Ranges bytes Content-Range);
 			push @$h, "bytes $beg-$end/$size";
 
@@ -70,8 +72,13 @@ sub response {
 	my ($env, $h, $path, $type) = @_;
 	return r(404) unless -f $path && -r _; # just in case it's a FIFO :P
 
-	open my $in, '<', $path or return;
-	my $size = -s $in;
+	my ($size, $in);
+	if ($env->{REQUEST_METHOD} eq 'HEAD') {
+		$size = -s _;
+	} else { # GET, callers should've already filtered out other methods
+		open $in, '<', $path or return r(403);
+		$size = -s $in;
+	}
 	my $mtime = time2str((stat(_))[9]);
 
 	if (my $ims = $env->{HTTP_IF_MODIFIED_SINCE}) {
@@ -86,13 +93,13 @@ sub response {
 		return $code if ref($code);
 	}
 	push @$h, 'Content-Length', $len, 'Last-Modified', $mtime;
-	my $body = bless {
+	my $body = $in ? bless {
 		initial_rd => 65536,
 		len => $len,
 		in => $in,
 		path => $path,
 		env => $env,
-	}, __PACKAGE__;
+	}, __PACKAGE__ : [];
 	[ $code, $h, $body ];
 }
 

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 5/6] wwwstatic: avoid TOCTTOU for FIFO check
  2020-01-01 10:38 [PATCH 0/6] wwwstatic: support directory listings Eric Wong
                   ` (3 preceding siblings ...)
  2020-01-01 10:38 ` [PATCH 4/6] wwwstatic: do not open() files for HEAD requests Eric Wong
@ 2020-01-01 10:38 ` Eric Wong
  2020-01-01 10:38 ` [PATCH 6/6] wwwstatic: add directory listing + index.html support Eric Wong
  5 siblings, 0 replies; 8+ messages in thread
From: Eric Wong @ 2020-01-01 10:38 UTC (permalink / raw)
  To: meta

We can use Perl's sysopen function to pass O_NONBLOCK to open(2)
and avoid blocking on FIFOs.  This avoids a TOCTTOU race where
somebody can change a regular to FIFO in between the stat(2) and
open(2) syscalls.
---
 lib/PublicInbox/WwwStatic.pm | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/lib/PublicInbox/WwwStatic.pm b/lib/PublicInbox/WwwStatic.pm
index 093a7920..ce4bfe9b 100644
--- a/lib/PublicInbox/WwwStatic.pm
+++ b/lib/PublicInbox/WwwStatic.pm
@@ -4,9 +4,10 @@
 package PublicInbox::WwwStatic;
 use strict;
 use parent qw(Exporter);
-use Fcntl qw(:seek);
+use Fcntl qw(SEEK_SET O_RDONLY O_NONBLOCK);
 use HTTP::Date qw(time2str);
 use HTTP::Status qw(status_message);
+use Errno qw(EACCES ENOTDIR ENOENT);
 our @EXPORT_OK = qw(@NO_CACHE r);
 
 our @NO_CACHE = ('Expires', 'Fri, 01 Jan 1980 00:00:00 GMT',
@@ -70,15 +71,19 @@ sub prepare_range {
 
 sub response {
 	my ($env, $h, $path, $type) = @_;
-	return r(404) unless -f $path && -r _; # just in case it's a FIFO :P
 
-	my ($size, $in);
+	my $in;
 	if ($env->{REQUEST_METHOD} eq 'HEAD') {
-		$size = -s _;
+		return r(404) unless -f $path && -r _; # in case it's a FIFO :P
 	} else { # GET, callers should've already filtered out other methods
-		open $in, '<', $path or return r(403);
-		$size = -s $in;
+		if (!sysopen($in, $path, O_RDONLY|O_NONBLOCK)) {
+			return r(404) if $! == ENOENT || $! == ENOTDIR;
+			return r(403) if $! == EACCES;
+			return r(500);
+		}
+		return r(404) unless -f $in;
 	}
+	my $size = -s _; # bare "_" reuses "struct stat" from "-f" above
 	my $mtime = time2str((stat(_))[9]);
 
 	if (my $ims = $env->{HTTP_IF_MODIFIED_SINCE}) {

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 6/6] wwwstatic: add directory listing + index.html support
  2020-01-01 10:38 [PATCH 0/6] wwwstatic: support directory listings Eric Wong
                   ` (4 preceding siblings ...)
  2020-01-01 10:38 ` [PATCH 5/6] wwwstatic: avoid TOCTTOU for FIFO check Eric Wong
@ 2020-01-01 10:38 ` Eric Wong
  5 siblings, 0 replies; 8+ messages in thread
From: Eric Wong @ 2020-01-01 10:38 UTC (permalink / raw)
  To: meta

It's now possible to use WwwStatic as a standalone PSGI
app to serve static files and recreate the award-winning
web design of https://public-inbox.org/ :>
---
 MANIFEST                     |   1 +
 lib/PublicInbox/Cgit.pm      |   6 +-
 lib/PublicInbox/WWW.pm       |  15 +--
 lib/PublicInbox/WwwStatic.pm | 198 ++++++++++++++++++++++++++++++++++-
 t/www_static.t               |  96 +++++++++++++++++
 5 files changed, 294 insertions(+), 22 deletions(-)
 create mode 100644 t/www_static.t

diff --git a/MANIFEST b/MANIFEST
index f649bbef..16c92c36 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -290,6 +290,7 @@ t/watch_filter_rubylang.t
 t/watch_maildir.t
 t/watch_maildir_v2.t
 t/www_listing.t
+t/www_static.t
 t/xcpdb-reshard.t
 xt/git-http-backend.t
 xt/git_async_cmp.t
diff --git a/lib/PublicInbox/Cgit.pm b/lib/PublicInbox/Cgit.pm
index c0b1a73b..c42f8847 100644
--- a/lib/PublicInbox/Cgit.pm
+++ b/lib/PublicInbox/Cgit.pm
@@ -16,7 +16,6 @@ use PublicInbox::Git;
 use warnings;
 use PublicInbox::Qspawn;
 use PublicInbox::WwwStatic qw(r);
-use Plack::MIME;
 
 sub locate_cgit ($) {
 	my ($pi_config) = @_;
@@ -114,9 +113,8 @@ sub call {
 		}
 	} elsif ($path_info =~ m!$self->{static}! &&
 		 defined($cgit_data = $self->{cgit_data})) {
-		my $f = $1;
-		return PublicInbox::WwwStatic::response($env, [], $cgit_data.$f,
-						Plack::MIME->mime_type($f));
+		my $f = $cgit_data.$1; # {static} only matches leading slash
+		return PublicInbox::WwwStatic::response($env, [], $f);
 	}
 
 	my $cgi_env = { PATH_INFO => $path_info };
diff --git a/lib/PublicInbox/WWW.pm b/lib/PublicInbox/WWW.pm
index 99f9f1dc..efe7c8ca 100644
--- a/lib/PublicInbox/WWW.pm
+++ b/lib/PublicInbox/WWW.pm
@@ -22,7 +22,7 @@ use PublicInbox::MID qw(mid_escape);
 require PublicInbox::Git;
 use PublicInbox::GitHTTPBackend;
 use PublicInbox::UserContent;
-use PublicInbox::WwwStatic qw(r);
+use PublicInbox::WwwStatic qw(r path_info_raw);
 
 # TODO: consider a routing tree now that we have more endpoints:
 our $INBOX_RE = qr!\A/([\w\-][\w\.\-]*)!;
@@ -43,19 +43,6 @@ sub run {
 	PublicInbox::WWW->new->call($req->env);
 }
 
-# PATH_INFO is decoded, and we want the undecoded original
-my %path_re_cache;
-sub path_info_raw ($) {
-	my ($env) = @_;
-	my $sn = $env->{SCRIPT_NAME};
-	my $re = $path_re_cache{$sn} ||= do {
-		$sn = '/'.$sn unless index($sn, '/') == 0;
-		$sn =~ s!/\z!!;
-		qr!\A(?:https?://[^/]+)?\Q$sn\E(/[^\?\#]+)!;
-	};
-	$env->{REQUEST_URI} =~ $re ? $1 : $env->{PATH_INFO};
-}
-
 sub call {
 	my ($self, $env) = @_;
 	my $ctx = { env => $env, www => $self };
diff --git a/lib/PublicInbox/WwwStatic.pm b/lib/PublicInbox/WwwStatic.pm
index ce4bfe9b..bc42236e 100644
--- a/lib/PublicInbox/WwwStatic.pm
+++ b/lib/PublicInbox/WwwStatic.pm
@@ -1,19 +1,48 @@
 # Copyright (C) 2016-2019 all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 
+# This package can either be a PSGI response body for a static file
+# OR a standalone PSGI app which returns the above PSGI response body
+# (or an HTML directory listing).
+#
+# It encapsulates the "autoindex", "index", and "gzip_static"
+# functionality of nginx.
 package PublicInbox::WwwStatic;
 use strict;
 use parent qw(Exporter);
+use bytes ();
 use Fcntl qw(SEEK_SET O_RDONLY O_NONBLOCK);
+use POSIX qw(strftime lround);
 use HTTP::Date qw(time2str);
 use HTTP::Status qw(status_message);
 use Errno qw(EACCES ENOTDIR ENOENT);
-our @EXPORT_OK = qw(@NO_CACHE r);
+use URI::Escape qw(uri_escape_utf8);
+use PublicInbox::Hval qw(ascii_html);
+use Plack::MIME;
+our @EXPORT_OK = qw(@NO_CACHE r path_info_raw);
 
 our @NO_CACHE = ('Expires', 'Fri, 01 Jan 1980 00:00:00 GMT',
 		'Pragma', 'no-cache',
 		'Cache-Control', 'no-cache, max-age=0, must-revalidate');
 
+our $STYLE = <<'EOF';
+<style>
+@media screen {
+	*{background:#000;color:#ccc}
+	a{color:#69f;text-decoration:none}
+	a:visited{color:#96f}
+}
+@media screen AND (prefers-color-scheme:light) {
+	*{background:#fff;color:#333}
+	a{color:#00f;text-decoration:none}
+	a:visited{color:#808}
+}
+</style>
+EOF
+
+$STYLE =~ s/^\s*//gm;
+$STYLE =~ tr/\n//d;
+
 sub r ($;$) {
 	my ($code, $msg) = @_;
 	$msg ||= status_message($code);
@@ -69,8 +98,28 @@ sub prepare_range {
 	($code, $len);
 }
 
-sub response {
+# returns a PSGI arrayref response iff .gz and non-.gz mtimes match
+sub try_gzip_static ($$$$) {
 	my ($env, $h, $path, $type) = @_;
+	return unless ($env->{HTTP_ACCEPT_ENCODING} // '') =~ /\bgzip\b/i;
+	my $mtime;
+	return unless -f $path && defined(($mtime = (stat(_))[9]));
+	my $gz = "$path.gz";
+	return unless -f $gz && (stat(_))[9] == $mtime;
+	my $res = response($env, $h, $gz, $type);
+	return if ($res->[0] > 300 || $res->[0] < 200);
+	push @{$res->[1]}, qw(Cache-Control no-transform Content-Encoding gzip);
+	$res;
+}
+
+sub response ($$$;$) {
+	my ($env, $h, $path, $type) = @_;
+	$type //= Plack::MIME->mime_type($path) // 'application/octet-stream';
+	if ($path !~ /\.gz\z/i) {
+		if (my $res = try_gzip_static($env, $h, $path, $type)) {
+			return $res;
+		}
+	}
 
 	my $in;
 	if ($env->{REQUEST_METHOD} eq 'HEAD') {
@@ -108,7 +157,7 @@ sub response {
 	[ $code, $h, $body ];
 }
 
-# called by PSGI servers:
+# called by PSGI servers on each response chunk:
 sub getline {
 	my ($self) = @_;
 	my $len = $self->{len} or return; # undef, tells server we're done
@@ -132,6 +181,147 @@ sub getline {
 	undef;
 }
 
-sub close {} # noop, just let everything go out-of-scope
+sub close {} # noop, called by PSGI server, just let everything go out-of-scope
+
+# OO interface for use as a Plack app
+sub new {
+	my ($class, %opt) = @_;
+	my $index = $opt{'index'} // [ 'index.html' ];
+	$index = [ $index ] if defined($index) && ref($index) ne 'ARRAY';
+	$index = undef if scalar(@$index) == 0;
+	my $style = $opt{style};
+	if (defined $style) {
+		$style = \$style unless ref($style);
+	}
+	my $docroot = $opt{docroot};
+	die "`docroot' not set" unless defined($docroot) && $docroot ne '';
+	bless {
+		docroot => $docroot,
+		index => $index,
+		autoindex => $opt{autoindex},
+		style => $style // \$STYLE,
+	}, $class;
+}
+
+# PATH_INFO is decoded, and we want the undecoded original
+my %path_re_cache;
+sub path_info_raw ($) {
+	my ($env) = @_;
+	my $sn = $env->{SCRIPT_NAME};
+	my $re = $path_re_cache{$sn} ||= do {
+		$sn = '/'.$sn unless index($sn, '/') == 0;
+		$sn =~ s!/\z!!;
+		qr!\A(?:https?://[^/]+)?\Q$sn\E(/[^\?\#]+)!;
+	};
+	$env->{REQUEST_URI} =~ $re ? $1 : $env->{PATH_INFO};
+}
+
+sub redirect_slash ($) {
+	my ($env) = @_;
+	my $url = $env->{'psgi.url_scheme'} . '://';
+	my $host_port = $env->{HTTP_HOST} //
+		"$env->{SERVER_NAME}:$env->{SERVER_PORT}";
+	$url .= $host_port . path_info_raw($env) . '/';
+	my $body = "Redirecting to $url\n";
+	[ 302, [ qw(Content-Type text/plain), 'Location', $url,
+		'Content-Length', length($body) ], [ $body ] ]
+}
+
+sub human_size ($) {
+	my ($size) = @_;
+	my $suffix = '';
+	for my $s (qw(K M G T P)) {
+		last if $size < 1024;
+		$size /= 1024;
+		if ($size <= 1024) {
+			$suffix = $s;
+			last;
+		}
+	}
+	lround($size).$suffix;
+}
+
+# by default, this returns "index.html" if it exists for a given directory
+# It'll generate a directory listing, (autoindex).
+# May be disabled by setting autoindex => 0
+sub dir_response ($$$) {
+	my ($self, $env, $fs_path) = @_;
+	if (my $index = $self->{'index'}) { # serve index.html or similar
+		for my $html (@$index) {
+			my $p = $fs_path . $html;
+			my $res = response($env, [], $p);
+			return $res if $res->[0] != 404;
+		}
+	}
+	return r(404) unless $self->{autoindex};
+	opendir(my $dh, $fs_path) or do {
+		return r(404) if ($! == ENOENT || $! == ENOTDIR);
+		return r(403) if $! == EACCES;
+		return r(500);
+	};
+	my @entries = grep(!/\A\./, readdir($dh));
+	$dh = undef;
+	my (%dirs, %other, %want_gz);
+	my $path_info = $env->{PATH_INFO};
+	push @entries, '..' if $path_info ne '/';
+	for my $base (@entries) {
+		my $href = ascii_html(uri_escape_utf8($base));
+		my $name = ascii_html($base);
+		my @st = stat($fs_path . $base) or next; # unlikely
+		my ($gzipped, $uncompressed, $hsize);
+		my $entry = '';
+		my $mtime = $st[9];
+		if (-d _) {
+			$href .= '/';
+			$name .= '/';
+			$hsize = '-';
+			$dirs{"$base\0$mtime"} = \$entry;
+		} elsif (-f _) {
+			$other{"$base\0$mtime"} = \$entry;
+			if ($base !~ /\.gz\z/i) {
+				$want_gz{"$base.gz\0$mtime"} = undef;
+			}
+			$hsize = human_size($st[7]);
+		} else {
+			next;
+		}
+		# 54 = 80 - (SP length(strftime(%Y-%m-%d %k:%M)) SP human_size)
+		$hsize = sprintf('% 8s', $hsize);
+		my $pad = 54 - length($name);
+		$pad = 1 if $pad <= 0;
+		$entry .= qq(<a\nhref="$href">$name</a>) . (' ' x $pad);
+		$mtime = strftime('%Y-%m-%d %k:%M', gmtime($mtime));
+		$entry .= $mtime . $hsize;
+	}
+
+	# filter out '.gz' files as long as the mtime matches the
+	# uncompressed version
+	delete(@other{keys %want_gz});
+	@entries = ((map { ${$dirs{$_}} } sort keys %dirs),
+			(map { ${$other{$_}} } sort keys %other));
+
+	my $path_info_html = ascii_html($path_info);
+	my $body = "<html><head><title>Index of $path_info_html</title>" .
+		${$self->{style}} .
+		"</head><body><pre>Index of $path_info_html</pre><hr><pre>\n";
+	$body .= join("\n", @entries) . "</pre><hr></body></html>\n";
+	[ 200, [ qw(Content-Type text/html
+			Content-Length), bytes::length($body) ], [ $body ] ]
+}
+
+sub call { # PSGI app endpoint
+	my ($self, $env) = @_;
+	return r(405) if $env->{REQUEST_METHOD} !~ /\A(?:GET|HEAD)\z/;
+	my $path_info = $env->{PATH_INFO};
+	return r(403) if index($path_info, "\0") >= 0;
+	my (@parts) = split(m!/+!, $path_info, -1);
+	return r(403) if grep(/\A(?:\.\.)\z/, @parts) || $parts[0] ne '';
+
+	my $fs_path = join('/', $self->{docroot}, @parts);
+	return dir_response($self, $env, $fs_path) if $parts[-1] eq '';
+
+	my $res = response($env, [], $fs_path);
+	$res->[0] == 404 && -d $fs_path ? redirect_slash($env) : $res;
+}
 
 1;
diff --git a/t/www_static.t b/t/www_static.t
new file mode 100644
index 00000000..5f2e3380
--- /dev/null
+++ b/t/www_static.t
@@ -0,0 +1,96 @@
+# Copyright (C) 2019 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+use strict;
+use warnings;
+use Test::More;
+use PublicInbox::TestCommon;
+my ($tmpdir, $for_destroy) = tmpdir();
+my @mods = qw(HTTP::Request::Common Plack::Test URI::Escape);
+require_mods(@mods);
+use_ok $_ foreach @mods;
+use_ok 'PublicInbox::WwwStatic';
+
+my $app = sub {
+	my $ws = PublicInbox::WwwStatic->new(docroot => $tmpdir, @_);
+	sub { $ws->call(shift) };
+};
+
+test_psgi($app->(), sub {
+	my $cb = shift;
+	my $res = $cb->(GET('/'));
+	is($res->code, 404, '404 on "/" by default');
+	open my $fh, '>', "$tmpdir/index.html" or die;
+	print $fh 'hi' or die;
+	close $fh or die;
+	$res = $cb->(GET('/'));
+	is($res->code, 200, '200 with index.html');
+	is($res->content, 'hi', 'default index.html returned');
+	$res = $cb->(HEAD('/'));
+	is($res->code, 200, '200 on HEAD /');
+	is($res->content, '', 'no content');
+	is($res->header('Content-Length'), '2', 'content-length set');
+	like($res->header('Content-Type'), qr!^text/html\b!,
+		'content-type is html');
+});
+
+test_psgi($app->(autoindex => 1, index => []), sub {
+	my $cb = shift;
+	my $res = $cb->(GET('/'));
+	my $updir = 'href="../">../</a>';
+	is($res->code, 200, '200 with autoindex default');
+	my $ls = $res->content;
+	like($ls, qr/index\.html/, 'got listing with index.html');
+	ok(index($ls, $updir) < 0, 'no updir at /');
+	mkdir("$tmpdir/dir") or die;
+	rename("$tmpdir/index.html", "$tmpdir/dir/index.html") or die;
+
+	$res = $cb->(GET('/dir/'));
+	is($res->code, 200, '200 with autoindex for dir/');
+	$ls = $res->content;
+	ok(index($ls, $updir) > 0, 'updir at /dir/');
+
+	for my $up (qw(/../ .. /dir/.. /dir/../)) {
+		is($cb->(GET($up))->code, 403, "`$up' traversal rejected");
+	}
+
+	$res = $cb->(GET('/dir'));
+	is($res->code, 302, '302 w/o slash');
+	like($res->header('Location'), qr!://[^/]+/dir/\z!,
+		'redirected w/ slash');
+
+	rename("$tmpdir/dir/index.html", "$tmpdir/dir/foo") or die;
+	link("$tmpdir/dir/foo", "$tmpdir/dir/foo.gz") or die;
+	$res = $cb->(GET('/dir/'));
+	unlike($res->content, qr/>foo\.gz</,
+		'.gz file hidden if mtime matches uncompressed');
+	like($res->content, qr/>foo</, 'uncompressed foo shown');
+
+	$res = $cb->(GET('/dir/foo/bar'));
+	is($res->code, 404, 'using file as dir fails');
+
+	unlink("$tmpdir/dir/foo") or die;
+	$res = $cb->(GET('/dir/'));
+	like($res->content, qr/>foo\.gz</,
+		'.gz shown when no uncompressed version exists');
+
+	open my $fh, '>', "$tmpdir/dir/foo" or die;
+	print $fh "uncompressed\n" or die;
+	close $fh or die;
+	utime(0, 0, "$tmpdir/dir/foo") or die;
+	$res = $cb->(GET('/dir/'));
+	my $html = $res->content;
+	like($html, qr/>foo</, 'uncompressed foo shown');
+	like($html, qr/>foo\.gz</, 'gzipped foo shown on mtime mismatch');
+
+	$res = $cb->(GET('/dir/foo'));
+	is($res->content, "uncompressed\n",
+		'got uncompressed on mtime mismatch');
+
+	utime(0, 0, "$tmpdir/dir/foo.gz") or die;
+	my $get = GET('/dir/foo');
+	$get->header('Accept-Encoding' => 'gzip');
+	$res = $cb->($get);
+	is($res->content, "hi", 'got compressed on mtime match');
+});
+
+done_testing();

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/6] wwwstatic: implement Last-Modified and If-Modified-Since
  2020-01-01 10:38 ` [PATCH 1/6] wwwstatic: implement Last-Modified and If-Modified-Since Eric Wong
@ 2020-01-01 19:07   ` Eric Wong
  0 siblings, 0 replies; 8+ messages in thread
From: Eric Wong @ 2020-01-01 19:07 UTC (permalink / raw)
  To: meta

Eric Wong <e@80x24.org> wrote:
> We're already static files for cgit, and will serve more
               ^- "serving"

> static files, soon.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-01-01 19:07 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-01-01 10:38 [PATCH 0/6] wwwstatic: support directory listings Eric Wong
2020-01-01 10:38 ` [PATCH 1/6] wwwstatic: implement Last-Modified and If-Modified-Since Eric Wong
2020-01-01 19:07   ` Eric Wong
2020-01-01 10:38 ` [PATCH 2/6] www: move more logic into path_info_raw Eric Wong
2020-01-01 10:38 ` [PATCH 3/6] wwwstatic: move r(...) functions here Eric Wong
2020-01-01 10:38 ` [PATCH 4/6] wwwstatic: do not open() files for HEAD requests Eric Wong
2020-01-01 10:38 ` [PATCH 5/6] wwwstatic: avoid TOCTTOU for FIFO check Eric Wong
2020-01-01 10:38 ` [PATCH 6/6] wwwstatic: add directory listing + index.html support Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).