unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* [PATCH 0/6] daemon: reduce fragmentation via preload
@ 2020-03-19  8:32 Eric Wong
  2020-03-19  8:32 ` [PATCH 1/6] www: update ->preload for newer modules Eric Wong
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Eric Wong @ 2020-03-19  8:32 UTC (permalink / raw)
  To: meta

For long-lived daemons, perform immortal allocations as early as
possible to reduce the likelyhood of heap fragmentation due to
mixed-lifetime allocations happening once the process is fully
loaded and serving requests, since per-request allocations
should all be short-lived.

On a side note, I'm wondering if WWW should just preload by
default.  I'm not sure if anybody uses public-inbox.cgi (or
should be using it :P).  It's not like we don't ship
public-inbox-httpd; and any PSGI implementation could be used
for smaller inboxes (or powerful-enough hardware).

Eric Wong (6):
  www: update ->preload for newer modules
  wwwlisting: favor "use" over require
  wwwlisting: avoid lazy loading JSON module
  www: avoid `state' usage to perform allocations up-front
  daemon: do more immortal allocations up front
  viewdiff: favor `qr' to precompile regexps

 lib/PublicInbox/NNTPD.pm      |  4 +++
 lib/PublicInbox/SolverGit.pm  | 13 +++++----
 lib/PublicInbox/ViewDiff.pm   | 53 +++++++++++++++++++----------------
 lib/PublicInbox/WWW.pm        | 29 ++++++++++++++-----
 lib/PublicInbox/WwwListing.pm | 30 +++++++++-----------
 t/www_listing.t               |  4 +--
 6 files changed, 78 insertions(+), 55 deletions(-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/6] www: update ->preload for newer modules
  2020-03-19  8:32 [PATCH 0/6] daemon: reduce fragmentation via preload Eric Wong
@ 2020-03-19  8:32 ` Eric Wong
  2020-03-19  8:32 ` [PATCH 2/6] wwwlisting: favor "use" over require Eric Wong
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Eric Wong @ 2020-03-19  8:32 UTC (permalink / raw)
  To: meta

We'll also avoid explicitly loading standard library modules
like POSIX and Digest::SHA, here; instead we load our own
modules and let those load whatever non-PublicInbox:: modules
they need.
---
 lib/PublicInbox/WWW.pm | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/WWW.pm b/lib/PublicInbox/WWW.pm
index 1e7d3c1e..534ee028 100644
--- a/lib/PublicInbox/WWW.pm
+++ b/lib/PublicInbox/WWW.pm
@@ -19,7 +19,6 @@ use PublicInbox::Config;
 use PublicInbox::Hval;
 use URI::Escape qw(uri_unescape);
 use PublicInbox::MID qw(mid_escape);
-require PublicInbox::Git;
 use PublicInbox::GitHTTPBackend;
 use PublicInbox::UserContent;
 use PublicInbox::WwwStatic qw(r path_info_raw);
@@ -136,18 +135,21 @@ sub call {
 # for CoW-friendliness, MOOOOO!
 sub preload {
 	my ($self) = @_;
+	require PublicInbox::ExtMsg;
 	require PublicInbox::Feed;
 	require PublicInbox::View;
 	require PublicInbox::SearchThread;
 	require PublicInbox::MIME;
-	require Digest::SHA;
-	require POSIX;
+	require PublicInbox::Mbox;
+	require PublicInbox::ViewVCS;
+	require PublicInbox::WwwText;
+	require PublicInbox::WwwAttach;
 	eval {
 		require PublicInbox::Search;
 		PublicInbox::Search::load_xapian();
 	};
 	foreach (qw(PublicInbox::SearchView
-			PublicInbox::Mbox IO::Compress::Gzip
+			PublicInbox::MboxGz
 			PublicInbox::NewsWWW)) {
 		eval "require $_;";
 	}

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/6] wwwlisting: favor "use" over require
  2020-03-19  8:32 [PATCH 0/6] daemon: reduce fragmentation via preload Eric Wong
  2020-03-19  8:32 ` [PATCH 1/6] www: update ->preload for newer modules Eric Wong
@ 2020-03-19  8:32 ` Eric Wong
  2020-03-19  8:32 ` [PATCH 3/6] wwwlisting: avoid lazy loading JSON module Eric Wong
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Eric Wong @ 2020-03-19  8:32 UTC (permalink / raw)
  To: meta

"use" is also evaluated earlier than "require", so it is
favorable for compile-only checking.
---
 lib/PublicInbox/WwwListing.pm | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/WwwListing.pm b/lib/PublicInbox/WwwListing.pm
index c063fca6..a8aecaf7 100644
--- a/lib/PublicInbox/WwwListing.pm
+++ b/lib/PublicInbox/WwwListing.pm
@@ -12,8 +12,8 @@ use PublicInbox::View;
 use PublicInbox::Inbox;
 use bytes ();
 use HTTP::Date qw(time2str);
-require Digest::SHA;
-require File::Spec;
+use Digest::SHA ();
+use File::Spec ();
 *try_cat = \&PublicInbox::Inbox::try_cat;
 
 sub list_all_i {

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/6] wwwlisting: avoid lazy loading JSON module
  2020-03-19  8:32 [PATCH 0/6] daemon: reduce fragmentation via preload Eric Wong
  2020-03-19  8:32 ` [PATCH 1/6] www: update ->preload for newer modules Eric Wong
  2020-03-19  8:32 ` [PATCH 2/6] wwwlisting: favor "use" over require Eric Wong
@ 2020-03-19  8:32 ` Eric Wong
  2020-03-21  1:10   ` [PATCH 0/2] wwwlisting: fixup warnings :x Eric Wong
  2020-03-19  8:32 ` [PATCH 4/6] www: avoid `state' usage to perform allocations up-front Eric Wong
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Eric Wong @ 2020-03-19  8:32 UTC (permalink / raw)
  To: meta

We already lazy-load WwwListing for the CGI script, and
hiding another layer of lazy-loading makes things difficult
to do WWW->preload.

We want long-lived processes to do all long-lived allocations up
front to avoid fragmentation in the allocator, but we'll still
support short-lived processes by lazy-loading individual modules
in the PublicInbox::* namespace.

Mixing up allocation lifetimes (e.g. doing immortal allocations
while a large amount of space is taken by short-lived objects)
will cause fragmentation in any allocator which favors large
contiguous regions for performance reasons.  This includes any
malloc implementation which relies on sbrk() for the primary
heap, including glibc malloc.
---
 lib/PublicInbox/WwwListing.pm | 26 ++++++++++++--------------
 t/www_listing.t               |  4 ++--
 2 files changed, 14 insertions(+), 16 deletions(-)

diff --git a/lib/PublicInbox/WwwListing.pm b/lib/PublicInbox/WwwListing.pm
index a8aecaf7..33cb0ace 100644
--- a/lib/PublicInbox/WwwListing.pm
+++ b/lib/PublicInbox/WwwListing.pm
@@ -10,11 +10,19 @@ use PublicInbox::Hval qw(ascii_html prurl);
 use PublicInbox::Linkify;
 use PublicInbox::View;
 use PublicInbox::Inbox;
-use bytes ();
+use bytes (); # bytes::length
 use HTTP::Date qw(time2str);
 use Digest::SHA ();
 use File::Spec ();
 *try_cat = \&PublicInbox::Inbox::try_cat;
+our $json;
+if (eval { require IO::Compress::Gzip }) {
+	for my $mod (qw(JSON::MaybeXS JSON JSON::PP)) {
+		eval "require $mod" or next;
+		# ->ascii encodes non-ASCII to "\uXXXX"
+		$json = $mod->new->ascii(1);
+	}
+}
 
 sub list_all_i {
 	my ($ibx, $arg) = @_;
@@ -121,16 +129,6 @@ sub html ($$) {
 	[ $code, $h, [ $out ] ];
 }
 
-my $json;
-sub _json () {
-	for my $mod (qw(JSON::MaybeXS JSON JSON::PP)) {
-		eval "require $mod" or next;
-		# ->ascii encodes non-ASCII to "\uXXXX"
-		return $mod->new->ascii(1);
-	}
-	die;
-}
-
 sub fingerprint ($) {
 	my ($git) = @_;
 	# TODO: convert to qspawn for fairness when there's
@@ -201,7 +199,8 @@ sub manifest_add ($$;$$) {
 # manifest.js.gz
 sub js ($$) {
 	my ($env, $list) = @_;
-	eval { require IO::Compress::Gzip } or return [ 404, [], [] ];
+	# $json won't be defined if IO::Compress::Gzip is missing
+	$json or return [ 404, [], [] ];
 
 	my $manifest = { -abs2urlpath => {}, -mtime => 0 };
 	for my $ibx (@$list) {
@@ -221,8 +220,7 @@ sub js ($$) {
 		$repo->{reference} = $abs2urlpath->{$abs};
 	}
 	my $out;
-	IO::Compress::Gzip::gzip(\(($json ||= _json())->encode($manifest)) =>
-				 \$out);
+	IO::Compress::Gzip::gzip(\($json->encode($manifest)) => \$out);
 	$manifest = undef;
 	[ 200, [ qw(Content-Type application/gzip),
 		 'Last-Modified', time2str($mtime),
diff --git a/t/www_listing.t b/t/www_listing.t
index 5168e16a..39c19577 100644
--- a/t/www_listing.t
+++ b/t/www_listing.t
@@ -9,8 +9,8 @@ use PublicInbox::TestCommon;
 require_mods(qw(URI::Escape Plack::Builder Digest::SHA
 		IO::Compress::Gzip IO::Uncompress::Gunzip HTTP::Tiny));
 require PublicInbox::WwwListing;
-my $json = eval { PublicInbox::WwwListing::_json() };
-plan skip_all => "JSON module missing: $@" if $@;
+my $json = $PublicInbox::WwwListing::json or
+	plan skip_all => "JSON module missing";
 
 use_ok 'PublicInbox::Git';
 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/6] www: avoid `state' usage to perform allocations up-front
  2020-03-19  8:32 [PATCH 0/6] daemon: reduce fragmentation via preload Eric Wong
                   ` (2 preceding siblings ...)
  2020-03-19  8:32 ` [PATCH 3/6] wwwlisting: avoid lazy loading JSON module Eric Wong
@ 2020-03-19  8:32 ` Eric Wong
  2020-03-19  8:32 ` [PATCH 5/6] daemon: do more immortal allocations up front Eric Wong
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Eric Wong @ 2020-03-19  8:32 UTC (permalink / raw)
  To: meta

We want WWW->preload to get as many immortal allocations done
as possible, and the `state' feature from Perl 5.10 prevents that.
---
 lib/PublicInbox/SolverGit.pm | 13 +++++++------
 lib/PublicInbox/ViewDiff.pm  |  6 +++---
 2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/lib/PublicInbox/SolverGit.pm b/lib/PublicInbox/SolverGit.pm
index 34669dbe..f881e16e 100644
--- a/lib/PublicInbox/SolverGit.pm
+++ b/lib/PublicInbox/SolverGit.pm
@@ -34,6 +34,12 @@ my $OID_MIN = 7;
 # work fairly.  Other PSGI servers may have trouble, though.
 my $MAX_PATCH = 9999;
 
+my $LF = qr!\r?\n!;
+my $ANY = qr![^\r\n]+!;
+my $MODE = '100644|120000|100755';
+my $FN = qr!(?:("?[^/\n]+/[^\r\n]+)|/dev/null)!;
+my %BAD_COMPONENT = ('' => 1, '.' => 1, '..' => 1);
+
 # di = diff info / a hashref with information about a diff ($di):
 # {
 #	oid_a => abbreviated pre-image oid,
@@ -110,10 +116,6 @@ sub extract_diff ($$) {
 		$s =~ s/\r\n/\n/sg;
 	}
 
-	state $LF = qr!\r?\n!;
-	state $ANY = qr![^\r\n]+!;
-	state $MODE = '100644|120000|100755';
-	state $FN = qr!(?:("?[^/\n]+/[^\r\n]+)|/dev/null)!;
 
 	$s =~ m!( # $1 start header lines we save for debugging:
 
@@ -174,8 +176,7 @@ sub extract_diff ($$) {
 
 	# get rid of path-traversal attempts and junk patches:
 	# it's junk at best, an attack attempt at worse:
-	state $bad_component = { map { $_ => 1 } ('', '.', '..') };
-	foreach (@a, @b) { return if $bad_component->{$_} }
+	foreach (@a, @b) { return if $BAD_COMPONENT{$_} }
 
 	$di->{path_a} = join('/', @a) if @a;
 	$di->{path_b} = join('/', @b);
diff --git a/lib/PublicInbox/ViewDiff.pm b/lib/PublicInbox/ViewDiff.pm
index 0f5c0e4e..57a1b5d6 100644
--- a/lib/PublicInbox/ViewDiff.pm
+++ b/lib/PublicInbox/ViewDiff.pm
@@ -20,6 +20,9 @@ sub UNSAFE () { "^A-Za-z0-9\-\._~/" }
 
 my $OID_NULL = '0{7,40}';
 my $OID_BLOB = '[a-f0-9]{7,40}';
+my $LF = qr!\n!;
+my $ANY = qr![^\n]!;
+my $FN = qr!(?:"?[^/\n]+/[^\n]+|/dev/null)!;
 
 # cf. git diff.c :: get_compact_summary
 my $DIFFSTAT_COMMENT = qr/\((?:new|gone|(?:(?:new|mode) [\+\-][lx]))\)/;
@@ -170,9 +173,6 @@ sub diff_before_or_after ($$$) {
 # callers must do CRLF => LF conversion before calling this
 sub flush_diff ($$$) {
 	my ($dst, $ctx, $cur) = @_;
-	state $LF = qr!\n!;
-	state $ANY = qr![^\n]!;
-	state $FN = qr!(?:"?[^/\n]+/[^\n]+|/dev/null)!;
 
 	my @top = split(/(
 		(?:	# begin header stuff, don't capture filenames, here,

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/6] daemon: do more immortal allocations up front
  2020-03-19  8:32 [PATCH 0/6] daemon: reduce fragmentation via preload Eric Wong
                   ` (3 preceding siblings ...)
  2020-03-19  8:32 ` [PATCH 4/6] www: avoid `state' usage to perform allocations up-front Eric Wong
@ 2020-03-19  8:32 ` Eric Wong
  2020-03-19  8:32 ` [PATCH 6/6] viewdiff: favor `qr' to precompile regexps Eric Wong
  2020-04-21  8:52 ` Encode preloading Eric Wong
  6 siblings, 0 replies; 13+ messages in thread
From: Eric Wong @ 2020-03-19  8:32 UTC (permalink / raw)
  To: meta

Doing immortal allocations late can cause those allocations
to end up in places where it fragments the heap.  So do more
things up front for long-lived daemons.
---
 lib/PublicInbox/NNTPD.pm |  4 ++++
 lib/PublicInbox/WWW.pm   | 21 +++++++++++++++++----
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/NNTPD.pm b/lib/PublicInbox/NNTPD.pm
index 7a917169..451f4d41 100644
--- a/lib/PublicInbox/NNTPD.pm
+++ b/lib/PublicInbox/NNTPD.pm
@@ -45,6 +45,10 @@ sub refresh_groups () {
 			# Only valid if msgmap and search works
 			$new->{$ngname} = $ng;
 			push @list, $ng;
+
+			# preload to avoid fragmentation:
+			$ng->description;
+			$ng->base_url;
 		}
 	});
 	@list =	sort { $a->{newsgroup} cmp $b->{newsgroup} } @list;
diff --git a/lib/PublicInbox/WWW.pm b/lib/PublicInbox/WWW.pm
index 534ee028..2434f2f5 100644
--- a/lib/PublicInbox/WWW.pm
+++ b/lib/PublicInbox/WWW.pm
@@ -132,7 +132,9 @@ sub call {
 	}
 }
 
-# for CoW-friendliness, MOOOOO!
+# for CoW-friendliness, MOOOOO!  Even for single-process setups,
+# we want to get all immortal allocations done early to avoid heap
+# fragmentation since common allocators favor a large contiguous heap.
 sub preload {
 	my ($self) = @_;
 	require PublicInbox::ExtMsg;
@@ -148,18 +150,29 @@ sub preload {
 		require PublicInbox::Search;
 		PublicInbox::Search::load_xapian();
 	};
-	foreach (qw(PublicInbox::SearchView
-			PublicInbox::MboxGz
-			PublicInbox::NewsWWW)) {
+	foreach (qw(PublicInbox::SearchView PublicInbox::MboxGz)) {
 		eval "require $_;";
 	}
 	if (ref($self)) {
+		my $pi_config = $self->{pi_config};
+		if (defined($pi_config->{'publicinbox.cgitrc'})) {
+			$pi_config->limiter('-cgit');
+		}
 		$self->cgit;
 		$self->stylesheets_prepare($_) for ('', '../', '../../');
 		$self->www_listing;
+		$self->news_www;
+		$pi_config->each_inbox(\&preload_inbox);
 	}
 }
 
+sub preload_inbox {
+	my $ibx = shift;
+	$ibx->cloneurl;
+	$ibx->description;
+	$ibx->base_url;
+}
+
 # private functions below
 
 sub r404 {

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 6/6] viewdiff: favor `qr' to precompile regexps
  2020-03-19  8:32 [PATCH 0/6] daemon: reduce fragmentation via preload Eric Wong
                   ` (4 preceding siblings ...)
  2020-03-19  8:32 ` [PATCH 5/6] daemon: do more immortal allocations up front Eric Wong
@ 2020-03-19  8:32 ` Eric Wong
  2020-04-21  8:52 ` Encode preloading Eric Wong
  6 siblings, 0 replies; 13+ messages in thread
From: Eric Wong @ 2020-03-19  8:32 UTC (permalink / raw)
  To: meta

We can also avoid `o' regexp modifier, since it isn't
recommended by Perl upstream, anymore (although we don't
have any bugs or unintended behavior because of it).
---
 lib/PublicInbox/ViewDiff.pm | 47 ++++++++++++++++++++-----------------
 1 file changed, 26 insertions(+), 21 deletions(-)

diff --git a/lib/PublicInbox/ViewDiff.pm b/lib/PublicInbox/ViewDiff.pm
index 57a1b5d6..d22c80b9 100644
--- a/lib/PublicInbox/ViewDiff.pm
+++ b/lib/PublicInbox/ViewDiff.pm
@@ -25,7 +25,23 @@ my $ANY = qr![^\n]!;
 my $FN = qr!(?:"?[^/\n]+/[^\n]+|/dev/null)!;
 
 # cf. git diff.c :: get_compact_summary
-my $DIFFSTAT_COMMENT = qr/\((?:new|gone|(?:(?:new|mode) [\+\-][lx]))\)/;
+my $DIFFSTAT_COMMENT =
+	qr/(?: *\((?:new|gone|(?:(?:new|mode) [\+\-][lx]))\))? *\z/s;
+my $NULL_TO_BLOB = qr/^(index $OID_NULL\.\.)($OID_BLOB)\b/ms;
+my $BLOB_TO_NULL = qr/^index ($OID_BLOB)(\.\.$OID_NULL)\b/ms;
+my $BLOB_TO_BLOB = qr/^index ($OID_BLOB)\.\.($OID_BLOB)/ms;
+my $EXTRACT_DIFFS = qr/(
+		(?:	# begin header stuff, don't capture filenames, here,
+			# but instead wait for the --- and +++ lines.
+			(?:^diff\x20--git\x20$FN\x20$FN$LF)
+
+			# old mode || new mode || copy|rename|deleted|...
+			(?:^[a-z]$ANY+$LF)*
+		)? # end of optional stuff, everything below is required
+		^index\x20($OID_BLOB)\.\.($OID_BLOB)$ANY*$LF
+		^---\x20($FN)$LF
+		^\+{3}\x20($FN)$LF)/msx;
+my $IS_OID = qr/\A$OID_BLOB\z/s;
 
 # link to line numbers in blobs
 sub diff_hunk ($$$$) {
@@ -62,7 +78,7 @@ sub anchor0 ($$$$) {
 	# So only do best-effort handling of renames for common cases;
 	# which works well in practice. If projects put "=>", or trailing
 	# spaces in filenames, oh well :P
-	$fn =~ s/(?: *$DIFFSTAT_COMMENT)? *\z//so;
+	$fn =~ s/$DIFFSTAT_COMMENT//;
 	$fn =~ s/{(?:.+) => (.+)}/$1/ or $fn =~ s/.* => (.+)/$1/;
 	$fn = git_unquote($fn);
 
@@ -132,18 +148,17 @@ sub diff_header ($$$$) {
 
 	# no need to capture oid_a and oid_b on add/delete,
 	# we just linkify OIDs directly via s///e in conditional
-	if (($$x =~ s/^(index $OID_NULL\.\.)($OID_BLOB)\b/
-			$1 . oid($dctx, $spfx, $2)/emos) ||
-		($$x =~ s/^index ($OID_BLOB)(\.\.$OID_NULL)\b/
-			'index ' . oid($dctx, $spfx, $1) . $2/emos)) {
-	} elsif ($$x =~ /^index ($OID_BLOB)\.\.($OID_BLOB)/mos) {
+	if (($$x =~ s/$NULL_TO_BLOB/$1 . oid($dctx, $spfx, $2)/e) ||
+		($$x =~ s/$BLOB_TO_NULL/
+			'index ' . oid($dctx, $spfx, $1) . $2/e)) {
+	} elsif ($$x =~ $BLOB_TO_BLOB) {
 		# modification-only, not add/delete:
 		# linkify hunk headers later using oid_a and oid_b
 		@$dctx{qw(oid_a oid_b)} = ($1, $2);
 	} else {
 		warn "BUG? <$$x> had no ^index line";
 	}
-	$$x =~ s!^diff --git!anchor1($ctx, $pb) // 'diff --git'!emos;
+	$$x =~ s!^diff --git!anchor1($ctx, $pb) // 'diff --git'!ems;
 	$$dst .= qq(<span\nclass="head">);
 	$$dst .= $$x;
 	$$dst .= '</span>';
@@ -174,17 +189,7 @@ sub diff_before_or_after ($$$) {
 sub flush_diff ($$$) {
 	my ($dst, $ctx, $cur) = @_;
 
-	my @top = split(/(
-		(?:	# begin header stuff, don't capture filenames, here,
-			# but instead wait for the --- and +++ lines.
-			(?:^diff\x20--git\x20$FN\x20$FN$LF)
-
-			# old mode || new mode || copy|rename|deleted|...
-			(?:^[a-z]$ANY+$LF)*
-		)? # end of optional stuff, everything below is required
-		^index\x20($OID_BLOB)\.\.($OID_BLOB)$ANY*$LF
-		^---\x20($FN)$LF
-		^\+{3}\x20($FN)$LF)/smxo, $$cur);
+	my @top = split($EXTRACT_DIFFS, $$cur);
 	$$cur = undef;
 
 	my $linkify = $ctx->{-linkify};
@@ -192,8 +197,8 @@ sub flush_diff ($$$) {
 
 	while (defined(my $x = shift @top)) {
 		if (scalar(@top) >= 4 &&
-				$top[1] =~ /\A$OID_BLOB\z/os &&
-				$top[0] =~ /\A$OID_BLOB\z/os) {
+				$top[1] =~ $IS_OID &&
+				$top[0] =~ $IS_OID) {
 			$dctx = diff_header($dst, \$x, $ctx, \@top);
 		} elsif ($dctx) {
 			my $after = '';

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 0/2] wwwlisting: fixup warnings :x
  2020-03-19  8:32 ` [PATCH 3/6] wwwlisting: avoid lazy loading JSON module Eric Wong
@ 2020-03-21  1:10   ` Eric Wong
  2020-03-21  1:10     ` [PATCH 1/2] wwwlisting: use first successfully loaded JSON module Eric Wong
  2020-03-21  1:10     ` [PATCH 2/2] t/www_listing: avoid 'once' warnings Eric Wong
  0 siblings, 2 replies; 13+ messages in thread
From: Eric Wong @ 2020-03-21  1:10 UTC (permalink / raw)
  To: meta

I made tests run so quickly that I missed some warnings :x

Eric Wong (2):
  wwwlisting: use first successfully loaded JSON module
  t/www_listing: avoid 'once' warnings

 lib/PublicInbox/WwwListing.pm | 2 +-
 t/www_listing.t               | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] wwwlisting: use first successfully loaded JSON module
  2020-03-21  1:10   ` [PATCH 0/2] wwwlisting: fixup warnings :x Eric Wong
@ 2020-03-21  1:10     ` Eric Wong
  2020-03-21  1:10     ` [PATCH 2/2] t/www_listing: avoid 'once' warnings Eric Wong
  1 sibling, 0 replies; 13+ messages in thread
From: Eric Wong @ 2020-03-21  1:10 UTC (permalink / raw)
  To: meta

And not the last...

I only noticed this since JSON::PP::Boolean was spewing
redefinition warnings via overload.pm

Fixes: 8fb8fc52420ef669 ("wwwlisting: avoid lazy loading JSON module")
---
 lib/PublicInbox/WwwListing.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/PublicInbox/WwwListing.pm b/lib/PublicInbox/WwwListing.pm
index 33cb0ace..42a0c0d8 100644
--- a/lib/PublicInbox/WwwListing.pm
+++ b/lib/PublicInbox/WwwListing.pm
@@ -20,7 +20,7 @@ if (eval { require IO::Compress::Gzip }) {
 	for my $mod (qw(JSON::MaybeXS JSON JSON::PP)) {
 		eval "require $mod" or next;
 		# ->ascii encodes non-ASCII to "\uXXXX"
-		$json = $mod->new->ascii(1);
+		$json = $mod->new->ascii(1) and last;
 	}
 }
 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] t/www_listing: avoid 'once' warnings
  2020-03-21  1:10   ` [PATCH 0/2] wwwlisting: fixup warnings :x Eric Wong
  2020-03-21  1:10     ` [PATCH 1/2] wwwlisting: use first successfully loaded JSON module Eric Wong
@ 2020-03-21  1:10     ` Eric Wong
  2020-03-21  5:24       ` [PATCH v2] " Eric Wong
  1 sibling, 1 reply; 13+ messages in thread
From: Eric Wong @ 2020-03-21  1:10 UTC (permalink / raw)
  To: meta

We reach into the WwwListing package directly to retrieve
that JSON encoder/decoder object, and we can't rely on `use'
since WwwListing loading may fail if Plack is missing.
---
 t/www_listing.t | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/www_listing.t b/t/www_listing.t
index 39c19577..1b29dd88 100644
--- a/t/www_listing.t
+++ b/t/www_listing.t
@@ -8,7 +8,7 @@ use PublicInbox::Spawn qw(which);
 use PublicInbox::TestCommon;
 require_mods(qw(URI::Escape Plack::Builder Digest::SHA
 		IO::Compress::Gzip IO::Uncompress::Gunzip HTTP::Tiny));
-require PublicInbox::WwwListing;
+use PublicInbox::WwwListing;
 my $json = $PublicInbox::WwwListing::json or
 	plan skip_all => "JSON module missing";
 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2] t/www_listing: avoid 'once' warnings
  2020-03-21  1:10     ` [PATCH 2/2] t/www_listing: avoid 'once' warnings Eric Wong
@ 2020-03-21  5:24       ` Eric Wong
  0 siblings, 0 replies; 13+ messages in thread
From: Eric Wong @ 2020-03-21  5:24 UTC (permalink / raw)
  To: meta

We reach into the WwwListing package directly to retrieve
that JSON encoder/decoder object, and we can't rely on `use'
since WwwListing loading may fail if Plack is missing.
---
 *sigh* v1 was wrong :x

 t/www_listing.t | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/t/www_listing.t b/t/www_listing.t
index 39c19577..9230329c 100644
--- a/t/www_listing.t
+++ b/t/www_listing.t
@@ -9,8 +9,10 @@ use PublicInbox::TestCommon;
 require_mods(qw(URI::Escape Plack::Builder Digest::SHA
 		IO::Compress::Gzip IO::Uncompress::Gunzip HTTP::Tiny));
 require PublicInbox::WwwListing;
-my $json = $PublicInbox::WwwListing::json or
-	plan skip_all => "JSON module missing";
+my $json = do {
+	no warnings 'once';
+	$PublicInbox::WwwListing::json;
+} or plan skip_all => "JSON module missing";
 
 use_ok 'PublicInbox::Git';
 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Encode preloading
  2020-03-19  8:32 [PATCH 0/6] daemon: reduce fragmentation via preload Eric Wong
                   ` (5 preceding siblings ...)
  2020-03-19  8:32 ` [PATCH 6/6] viewdiff: favor `qr' to precompile regexps Eric Wong
@ 2020-04-21  8:52 ` Eric Wong
  2020-05-08  1:59   ` [PATCH] www: preload: load all encodings at startup Eric Wong
  6 siblings, 1 reply; 13+ messages in thread
From: Eric Wong @ 2020-04-21  8:52 UTC (permalink / raw)
  To: meta

[was: Subject: [PATCH 0/6] daemon: reduce fragmentation via preload

Eric Wong <e@yhbt.net> wrote:
> For long-lived daemons, perform immortal allocations as early as
> possible to reduce the likelyhood of heap fragmentation due to
> mixed-lifetime allocations happening once the process is fully
> loaded and serving requests, since per-request allocations
> should all be short-lived.

Encode also loads lazily...

I thought about having public-inbox-{index,mda,watch} track a
list of modules for WWW to preload, but that can't predict the
future.

There's no telling what emails will show up in the future which
trigger autoloading, so I think the only way to avoid
fragmentation is to preload everything which would get
autoloaded... Ugh, that even means Encode::EBCDIC and
Encode::Symbol (for dingbats!)

> On a side note, I'm wondering if WWW should just preload by
> default.  I'm not sure if anybody uses public-inbox.cgi (or
> should be using it :P).  It's not like we don't ship
> public-inbox-httpd; and any PSGI implementation could be used
> for smaller inboxes (or powerful-enough hardware).

Yeah, likely to happen.  Trying to optimize a CGI script for
startup time while depending on git, SQLite, and Xapian just
doesn't seem worth it.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] www: preload: load all encodings at startup
  2020-04-21  8:52 ` Encode preloading Eric Wong
@ 2020-05-08  1:59   ` Eric Wong
  0 siblings, 0 replies; 13+ messages in thread
From: Eric Wong @ 2020-05-08  1:59 UTC (permalink / raw)
  To: meta

Eric Wong <e@yhbt.net> wrote:
> Eric Wong <e@yhbt.net> wrote:
> > For long-lived daemons, perform immortal allocations as early as
> > possible to reduce the likelyhood of heap fragmentation due to
> > mixed-lifetime allocations happening once the process is fully
> > loaded and serving requests, since per-request allocations
> > should all be short-lived.
> 
> Encode also loads lazily...

-----------8<---------
Subject: [PATCH] www: preload: load all encodings at startup

Encode lazy-loads encodings on an as-needed basis.  This is
great for short-lived programs, but leads to fragmentation in
long-lived daemons where immortal allocations can get
interleaved with short-lived, per-request allocations.

Since we have no idea which encodings will be needed when
there's a constant flow of incoming mail, just preload
everything available at startup.
---
 lib/PublicInbox/WWW.pm | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/lib/PublicInbox/WWW.pm b/lib/PublicInbox/WWW.pm
index 275e509f..3a428218 100644
--- a/lib/PublicInbox/WWW.pm
+++ b/lib/PublicInbox/WWW.pm
@@ -141,6 +141,12 @@ sub call {
 # fragmentation since common allocators favor a large contiguous heap.
 sub preload {
 	my ($self) = @_;
+
+	# populate caches used by Encode internally, since emails
+	# may show up with any encoding.
+	require Encode;
+	Encode::find_encoding($_) for Encode->encodings(':all');
+
 	require PublicInbox::ExtMsg;
 	require PublicInbox::Feed;
 	require PublicInbox::View;

^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-05-08  1:59 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-03-19  8:32 [PATCH 0/6] daemon: reduce fragmentation via preload Eric Wong
2020-03-19  8:32 ` [PATCH 1/6] www: update ->preload for newer modules Eric Wong
2020-03-19  8:32 ` [PATCH 2/6] wwwlisting: favor "use" over require Eric Wong
2020-03-19  8:32 ` [PATCH 3/6] wwwlisting: avoid lazy loading JSON module Eric Wong
2020-03-21  1:10   ` [PATCH 0/2] wwwlisting: fixup warnings :x Eric Wong
2020-03-21  1:10     ` [PATCH 1/2] wwwlisting: use first successfully loaded JSON module Eric Wong
2020-03-21  1:10     ` [PATCH 2/2] t/www_listing: avoid 'once' warnings Eric Wong
2020-03-21  5:24       ` [PATCH v2] " Eric Wong
2020-03-19  8:32 ` [PATCH 4/6] www: avoid `state' usage to perform allocations up-front Eric Wong
2020-03-19  8:32 ` [PATCH 5/6] daemon: do more immortal allocations up front Eric Wong
2020-03-19  8:32 ` [PATCH 6/6] viewdiff: favor `qr' to precompile regexps Eric Wong
2020-04-21  8:52 ` Encode preloading Eric Wong
2020-05-08  1:59   ` [PATCH] www: preload: load all encodings at startup Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).