unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* httpd memory usage?
@ 2021-09-04 23:53 Eric Wong
  2021-09-27  7:10 ` Eric Wong
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Eric Wong @ 2021-09-04 23:53 UTC (permalink / raw)
  To: meta

Is it high for anybody else?  I'm not seeing it for small
instances (e.g. the instance that's powering
public-inbox.org/{git,meta} is fine, but with lots of entries my
lore mirror @ https://yhbt.net/lore/ the main sbrk heap seems to
be growing slowly without bounds.

And thats with MALLOC_MMAP_THRESHOLD_=131072 set to avoid
sbrk for large allocations (glibc 2.28 Debian buster).

I think I'll need to finish porting mwrap over to Perl/XS
(from Ruby/C)...

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: httpd memory usage?
  2021-09-04 23:53 httpd memory usage? Eric Wong
@ 2021-09-27  7:10 ` Eric Wong
  2021-10-04  0:07 ` [PATCH 0/2] www: fix ref cycles when threading extindex Eric Wong
  2021-11-04  0:17 ` httpd memory usage? Eric Wong
  2 siblings, 0 replies; 17+ messages in thread
From: Eric Wong @ 2021-09-27  7:10 UTC (permalink / raw)
  To: meta

Eric Wong <e@80x24.org> wrote:
> Is it high for anybody else?  I'm not seeing it for small
> instances (e.g. the instance that's powering
> public-inbox.org/{git,meta} is fine, but with lots of entries my
> lore mirror @ https://yhbt.net/lore/ the main sbrk heap seems to
> be growing slowly without bounds.

I suspect it's long-lived SQLite connections and associated
caches/etc (and having short-lived allocations mixed in with
long-term ones).

Anyways, I'm testing the below on yhbt.net/lore, but
there's other SQLite knobs worth investigating...
Preloading all DBs isn't realistic, either, due to the
potential number of inboxes.

diff --git a/lib/PublicInbox/Inbox.pm b/lib/PublicInbox/Inbox.pm
index 1d5fc708..8fcf6e81 100644
--- a/lib/PublicInbox/Inbox.pm
+++ b/lib/PublicInbox/Inbox.pm
@@ -43,7 +43,7 @@ sub cleanup_task () {
 		for my $git (@{$ibx->{-repo_objs}}) {
 			$again = 1 if $git->cleanup;
 		}
-		check_inodes($ibx);
+		delete @$ibx{qw(over mm)};
 		$next->{"$ibx"} = $ibx if $again;
 	}
 	$CLEANUP = $next;

> I think I'll need to finish porting mwrap over to Perl/XS
> (from Ruby/C)...

Yeah, still on my TODO list.  But right now lei IMAP(S) on
high-latency links is a terrible...

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 0/2] www: fix ref cycles when threading extindex
  2021-09-04 23:53 httpd memory usage? Eric Wong
  2021-09-27  7:10 ` Eric Wong
@ 2021-10-04  0:07 ` Eric Wong
  2021-10-04  0:07   ` [PATCH 1/2] t/thread-cycle: make Email::Simple optional Eric Wong
                     ` (2 more replies)
  2021-11-04  0:17 ` httpd memory usage? Eric Wong
  2 siblings, 3 replies; 17+ messages in thread
From: Eric Wong @ 2021-10-04  0:07 UTC (permalink / raw)
  To: meta

I finally got Devel::Mwrap::PSGI working and fixed a
long-standing reference cycle.  AFAIK, it only affects
extindex users, not v2, and definitely not v1.

Eric Wong (2):
  t/thread-cycle: make Email::Simple optional
  www: fix ref cycle from threading w/ extindex

 lib/PublicInbox/SearchThread.pm | 104 +++++++++++++++++---------------
 t/thread-cycle.t                |  50 ++++++++++-----
 2 files changed, 88 insertions(+), 66 deletions(-)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/2] t/thread-cycle: make Email::Simple optional
  2021-10-04  0:07 ` [PATCH 0/2] www: fix ref cycles when threading extindex Eric Wong
@ 2021-10-04  0:07   ` Eric Wong
  2021-10-04  0:07   ` [PATCH 2/2] www: fix ref cycle from threading w/ extindex Eric Wong
  2021-10-04 22:51   ` [PATCH 0/2] www: fix ref cycles when threading extindex Eric Wong
  2 siblings, 0 replies; 17+ messages in thread
From: Eric Wong @ 2021-10-04  0:07 UTC (permalink / raw)
  To: meta

We only use it if Mail::Thread is available, and often it's not.
---
 t/thread-cycle.t | 29 ++++++++++++++---------------
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/t/thread-cycle.t b/t/thread-cycle.t
index 613c142e65cd..4b47c01c37c1 100644
--- a/t/thread-cycle.t
+++ b/t/thread-cycle.t
@@ -1,19 +1,16 @@
 # Copyright (C) 2016-2021 all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
-use strict;
-use warnings;
-use Test::More;
-use PublicInbox::TestCommon;
-require_mods 'Email::Simple';
+use strict; use v5.10.1; use PublicInbox::TestCommon;
 use_ok('PublicInbox::SearchThread');
 my $mt = eval {
 	require Mail::Thread;
 	no warnings 'once';
 	$Mail::Thread::nosubject = 1;
 	$Mail::Thread::noprune = 1;
+	require Email::Simple; # required by Mail::Thread (via Email::Abstract)
 };
 
-sub make_objs {
+my $make_objs = sub {
 	my @simples;
 	my $n = 0;
 	my @msgs = map {
@@ -21,17 +18,19 @@ sub make_objs {
 		$msg->{ds} ||= ++$n;
 		$msg->{references} =~ s/\s+/ /sg if $msg->{references};
 		$msg->{blob} = '0'x40; # any dummy value will do, here
-		my $simple = Email::Simple->create(header => [
-			'Message-ID' => "<$msg->{mid}>",
-			'References' => $msg->{references},
-		]);
-		push @simples, $simple;
+		if ($mt) {
+			my $simple = Email::Simple->create(header => [
+				'Message-ID' => "<$msg->{mid}>",
+				'References' => $msg->{references},
+			]);
+			push @simples, $simple;
+		}
 		bless $msg, 'PublicInbox::Smsg'
 	} @_;
 	(\@simples, \@msgs);
-}
+};
 
-my ($simples, $smsgs) = make_objs(
+my ($simples, $smsgs) = $make_objs->(
 # data from t/testbox-6 in Mail::Thread 2.55:
 	{ mid => '20021124145312.GA1759@nlin.net' },
 	{ mid => 'slrnau448m.7l4.markj+0111@cloaked.freeserve.co.uk',
@@ -79,13 +78,13 @@ my @backwards = (
 	{ mid => 8, references => '' }
 );
 
-($simples, $smsgs) = make_objs(@backwards);
+($simples, $smsgs) = $make_objs->(@backwards);
 my $backward = thread_to_s($smsgs);
 SKIP: {
 	skip 'Mail::Thread missing', 1 unless $mt;
 	check_mt($backward, $simples, 'matches Mail::Thread backwards');
 }
-($simples, $smsgs) = make_objs(reverse @backwards);
+($simples, $smsgs) = $make_objs->(reverse @backwards);
 my $forward = thread_to_s($smsgs);
 unless ('Mail::Thread sorts by Date') {
 	SKIP: {

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/2] www: fix ref cycle from threading w/ extindex
  2021-10-04  0:07 ` [PATCH 0/2] www: fix ref cycles when threading extindex Eric Wong
  2021-10-04  0:07   ` [PATCH 1/2] t/thread-cycle: make Email::Simple optional Eric Wong
@ 2021-10-04  0:07   ` Eric Wong
  2021-10-04 22:51   ` [PATCH 0/2] www: fix ref cycles when threading extindex Eric Wong
  2 siblings, 0 replies; 17+ messages in thread
From: Eric Wong @ 2021-10-04  0:07 UTC (permalink / raw)
  To: meta

Unlike v1 inboxes (which don't accept duplicate Message-IDs at
all), and v2 inboxes (which generate a new Message-ID for
duplicates), extindex must accept duplicate Message-IDs as-is.

This was fine for storage, but prevented the reference-cycle
mechanism of our message threading display algorithm from working
reliably.  It could no longer delete the ->{parent} field from
clobbered entries in the %id_table.

So we now take into account reused Message-IDs and never clobber
entries in %id_table.  Instead, we mark reused Message-IDs as
"imposters" and special-case them by injecting them as children
after all other threading is complete.

This cycle was noticed using a pre-release of Devel::Mwrap::PSGI:
  https://80x24.org/mwrap-perl.git
---
 lib/PublicInbox/SearchThread.pm | 104 +++++++++++++++++---------------
 t/thread-cycle.t                |  21 ++++++-
 2 files changed, 74 insertions(+), 51 deletions(-)

diff --git a/lib/PublicInbox/SearchThread.pm b/lib/PublicInbox/SearchThread.pm
index 8fb3a030aa72..507f25baab0e 100644
--- a/lib/PublicInbox/SearchThread.pm
+++ b/lib/PublicInbox/SearchThread.pm
@@ -24,70 +24,74 @@ use PublicInbox::MID qw($MID_EXTRACT);
 
 sub thread {
 	my ($msgs, $ordersub, $ctx) = @_;
+	my (%id_table, @imposters);
+	keys(%id_table) = scalar @$msgs; # pre-size
 
-	# A. put all current $msgs (non-ghosts) into %id_table
-	my %id_table = map {;
+	# A. put all current non-imposter $msgs (non-ghosts) into %id_table
+	# (imposters are messages with reused Message-IDs)
+	# Sadly, we sort here anyways since the fill-in-the-blanks References:
+	# can be shakier if somebody used In-Reply-To with multiple, disparate
+	# messages.  So, take the client Date: into account since we can't
+	# always determine ordering when somebody uses multiple In-Reply-To.
+	my @kids = sort { $a->{ds} <=> $b->{ds} } grep {
 		# this delete saves around 4K across 1K messages
 		# TODO: move this to a more appropriate place, breaks tests
 		# if we do it during psgi_cull
 		delete $_->{num};
 
-		$_->{mid} => PublicInbox::SearchThread::Msg::cast($_);
+		PublicInbox::SearchThread::Msg::cast($_);
+		if (exists $id_table{$_->{mid}}) {
+			$_->{children} = [];
+			push @imposters, $_; # we'll deal with them later
+			undef;
+		} else {
+			$id_table{$_->{mid}} = $_;
+			defined($_->{references});
+		}
 	} @$msgs;
+	for my $smsg (@kids) {
+		# This loop exists to help fill in gaps left from missing
+		# messages.  It is not needed in a perfect world where
+		# everything is perfectly referenced, only the last ref
+		# matters.
+		my $prev;
+		for my $ref ($smsg->{references} =~ m/$MID_EXTRACT/go) {
+			# Find a Container object for the given Message-ID
+			my $cont = $id_table{$ref} //=
+				PublicInbox::SearchThread::Msg::ghost($ref);
+
+			# Link the References field's Containers together in
+			# the order implied by the References header
+			#
+			# * If they are already linked don't change the
+			#   existing links
+			# * Do not add a link if adding that link would
+			#   introduce a loop...
+			if ($prev &&
+				!$cont->{parent} &&  # already linked
+				!$cont->has_descendent($prev) # would loop
+			   ) {
+				$prev->add_child($cont);
+			}
+			$prev = $cont;
+		}
 
-	# Sadly, we sort here anyways since the fill-in-the-blanks References:
-	# can be shakier if somebody used In-Reply-To with multiple, disparate
-	# messages.  So, take the client Date: into account since we can't
-	# always determine ordering when somebody uses multiple In-Reply-To.
-	# We'll trust the client Date: header here instead of the Received:
-	# time since this is for display (and not retrieval)
-	_set_parent(\%id_table, $_) for sort { $a->{ds} <=> $b->{ds} } @$msgs;
+		# C. Set the parent of this message to be the last element in
+		# References.
+		if (defined $prev && !$smsg->has_descendent($prev)) {
+			$prev->add_child($smsg);
+		}
+	}
 	my $ibx = $ctx->{ibx};
-	my $rootset = [ grep {
+	my $rootset = [ grep { # n.b.: delete prevents cyclic refs
 			!delete($_->{parent}) && $_->visible($ibx)
 		} values %id_table ];
 	$rootset = $ordersub->($rootset);
 	$_->order_children($ordersub, $ctx) for @$rootset;
-	$rootset;
-}
 
-sub _set_parent ($$) {
-	my ($id_table, $this) = @_;
-
-	# B. For each element in the message's References field:
-	defined(my $refs = $this->{references}) or return;
-
-	# This loop exists to help fill in gaps left from missing
-	# messages.  It is not needed in a perfect world where
-	# everything is perfectly referenced, only the last ref
-	# matters.
-	my $prev;
-	foreach my $ref ($refs =~ m/$MID_EXTRACT/go) {
-		# Find a Container object for the given Message-ID
-		my $cont = $id_table->{$ref} //=
-			PublicInbox::SearchThread::Msg::ghost($ref);
-
-		# Link the References field's Containers together in
-		# the order implied by the References header
-		#
-		# * If they are already linked don't change the
-		#   existing links
-		# * Do not add a link if adding that link would
-		#   introduce a loop...
-		if ($prev &&
-			!$cont->{parent} &&  # already linked
-			!$cont->has_descendent($prev) # would loop
-		   ) {
-			$prev->add_child($cont);
-		}
-		$prev = $cont;
-	}
-
-	# C. Set the parent of this message to be the last element in
-	# References.
-	if (defined $prev && !$this->has_descendent($prev)) { # would loop
-		$prev->add_child($this);
-	}
+	# parent imposter messages with reused Message-IDs
+	unshift(@{$id_table{$_->{mid}}->{children}}, $_) for @imposters;
+	$rootset;
 }
 
 package PublicInbox::SearchThread::Msg;
diff --git a/t/thread-cycle.t b/t/thread-cycle.t
index 4b47c01c37c1..e89b18464a5f 100644
--- a/t/thread-cycle.t
+++ b/t/thread-cycle.t
@@ -96,7 +96,26 @@ if ('sorting by Date') {
 	is("\n".$backward, "\n".$forward, 'forward and backward matches');
 }
 
-done_testing();
+SKIP: {
+	require_mods 'Devel::Cycle', 1;
+	Devel::Cycle->import('find_cycle');
+	my @dup = (
+		{ mid => 5, references => '<6>' },
+		{ mid => 5, references => '<6> <1>' },
+	);
+	open my $fh, '+>', \(my $out = '') or xbail "open: $!";
+	(undef, $smsgs) = $make_objs->(@dup);
+	eval 'package EmptyInbox; sub smsg_by_mid { undef }';
+	my $ctx = { ibx => bless {}, 'EmptyInbox' };
+	my $rootset = PublicInbox::SearchThread::thread($smsgs, sub {
+		[ sort { $a->{mid} cmp $b->{mid} } @{$_[0]} ] }, $ctx);
+	my $oldout = select $fh;
+	find_cycle($rootset);
+	select $oldout;
+	is($out, '', 'nothing from find_cycle');
+} # Devel::Cycle check
+
+done_testing;
 
 sub thread_to_s {
 	my ($msgs) = @_;

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2] www: fix ref cycles when threading extindex
  2021-10-04  0:07 ` [PATCH 0/2] www: fix ref cycles when threading extindex Eric Wong
  2021-10-04  0:07   ` [PATCH 1/2] t/thread-cycle: make Email::Simple optional Eric Wong
  2021-10-04  0:07   ` [PATCH 2/2] www: fix ref cycle from threading w/ extindex Eric Wong
@ 2021-10-04 22:51   ` Eric Wong
  2021-10-05 11:33     ` Encode.pm leak Eric Wong
  2 siblings, 1 reply; 17+ messages in thread
From: Eric Wong @ 2021-10-04 22:51 UTC (permalink / raw)
  To: meta

That was most of it.  https://bugs.debian.org/995738 has some more.
I suspect there's other leaks, too :<

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Encode.pm leak
  2021-10-04 22:51   ` [PATCH 0/2] www: fix ref cycles when threading extindex Eric Wong
@ 2021-10-05 11:33     ` Eric Wong
  2021-10-12 10:59       ` Encode.pm leak in v2.87..v3.12 Eric Wong
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Wong @ 2021-10-05 11:33 UTC (permalink / raw)
  To: meta

The leak's been there for a while, but I couldn't find
other reports of it:

   https://rt.cpan.org/Public/Bug/Display.html?id=139622

Anybody else want to try this and see which versions of Encode
or Perl hit this leak?  Encode is bundled with Perl, but at
least Debian also provides an optional standalone package, too.

In the meantime, I may need to put in my own workaround for it
PublicInbox::Eml because distros are slow moving...

----8<----
#!perl -w
use v5.10.1;
use Encode qw(decode);
say "Encode::VERSION=$Encode::VERSION";
my $blob = "\xef\xbf\xbd" x 1000;
my $nr = $ENV{NR} || 10000; # higher => more memory
my $warn = $ENV{FB_WARN};
for (0..$nr) {
	if ($warn) {
		decode('us-ascii', $blob, Encode::FB_WARN);
	} else {
		eval { decode('us-ascii', $blob, Encode::FB_CROAK) };
	}
}

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Encode.pm leak in v2.87..v3.12
  2021-10-05 11:33     ` Encode.pm leak Eric Wong
@ 2021-10-12 10:59       ` Eric Wong
  2021-10-13 10:16         ` [PATCH 0/7] workaround Encode leak, several test fixes Eric Wong
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Wong @ 2021-10-12 10:59 UTC (permalink / raw)
  To: meta

Eric Wong <e@80x24.org> wrote:
> The leak's been there for a while, but I couldn't find
> other reports of it:
> 
>    https://rt.cpan.org/Public/Bug/Display.html?id=139622

So this got fixed in Encode 3.13, and 3.15 (3.14 segfaulted :P).
Perl v5.36 will have 3.15 (or possibly a newer version).  This
has been a leak since 2016 (2.87), so there might be lot of
people affected...

CentOS 7.x has Encode 2.51 so it's probably not affected by
this particular bug...

> In the meantime, I may need to put in my own workaround for it
> PublicInbox::Eml because distros are slow moving...

It's tracked for Debian, at least https://bugs.debian.org/995804

Our "eval { $eml->body_str }" usage is kinda dumb, so maybe it
could be done better w/o needing eval at all...

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 0/7] workaround Encode leak, several test fixes
  2021-10-12 10:59       ` Encode.pm leak in v2.87..v3.12 Eric Wong
@ 2021-10-13 10:16         ` Eric Wong
  2021-10-13 10:16           ` [PATCH 1/7] xt/perf-msgview: drop unnecessary use_ok Eric Wong
                             ` (6 more replies)
  0 siblings, 7 replies; 17+ messages in thread
From: Eric Wong @ 2021-10-13 10:16 UTC (permalink / raw)
  To: meta

Eric Wong (7):
  xt/perf-msgview: drop unnecessary use_ok
  test_common: hoist out tail_f sub
  t/www_listing: require opt-in for grokmirror tests
  eml: avoid Encode 2.87..3.12 leak
  t/lei-mirror: avoid reading ~/.public-inbox/config in test
  t/git: avoid "once" warning for async_warn
  t/nntpd-tls: change diag() to like() assertion

 lib/PublicInbox/Eml.pm        | 25 +++++++++-----
 lib/PublicInbox/TestCommon.pm | 63 ++++++++++++++++++++---------------
 t/git.t                       |  3 +-
 t/lei-mirror.t                |  1 +
 t/nntpd-tls.t                 |  3 +-
 t/www_listing.t               | 20 +++++++----
 xt/perf-msgview.t             |  1 -
 7 files changed, 70 insertions(+), 46 deletions(-)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/7] xt/perf-msgview: drop unnecessary use_ok
  2021-10-13 10:16         ` [PATCH 0/7] workaround Encode leak, several test fixes Eric Wong
@ 2021-10-13 10:16           ` Eric Wong
  2021-10-13 10:16           ` [PATCH 2/7] test_common: hoist out tail_f sub Eric Wong
                             ` (5 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Eric Wong @ 2021-10-13 10:16 UTC (permalink / raw)
  To: meta

require_mods covers it, and we're not testing Plack itself.
---
 xt/perf-msgview.t | 1 -
 1 file changed, 1 deletion(-)

diff --git a/xt/perf-msgview.t b/xt/perf-msgview.t
index bc73fcce..cf550c1a 100644
--- a/xt/perf-msgview.t
+++ b/xt/perf-msgview.t
@@ -21,7 +21,6 @@ if (require_git(2.19, 1)) {
 "git <2.19, cat-file lacks --unordered, locality suffers\n";
 }
 require_mods qw(Plack::Util);
-use_ok 'Plack::Util';
 my $ibx = PublicInbox::Inbox->new({ inboxdir => $inboxdir, name => 'name' });
 my $git = $ibx->git;
 my $fh = $blob ? undef : $git->popen(@cat);

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/7] test_common: hoist out tail_f sub
  2021-10-13 10:16         ` [PATCH 0/7] workaround Encode leak, several test fixes Eric Wong
  2021-10-13 10:16           ` [PATCH 1/7] xt/perf-msgview: drop unnecessary use_ok Eric Wong
@ 2021-10-13 10:16           ` Eric Wong
  2021-10-13 10:16           ` [PATCH 3/7] t/www_listing: require opt-in for grokmirror tests Eric Wong
                             ` (4 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Eric Wong @ 2021-10-13 10:16 UTC (permalink / raw)
  To: meta

We'll be reusing this in more places.  While we're at it, allow
it to tail all run_script() users, including lei() in TestCommon.
---
 lib/PublicInbox/TestCommon.pm | 63 ++++++++++++++++++++---------------
 1 file changed, 37 insertions(+), 26 deletions(-)

diff --git a/lib/PublicInbox/TestCommon.pm b/lib/PublicInbox/TestCommon.pm
index cd706e0e..57f1db95 100644
--- a/lib/PublicInbox/TestCommon.pm
+++ b/lib/PublicInbox/TestCommon.pm
@@ -12,13 +12,14 @@ use IO::Socket::INET;
 use File::Spec;
 our @EXPORT;
 my $lei_loud = $ENV{TEST_LEI_ERR_LOUD};
+my $tail_cmd = $ENV{TAIL};
 our ($lei_opt, $lei_out, $lei_err, $lei_cwdfh);
 BEGIN {
 	@EXPORT = qw(tmpdir tcp_server tcp_connect require_git require_mods
 		run_script start_script key2sub xsys xsys_e xqx eml_load tick
 		have_xapian_compact json_utf8 setup_public_inboxes create_inbox
 		tcp_host_port test_lei lei lei_ok $lei_out $lei_err $lei_opt
-		test_httpd xbail require_cmd is_xdeeply);
+		test_httpd xbail require_cmd is_xdeeply tail_f);
 	require Test::More;
 	my @methods = grep(!/\W/, @Test::More::EXPORT);
 	eval(join('', map { "*$_=\\&Test::More::$_;" } @methods));
@@ -280,11 +281,20 @@ sub run_script ($;$$) {
 	my $sub = $run_mode == 0 ? undef : key2sub($key);
 	my $fhref = [];
 	my $spawn_opt = {};
+	my @tail_paths;
 	for my $fd (0..2) {
 		my $redir = $opt->{$fd};
 		my $ref = ref($redir);
 		if ($ref eq 'SCALAR') {
-			open my $fh, '+>', undef or die "open: $!";
+			my $fh;
+			if ($tail_cmd && $ENV{TAIL_ALL} && $fd > 0) {
+				require File::Temp;
+				$fh = File::Temp->new("fd.$fd-XXXX", TMPDIR=>1);
+				push @tail_paths, $fh->filename;
+			} else {
+				open $fh, '+>', undef;
+			}
+			$fh or xbail $!;
 			$fhref->[$fd] = $fh;
 			$spawn_opt->{$fd} = $fh;
 			next if $fd > 0;
@@ -297,6 +307,7 @@ sub run_script ($;$$) {
 			die "unable to deal with $ref $redir";
 		}
 	}
+	my $tail = @tail_paths ? tail_f(@tail_paths) : undef;
 	if ($key =~ /-(index|convert|extindex|convert|xcpdb)\z/) {
 		unshift @argv, '--no-fsync';
 	}
@@ -337,6 +348,7 @@ sub run_script ($;$$) {
 		umask($umask);
 	}
 
+	{ local $?; undef $tail };
 	# slurp the redirects back into user-supplied strings
 	for my $fd (1..2) {
 		my $fh = $fhref->[$fd] or next;
@@ -355,13 +367,13 @@ sub tick (;$) {
 	1;
 }
 
-sub wait_for_tail ($;$) {
+sub wait_for_tail {
 	my ($tail_pid, $want) = @_;
-	my $wait = 2;
+	my $wait = 2; # "tail -F" sleeps 1.0s at-a-time w/o inotify/kevent
 	if ($^O eq 'linux') { # GNU tail may use inotify
 		state $tail_has_inotify;
-		return tick if $want < 0 && $tail_has_inotify;
-		my $end = time + $wait;
+		return tick if !$want && $tail_has_inotify; # before TERM
+		my $end = time + $wait; # wait for startup:
 		my @ino;
 		do {
 			@ino = grep {
@@ -410,13 +422,23 @@ sub xqx {
 	wantarray ? split(/^/m, $out) : $out;
 }
 
+sub tail_f (@) {
+	$tail_cmd or return; # "tail -F" or "tail -f"
+	for (@_) { open(my $fh, '>>', $_) or die $! };
+	my $cmd = [ split(/ /, $tail_cmd), @_ ];
+	require PublicInbox::Spawn;
+	my $pid = PublicInbox::Spawn::spawn($cmd, undef, { 1 => 2 });
+	wait_for_tail($pid, scalar @_);
+	PublicInboxTestProcess->new($pid, \&wait_for_tail);
+}
+
 sub start_script {
 	my ($cmd, $env, $opt) = @_;
 	my ($key, @argv) = @$cmd;
 	my $run_mode = $ENV{TEST_RUN_MODE} // $opt->{run_mode} // 2;
 	my $sub = $run_mode == 0 ? undef : key2sub($key);
-	my $tail_pid;
-	if (my $tail_cmd = $ENV{TAIL}) {
+	my $tail;
+	if ($tail_cmd) {
 		my @paths;
 		for (@argv) {
 			next unless /\A--std(?:err|out)=(.+)\z/;
@@ -434,17 +456,7 @@ sub start_script {
 				}
 			}
 		}
-		if (@paths) {
-			$tail_pid = fork // die "fork: $!";
-			if ($tail_pid == 0) {
-				# make sure files exist, first
-				open my $fh, '>>', $_ for @paths;
-				open(STDOUT, '>&STDERR') or die "1>&2: $!";
-				exec(split(' ', $tail_cmd), @paths);
-				die "$tail_cmd failed: $!";
-			}
-			wait_for_tail($tail_pid, scalar @paths);
-		}
+		$tail = tail_f(@paths);
 	}
 	my $pid = fork // die "fork: $!\n";
 	if ($pid == 0) {
@@ -480,7 +492,9 @@ sub start_script {
 			die "FAIL: ",join(' ', $key, @argv), ": $!\n";
 		}
 	}
-	PublicInboxTestProcess->new($pid, $tail_pid);
+	my $td = PublicInboxTestProcess->new($pid);
+	$td->{-extra} = $tail;
+	$td;
 }
 
 # favor lei() or lei_ok() over $lei for new code
@@ -735,8 +749,8 @@ use strict;
 sub CLONE_SKIP { 1 }
 
 sub new {
-	my ($klass, $pid, $tail_pid) = @_;
-	bless { pid => $pid, tail_pid => $tail_pid, owner => $$ }, $klass;
+	my ($cls, $pid, $cb) = @_;
+	bless { pid => $pid, cb => $cb, owner => $$ }, $cls;
 }
 
 sub kill {
@@ -747,6 +761,7 @@ sub kill {
 sub join {
 	my ($self, $sig) = @_;
 	my $pid = delete $self->{pid} or return;
+	$self->{cb}->() if defined $self->{cb};
 	CORE::kill($sig, $pid) if defined $sig;
 	my $ret = waitpid($pid, 0) // die "waitpid($pid): $!";
 	$ret == $pid or die "waitpid($pid) != $ret";
@@ -755,10 +770,6 @@ sub join {
 sub DESTROY {
 	my ($self) = @_;
 	return if $self->{owner} != $$;
-	if (my $tail_pid = delete $self->{tail_pid}) {
-		PublicInbox::TestCommon::wait_for_tail($tail_pid, -1);
-		CORE::kill('TERM', $tail_pid);
-	}
 	$self->join('TERM');
 }
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/7] t/www_listing: require opt-in for grokmirror tests
  2021-10-13 10:16         ` [PATCH 0/7] workaround Encode leak, several test fixes Eric Wong
  2021-10-13 10:16           ` [PATCH 1/7] xt/perf-msgview: drop unnecessary use_ok Eric Wong
  2021-10-13 10:16           ` [PATCH 2/7] test_common: hoist out tail_f sub Eric Wong
@ 2021-10-13 10:16           ` Eric Wong
  2021-10-13 10:16           ` [PATCH 4/7] eml: avoid Encode 2.87..3.12 leak Eric Wong
                             ` (3 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Eric Wong @ 2021-10-13 10:16 UTC (permalink / raw)
  To: meta

grokmirror 2.x seems to idle in several places for 5s at-a-time,
causing t/www_listing.t to take longer than "make check-run" on
a 4-core system when run without grokmirror.  So make it
optional but add some test knobs to allow tailing the log
output so I can see what's going on.
---
 t/www_listing.t | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/t/www_listing.t b/t/www_listing.t
index eb77969b..c556a2d7 100644
--- a/t/www_listing.t
+++ b/t/www_listing.t
@@ -130,27 +130,31 @@ SKIP: {
 	tiny_test($json, $host, $port, 1);
 	undef $sock;
 
+	skip 'TEST_GROK unset', 12 unless $ENV{TEST_GROK};
 	my $grok_pull = require_cmd('grok-pull', 1) or
 		skip('grok-pull not available', 12);
 	my ($grok_version) = (xqx([$grok_pull, "--version"])
 			=~ /(\d+)\.(?:\d+)(?:\.(\d+))?/);
 	$grok_version >= 2 or
 		skip('grok-pull v2 or later not available', 12);
+	my $grok_loglevel = $ENV{TEST_GROK_LOGLEVEL} // 'info';
 
 	ok(mkdir("$tmpdir/mirror"), 'prepare grok mirror dest');
-	open $fh, '>', "$tmpdir/repos.conf" or die;
-	print $fh <<"" or die;
+	my $tail = tail_f("$tmpdir/grok.log");
+	open $fh, '>', "$tmpdir/repos.conf" or xbail $!;
+	print $fh <<"" or xbail $!;
 [core]
 toplevel = $tmpdir/mirror
 manifest = $tmpdir/local-manifest.js.gz
+log = $tmpdir/grok.log
+loglevel = $grok_loglevel
 [remote]
 site = http://$host:$port
 manifest = \${site}/manifest.js.gz
 [pull]
 [fsck]
 
-	close $fh or die;
-
+	close $fh or xbail $!;
 	xsys($grok_pull, '-c', "$tmpdir/repos.conf");
 	is($? >> 8, 0, 'grok-pull exit code as expected');
 	for (qw(alt bare v2/git/0.git v2/git/1.git v2/git/2.git)) {
@@ -159,18 +163,20 @@ manifest = \${site}/manifest.js.gz
 
 	# support per-inbox manifests, handy for v2:
 	# /$INBOX/v2/manifest.js.gz
-	open $fh, '>', "$tmpdir/per-inbox.conf" or die;
-	print $fh <<"" or die;
+	open $fh, '>', "$tmpdir/per-inbox.conf" or xbail $!;
+	print $fh <<"" or xbail $!;
 [core]
 toplevel = $tmpdir/per-inbox
 manifest = $tmpdir/per-inbox-manifest.js.gz
+log = $tmpdir/grok.log
+loglevel = $grok_loglevel
 [remote]
 site = http://$host:$port
 manifest = \${site}/v2/manifest.js.gz
 [pull]
 [fsck]
 
-	close $fh or die;
+	close $fh or xbail $!;
 	ok(mkdir("$tmpdir/per-inbox"), 'prepare single-v2-inbox mirror');
 	xsys($grok_pull, '-c', "$tmpdir/per-inbox.conf");
 	is($? >> 8, 0, 'grok-pull exit code as expected');

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/7] eml: avoid Encode 2.87..3.12 leak
  2021-10-13 10:16         ` [PATCH 0/7] workaround Encode leak, several test fixes Eric Wong
                             ` (2 preceding siblings ...)
  2021-10-13 10:16           ` [PATCH 3/7] t/www_listing: require opt-in for grokmirror tests Eric Wong
@ 2021-10-13 10:16           ` Eric Wong
  2021-10-13 10:16           ` [PATCH 5/7] t/lei-mirror: avoid reading ~/.public-inbox/config in test Eric Wong
                             ` (2 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Eric Wong @ 2021-10-13 10:16 UTC (permalink / raw)
  To: meta

Encode::FB_CROAK leaks memory in old versions of Encode:
<https://rt.cpan.org/Public/Bug/Display.html?id=139622>

Since I expect there's still many users on old systems and old
Perls, we can use "$SIG{__WARN__} = \&croak" here with
Encode::FB_WARN to emulate Encode::FB_CROAK behavior.
---
 lib/PublicInbox/Eml.pm | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/lib/PublicInbox/Eml.pm b/lib/PublicInbox/Eml.pm
index 0867a016..69c26932 100644
--- a/lib/PublicInbox/Eml.pm
+++ b/lib/PublicInbox/Eml.pm
@@ -28,7 +28,7 @@ package PublicInbox::Eml;
 use strict;
 use v5.10.1;
 use Carp qw(croak);
-use Encode qw(find_encoding decode encode); # stdlib
+use Encode qw(find_encoding); # stdlib
 use Text::Wrap qw(wrap); # stdlib, we need Perl 5.6+ for $huge
 use MIME::Base64 3.05; # Perl 5.10.0 / 5.9.2
 use MIME::QuotedPrint 3.05; # ditto
@@ -334,9 +334,14 @@ sub body_set {
 
 sub body_str_set {
 	my ($self, $body_str) = @_;
-	my $charset = ct($self)->{attributes}->{charset} or
+	my $cs = ct($self)->{attributes}->{charset} //
 		croak('body_str was given, but no charset is defined');
-	body_set($self, \(encode($charset, $body_str, Encode::FB_CROAK)));
+	my $enc = find_encoding($cs) // croak "unknown encoding `$cs'";
+	$body_str = do {
+		local $SIG{__WARN__} = \&croak;
+		$enc->encode($body_str, Encode::FB_WARN);
+	};
+	body_set($self, \$body_str);
 }
 
 sub content_type { scalar header($_[0], 'Content-Type') }
@@ -452,15 +457,17 @@ sub body {
 sub body_str {
 	my ($self) = @_;
 	my $ct = ct($self);
-	my $charset = $ct->{attributes}->{charset};
-	if (!$charset) {
-		if ($STR_TYPE{$ct->{type}} && $STR_SUBTYPE{$ct->{subtype}}) {
+	my $cs = $ct->{attributes}->{charset} // do {
+		($STR_TYPE{$ct->{type}} && $STR_SUBTYPE{$ct->{subtype}}) and
 			return body($self);
-		}
 		croak("can't get body as a string for ",
 			join("\n\t", header_raw($self, 'Content-Type')));
-	}
-	decode($charset, body($self), Encode::FB_CROAK);
+	};
+	my $enc = find_encoding($cs) or croak "unknown encoding `$cs'";
+	my $tmp = body($self);
+	# workaround https://rt.cpan.org/Public/Bug/Display.html?id=139622
+	local $SIG{__WARN__} = \&croak;
+	$enc->decode($tmp, Encode::FB_WARN);
 }
 
 sub as_string {

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/7] t/lei-mirror: avoid reading ~/.public-inbox/config in test
  2021-10-13 10:16         ` [PATCH 0/7] workaround Encode leak, several test fixes Eric Wong
                             ` (3 preceding siblings ...)
  2021-10-13 10:16           ` [PATCH 4/7] eml: avoid Encode 2.87..3.12 leak Eric Wong
@ 2021-10-13 10:16           ` Eric Wong
  2021-10-13 10:16           ` [PATCH 6/7] t/git: avoid "once" warning for async_warn Eric Wong
  2021-10-13 10:16           ` [PATCH 7/7] t/nntpd-tls: change diag() to like() assertion Eric Wong
  6 siblings, 0 replies; 17+ messages in thread
From: Eric Wong @ 2021-10-13 10:16 UTC (permalink / raw)
  To: meta

Oops, we shouldn't attempt to read a users' actual HOME when
running -index, since mine has a bunch of invalid entries in
there.
---
 t/lei-mirror.t | 1 +
 1 file changed, 1 insertion(+)

diff --git a/t/lei-mirror.t b/t/lei-mirror.t
index b449e0b4..294eb654 100644
--- a/t/lei-mirror.t
+++ b/t/lei-mirror.t
@@ -167,6 +167,7 @@ SKIP: {
 	my $after = [ glob("$d/t1/*") ];
 	is_deeply($before, $after, 'no new files created');
 
+	local $ENV{HOME} = $tmpdir;
 	ok(run_script([qw(-index -Lbasic), "$d/t1"]), 'index v1');
 	ok(run_script([qw(-index -Lbasic), "$d/t2"]), 'index v2');
 	my $f = "$d/t1/public-inbox/msgmap.sqlite3";

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 6/7] t/git: avoid "once" warning for async_warn
  2021-10-13 10:16         ` [PATCH 0/7] workaround Encode leak, several test fixes Eric Wong
                             ` (4 preceding siblings ...)
  2021-10-13 10:16           ` [PATCH 5/7] t/lei-mirror: avoid reading ~/.public-inbox/config in test Eric Wong
@ 2021-10-13 10:16           ` Eric Wong
  2021-10-13 10:16           ` [PATCH 7/7] t/nntpd-tls: change diag() to like() assertion Eric Wong
  6 siblings, 0 replies; 17+ messages in thread
From: Eric Wong @ 2021-10-13 10:16 UTC (permalink / raw)
  To: meta

No point in testing use_ok when we have no outside dependencies
nor exports in this case.
---
 t/git.t | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/t/git.t b/t/git.t
index 8a020211..fa541f41 100644
--- a/t/git.t
+++ b/t/git.t
@@ -6,8 +6,7 @@ use PublicInbox::TestCommon;
 my ($dir, $for_destroy) = tmpdir();
 use PublicInbox::Import;
 use POSIX qw(strftime);
-
-use_ok 'PublicInbox::Git';
+use PublicInbox::Git;
 
 {
 	PublicInbox::Import::init_bare($dir);

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 7/7] t/nntpd-tls: change diag() to like() assertion
  2021-10-13 10:16         ` [PATCH 0/7] workaround Encode leak, several test fixes Eric Wong
                             ` (5 preceding siblings ...)
  2021-10-13 10:16           ` [PATCH 6/7] t/git: avoid "once" warning for async_warn Eric Wong
@ 2021-10-13 10:16           ` Eric Wong
  6 siblings, 0 replies; 17+ messages in thread
From: Eric Wong @ 2021-10-13 10:16 UTC (permalink / raw)
  To: meta

This test wasn't finished when I initially wrote it :x
---
 t/nntpd-tls.t | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/t/nntpd-tls.t b/t/nntpd-tls.t
index d81d1e13..2a76867a 100644
--- a/t/nntpd-tls.t
+++ b/t/nntpd-tls.t
@@ -151,7 +151,8 @@ for my $args (
 			\'STARTTLS not used by default';
 		ok(!lei(qw(ls-mail-source -c nntp.starttls=true),
 			"nntp://$starttls_addr"), 'STARTTLS verify fails');
-		diag $lei_err;
+		like $lei_err, qr/STARTTLS requested/,
+			'STARTTLS noted in stderr';
 	});
 
 	SKIP: {

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: httpd memory usage?
  2021-09-04 23:53 httpd memory usage? Eric Wong
  2021-09-27  7:10 ` Eric Wong
  2021-10-04  0:07 ` [PATCH 0/2] www: fix ref cycles when threading extindex Eric Wong
@ 2021-11-04  0:17 ` Eric Wong
  2 siblings, 0 replies; 17+ messages in thread
From: Eric Wong @ 2021-11-04  0:17 UTC (permalink / raw)
  To: meta

Eric Wong <e@80x24.org> wrote:
> Is it high for anybody else?  I'm not seeing it for small
> instances (e.g. the instance that's powering
> public-inbox.org/{git,meta} is fine, but with lots of entries my
> lore mirror @ https://yhbt.net/lore/ the main sbrk heap seems to
> be growing slowly without bounds.

After the Encode leak workaround fix
<https://public-inbox.org/meta/20211016232301.20463-1-e@80x24.org/>
and a bunch of scratchpad reductions, it should be at an
all time low.

64-bit instances should be below 80 MB unless you're serving
gigantic messages/attachments (that are usually spam).

> I think I'll need to finish porting mwrap over to Perl/XS
> (from Ruby/C)...

It's working well enough: https://80x24.org/mwrap-perl.git
(still unreleased, though, might take a few more years :P)

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-11-04  0:17 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-04 23:53 httpd memory usage? Eric Wong
2021-09-27  7:10 ` Eric Wong
2021-10-04  0:07 ` [PATCH 0/2] www: fix ref cycles when threading extindex Eric Wong
2021-10-04  0:07   ` [PATCH 1/2] t/thread-cycle: make Email::Simple optional Eric Wong
2021-10-04  0:07   ` [PATCH 2/2] www: fix ref cycle from threading w/ extindex Eric Wong
2021-10-04 22:51   ` [PATCH 0/2] www: fix ref cycles when threading extindex Eric Wong
2021-10-05 11:33     ` Encode.pm leak Eric Wong
2021-10-12 10:59       ` Encode.pm leak in v2.87..v3.12 Eric Wong
2021-10-13 10:16         ` [PATCH 0/7] workaround Encode leak, several test fixes Eric Wong
2021-10-13 10:16           ` [PATCH 1/7] xt/perf-msgview: drop unnecessary use_ok Eric Wong
2021-10-13 10:16           ` [PATCH 2/7] test_common: hoist out tail_f sub Eric Wong
2021-10-13 10:16           ` [PATCH 3/7] t/www_listing: require opt-in for grokmirror tests Eric Wong
2021-10-13 10:16           ` [PATCH 4/7] eml: avoid Encode 2.87..3.12 leak Eric Wong
2021-10-13 10:16           ` [PATCH 5/7] t/lei-mirror: avoid reading ~/.public-inbox/config in test Eric Wong
2021-10-13 10:16           ` [PATCH 6/7] t/git: avoid "once" warning for async_warn Eric Wong
2021-10-13 10:16           ` [PATCH 7/7] t/nntpd-tls: change diag() to like() assertion Eric Wong
2021-11-04  0:17 ` httpd memory usage? Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).