* httpd memory usage? @ 2021-09-04 23:53 Eric Wong 2021-09-27 7:10 ` Eric Wong ` (2 more replies) 0 siblings, 3 replies; 17+ messages in thread From: Eric Wong @ 2021-09-04 23:53 UTC (permalink / raw) To: meta Is it high for anybody else? I'm not seeing it for small instances (e.g. the instance that's powering public-inbox.org/{git,meta} is fine, but with lots of entries my lore mirror @ https://yhbt.net/lore/ the main sbrk heap seems to be growing slowly without bounds. And thats with MALLOC_MMAP_THRESHOLD_=131072 set to avoid sbrk for large allocations (glibc 2.28 Debian buster). I think I'll need to finish porting mwrap over to Perl/XS (from Ruby/C)... ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: httpd memory usage? 2021-09-04 23:53 httpd memory usage? Eric Wong @ 2021-09-27 7:10 ` Eric Wong 2021-10-04 0:07 ` [PATCH 0/2] www: fix ref cycles when threading extindex Eric Wong 2021-11-04 0:17 ` httpd memory usage? Eric Wong 2 siblings, 0 replies; 17+ messages in thread From: Eric Wong @ 2021-09-27 7:10 UTC (permalink / raw) To: meta Eric Wong <e@80x24.org> wrote: > Is it high for anybody else? I'm not seeing it for small > instances (e.g. the instance that's powering > public-inbox.org/{git,meta} is fine, but with lots of entries my > lore mirror @ https://yhbt.net/lore/ the main sbrk heap seems to > be growing slowly without bounds. I suspect it's long-lived SQLite connections and associated caches/etc (and having short-lived allocations mixed in with long-term ones). Anyways, I'm testing the below on yhbt.net/lore, but there's other SQLite knobs worth investigating... Preloading all DBs isn't realistic, either, due to the potential number of inboxes. diff --git a/lib/PublicInbox/Inbox.pm b/lib/PublicInbox/Inbox.pm index 1d5fc708..8fcf6e81 100644 --- a/lib/PublicInbox/Inbox.pm +++ b/lib/PublicInbox/Inbox.pm @@ -43,7 +43,7 @@ sub cleanup_task () { for my $git (@{$ibx->{-repo_objs}}) { $again = 1 if $git->cleanup; } - check_inodes($ibx); + delete @$ibx{qw(over mm)}; $next->{"$ibx"} = $ibx if $again; } $CLEANUP = $next; > I think I'll need to finish porting mwrap over to Perl/XS > (from Ruby/C)... Yeah, still on my TODO list. But right now lei IMAP(S) on high-latency links is a terrible... ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 0/2] www: fix ref cycles when threading extindex 2021-09-04 23:53 httpd memory usage? Eric Wong 2021-09-27 7:10 ` Eric Wong @ 2021-10-04 0:07 ` Eric Wong 2021-10-04 0:07 ` [PATCH 1/2] t/thread-cycle: make Email::Simple optional Eric Wong ` (2 more replies) 2021-11-04 0:17 ` httpd memory usage? Eric Wong 2 siblings, 3 replies; 17+ messages in thread From: Eric Wong @ 2021-10-04 0:07 UTC (permalink / raw) To: meta I finally got Devel::Mwrap::PSGI working and fixed a long-standing reference cycle. AFAIK, it only affects extindex users, not v2, and definitely not v1. Eric Wong (2): t/thread-cycle: make Email::Simple optional www: fix ref cycle from threading w/ extindex lib/PublicInbox/SearchThread.pm | 104 +++++++++++++++++--------------- t/thread-cycle.t | 50 ++++++++++----- 2 files changed, 88 insertions(+), 66 deletions(-) ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 1/2] t/thread-cycle: make Email::Simple optional 2021-10-04 0:07 ` [PATCH 0/2] www: fix ref cycles when threading extindex Eric Wong @ 2021-10-04 0:07 ` Eric Wong 2021-10-04 0:07 ` [PATCH 2/2] www: fix ref cycle from threading w/ extindex Eric Wong 2021-10-04 22:51 ` [PATCH 0/2] www: fix ref cycles when threading extindex Eric Wong 2 siblings, 0 replies; 17+ messages in thread From: Eric Wong @ 2021-10-04 0:07 UTC (permalink / raw) To: meta We only use it if Mail::Thread is available, and often it's not. --- t/thread-cycle.t | 29 ++++++++++++++--------------- 1 file changed, 14 insertions(+), 15 deletions(-) diff --git a/t/thread-cycle.t b/t/thread-cycle.t index 613c142e65cd..4b47c01c37c1 100644 --- a/t/thread-cycle.t +++ b/t/thread-cycle.t @@ -1,19 +1,16 @@ # Copyright (C) 2016-2021 all contributors <meta@public-inbox.org> # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> -use strict; -use warnings; -use Test::More; -use PublicInbox::TestCommon; -require_mods 'Email::Simple'; +use strict; use v5.10.1; use PublicInbox::TestCommon; use_ok('PublicInbox::SearchThread'); my $mt = eval { require Mail::Thread; no warnings 'once'; $Mail::Thread::nosubject = 1; $Mail::Thread::noprune = 1; + require Email::Simple; # required by Mail::Thread (via Email::Abstract) }; -sub make_objs { +my $make_objs = sub { my @simples; my $n = 0; my @msgs = map { @@ -21,17 +18,19 @@ sub make_objs { $msg->{ds} ||= ++$n; $msg->{references} =~ s/\s+/ /sg if $msg->{references}; $msg->{blob} = '0'x40; # any dummy value will do, here - my $simple = Email::Simple->create(header => [ - 'Message-ID' => "<$msg->{mid}>", - 'References' => $msg->{references}, - ]); - push @simples, $simple; + if ($mt) { + my $simple = Email::Simple->create(header => [ + 'Message-ID' => "<$msg->{mid}>", + 'References' => $msg->{references}, + ]); + push @simples, $simple; + } bless $msg, 'PublicInbox::Smsg' } @_; (\@simples, \@msgs); -} +}; -my ($simples, $smsgs) = make_objs( +my ($simples, $smsgs) = $make_objs->( # data from t/testbox-6 in Mail::Thread 2.55: { mid => '20021124145312.GA1759@nlin.net' }, { mid => 'slrnau448m.7l4.markj+0111@cloaked.freeserve.co.uk', @@ -79,13 +78,13 @@ my @backwards = ( { mid => 8, references => '' } ); -($simples, $smsgs) = make_objs(@backwards); +($simples, $smsgs) = $make_objs->(@backwards); my $backward = thread_to_s($smsgs); SKIP: { skip 'Mail::Thread missing', 1 unless $mt; check_mt($backward, $simples, 'matches Mail::Thread backwards'); } -($simples, $smsgs) = make_objs(reverse @backwards); +($simples, $smsgs) = $make_objs->(reverse @backwards); my $forward = thread_to_s($smsgs); unless ('Mail::Thread sorts by Date') { SKIP: { ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 2/2] www: fix ref cycle from threading w/ extindex 2021-10-04 0:07 ` [PATCH 0/2] www: fix ref cycles when threading extindex Eric Wong 2021-10-04 0:07 ` [PATCH 1/2] t/thread-cycle: make Email::Simple optional Eric Wong @ 2021-10-04 0:07 ` Eric Wong 2021-10-04 22:51 ` [PATCH 0/2] www: fix ref cycles when threading extindex Eric Wong 2 siblings, 0 replies; 17+ messages in thread From: Eric Wong @ 2021-10-04 0:07 UTC (permalink / raw) To: meta Unlike v1 inboxes (which don't accept duplicate Message-IDs at all), and v2 inboxes (which generate a new Message-ID for duplicates), extindex must accept duplicate Message-IDs as-is. This was fine for storage, but prevented the reference-cycle mechanism of our message threading display algorithm from working reliably. It could no longer delete the ->{parent} field from clobbered entries in the %id_table. So we now take into account reused Message-IDs and never clobber entries in %id_table. Instead, we mark reused Message-IDs as "imposters" and special-case them by injecting them as children after all other threading is complete. This cycle was noticed using a pre-release of Devel::Mwrap::PSGI: https://80x24.org/mwrap-perl.git --- lib/PublicInbox/SearchThread.pm | 104 +++++++++++++++++--------------- t/thread-cycle.t | 21 ++++++- 2 files changed, 74 insertions(+), 51 deletions(-) diff --git a/lib/PublicInbox/SearchThread.pm b/lib/PublicInbox/SearchThread.pm index 8fb3a030aa72..507f25baab0e 100644 --- a/lib/PublicInbox/SearchThread.pm +++ b/lib/PublicInbox/SearchThread.pm @@ -24,70 +24,74 @@ use PublicInbox::MID qw($MID_EXTRACT); sub thread { my ($msgs, $ordersub, $ctx) = @_; + my (%id_table, @imposters); + keys(%id_table) = scalar @$msgs; # pre-size - # A. put all current $msgs (non-ghosts) into %id_table - my %id_table = map {; + # A. put all current non-imposter $msgs (non-ghosts) into %id_table + # (imposters are messages with reused Message-IDs) + # Sadly, we sort here anyways since the fill-in-the-blanks References: + # can be shakier if somebody used In-Reply-To with multiple, disparate + # messages. So, take the client Date: into account since we can't + # always determine ordering when somebody uses multiple In-Reply-To. + my @kids = sort { $a->{ds} <=> $b->{ds} } grep { # this delete saves around 4K across 1K messages # TODO: move this to a more appropriate place, breaks tests # if we do it during psgi_cull delete $_->{num}; - $_->{mid} => PublicInbox::SearchThread::Msg::cast($_); + PublicInbox::SearchThread::Msg::cast($_); + if (exists $id_table{$_->{mid}}) { + $_->{children} = []; + push @imposters, $_; # we'll deal with them later + undef; + } else { + $id_table{$_->{mid}} = $_; + defined($_->{references}); + } } @$msgs; + for my $smsg (@kids) { + # This loop exists to help fill in gaps left from missing + # messages. It is not needed in a perfect world where + # everything is perfectly referenced, only the last ref + # matters. + my $prev; + for my $ref ($smsg->{references} =~ m/$MID_EXTRACT/go) { + # Find a Container object for the given Message-ID + my $cont = $id_table{$ref} //= + PublicInbox::SearchThread::Msg::ghost($ref); + + # Link the References field's Containers together in + # the order implied by the References header + # + # * If they are already linked don't change the + # existing links + # * Do not add a link if adding that link would + # introduce a loop... + if ($prev && + !$cont->{parent} && # already linked + !$cont->has_descendent($prev) # would loop + ) { + $prev->add_child($cont); + } + $prev = $cont; + } - # Sadly, we sort here anyways since the fill-in-the-blanks References: - # can be shakier if somebody used In-Reply-To with multiple, disparate - # messages. So, take the client Date: into account since we can't - # always determine ordering when somebody uses multiple In-Reply-To. - # We'll trust the client Date: header here instead of the Received: - # time since this is for display (and not retrieval) - _set_parent(\%id_table, $_) for sort { $a->{ds} <=> $b->{ds} } @$msgs; + # C. Set the parent of this message to be the last element in + # References. + if (defined $prev && !$smsg->has_descendent($prev)) { + $prev->add_child($smsg); + } + } my $ibx = $ctx->{ibx}; - my $rootset = [ grep { + my $rootset = [ grep { # n.b.: delete prevents cyclic refs !delete($_->{parent}) && $_->visible($ibx) } values %id_table ]; $rootset = $ordersub->($rootset); $_->order_children($ordersub, $ctx) for @$rootset; - $rootset; -} -sub _set_parent ($$) { - my ($id_table, $this) = @_; - - # B. For each element in the message's References field: - defined(my $refs = $this->{references}) or return; - - # This loop exists to help fill in gaps left from missing - # messages. It is not needed in a perfect world where - # everything is perfectly referenced, only the last ref - # matters. - my $prev; - foreach my $ref ($refs =~ m/$MID_EXTRACT/go) { - # Find a Container object for the given Message-ID - my $cont = $id_table->{$ref} //= - PublicInbox::SearchThread::Msg::ghost($ref); - - # Link the References field's Containers together in - # the order implied by the References header - # - # * If they are already linked don't change the - # existing links - # * Do not add a link if adding that link would - # introduce a loop... - if ($prev && - !$cont->{parent} && # already linked - !$cont->has_descendent($prev) # would loop - ) { - $prev->add_child($cont); - } - $prev = $cont; - } - - # C. Set the parent of this message to be the last element in - # References. - if (defined $prev && !$this->has_descendent($prev)) { # would loop - $prev->add_child($this); - } + # parent imposter messages with reused Message-IDs + unshift(@{$id_table{$_->{mid}}->{children}}, $_) for @imposters; + $rootset; } package PublicInbox::SearchThread::Msg; diff --git a/t/thread-cycle.t b/t/thread-cycle.t index 4b47c01c37c1..e89b18464a5f 100644 --- a/t/thread-cycle.t +++ b/t/thread-cycle.t @@ -96,7 +96,26 @@ if ('sorting by Date') { is("\n".$backward, "\n".$forward, 'forward and backward matches'); } -done_testing(); +SKIP: { + require_mods 'Devel::Cycle', 1; + Devel::Cycle->import('find_cycle'); + my @dup = ( + { mid => 5, references => '<6>' }, + { mid => 5, references => '<6> <1>' }, + ); + open my $fh, '+>', \(my $out = '') or xbail "open: $!"; + (undef, $smsgs) = $make_objs->(@dup); + eval 'package EmptyInbox; sub smsg_by_mid { undef }'; + my $ctx = { ibx => bless {}, 'EmptyInbox' }; + my $rootset = PublicInbox::SearchThread::thread($smsgs, sub { + [ sort { $a->{mid} cmp $b->{mid} } @{$_[0]} ] }, $ctx); + my $oldout = select $fh; + find_cycle($rootset); + select $oldout; + is($out, '', 'nothing from find_cycle'); +} # Devel::Cycle check + +done_testing; sub thread_to_s { my ($msgs) = @_; ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 0/2] www: fix ref cycles when threading extindex 2021-10-04 0:07 ` [PATCH 0/2] www: fix ref cycles when threading extindex Eric Wong 2021-10-04 0:07 ` [PATCH 1/2] t/thread-cycle: make Email::Simple optional Eric Wong 2021-10-04 0:07 ` [PATCH 2/2] www: fix ref cycle from threading w/ extindex Eric Wong @ 2021-10-04 22:51 ` Eric Wong 2021-10-05 11:33 ` Encode.pm leak Eric Wong 2 siblings, 1 reply; 17+ messages in thread From: Eric Wong @ 2021-10-04 22:51 UTC (permalink / raw) To: meta That was most of it. https://bugs.debian.org/995738 has some more. I suspect there's other leaks, too :< ^ permalink raw reply [flat|nested] 17+ messages in thread
* Encode.pm leak 2021-10-04 22:51 ` [PATCH 0/2] www: fix ref cycles when threading extindex Eric Wong @ 2021-10-05 11:33 ` Eric Wong 2021-10-12 10:59 ` Encode.pm leak in v2.87..v3.12 Eric Wong 0 siblings, 1 reply; 17+ messages in thread From: Eric Wong @ 2021-10-05 11:33 UTC (permalink / raw) To: meta The leak's been there for a while, but I couldn't find other reports of it: https://rt.cpan.org/Public/Bug/Display.html?id=139622 Anybody else want to try this and see which versions of Encode or Perl hit this leak? Encode is bundled with Perl, but at least Debian also provides an optional standalone package, too. In the meantime, I may need to put in my own workaround for it PublicInbox::Eml because distros are slow moving... ----8<---- #!perl -w use v5.10.1; use Encode qw(decode); say "Encode::VERSION=$Encode::VERSION"; my $blob = "\xef\xbf\xbd" x 1000; my $nr = $ENV{NR} || 10000; # higher => more memory my $warn = $ENV{FB_WARN}; for (0..$nr) { if ($warn) { decode('us-ascii', $blob, Encode::FB_WARN); } else { eval { decode('us-ascii', $blob, Encode::FB_CROAK) }; } } ^ permalink raw reply [flat|nested] 17+ messages in thread
* Encode.pm leak in v2.87..v3.12 2021-10-05 11:33 ` Encode.pm leak Eric Wong @ 2021-10-12 10:59 ` Eric Wong 2021-10-13 10:16 ` [PATCH 0/7] workaround Encode leak, several test fixes Eric Wong 0 siblings, 1 reply; 17+ messages in thread From: Eric Wong @ 2021-10-12 10:59 UTC (permalink / raw) To: meta Eric Wong <e@80x24.org> wrote: > The leak's been there for a while, but I couldn't find > other reports of it: > > https://rt.cpan.org/Public/Bug/Display.html?id=139622 So this got fixed in Encode 3.13, and 3.15 (3.14 segfaulted :P). Perl v5.36 will have 3.15 (or possibly a newer version). This has been a leak since 2016 (2.87), so there might be lot of people affected... CentOS 7.x has Encode 2.51 so it's probably not affected by this particular bug... > In the meantime, I may need to put in my own workaround for it > PublicInbox::Eml because distros are slow moving... It's tracked for Debian, at least https://bugs.debian.org/995804 Our "eval { $eml->body_str }" usage is kinda dumb, so maybe it could be done better w/o needing eval at all... ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 0/7] workaround Encode leak, several test fixes 2021-10-12 10:59 ` Encode.pm leak in v2.87..v3.12 Eric Wong @ 2021-10-13 10:16 ` Eric Wong 2021-10-13 10:16 ` [PATCH 1/7] xt/perf-msgview: drop unnecessary use_ok Eric Wong ` (6 more replies) 0 siblings, 7 replies; 17+ messages in thread From: Eric Wong @ 2021-10-13 10:16 UTC (permalink / raw) To: meta Eric Wong (7): xt/perf-msgview: drop unnecessary use_ok test_common: hoist out tail_f sub t/www_listing: require opt-in for grokmirror tests eml: avoid Encode 2.87..3.12 leak t/lei-mirror: avoid reading ~/.public-inbox/config in test t/git: avoid "once" warning for async_warn t/nntpd-tls: change diag() to like() assertion lib/PublicInbox/Eml.pm | 25 +++++++++----- lib/PublicInbox/TestCommon.pm | 63 ++++++++++++++++++++--------------- t/git.t | 3 +- t/lei-mirror.t | 1 + t/nntpd-tls.t | 3 +- t/www_listing.t | 20 +++++++---- xt/perf-msgview.t | 1 - 7 files changed, 70 insertions(+), 46 deletions(-) ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 1/7] xt/perf-msgview: drop unnecessary use_ok 2021-10-13 10:16 ` [PATCH 0/7] workaround Encode leak, several test fixes Eric Wong @ 2021-10-13 10:16 ` Eric Wong 2021-10-13 10:16 ` [PATCH 2/7] test_common: hoist out tail_f sub Eric Wong ` (5 subsequent siblings) 6 siblings, 0 replies; 17+ messages in thread From: Eric Wong @ 2021-10-13 10:16 UTC (permalink / raw) To: meta require_mods covers it, and we're not testing Plack itself. --- xt/perf-msgview.t | 1 - 1 file changed, 1 deletion(-) diff --git a/xt/perf-msgview.t b/xt/perf-msgview.t index bc73fcce..cf550c1a 100644 --- a/xt/perf-msgview.t +++ b/xt/perf-msgview.t @@ -21,7 +21,6 @@ if (require_git(2.19, 1)) { "git <2.19, cat-file lacks --unordered, locality suffers\n"; } require_mods qw(Plack::Util); -use_ok 'Plack::Util'; my $ibx = PublicInbox::Inbox->new({ inboxdir => $inboxdir, name => 'name' }); my $git = $ibx->git; my $fh = $blob ? undef : $git->popen(@cat); ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 2/7] test_common: hoist out tail_f sub 2021-10-13 10:16 ` [PATCH 0/7] workaround Encode leak, several test fixes Eric Wong 2021-10-13 10:16 ` [PATCH 1/7] xt/perf-msgview: drop unnecessary use_ok Eric Wong @ 2021-10-13 10:16 ` Eric Wong 2021-10-13 10:16 ` [PATCH 3/7] t/www_listing: require opt-in for grokmirror tests Eric Wong ` (4 subsequent siblings) 6 siblings, 0 replies; 17+ messages in thread From: Eric Wong @ 2021-10-13 10:16 UTC (permalink / raw) To: meta We'll be reusing this in more places. While we're at it, allow it to tail all run_script() users, including lei() in TestCommon. --- lib/PublicInbox/TestCommon.pm | 63 ++++++++++++++++++++--------------- 1 file changed, 37 insertions(+), 26 deletions(-) diff --git a/lib/PublicInbox/TestCommon.pm b/lib/PublicInbox/TestCommon.pm index cd706e0e..57f1db95 100644 --- a/lib/PublicInbox/TestCommon.pm +++ b/lib/PublicInbox/TestCommon.pm @@ -12,13 +12,14 @@ use IO::Socket::INET; use File::Spec; our @EXPORT; my $lei_loud = $ENV{TEST_LEI_ERR_LOUD}; +my $tail_cmd = $ENV{TAIL}; our ($lei_opt, $lei_out, $lei_err, $lei_cwdfh); BEGIN { @EXPORT = qw(tmpdir tcp_server tcp_connect require_git require_mods run_script start_script key2sub xsys xsys_e xqx eml_load tick have_xapian_compact json_utf8 setup_public_inboxes create_inbox tcp_host_port test_lei lei lei_ok $lei_out $lei_err $lei_opt - test_httpd xbail require_cmd is_xdeeply); + test_httpd xbail require_cmd is_xdeeply tail_f); require Test::More; my @methods = grep(!/\W/, @Test::More::EXPORT); eval(join('', map { "*$_=\\&Test::More::$_;" } @methods)); @@ -280,11 +281,20 @@ sub run_script ($;$$) { my $sub = $run_mode == 0 ? undef : key2sub($key); my $fhref = []; my $spawn_opt = {}; + my @tail_paths; for my $fd (0..2) { my $redir = $opt->{$fd}; my $ref = ref($redir); if ($ref eq 'SCALAR') { - open my $fh, '+>', undef or die "open: $!"; + my $fh; + if ($tail_cmd && $ENV{TAIL_ALL} && $fd > 0) { + require File::Temp; + $fh = File::Temp->new("fd.$fd-XXXX", TMPDIR=>1); + push @tail_paths, $fh->filename; + } else { + open $fh, '+>', undef; + } + $fh or xbail $!; $fhref->[$fd] = $fh; $spawn_opt->{$fd} = $fh; next if $fd > 0; @@ -297,6 +307,7 @@ sub run_script ($;$$) { die "unable to deal with $ref $redir"; } } + my $tail = @tail_paths ? tail_f(@tail_paths) : undef; if ($key =~ /-(index|convert|extindex|convert|xcpdb)\z/) { unshift @argv, '--no-fsync'; } @@ -337,6 +348,7 @@ sub run_script ($;$$) { umask($umask); } + { local $?; undef $tail }; # slurp the redirects back into user-supplied strings for my $fd (1..2) { my $fh = $fhref->[$fd] or next; @@ -355,13 +367,13 @@ sub tick (;$) { 1; } -sub wait_for_tail ($;$) { +sub wait_for_tail { my ($tail_pid, $want) = @_; - my $wait = 2; + my $wait = 2; # "tail -F" sleeps 1.0s at-a-time w/o inotify/kevent if ($^O eq 'linux') { # GNU tail may use inotify state $tail_has_inotify; - return tick if $want < 0 && $tail_has_inotify; - my $end = time + $wait; + return tick if !$want && $tail_has_inotify; # before TERM + my $end = time + $wait; # wait for startup: my @ino; do { @ino = grep { @@ -410,13 +422,23 @@ sub xqx { wantarray ? split(/^/m, $out) : $out; } +sub tail_f (@) { + $tail_cmd or return; # "tail -F" or "tail -f" + for (@_) { open(my $fh, '>>', $_) or die $! }; + my $cmd = [ split(/ /, $tail_cmd), @_ ]; + require PublicInbox::Spawn; + my $pid = PublicInbox::Spawn::spawn($cmd, undef, { 1 => 2 }); + wait_for_tail($pid, scalar @_); + PublicInboxTestProcess->new($pid, \&wait_for_tail); +} + sub start_script { my ($cmd, $env, $opt) = @_; my ($key, @argv) = @$cmd; my $run_mode = $ENV{TEST_RUN_MODE} // $opt->{run_mode} // 2; my $sub = $run_mode == 0 ? undef : key2sub($key); - my $tail_pid; - if (my $tail_cmd = $ENV{TAIL}) { + my $tail; + if ($tail_cmd) { my @paths; for (@argv) { next unless /\A--std(?:err|out)=(.+)\z/; @@ -434,17 +456,7 @@ sub start_script { } } } - if (@paths) { - $tail_pid = fork // die "fork: $!"; - if ($tail_pid == 0) { - # make sure files exist, first - open my $fh, '>>', $_ for @paths; - open(STDOUT, '>&STDERR') or die "1>&2: $!"; - exec(split(' ', $tail_cmd), @paths); - die "$tail_cmd failed: $!"; - } - wait_for_tail($tail_pid, scalar @paths); - } + $tail = tail_f(@paths); } my $pid = fork // die "fork: $!\n"; if ($pid == 0) { @@ -480,7 +492,9 @@ sub start_script { die "FAIL: ",join(' ', $key, @argv), ": $!\n"; } } - PublicInboxTestProcess->new($pid, $tail_pid); + my $td = PublicInboxTestProcess->new($pid); + $td->{-extra} = $tail; + $td; } # favor lei() or lei_ok() over $lei for new code @@ -735,8 +749,8 @@ use strict; sub CLONE_SKIP { 1 } sub new { - my ($klass, $pid, $tail_pid) = @_; - bless { pid => $pid, tail_pid => $tail_pid, owner => $$ }, $klass; + my ($cls, $pid, $cb) = @_; + bless { pid => $pid, cb => $cb, owner => $$ }, $cls; } sub kill { @@ -747,6 +761,7 @@ sub kill { sub join { my ($self, $sig) = @_; my $pid = delete $self->{pid} or return; + $self->{cb}->() if defined $self->{cb}; CORE::kill($sig, $pid) if defined $sig; my $ret = waitpid($pid, 0) // die "waitpid($pid): $!"; $ret == $pid or die "waitpid($pid) != $ret"; @@ -755,10 +770,6 @@ sub join { sub DESTROY { my ($self) = @_; return if $self->{owner} != $$; - if (my $tail_pid = delete $self->{tail_pid}) { - PublicInbox::TestCommon::wait_for_tail($tail_pid, -1); - CORE::kill('TERM', $tail_pid); - } $self->join('TERM'); } ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 3/7] t/www_listing: require opt-in for grokmirror tests 2021-10-13 10:16 ` [PATCH 0/7] workaround Encode leak, several test fixes Eric Wong 2021-10-13 10:16 ` [PATCH 1/7] xt/perf-msgview: drop unnecessary use_ok Eric Wong 2021-10-13 10:16 ` [PATCH 2/7] test_common: hoist out tail_f sub Eric Wong @ 2021-10-13 10:16 ` Eric Wong 2021-10-13 10:16 ` [PATCH 4/7] eml: avoid Encode 2.87..3.12 leak Eric Wong ` (3 subsequent siblings) 6 siblings, 0 replies; 17+ messages in thread From: Eric Wong @ 2021-10-13 10:16 UTC (permalink / raw) To: meta grokmirror 2.x seems to idle in several places for 5s at-a-time, causing t/www_listing.t to take longer than "make check-run" on a 4-core system when run without grokmirror. So make it optional but add some test knobs to allow tailing the log output so I can see what's going on. --- t/www_listing.t | 20 +++++++++++++------- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/t/www_listing.t b/t/www_listing.t index eb77969b..c556a2d7 100644 --- a/t/www_listing.t +++ b/t/www_listing.t @@ -130,27 +130,31 @@ SKIP: { tiny_test($json, $host, $port, 1); undef $sock; + skip 'TEST_GROK unset', 12 unless $ENV{TEST_GROK}; my $grok_pull = require_cmd('grok-pull', 1) or skip('grok-pull not available', 12); my ($grok_version) = (xqx([$grok_pull, "--version"]) =~ /(\d+)\.(?:\d+)(?:\.(\d+))?/); $grok_version >= 2 or skip('grok-pull v2 or later not available', 12); + my $grok_loglevel = $ENV{TEST_GROK_LOGLEVEL} // 'info'; ok(mkdir("$tmpdir/mirror"), 'prepare grok mirror dest'); - open $fh, '>', "$tmpdir/repos.conf" or die; - print $fh <<"" or die; + my $tail = tail_f("$tmpdir/grok.log"); + open $fh, '>', "$tmpdir/repos.conf" or xbail $!; + print $fh <<"" or xbail $!; [core] toplevel = $tmpdir/mirror manifest = $tmpdir/local-manifest.js.gz +log = $tmpdir/grok.log +loglevel = $grok_loglevel [remote] site = http://$host:$port manifest = \${site}/manifest.js.gz [pull] [fsck] - close $fh or die; - + close $fh or xbail $!; xsys($grok_pull, '-c', "$tmpdir/repos.conf"); is($? >> 8, 0, 'grok-pull exit code as expected'); for (qw(alt bare v2/git/0.git v2/git/1.git v2/git/2.git)) { @@ -159,18 +163,20 @@ manifest = \${site}/manifest.js.gz # support per-inbox manifests, handy for v2: # /$INBOX/v2/manifest.js.gz - open $fh, '>', "$tmpdir/per-inbox.conf" or die; - print $fh <<"" or die; + open $fh, '>', "$tmpdir/per-inbox.conf" or xbail $!; + print $fh <<"" or xbail $!; [core] toplevel = $tmpdir/per-inbox manifest = $tmpdir/per-inbox-manifest.js.gz +log = $tmpdir/grok.log +loglevel = $grok_loglevel [remote] site = http://$host:$port manifest = \${site}/v2/manifest.js.gz [pull] [fsck] - close $fh or die; + close $fh or xbail $!; ok(mkdir("$tmpdir/per-inbox"), 'prepare single-v2-inbox mirror'); xsys($grok_pull, '-c', "$tmpdir/per-inbox.conf"); is($? >> 8, 0, 'grok-pull exit code as expected'); ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 4/7] eml: avoid Encode 2.87..3.12 leak 2021-10-13 10:16 ` [PATCH 0/7] workaround Encode leak, several test fixes Eric Wong ` (2 preceding siblings ...) 2021-10-13 10:16 ` [PATCH 3/7] t/www_listing: require opt-in for grokmirror tests Eric Wong @ 2021-10-13 10:16 ` Eric Wong 2021-10-13 10:16 ` [PATCH 5/7] t/lei-mirror: avoid reading ~/.public-inbox/config in test Eric Wong ` (2 subsequent siblings) 6 siblings, 0 replies; 17+ messages in thread From: Eric Wong @ 2021-10-13 10:16 UTC (permalink / raw) To: meta Encode::FB_CROAK leaks memory in old versions of Encode: <https://rt.cpan.org/Public/Bug/Display.html?id=139622> Since I expect there's still many users on old systems and old Perls, we can use "$SIG{__WARN__} = \&croak" here with Encode::FB_WARN to emulate Encode::FB_CROAK behavior. --- lib/PublicInbox/Eml.pm | 25 ++++++++++++++++--------- 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/lib/PublicInbox/Eml.pm b/lib/PublicInbox/Eml.pm index 0867a016..69c26932 100644 --- a/lib/PublicInbox/Eml.pm +++ b/lib/PublicInbox/Eml.pm @@ -28,7 +28,7 @@ package PublicInbox::Eml; use strict; use v5.10.1; use Carp qw(croak); -use Encode qw(find_encoding decode encode); # stdlib +use Encode qw(find_encoding); # stdlib use Text::Wrap qw(wrap); # stdlib, we need Perl 5.6+ for $huge use MIME::Base64 3.05; # Perl 5.10.0 / 5.9.2 use MIME::QuotedPrint 3.05; # ditto @@ -334,9 +334,14 @@ sub body_set { sub body_str_set { my ($self, $body_str) = @_; - my $charset = ct($self)->{attributes}->{charset} or + my $cs = ct($self)->{attributes}->{charset} // croak('body_str was given, but no charset is defined'); - body_set($self, \(encode($charset, $body_str, Encode::FB_CROAK))); + my $enc = find_encoding($cs) // croak "unknown encoding `$cs'"; + $body_str = do { + local $SIG{__WARN__} = \&croak; + $enc->encode($body_str, Encode::FB_WARN); + }; + body_set($self, \$body_str); } sub content_type { scalar header($_[0], 'Content-Type') } @@ -452,15 +457,17 @@ sub body { sub body_str { my ($self) = @_; my $ct = ct($self); - my $charset = $ct->{attributes}->{charset}; - if (!$charset) { - if ($STR_TYPE{$ct->{type}} && $STR_SUBTYPE{$ct->{subtype}}) { + my $cs = $ct->{attributes}->{charset} // do { + ($STR_TYPE{$ct->{type}} && $STR_SUBTYPE{$ct->{subtype}}) and return body($self); - } croak("can't get body as a string for ", join("\n\t", header_raw($self, 'Content-Type'))); - } - decode($charset, body($self), Encode::FB_CROAK); + }; + my $enc = find_encoding($cs) or croak "unknown encoding `$cs'"; + my $tmp = body($self); + # workaround https://rt.cpan.org/Public/Bug/Display.html?id=139622 + local $SIG{__WARN__} = \&croak; + $enc->decode($tmp, Encode::FB_WARN); } sub as_string { ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 5/7] t/lei-mirror: avoid reading ~/.public-inbox/config in test 2021-10-13 10:16 ` [PATCH 0/7] workaround Encode leak, several test fixes Eric Wong ` (3 preceding siblings ...) 2021-10-13 10:16 ` [PATCH 4/7] eml: avoid Encode 2.87..3.12 leak Eric Wong @ 2021-10-13 10:16 ` Eric Wong 2021-10-13 10:16 ` [PATCH 6/7] t/git: avoid "once" warning for async_warn Eric Wong 2021-10-13 10:16 ` [PATCH 7/7] t/nntpd-tls: change diag() to like() assertion Eric Wong 6 siblings, 0 replies; 17+ messages in thread From: Eric Wong @ 2021-10-13 10:16 UTC (permalink / raw) To: meta Oops, we shouldn't attempt to read a users' actual HOME when running -index, since mine has a bunch of invalid entries in there. --- t/lei-mirror.t | 1 + 1 file changed, 1 insertion(+) diff --git a/t/lei-mirror.t b/t/lei-mirror.t index b449e0b4..294eb654 100644 --- a/t/lei-mirror.t +++ b/t/lei-mirror.t @@ -167,6 +167,7 @@ SKIP: { my $after = [ glob("$d/t1/*") ]; is_deeply($before, $after, 'no new files created'); + local $ENV{HOME} = $tmpdir; ok(run_script([qw(-index -Lbasic), "$d/t1"]), 'index v1'); ok(run_script([qw(-index -Lbasic), "$d/t2"]), 'index v2'); my $f = "$d/t1/public-inbox/msgmap.sqlite3"; ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 6/7] t/git: avoid "once" warning for async_warn 2021-10-13 10:16 ` [PATCH 0/7] workaround Encode leak, several test fixes Eric Wong ` (4 preceding siblings ...) 2021-10-13 10:16 ` [PATCH 5/7] t/lei-mirror: avoid reading ~/.public-inbox/config in test Eric Wong @ 2021-10-13 10:16 ` Eric Wong 2021-10-13 10:16 ` [PATCH 7/7] t/nntpd-tls: change diag() to like() assertion Eric Wong 6 siblings, 0 replies; 17+ messages in thread From: Eric Wong @ 2021-10-13 10:16 UTC (permalink / raw) To: meta No point in testing use_ok when we have no outside dependencies nor exports in this case. --- t/git.t | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/t/git.t b/t/git.t index 8a020211..fa541f41 100644 --- a/t/git.t +++ b/t/git.t @@ -6,8 +6,7 @@ use PublicInbox::TestCommon; my ($dir, $for_destroy) = tmpdir(); use PublicInbox::Import; use POSIX qw(strftime); - -use_ok 'PublicInbox::Git'; +use PublicInbox::Git; { PublicInbox::Import::init_bare($dir); ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 7/7] t/nntpd-tls: change diag() to like() assertion 2021-10-13 10:16 ` [PATCH 0/7] workaround Encode leak, several test fixes Eric Wong ` (5 preceding siblings ...) 2021-10-13 10:16 ` [PATCH 6/7] t/git: avoid "once" warning for async_warn Eric Wong @ 2021-10-13 10:16 ` Eric Wong 6 siblings, 0 replies; 17+ messages in thread From: Eric Wong @ 2021-10-13 10:16 UTC (permalink / raw) To: meta This test wasn't finished when I initially wrote it :x --- t/nntpd-tls.t | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/t/nntpd-tls.t b/t/nntpd-tls.t index d81d1e13..2a76867a 100644 --- a/t/nntpd-tls.t +++ b/t/nntpd-tls.t @@ -151,7 +151,8 @@ for my $args ( \'STARTTLS not used by default'; ok(!lei(qw(ls-mail-source -c nntp.starttls=true), "nntp://$starttls_addr"), 'STARTTLS verify fails'); - diag $lei_err; + like $lei_err, qr/STARTTLS requested/, + 'STARTTLS noted in stderr'; }); SKIP: { ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: httpd memory usage? 2021-09-04 23:53 httpd memory usage? Eric Wong 2021-09-27 7:10 ` Eric Wong 2021-10-04 0:07 ` [PATCH 0/2] www: fix ref cycles when threading extindex Eric Wong @ 2021-11-04 0:17 ` Eric Wong 2 siblings, 0 replies; 17+ messages in thread From: Eric Wong @ 2021-11-04 0:17 UTC (permalink / raw) To: meta Eric Wong <e@80x24.org> wrote: > Is it high for anybody else? I'm not seeing it for small > instances (e.g. the instance that's powering > public-inbox.org/{git,meta} is fine, but with lots of entries my > lore mirror @ https://yhbt.net/lore/ the main sbrk heap seems to > be growing slowly without bounds. After the Encode leak workaround fix <https://public-inbox.org/meta/20211016232301.20463-1-e@80x24.org/> and a bunch of scratchpad reductions, it should be at an all time low. 64-bit instances should be below 80 MB unless you're serving gigantic messages/attachments (that are usually spam). > I think I'll need to finish porting mwrap over to Perl/XS > (from Ruby/C)... It's working well enough: https://80x24.org/mwrap-perl.git (still unreleased, though, might take a few more years :P) ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2021-11-04 0:17 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-09-04 23:53 httpd memory usage? Eric Wong 2021-09-27 7:10 ` Eric Wong 2021-10-04 0:07 ` [PATCH 0/2] www: fix ref cycles when threading extindex Eric Wong 2021-10-04 0:07 ` [PATCH 1/2] t/thread-cycle: make Email::Simple optional Eric Wong 2021-10-04 0:07 ` [PATCH 2/2] www: fix ref cycle from threading w/ extindex Eric Wong 2021-10-04 22:51 ` [PATCH 0/2] www: fix ref cycles when threading extindex Eric Wong 2021-10-05 11:33 ` Encode.pm leak Eric Wong 2021-10-12 10:59 ` Encode.pm leak in v2.87..v3.12 Eric Wong 2021-10-13 10:16 ` [PATCH 0/7] workaround Encode leak, several test fixes Eric Wong 2021-10-13 10:16 ` [PATCH 1/7] xt/perf-msgview: drop unnecessary use_ok Eric Wong 2021-10-13 10:16 ` [PATCH 2/7] test_common: hoist out tail_f sub Eric Wong 2021-10-13 10:16 ` [PATCH 3/7] t/www_listing: require opt-in for grokmirror tests Eric Wong 2021-10-13 10:16 ` [PATCH 4/7] eml: avoid Encode 2.87..3.12 leak Eric Wong 2021-10-13 10:16 ` [PATCH 5/7] t/lei-mirror: avoid reading ~/.public-inbox/config in test Eric Wong 2021-10-13 10:16 ` [PATCH 6/7] t/git: avoid "once" warning for async_warn Eric Wong 2021-10-13 10:16 ` [PATCH 7/7] t/nntpd-tls: change diag() to like() assertion Eric Wong 2021-11-04 0:17 ` httpd memory usage? Eric Wong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).