* how's memory use? May 2020 edition
@ 2020-05-12 8:37 Eric Wong
2020-05-14 20:57 ` Konstantin Ryabitsev
0 siblings, 1 reply; 3+ messages in thread
From: Eric Wong @ 2020-05-12 8:37 UTC (permalink / raw)
To: meta
Hey all, if possible; I'd like to know the memory use of your
daemons (particularly -httpd), relevant pmap(1) (or equivalent)
output, and version of public-inbox in use.
I'm primarily a GNU/Linux user, so much of the following is
glibc-specific. If you use another malloc (or libc) I'd also be
interested to know. I expect my changes to be sympathetic to
the way all reasonable malloc implementations work.
Over the past few months, my processes have been using less RAM.
With 1.5.0 on 64-bit systems, I don't see -httpd go stay at or
above ~80MB RSS, 32-bit systems ~50 MB. It might spike for
giant messages, but the change to preload encodings[1] seems
to let malloc to trim the top of the heap more consistently.
The biggest message in an inbox is still a factor, and I use
this to find the largest blob in a git repo:
git cat-file --batch-check --batch-all-objects --unordered | \
awk '$2 == "blob" && $3 > max { max = $3; oid = $1 } END {print oid, max}'
It's usually spam, which won't get served if
"public-inbox-learn spam"-ed away.
Linux-based systems with `procps' installed can use pmap to show
anonymous mappings (not sure about other OSes):
pmap $PID | grep -w anon
On a "beefy" 64-bit workstation running -httpd, there's only
one giant anonymous region (and several smaller ones probably
not used by malloc):
00055df38140000 63540K rw--- [ anon ]
Above is for the process which hosts http://czquwvybam4bgbro.onion/
On a lesser VM (still 64-bit) which hosts http://hjrcffqmbrq6wope.onion/,
the heap is split since the lack of space caused sbrk(2) to fail
and forced malloc to use mmap(2) to create a new (sub) heap:
00005575d3d4a000 30616K rw--- [ anon ]
00005575d5b30000 13852K rw--- [ anon ]
glibc malloc defaults to a sliding window for mmap, so messages
which are beyond that window won't risk fragmenting the main
heap for their in-memory representation.
For the curious, the Linux mallopt(3) manpage also documents
environment variables which can be used to set a fixed mmap
window, trim threshold, and several other malloc knobs.
However, one of my goals is to get things working as well as
possible out-of-the-box so users won't need to fiddle with
knobs :>
-nntpd uses significantly less memory than -httpd since it:
1) doesn't split MIME parts
2) doesn't decode quoted-printable or base64
3) doesn't do character set conversions
STARTTLS or NNTPS for OpenSSL requires a significant amount of
per-socket memory, though I'm not sure how many NNTP readers
there are and if they use TLS.
[1] - https://public-inbox.org/meta/20200508015901.GA27432@dcvr/
("www: preload: load all encodings at startup")
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: how's memory use? May 2020 edition
2020-05-12 8:37 how's memory use? May 2020 edition Eric Wong
@ 2020-05-14 20:57 ` Konstantin Ryabitsev
2020-05-15 5:23 ` Eric Wong
0 siblings, 1 reply; 3+ messages in thread
From: Konstantin Ryabitsev @ 2020-05-14 20:57 UTC (permalink / raw)
To: Eric Wong; +Cc: meta
On Tue, May 12, 2020 at 08:37:34AM +0000, Eric Wong wrote:
> Hey all, if possible; I'd like to know the memory use of your
> daemons (particularly -httpd), relevant pmap(1) (or equivalent)
> output, and version of public-inbox in use.
This is on lore.kernel.org. We upgraded to 1.5.0 yesterday, so this is
only after a day of running, but this usually covers a lot of traffic.
We run with -W4, hence 4 different outputs:
# pgrep -f public-inbox-httpd | xargs pmap | grep anon
0000000002093000 23568K rw--- [ anon ]
00007f981c6dc000 84K rw--- [ anon ]
00007f981d2c5000 4K rw--- [ anon ]
00007f982802f000 20K rw--- [ anon ]
00007f982824c000 16K rw--- [ anon ]
00007f982865c000 184K rw--- [ anon ]
00007f9828da8000 8K rw--- [ anon ]
00007f9828fc2000 8K rw--- [ anon ]
00007f9829351000 4K rw--- [ anon ]
00007f982953f000 160K rw--- [ anon ]
00007f9829572000 4K rw--- [ anon ]
00007f9829575000 4K rw--- [ anon ]
00007fffddbe2000 8K r-x-- [ anon ]
ffffffffff600000 4K r-x-- [ anon ]
0000000002093000 23568K rw--- [ anon ]
0000000003797000 235060K rw--- [ anon ]
00007f981a8fd000 2736K rw--- [ anon ]
00007f981c6dc000 84K rw--- [ anon ]
00007f981d2c5000 4K rw--- [ anon ]
00007f982802f000 20K rw--- [ anon ]
00007f982824c000 16K rw--- [ anon ]
00007f982865c000 184K rw--- [ anon ]
00007f9828da8000 8K rw--- [ anon ]
00007f9828fc2000 8K rw--- [ anon ]
00007f9829351000 4K rw--- [ anon ]
00007f982953f000 160K rw--- [ anon ]
00007f9829572000 4K rw--- [ anon ]
00007f9829575000 4K rw--- [ anon ]
00007fffddbe2000 8K r-x-- [ anon ]
ffffffffff600000 4K r-x-- [ anon ]
0000000002093000 23568K rw--- [ anon ]
0000000003797000 216568K rw--- [ anon ]
00007f98196cc000 4724K rw--- [ anon ]
00007f9819b69000 4876K rw--- [ anon ]
00007f981a02c000 2736K rw--- [ anon ]
00007f981aeb5000 3496K rw--- [ anon ]
00007f981b350000 1628K rw--- [ anon ]
00007f981c6dc000 84K rw--- [ anon ]
00007f981d2c5000 4K rw--- [ anon ]
00007f982802f000 20K rw--- [ anon ]
00007f982824c000 16K rw--- [ anon ]
00007f982865c000 184K rw--- [ anon ]
00007f9828da8000 8K rw--- [ anon ]
00007f9828fc2000 8K rw--- [ anon ]
00007f9829351000 4K rw--- [ anon ]
00007f982953f000 160K rw--- [ anon ]
00007f9829572000 4K rw--- [ anon ]
00007f9829575000 4K rw--- [ anon ]
00007fffddbe2000 8K r-x-- [ anon ]
ffffffffff600000 4K r-x-- [ anon ]
0000000002093000 23568K rw--- [ anon ]
0000000003797000 241724K rw--- [ anon ]
00007f981a8e2000 2736K rw--- [ anon ]
00007f981b16f000 1964K rw--- [ anon ]
00007f981b35a000 1300K rw--- [ anon ]
00007f981c6dc000 84K rw--- [ anon ]
00007f981d2c5000 4K rw--- [ anon ]
00007f982802f000 20K rw--- [ anon ]
00007f982824c000 16K rw--- [ anon ]
00007f982865c000 184K rw--- [ anon ]
00007f9828da8000 8K rw--- [ anon ]
00007f9828fc2000 8K rw--- [ anon ]
00007f9829351000 4K rw--- [ anon ]
00007f982953f000 160K rw--- [ anon ]
00007f9829572000 4K rw--- [ anon ]
00007f9829575000 4K rw--- [ anon ]
00007fffddbe2000 8K r-x-- [ anon ]
ffffffffff600000 4K r-x-- [ anon ]
0000000002093000 23568K rw--- [ anon ]
0000000003797000 202632K rw--- [ anon ]
00007f981c6dc000 84K rw--- [ anon ]
00007f981d2c5000 4K rw--- [ anon ]
00007f982802f000 20K rw--- [ anon ]
00007f982824c000 16K rw--- [ anon ]
00007f982865c000 184K rw--- [ anon ]
00007f9828da8000 8K rw--- [ anon ]
00007f9828fc2000 8K rw--- [ anon ]
00007f9829351000 4K rw--- [ anon ]
00007f982953f000 160K rw--- [ anon ]
00007f9829572000 4K rw--- [ anon ]
00007f9829575000 4K rw--- [ anon ]
00007fffddbe2000 8K r-x-- [ anon ]
ffffffffff600000 4K r-x-- [ anon ]
Best,
-K
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: how's memory use? May 2020 edition
2020-05-14 20:57 ` Konstantin Ryabitsev
@ 2020-05-15 5:23 ` Eric Wong
0 siblings, 0 replies; 3+ messages in thread
From: Eric Wong @ 2020-05-15 5:23 UTC (permalink / raw)
To: Konstantin Ryabitsev; +Cc: meta
Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Tue, May 12, 2020 at 08:37:34AM +0000, Eric Wong wrote:
> > Hey all, if possible; I'd like to know the memory use of your
> > daemons (particularly -httpd), relevant pmap(1) (or equivalent)
> > output, and version of public-inbox in use.
>
> This is on lore.kernel.org. We upgraded to 1.5.0 yesterday, so this is
> only after a day of running, but this usually covers a lot of traffic.
> We run with -W4, hence 4 different outputs:
Thanks for the info! I forgot to ask in the original post, but
having an idea of active connections per worker might be useful
too. I rarely see more than a few dozen, myself.
> # pgrep -f public-inbox-httpd | xargs pmap | grep anon
Would've been easier for humans to read the output if each
process were individually broken out, but I can figure it out
from addresses below :>
> 0000000002093000 23568K rw--- [ anon ]
OK, that looks like the heap of the master.
> 00007f981c6dc000 84K rw--- [ anon ]
> 00007f981d2c5000 4K rw--- [ anon ]
> 00007f982802f000 20K rw--- [ anon ]
> 00007f982824c000 16K rw--- [ anon ]
> 00007f982865c000 184K rw--- [ anon ]
> 00007f9828da8000 8K rw--- [ anon ]
> 00007f9828fc2000 8K rw--- [ anon ]
> 00007f9829351000 4K rw--- [ anon ]
> 00007f982953f000 160K rw--- [ anon ]
> 00007f9829572000 4K rw--- [ anon ]
> 00007f9829575000 4K rw--- [ anon ]
> 00007fffddbe2000 8K r-x-- [ anon ]
> ffffffffff600000 4K r-x-- [ anon ]
Probably somethings used by glibc internally, or maybe
SQLite, Xapian. Good thing is the above mappings now
get shared with children and are copy-on-write
OK, onto another process:
> 0000000002093000 23568K rw--- [ anon ]
That looks inherited with the parent.
> 0000000003797000 235060K rw--- [ anon ]
Ah, so that's probably the main heap after forking (I forget
my 64-bit process uses -W0, so no workers in that setup).
Anyways, 250-300MB seems a lot better than things were for lore
few months ago (closer to ~1G per worker, IIRC?).
I've still got a some pure Perl ideas (and plenty with
Inline::C), though I'll probably prioritize other things, first,
such as IMAP.
> 00007f981a8fd000 2736K rw--- [ anon ]
Not sure where the above comes from, but it's an odd allocation
that seems to get pulled in by most other workers, independently.
> 00007f981c6dc000 84K rw--- [ anon ]
> 00007f981d2c5000 4K rw--- [ anon ]
> 00007f982802f000 20K rw--- [ anon ]
> 00007f982824c000 16K rw--- [ anon ]
> 00007f982865c000 184K rw--- [ anon ]
> 00007f9828da8000 8K rw--- [ anon ]
> 00007f9828fc2000 8K rw--- [ anon ]
> 00007f9829351000 4K rw--- [ anon ]
> 00007f982953f000 160K rw--- [ anon ]
> 00007f9829572000 4K rw--- [ anon ]
> 00007f9829575000 4K rw--- [ anon ]
> 00007fffddbe2000 8K r-x-- [ anon ]
> ffffffffff600000 4K r-x-- [ anon ]
That all looks shared from the parent, good.
> 0000000002093000 23568K rw--- [ anon ]
> 0000000003797000 216568K rw--- [ anon ]
OK, similar to the other worker.
> 00007f98196cc000 4724K rw--- [ anon ]
> 00007f9819b69000 4876K rw--- [ anon ]
> 00007f981a02c000 2736K rw--- [ anon ]
> 00007f981aeb5000 3496K rw--- [ anon ]
> 00007f981b350000 1628K rw--- [ anon ]
Weird, going to need to source dive into other dependencies to
figure this out, but also not a lot compared to the main 200MB+
heap, either.
I've got some odd ones like those, too, and they seem to
persist...
<snip 00007f981c6dc000 - ffffffffff600000>
> 0000000002093000 23568K rw--- [ anon ]
> 0000000003797000 241724K rw--- [ anon ]
> 00007f981a8e2000 2736K rw--- [ anon ]
> 00007f981b16f000 1964K rw--- [ anon ]
> 00007f981b35a000 1300K rw--- [ anon ]
Ditto for these mysterious allocations
<snip 00007f981c6dc000 - ffffffffff600000>
> 0000000002093000 23568K rw--- [ anon ]
> 0000000003797000 202632K rw--- [ anon ]
Probably the least busy process, and no odd >1MB mappings.
Anyways, things seem looking much better than they were in the
past. Regexp matching and split() for MIME is still a problem,
and some lists like linux-mtd having some giant multi-MB spam
that gets crawled...
Thanks again for the info!
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2020-05-15 5:23 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-05-12 8:37 how's memory use? May 2020 edition Eric Wong
2020-05-14 20:57 ` Konstantin Ryabitsev
2020-05-15 5:23 ` Eric Wong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).