* how's memory use? May 2020 edition @ 2020-05-12 8:37 Eric Wong 2020-05-14 20:57 ` Konstantin Ryabitsev 0 siblings, 1 reply; 3+ messages in thread From: Eric Wong @ 2020-05-12 8:37 UTC (permalink / raw) To: meta Hey all, if possible; I'd like to know the memory use of your daemons (particularly -httpd), relevant pmap(1) (or equivalent) output, and version of public-inbox in use. I'm primarily a GNU/Linux user, so much of the following is glibc-specific. If you use another malloc (or libc) I'd also be interested to know. I expect my changes to be sympathetic to the way all reasonable malloc implementations work. Over the past few months, my processes have been using less RAM. With 1.5.0 on 64-bit systems, I don't see -httpd go stay at or above ~80MB RSS, 32-bit systems ~50 MB. It might spike for giant messages, but the change to preload encodings[1] seems to let malloc to trim the top of the heap more consistently. The biggest message in an inbox is still a factor, and I use this to find the largest blob in a git repo: git cat-file --batch-check --batch-all-objects --unordered | \ awk '$2 == "blob" && $3 > max { max = $3; oid = $1 } END {print oid, max}' It's usually spam, which won't get served if "public-inbox-learn spam"-ed away. Linux-based systems with `procps' installed can use pmap to show anonymous mappings (not sure about other OSes): pmap $PID | grep -w anon On a "beefy" 64-bit workstation running -httpd, there's only one giant anonymous region (and several smaller ones probably not used by malloc): 00055df38140000 63540K rw--- [ anon ] Above is for the process which hosts http://czquwvybam4bgbro.onion/ On a lesser VM (still 64-bit) which hosts http://hjrcffqmbrq6wope.onion/, the heap is split since the lack of space caused sbrk(2) to fail and forced malloc to use mmap(2) to create a new (sub) heap: 00005575d3d4a000 30616K rw--- [ anon ] 00005575d5b30000 13852K rw--- [ anon ] glibc malloc defaults to a sliding window for mmap, so messages which are beyond that window won't risk fragmenting the main heap for their in-memory representation. For the curious, the Linux mallopt(3) manpage also documents environment variables which can be used to set a fixed mmap window, trim threshold, and several other malloc knobs. However, one of my goals is to get things working as well as possible out-of-the-box so users won't need to fiddle with knobs :> -nntpd uses significantly less memory than -httpd since it: 1) doesn't split MIME parts 2) doesn't decode quoted-printable or base64 3) doesn't do character set conversions STARTTLS or NNTPS for OpenSSL requires a significant amount of per-socket memory, though I'm not sure how many NNTP readers there are and if they use TLS. [1] - https://public-inbox.org/meta/20200508015901.GA27432@dcvr/ ("www: preload: load all encodings at startup") ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: how's memory use? May 2020 edition 2020-05-12 8:37 how's memory use? May 2020 edition Eric Wong @ 2020-05-14 20:57 ` Konstantin Ryabitsev 2020-05-15 5:23 ` Eric Wong 0 siblings, 1 reply; 3+ messages in thread From: Konstantin Ryabitsev @ 2020-05-14 20:57 UTC (permalink / raw) To: Eric Wong; +Cc: meta On Tue, May 12, 2020 at 08:37:34AM +0000, Eric Wong wrote: > Hey all, if possible; I'd like to know the memory use of your > daemons (particularly -httpd), relevant pmap(1) (or equivalent) > output, and version of public-inbox in use. This is on lore.kernel.org. We upgraded to 1.5.0 yesterday, so this is only after a day of running, but this usually covers a lot of traffic. We run with -W4, hence 4 different outputs: # pgrep -f public-inbox-httpd | xargs pmap | grep anon 0000000002093000 23568K rw--- [ anon ] 00007f981c6dc000 84K rw--- [ anon ] 00007f981d2c5000 4K rw--- [ anon ] 00007f982802f000 20K rw--- [ anon ] 00007f982824c000 16K rw--- [ anon ] 00007f982865c000 184K rw--- [ anon ] 00007f9828da8000 8K rw--- [ anon ] 00007f9828fc2000 8K rw--- [ anon ] 00007f9829351000 4K rw--- [ anon ] 00007f982953f000 160K rw--- [ anon ] 00007f9829572000 4K rw--- [ anon ] 00007f9829575000 4K rw--- [ anon ] 00007fffddbe2000 8K r-x-- [ anon ] ffffffffff600000 4K r-x-- [ anon ] 0000000002093000 23568K rw--- [ anon ] 0000000003797000 235060K rw--- [ anon ] 00007f981a8fd000 2736K rw--- [ anon ] 00007f981c6dc000 84K rw--- [ anon ] 00007f981d2c5000 4K rw--- [ anon ] 00007f982802f000 20K rw--- [ anon ] 00007f982824c000 16K rw--- [ anon ] 00007f982865c000 184K rw--- [ anon ] 00007f9828da8000 8K rw--- [ anon ] 00007f9828fc2000 8K rw--- [ anon ] 00007f9829351000 4K rw--- [ anon ] 00007f982953f000 160K rw--- [ anon ] 00007f9829572000 4K rw--- [ anon ] 00007f9829575000 4K rw--- [ anon ] 00007fffddbe2000 8K r-x-- [ anon ] ffffffffff600000 4K r-x-- [ anon ] 0000000002093000 23568K rw--- [ anon ] 0000000003797000 216568K rw--- [ anon ] 00007f98196cc000 4724K rw--- [ anon ] 00007f9819b69000 4876K rw--- [ anon ] 00007f981a02c000 2736K rw--- [ anon ] 00007f981aeb5000 3496K rw--- [ anon ] 00007f981b350000 1628K rw--- [ anon ] 00007f981c6dc000 84K rw--- [ anon ] 00007f981d2c5000 4K rw--- [ anon ] 00007f982802f000 20K rw--- [ anon ] 00007f982824c000 16K rw--- [ anon ] 00007f982865c000 184K rw--- [ anon ] 00007f9828da8000 8K rw--- [ anon ] 00007f9828fc2000 8K rw--- [ anon ] 00007f9829351000 4K rw--- [ anon ] 00007f982953f000 160K rw--- [ anon ] 00007f9829572000 4K rw--- [ anon ] 00007f9829575000 4K rw--- [ anon ] 00007fffddbe2000 8K r-x-- [ anon ] ffffffffff600000 4K r-x-- [ anon ] 0000000002093000 23568K rw--- [ anon ] 0000000003797000 241724K rw--- [ anon ] 00007f981a8e2000 2736K rw--- [ anon ] 00007f981b16f000 1964K rw--- [ anon ] 00007f981b35a000 1300K rw--- [ anon ] 00007f981c6dc000 84K rw--- [ anon ] 00007f981d2c5000 4K rw--- [ anon ] 00007f982802f000 20K rw--- [ anon ] 00007f982824c000 16K rw--- [ anon ] 00007f982865c000 184K rw--- [ anon ] 00007f9828da8000 8K rw--- [ anon ] 00007f9828fc2000 8K rw--- [ anon ] 00007f9829351000 4K rw--- [ anon ] 00007f982953f000 160K rw--- [ anon ] 00007f9829572000 4K rw--- [ anon ] 00007f9829575000 4K rw--- [ anon ] 00007fffddbe2000 8K r-x-- [ anon ] ffffffffff600000 4K r-x-- [ anon ] 0000000002093000 23568K rw--- [ anon ] 0000000003797000 202632K rw--- [ anon ] 00007f981c6dc000 84K rw--- [ anon ] 00007f981d2c5000 4K rw--- [ anon ] 00007f982802f000 20K rw--- [ anon ] 00007f982824c000 16K rw--- [ anon ] 00007f982865c000 184K rw--- [ anon ] 00007f9828da8000 8K rw--- [ anon ] 00007f9828fc2000 8K rw--- [ anon ] 00007f9829351000 4K rw--- [ anon ] 00007f982953f000 160K rw--- [ anon ] 00007f9829572000 4K rw--- [ anon ] 00007f9829575000 4K rw--- [ anon ] 00007fffddbe2000 8K r-x-- [ anon ] ffffffffff600000 4K r-x-- [ anon ] Best, -K ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: how's memory use? May 2020 edition 2020-05-14 20:57 ` Konstantin Ryabitsev @ 2020-05-15 5:23 ` Eric Wong 0 siblings, 0 replies; 3+ messages in thread From: Eric Wong @ 2020-05-15 5:23 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: meta Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote: > On Tue, May 12, 2020 at 08:37:34AM +0000, Eric Wong wrote: > > Hey all, if possible; I'd like to know the memory use of your > > daemons (particularly -httpd), relevant pmap(1) (or equivalent) > > output, and version of public-inbox in use. > > This is on lore.kernel.org. We upgraded to 1.5.0 yesterday, so this is > only after a day of running, but this usually covers a lot of traffic. > We run with -W4, hence 4 different outputs: Thanks for the info! I forgot to ask in the original post, but having an idea of active connections per worker might be useful too. I rarely see more than a few dozen, myself. > # pgrep -f public-inbox-httpd | xargs pmap | grep anon Would've been easier for humans to read the output if each process were individually broken out, but I can figure it out from addresses below :> > 0000000002093000 23568K rw--- [ anon ] OK, that looks like the heap of the master. > 00007f981c6dc000 84K rw--- [ anon ] > 00007f981d2c5000 4K rw--- [ anon ] > 00007f982802f000 20K rw--- [ anon ] > 00007f982824c000 16K rw--- [ anon ] > 00007f982865c000 184K rw--- [ anon ] > 00007f9828da8000 8K rw--- [ anon ] > 00007f9828fc2000 8K rw--- [ anon ] > 00007f9829351000 4K rw--- [ anon ] > 00007f982953f000 160K rw--- [ anon ] > 00007f9829572000 4K rw--- [ anon ] > 00007f9829575000 4K rw--- [ anon ] > 00007fffddbe2000 8K r-x-- [ anon ] > ffffffffff600000 4K r-x-- [ anon ] Probably somethings used by glibc internally, or maybe SQLite, Xapian. Good thing is the above mappings now get shared with children and are copy-on-write OK, onto another process: > 0000000002093000 23568K rw--- [ anon ] That looks inherited with the parent. > 0000000003797000 235060K rw--- [ anon ] Ah, so that's probably the main heap after forking (I forget my 64-bit process uses -W0, so no workers in that setup). Anyways, 250-300MB seems a lot better than things were for lore few months ago (closer to ~1G per worker, IIRC?). I've still got a some pure Perl ideas (and plenty with Inline::C), though I'll probably prioritize other things, first, such as IMAP. > 00007f981a8fd000 2736K rw--- [ anon ] Not sure where the above comes from, but it's an odd allocation that seems to get pulled in by most other workers, independently. > 00007f981c6dc000 84K rw--- [ anon ] > 00007f981d2c5000 4K rw--- [ anon ] > 00007f982802f000 20K rw--- [ anon ] > 00007f982824c000 16K rw--- [ anon ] > 00007f982865c000 184K rw--- [ anon ] > 00007f9828da8000 8K rw--- [ anon ] > 00007f9828fc2000 8K rw--- [ anon ] > 00007f9829351000 4K rw--- [ anon ] > 00007f982953f000 160K rw--- [ anon ] > 00007f9829572000 4K rw--- [ anon ] > 00007f9829575000 4K rw--- [ anon ] > 00007fffddbe2000 8K r-x-- [ anon ] > ffffffffff600000 4K r-x-- [ anon ] That all looks shared from the parent, good. > 0000000002093000 23568K rw--- [ anon ] > 0000000003797000 216568K rw--- [ anon ] OK, similar to the other worker. > 00007f98196cc000 4724K rw--- [ anon ] > 00007f9819b69000 4876K rw--- [ anon ] > 00007f981a02c000 2736K rw--- [ anon ] > 00007f981aeb5000 3496K rw--- [ anon ] > 00007f981b350000 1628K rw--- [ anon ] Weird, going to need to source dive into other dependencies to figure this out, but also not a lot compared to the main 200MB+ heap, either. I've got some odd ones like those, too, and they seem to persist... <snip 00007f981c6dc000 - ffffffffff600000> > 0000000002093000 23568K rw--- [ anon ] > 0000000003797000 241724K rw--- [ anon ] > 00007f981a8e2000 2736K rw--- [ anon ] > 00007f981b16f000 1964K rw--- [ anon ] > 00007f981b35a000 1300K rw--- [ anon ] Ditto for these mysterious allocations <snip 00007f981c6dc000 - ffffffffff600000> > 0000000002093000 23568K rw--- [ anon ] > 0000000003797000 202632K rw--- [ anon ] Probably the least busy process, and no odd >1MB mappings. Anyways, things seem looking much better than they were in the past. Regexp matching and split() for MIME is still a problem, and some lists like linux-mtd having some giant multi-MB spam that gets crawled... Thanks again for the info! ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2020-05-15 5:23 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-05-12 8:37 how's memory use? May 2020 edition Eric Wong 2020-05-14 20:57 ` Konstantin Ryabitsev 2020-05-15 5:23 ` Eric Wong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).