* how's memory usage on public-inbox-httpd? @ 2018-12-01 19:44 Eric Wong 2019-06-06 19:04 ` Konstantin Ryabitsev 0 siblings, 1 reply; 17+ messages in thread From: Eric Wong @ 2018-12-01 19:44 UTC (permalink / raw) To: meta; +Cc: Konstantin Ryabitsev I haven't been around much, so not working on public-inbox means fewer restarts :x On my 2-core VM, I've been noticing public-inbox-httpd memory spikes into the 500MB range, which is gross... It seems caused by slow clients and large threads/mbox downloads. The PSGI code only loads one email per-client in memory at-a-time when using -httpd; but that adds up with many clients and larger messages. I run two -httpd workers, one-per-core, but also varnish and an experimental Ruby/C reverse-buffering proxy (yahns) for HTTPS. The problem seems to be varnish isn't reading from -httpd fast enough (and I lack CPU cores), but decreasing the niceness of varnish seems to help with the problem... ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: how's memory usage on public-inbox-httpd? 2018-12-01 19:44 how's memory usage on public-inbox-httpd? Eric Wong @ 2019-06-06 19:04 ` Konstantin Ryabitsev 2019-06-06 20:37 ` Eric Wong ` (2 more replies) 0 siblings, 3 replies; 17+ messages in thread From: Konstantin Ryabitsev @ 2019-06-06 19:04 UTC (permalink / raw) To: Eric Wong; +Cc: meta Hello: This is an old-ish discussion, but we finally had a chance to run the httpd daemon for a long time without restarting it to add more lists, and the memory usage on it is actually surprising: $ ps -eF | grep public-inbox publici+ 17741 1 0 52667 24836 8 May24 ? 00:00:00 /usr/bin/perl -w /usr/local/bin/public-inbox-nntpd -1 /var/log/public-inbox/nntpd.out.log publici+ 17744 17741 0 69739 90288 9 May24 ? 00:38:43 /usr/bin/perl -w /usr/local/bin/public-inbox-nntpd -1 /var/log/public-inbox/nntpd.out.log publici+ 18273 1 0 52599 23832 9 May24 ? 00:00:00 /usr/bin/perl -w /usr/local/bin/public-inbox-httpd -1 /var/log/public-inbox/httpd.out.log publici+ 18275 18273 4 5016115 19713872 10 May24 ? 13:59:13 /usr/bin/perl -w /usr/local/bin/public-inbox-httpd -1 /var/log/public-inbox/httpd.out.log You'll notice that process 18275 has been running since May 24 and takes up 19GB in RSS. This is a 16-core 64-GB system, so it's not necessarily super alarming, but seems large. :) Is that normal, and if not, what can I do to help troubleshoot where it's all going? -K On Sat, Dec 01, 2018 at 07:44:29PM +0000, Eric Wong wrote: >I haven't been around much, so not working on public-inbox means >fewer restarts :x > >On my 2-core VM, I've been noticing public-inbox-httpd memory >spikes into the 500MB range, which is gross... It seems caused >by slow clients and large threads/mbox downloads. The PSGI code >only loads one email per-client in memory at-a-time when using >-httpd; but that adds up with many clients and larger messages. > >I run two -httpd workers, one-per-core, but also varnish and an >experimental Ruby/C reverse-buffering proxy (yahns) for HTTPS. > >The problem seems to be varnish isn't reading from -httpd fast >enough (and I lack CPU cores), but decreasing the niceness of >varnish seems to help with the problem... ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: how's memory usage on public-inbox-httpd? 2019-06-06 19:04 ` Konstantin Ryabitsev @ 2019-06-06 20:37 ` Eric Wong 2019-06-06 21:45 ` Konstantin Ryabitsev 2019-06-06 20:54 ` Eric Wong 2019-10-16 22:10 ` Eric Wong 2 siblings, 1 reply; 17+ messages in thread From: Eric Wong @ 2019-06-06 20:37 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: meta Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote: > Hello: > > This is an old-ish discussion, but we finally had a chance to run the httpd > daemon for a long time without restarting it to add more lists, and the > memory usage on it is actually surprising: Thanks for getting back to this. > $ ps -eF | grep public-inbox > publici+ 17741 1 0 52667 24836 8 May24 ? 00:00:00 /usr/bin/perl -w /usr/local/bin/public-inbox-nntpd -1 /var/log/public-inbox/nntpd.out.log > publici+ 17744 17741 0 69739 90288 9 May24 ? 00:38:43 /usr/bin/perl -w /usr/local/bin/public-inbox-nntpd -1 /var/log/public-inbox/nntpd.out.log > publici+ 18273 1 0 52599 23832 9 May24 ? 00:00:00 /usr/bin/perl -w /usr/local/bin/public-inbox-httpd -1 /var/log/public-inbox/httpd.out.log > publici+ 18275 18273 4 5016115 19713872 10 May24 ? 13:59:13 /usr/bin/perl -w /usr/local/bin/public-inbox-httpd -1 /var/log/public-inbox/httpd.out.log > > You'll notice that process 18275 has been running since May 24 and takes up > 19GB in RSS. This is a 16-core 64-GB system, so it's not necessarily super > alarming, but seems large. :) Yes, it's large and ugly :< I don't even have 19GB and even 90MB RSS worries me. Do you have commit 7d02b9e64455831d3bda20cd2e64e0c15dc07df5? ("view: stop storing all MIME objects on large threads") That was most significant. Also, it looks like you've yet to configure the wacky coderepo+solver stuff, so that's not a culprit... Otherwise it's probably a combination of several things... httpd and nntpd both supports streaming, arbitrarily large endpoints (all.mbox.gz, and /T/, /t/, /t.mbox.gz threads with thousands of messages, giant NNTP BODY/ARTICLE ranges). All those endpoints should detect backpressure from a slow client (varnish/nginx in your case) using the ->getline method. gzip (for compressed mbox) also uses truckload of memory and I would like to add options to control zlib window sizes to reduce memory use (at the cost of less compression). nginx has these options, too, but they're not documented AFAIK. For the case of varnish/nginx or whatever's in front of it not keeping up... the old design choice of Danga::Socket (now inherited to PublicInbox::DS) made it buffer slow client data to RAM, which doesn't make sense to me... I prefer buffering to the FS (similar to nginx/varnish) to avoid malloc fragmentation and also to avoid delaying the extra kernel-to-user copy if using sendfile. By default, glibc malloc is really adverse to releasing memory back to the OS, too. It's fast in benchmarks that way; (until the system starts swapping and slowdowns cascade to failure). I'm also unsure about malloc fragmentation behavior at such sizes and how it hurts locality. So my preference is to avoid putting big objects into heap and let the kernel/FS deal with big buffers. httpd/nntpd both try to avoid buffering at all with the backpressure handling based on ->getline; but sometimes it's not effective enough because some big chunks still end up in heap. In any case, you can safely SIGQUIT the individual worker and it'll restart gracefully w/o dropping active connections. Also, are you only using the default of -W/--worker-process=1 on a 16-core machine? Just checked public-inbox-httpd(8), the -W switch is documented :) You can use SIGTTIN/TTOU to increase, decrease workers w/o restarting, too. nntpd would have the same problem if people used it more; but at the moment it doesn't do gzip. I'm happy to see it's at least gotten some traffic :) > Is that normal, and if not, what can I do to help troubleshoot where it's > all going? There's definitely some problems with big threads, giant messages and gzip overhead. I was looking into a few big threads earlier this year but forgot the Message-IDs :x Do you have any stats on the number of simultaneous connections public-inbox-httpd/nginx/varnish handles (and logging of that info at peek)? (perhaps running "ss -tan" periodically)(*) Are you using the Plack::Middleware::Deflater endpoint in PSGI? Removing it and doing gzip in varnish/nginx may be a little faster since it can utilize multiple cores, but at higher IPC cost. I've gotten rid of the annoying warning for that middleware install as a result... But gzipped mboxes has the same problem; though; so adding zlib window-size options would be necessary... So I think supporting buffer-to-FS behavior in ::DS along with gzip options should alleviate much of the memory use. But immediately you can increase worker process counts to distribute the load between cores a bit... I've also tried nicing down nginx/varnish so they're prioritized by the kernel and don't bottleneck -httpd. Makes sense to me in theory but I was also making a lot of changes around the same time to reduce httpd memory use. Limiting HTTP endpoint response size isn't a real option to protect the server; IMHO, because NNTP requires supporting giant responses anyways. (*) I did "raindrops" with Ruby+C back in the day but haven't really looked at it in ages, and I don't think the IPv6 counting was accurate <https://raindrops-demo.bogomips.org/> That's -httpd on :280 ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: how's memory usage on public-inbox-httpd? 2019-06-06 20:37 ` Eric Wong @ 2019-06-06 21:45 ` Konstantin Ryabitsev 2019-06-06 22:10 ` Eric Wong 0 siblings, 1 reply; 17+ messages in thread From: Konstantin Ryabitsev @ 2019-06-06 21:45 UTC (permalink / raw) To: Eric Wong; +Cc: meta On Thu, Jun 06, 2019 at 08:37:52PM +0000, Eric Wong wrote: >Do you have commit 7d02b9e64455831d3bda20cd2e64e0c15dc07df5? >("view: stop storing all MIME objects on large threads") >That was most significant. Yes. We're running 743ac758 with a few cherry-picked patches on top of that (like epoch roll-over fix). >Otherwise it's probably a combination of several things... >httpd and nntpd both supports streaming, arbitrarily large >endpoints (all.mbox.gz, and /T/, /t/, /t.mbox.gz threads with >thousands of messages, giant NNTP BODY/ARTICLE ranges). > >All those endpoints should detect backpressure from a slow >client (varnish/nginx in your case) using the ->getline method. Wouldn't that spike up and down? The size I'm seeing stays pretty constant without any significant changes across requests. >Also, are you only using the default of -W/--worker-process=1 >on a 16-core machine? Just checked public-inbox-httpd(8), the >-W switch is documented :) You can use SIGTTIN/TTOU to >increase, decrease workers w/o restarting, too. D'oh, yes... though it's not been a problem yet. :) I'm not sure I want to bump that up, though, if that means we're going to have multiple 19GB-sized processes instead of one. :) >Do you have any stats on the number of simultaneous connections >public-inbox-httpd/nginx/varnish handles (and logging of that >info at peek)? (perhaps running "ss -tan" periodically)(*) We don't collect that info, but I'm not sure it's the number of concurrent connections that's the culprit, as there is no fluctuation in RSS size based on the number of responses. To answer the questions in your follow-up: It would appear to be all in anon memory. Mem_usage [1] reports: # ./Mem_usage 18275 Backed by file: Executable r-x 16668 Write/Exec (jump tables) rwx 0 RO data r-- 106908 Data rw- 232 Unreadable --- 94072 Unknown 0 Anonymous: Writable code (stack) rwx 0 Data (malloc, mmap) rw- 19988892 RO data r-- 0 Unreadable --- 0 Unknown 12 I've been looking at lsof -p of that process and I see sqlite and xapian showing up and disappearing. The lkml ones are being accessed almost all the time, but even there I see them showing up with different FD entries, so they are being closed and reopened properly. Hope this helps. -K .. [1] https://elinux.org/images/d/d3/Mem_usage ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: how's memory usage on public-inbox-httpd? 2019-06-06 21:45 ` Konstantin Ryabitsev @ 2019-06-06 22:10 ` Eric Wong 2019-06-06 22:19 ` Konstantin Ryabitsev 2019-06-09 8:39 ` how's memory usage on public-inbox-httpd? Eric Wong 0 siblings, 2 replies; 17+ messages in thread From: Eric Wong @ 2019-06-06 22:10 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: meta Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote: > On Thu, Jun 06, 2019 at 08:37:52PM +0000, Eric Wong wrote: > > Do you have commit 7d02b9e64455831d3bda20cd2e64e0c15dc07df5? > > ("view: stop storing all MIME objects on large threads") > > That was most significant. > > Yes. We're running 743ac758 with a few cherry-picked patches on top of that > (like epoch roll-over fix). > > > Otherwise it's probably a combination of several things... > > httpd and nntpd both supports streaming, arbitrarily large > > endpoints (all.mbox.gz, and /T/, /t/, /t.mbox.gz threads with > > thousands of messages, giant NNTP BODY/ARTICLE ranges). > > > > All those endpoints should detect backpressure from a slow > > client (varnish/nginx in your case) using the ->getline method. > > Wouldn't that spike up and down? The size I'm seeing stays pretty constant > without any significant changes across requests. Nope. That's the thing with glibc malloc not wanting to trim the heap for good benchmarks. You could also try starting with MALLOC_MMAP_THRESHOLD_=131072 in env (or some smaller/larger number in bytes) to force it to use mmap in more cases instead of sbrk. > > Also, are you only using the default of -W/--worker-process=1 > > on a 16-core machine? Just checked public-inbox-httpd(8), the > > -W switch is documented :) You can use SIGTTIN/TTOU to > > increase, decrease workers w/o restarting, too. > > D'oh, yes... though it's not been a problem yet. :) I'm not sure I want to > bump that up, though, if that means we're going to have multiple 19GB-sized > processes instead of one. :) You'd probably end up with several smaller processes totalling up to 19GB. In any case, killing individual workers with QUIT/INT/TERM is graceful and won't drop connections if memory use on one goes awry. > > Do you have any stats on the number of simultaneous connections > > public-inbox-httpd/nginx/varnish handles (and logging of that > > info at peek)? (perhaps running "ss -tan" periodically)(*) > > We don't collect that info, but I'm not sure it's the number of concurrent > connections that's the culprit, as there is no fluctuation in RSS size based > on the number of responses. Without concurrent connections; I can't see that happening unless there's a single message which is gigabytes in size. I'm already irked that Email::MIME requires slurping entire emails into memory; but it should not be using more than one Email::MIME object in memory-at-a-time for a single client. Anything from varnish/nginx logs can't keep up for some reason? Come to think of it, nginx proxy buffering might be redundant and even harmful if varnish is already doing it. Perhaps "proxy_buffering off" in nginx is worth trying... I use yahns instead of nginx, which does lazy buffering (but scary Ruby experimental server warning applies :x). Last I checked: nginx is either buffer-in-full-before-first-byte or no buffering at all (which is probably fine with varnish). > To answer the questions in your follow-up: > > It would appear to be all in anon memory. Mem_usage [1] reports: > > # ./Mem_usage 18275 > Backed by file: > Executable r-x 16668 > Write/Exec (jump tables) rwx 0 > RO data r-- 106908 > Data rw- 232 > Unreadable --- 94072 > Unknown 0 > Anonymous: > Writable code (stack) rwx 0 > Data (malloc, mmap) rw- 19988892 > RO data r-- 0 > Unreadable --- 0 > Unknown 12 > > I've been looking at lsof -p of that process and I see sqlite and xapian > showing up and disappearing. The lkml ones are being accessed almost all the > time, but even there I see them showing up with different FD entries, so > they are being closed and reopened properly. Yep, that's expected. It's to better detect DB changes in case of compact/copydatabase/xcpdb for Xapian. Might not be necessary strictly necessary for SQLite, but maybe somebody could be running VACUUM offline; then flock-ing inbox.lock and rename-ing it into place or something (and retrying/restarting the VACUUM if out-of-date, seq_lock style). ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: how's memory usage on public-inbox-httpd? 2019-06-06 22:10 ` Eric Wong @ 2019-06-06 22:19 ` Konstantin Ryabitsev 2019-06-06 22:29 ` Eric Wong 2019-06-09 8:39 ` how's memory usage on public-inbox-httpd? Eric Wong 1 sibling, 1 reply; 17+ messages in thread From: Konstantin Ryabitsev @ 2019-06-06 22:19 UTC (permalink / raw) To: Eric Wong; +Cc: meta On Thu, Jun 06, 2019 at 10:10:09PM +0000, Eric Wong wrote: >> > All those endpoints should detect backpressure from a slow >> > client (varnish/nginx in your case) using the ->getline method. >> >> Wouldn't that spike up and down? The size I'm seeing stays pretty constant >> without any significant changes across requests. > >Nope. That's the thing with glibc malloc not wanting to trim >the heap for good benchmarks. > >You could also try starting with MALLOC_MMAP_THRESHOLD_=131072 >in env (or some smaller/larger number in bytes) to force it to >use mmap in more cases instead of sbrk. I've restarted the process and I'm running mmap -x $PID | tail -1 on it once a minute. I'll try to collect this data for a while and see if I can notice significant increases and correlate that with access logs. From the first few minutes of running I see: Thu Jun 6 22:06:03 UTC 2019 total kB 298160 102744 96836 Thu Jun 6 22:07:03 UTC 2019 total kB 355884 154968 147664 Thu Jun 6 22:08:03 UTC 2019 total kB 355884 154980 147664 Thu Jun 6 22:09:03 UTC 2019 total kB 359976 156788 148336 Thu Jun 6 22:10:03 UTC 2019 total kB 359976 156788 148336 Thu Jun 6 22:11:03 UTC 2019 total kB 359976 156788 148336 Thu Jun 6 22:12:03 UTC 2019 total kB 365464 166612 158160 Thu Jun 6 22:13:03 UTC 2019 total kB 366884 167908 159456 Thu Jun 6 22:14:03 UTC 2019 total kB 366884 167908 159456 Thu Jun 6 22:15:03 UTC 2019 total kB 366884 167908 159456 >Without concurrent connections; I can't see that happening >unless there's a single message which is gigabytes in size. I'm >already irked that Email::MIME requires slurping entire emails >into memory; but it should not be using more than one >Email::MIME object in memory-at-a-time for a single client. > >Anything from varnish/nginx logs can't keep up for some reason? Speaking of logs, I did notice that even though we're passing -1 /var/log/public-inbox/httpd.out.log, that file stays empty. There's nttpd.out.log there, which is non-empty, so that's curious: # ls -ahl total 2.6M drwx------. 2 publicinbox publicinbox 177 Jun 6 22:05 . drwxr-xr-x. 21 root root 4.0K Jun 2 03:12 .. -rw-r--r--. 1 publicinbox publicinbox 0 Jun 6 22:05 httpd.out.log -rw-r--r--. 1 publicinbox publicinbox 422K Jun 6 22:04 nntpd.out.log -rw-r--r--. 1 publicinbox publicinbox 771K May 12 01:02 nntpd.out.log-20190512.gz -rw-r--r--. 1 publicinbox publicinbox 271K May 19 03:45 nntpd.out.log-20190519.gz -rw-r--r--. 1 publicinbox publicinbox 86K May 25 22:23 nntpd.out.log-20190526.gz -rw-r--r--. 1 publicinbox publicinbox 1.1M Jun 2 00:52 nntpd.out.log-20190602 Could it be that stdout is not being written out and is just perpetually buffered? That could explain the ever-growing size. -K ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: how's memory usage on public-inbox-httpd? 2019-06-06 22:19 ` Konstantin Ryabitsev @ 2019-06-06 22:29 ` Eric Wong 2019-06-10 10:09 ` [RFC] optionally support glibc malloc_info via SIGCONT Eric Wong 0 siblings, 1 reply; 17+ messages in thread From: Eric Wong @ 2019-06-06 22:29 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: meta Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote: > On Thu, Jun 06, 2019 at 10:10:09PM +0000, Eric Wong wrote: > > > > All those endpoints should detect backpressure from a slow > > > > client (varnish/nginx in your case) using the ->getline method. > > > > > > Wouldn't that spike up and down? The size I'm seeing stays pretty constant > > > without any significant changes across requests. > > > > Nope. That's the thing with glibc malloc not wanting to trim > > the heap for good benchmarks. > > > > You could also try starting with MALLOC_MMAP_THRESHOLD_=131072 > > in env (or some smaller/larger number in bytes) to force it to > > use mmap in more cases instead of sbrk. > > I've restarted the process and I'm running mmap -x $PID | tail -1 on it once > a minute. I'll try to collect this data for a while and see if I can notice > significant increases and correlate that with access logs. From the first > few minutes of running I see: > > Thu Jun 6 22:06:03 UTC 2019 > total kB 298160 102744 96836 > Thu Jun 6 22:07:03 UTC 2019 > total kB 355884 154968 147664 > Thu Jun 6 22:08:03 UTC 2019 > total kB 355884 154980 147664 > Thu Jun 6 22:09:03 UTC 2019 > total kB 359976 156788 148336 > Thu Jun 6 22:10:03 UTC 2019 > total kB 359976 156788 148336 > Thu Jun 6 22:11:03 UTC 2019 > total kB 359976 156788 148336 > Thu Jun 6 22:12:03 UTC 2019 > total kB 365464 166612 158160 > Thu Jun 6 22:13:03 UTC 2019 > total kB 366884 167908 159456 > Thu Jun 6 22:14:03 UTC 2019 > total kB 366884 167908 159456 > Thu Jun 6 22:15:03 UTC 2019 > total kB 366884 167908 159456 Would also be good to correlate that to open sockets, too. (168M is probably normal for 64-bit, I'm still on 32-bit and its <100M). I'm not happy with that memory use, even; but it's better than gigabytes. > > Without concurrent connections; I can't see that happening > > unless there's a single message which is gigabytes in size. I'm > > already irked that Email::MIME requires slurping entire emails > > into memory; but it should not be using more than one > > Email::MIME object in memory-at-a-time for a single client. > > > > Anything from varnish/nginx logs can't keep up for some reason? > > Speaking of logs, I did notice that even though we're passing -1 > /var/log/public-inbox/httpd.out.log, that file stays empty. There's > nttpd.out.log there, which is non-empty, so that's curious: > > # ls -ahl > total 2.6M > drwx------. 2 publicinbox publicinbox 177 Jun 6 22:05 . > drwxr-xr-x. 21 root root 4.0K Jun 2 03:12 .. > -rw-r--r--. 1 publicinbox publicinbox 0 Jun 6 22:05 httpd.out.log > -rw-r--r--. 1 publicinbox publicinbox 422K Jun 6 22:04 nntpd.out.log > -rw-r--r--. 1 publicinbox publicinbox 771K May 12 01:02 nntpd.out.log-20190512.gz > -rw-r--r--. 1 publicinbox publicinbox 271K May 19 03:45 nntpd.out.log-20190519.gz > -rw-r--r--. 1 publicinbox publicinbox 86K May 25 22:23 nntpd.out.log-20190526.gz > -rw-r--r--. 1 publicinbox publicinbox 1.1M Jun 2 00:52 nntpd.out.log-20190602 > > Could it be that stdout is not being written out and is just perpetually > buffered? That could explain the ever-growing size. There's no HTTP access logging by default. AccessLog::Timed is commented out in examples/public-inbox.psgi; and the example uses syswrite, even. Also, PublicInbox::Daemon definitely enables autoflush on STDOUT. ^ permalink raw reply [flat|nested] 17+ messages in thread
* [RFC] optionally support glibc malloc_info via SIGCONT 2019-06-06 22:29 ` Eric Wong @ 2019-06-10 10:09 ` Eric Wong 0 siblings, 0 replies; 17+ messages in thread From: Eric Wong @ 2019-06-10 10:09 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: meta Eric Wong <e@80x24.org> wrote: > Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote: > > On Thu, Jun 06, 2019 at 10:10:09PM +0000, Eric Wong wrote: > > > > > All those endpoints should detect backpressure from a slow > > > > > client (varnish/nginx in your case) using the ->getline method. > > > > > > > > Wouldn't that spike up and down? The size I'm seeing stays pretty constant > > > > without any significant changes across requests. > > > > > > Nope. That's the thing with glibc malloc not wanting to trim > > > the heap for good benchmarks. glibc's malloc_info(3) function shows a bunch of info about malloc internal structures which can't be examined easily with other tools. Maybe this can help track down what glibc malloc is doing and can tell us how much free memory glibc is holding onto w/o giving back to the kernel. ------------8<------------- Subject: [RFC] optionally support glibc malloc_info via SIGCONT If run with PERL_INLINE_DIRECTORY for Inline::C support along with INBOX_DEBUG=malloc_info, we can allow users to opt-in to compiling extra code to support the glibc malloc_info(3) function. We'll also add SIGCONT handler to dump the malloc_info(3) output to stderr on our daemons. --- MANIFEST | 1 + lib/PublicInbox/Spawn.pm | 17 +++++++++++++++++ t/malloc_info.t | 25 +++++++++++++++++++++++++ 3 files changed, 43 insertions(+) create mode 100644 t/malloc_info.t diff --git a/MANIFEST b/MANIFEST index 5085bff..4a7f7ef 100644 --- a/MANIFEST +++ b/MANIFEST @@ -210,6 +210,7 @@ t/indexlevels-mirror.t t/init.t t/linkify.t t/main-bin/spamc +t/malloc_info.t t/mda.t t/mda_filter_rubylang.t t/mid.t diff --git a/lib/PublicInbox/Spawn.pm b/lib/PublicInbox/Spawn.pm index 66b916d..9210f11 100644 --- a/lib/PublicInbox/Spawn.pm +++ b/lib/PublicInbox/Spawn.pm @@ -149,6 +149,23 @@ int pi_fork_exec(int in, int out, int err, } VFORK_SPAWN +# TODO: we may support other mallocs through this parameter +if (($ENV{INBOX_DEBUG} // '') =~ /\bmalloc_info\b/) { + $vfork_spawn .= <<MALLOC_DEBUG; +#include <malloc.h> + +int inbox_malloc_info(int options) +{ + int rc = malloc_info(options, stderr); + + return rc == 0 ? TRUE : FALSE; +} +MALLOC_DEBUG + + # dump malloc info to stderr on SIGCONT + $SIG{CONT} = sub { inbox_malloc_info(0) }; +} + my $inline_dir = $ENV{PERL_INLINE_DIRECTORY}; $vfork_spawn = undef unless defined $inline_dir && -d $inline_dir && -w _; if (defined $vfork_spawn) { diff --git a/t/malloc_info.t b/t/malloc_info.t new file mode 100644 index 0000000..352ec5c --- /dev/null +++ b/t/malloc_info.t @@ -0,0 +1,25 @@ +# Copyright (C) 2019 all contributors <meta@public-inbox.org> +# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> +use strict; +use warnings; +use Test::More; +use PublicInbox::Spawn (); + +if (!PublicInbox::Spawn->can('inbox_malloc_info')) { + plan skip_all => 'inbox_malloc_info not enabled'; +} + +open my $olderr, '>&', \*STDERR or die "dup stderr: $!"; +open my $tmp, '+>', undef or die "tmpfile: $!"; +open STDERR, '>&', $tmp or die "redirect stderr to \$tmp: $!"; +my @x = map { '0' x (1024 * 1024) } (1..128); +my $cb = $SIG{CONT}; +$cb->(); +@x = ('hello'); +PublicInbox::Spawn::inbox_malloc_info(0); +open STDERR, '>&', $olderr or die "restore stderr: $!"; +sysseek($tmp, 0, 0) == 0 or die "sysseek: $!"; +my @info = <$tmp>; +like($info[0], qr/</, 'output looks like XML'); + +done_testing; -- EW ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: how's memory usage on public-inbox-httpd? 2019-06-06 22:10 ` Eric Wong 2019-06-06 22:19 ` Konstantin Ryabitsev @ 2019-06-09 8:39 ` Eric Wong 2019-06-12 17:08 ` Eric Wong 1 sibling, 1 reply; 17+ messages in thread From: Eric Wong @ 2019-06-09 8:39 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: meta Eric Wong <e@80x24.org> wrote: > Without concurrent connections; I can't see that happening > unless there's a single message which is gigabytes in size. I'm > already irked that Email::MIME requires slurping entire emails > into memory; but it should not be using more than one > Email::MIME object in memory-at-a-time for a single client. Giant multipart messages do a lot of damage. Maybe concurrent clients hitting the same endpoints will do more damage. Largest I see in LKML is 7204747 bytes (which is frightening). That bloats to 21626795 bytes when parsed by Email::MIME. I thought it was bad enough that all Perl mail modules seem to require slurping 7M into memory... -------8<-------- use strict; use warnings; require Email::MIME; use bytes (); use Devel::Size qw(total_size); my $in = do { local $/; <STDIN> }; print 'string: ', total_size($in), ' actual: ', bytes::length($in), "\n"; print 'MIME: ', total_size(Email::MIME->new(\$in)), "\n"; -------8<-------- That shows (on amd64): string: 7204819 actual: 7204747 MIME: 21626795 Maybe you have bigger messages outside of LKML. This prints all objects >1MB in a git dir: git cat-file --buffer --batch-check --batch-all-objects \ --unordered | awk '$3 > 1048576 { print }' And I also remember you're supporting non-vger lists where HTML mail is allowed, so that can't be good for memory use at all :< Streaming MIME handling has been on the TODO for a while, at least... |* streaming Email::MIME replacement: currently we generate many | allocations/strings for headers we never look at and slurp | entire message bodies into memory. | (this is pie-in-the-sky territory...) ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: how's memory usage on public-inbox-httpd? 2019-06-09 8:39 ` how's memory usage on public-inbox-httpd? Eric Wong @ 2019-06-12 17:08 ` Eric Wong 0 siblings, 0 replies; 17+ messages in thread From: Eric Wong @ 2019-06-12 17:08 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: meta Eric Wong <e@80x24.org> wrote: > Maybe you have bigger messages outside of LKML. > This prints all objects >1MB in a git dir: > > git cat-file --buffer --batch-check --batch-all-objects \ > --unordered | awk '$3 > 1048576 { print }' > > And I also remember you're supporting non-vger lists where HTML > mail is allowed, so that can't be good for memory use at all :< Btw, have you had a chance to run the above to scan your repos? The 19GB RSS would interact poorly with more frequent fork+exec; such as the change to do unconditional expiry of long-lived cat-file --batch* processes: https://public-inbox.org/meta/20190601033706.18113-2-e@80x24.org/ If you run Inline::C + PERL_INLINE_DIRECTORY to use vfork instead of fork, then RSS won't affect process spawning speeds. But yeah, 19GB RSS still sucks and I'd like to get to the bottom of it. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: how's memory usage on public-inbox-httpd? 2019-06-06 19:04 ` Konstantin Ryabitsev 2019-06-06 20:37 ` Eric Wong @ 2019-06-06 20:54 ` Eric Wong 2019-10-16 22:10 ` Eric Wong 2 siblings, 0 replies; 17+ messages in thread From: Eric Wong @ 2019-06-06 20:54 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: meta Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote: > Is that normal, and if not, what can I do to help troubleshoot where it's > all going? Oh, another thing is to check if any of that is from mmap-ed files or if its all anon heap memory. The SQLite I'm using from Debian doesn't seem to mmap files (I've never had to look too hard at tuning SQLite) and I don't think libxapian uses mmap at all either (just pread). But LKML's sqlite files are huge. I know git(1) will mmap entire packs into memory. If/when public-inbox start supporting Git::Raw (libgit2) or even Cache::FastMmap, then yeah; process sizes will look a lot bigger. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: how's memory usage on public-inbox-httpd? 2019-06-06 19:04 ` Konstantin Ryabitsev 2019-06-06 20:37 ` Eric Wong 2019-06-06 20:54 ` Eric Wong @ 2019-10-16 22:10 ` Eric Wong 2019-10-18 19:23 ` Konstantin Ryabitsev 2 siblings, 1 reply; 17+ messages in thread From: Eric Wong @ 2019-10-16 22:10 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: meta Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote: > Hello: > > This is an old-ish discussion, but we finally had a chance to run the httpd > daemon for a long time without restarting it to add more lists, and the > memory usage on it is actually surprising: > > $ ps -eF | grep public-inbox > publici+ 17741 1 0 52667 24836 8 May24 ? 00:00:00 /usr/bin/perl -w /usr/local/bin/public-inbox-nntpd -1 /var/log/public-inbox/nntpd.out.log > publici+ 17744 17741 0 69739 90288 9 May24 ? 00:38:43 /usr/bin/perl -w /usr/local/bin/public-inbox-nntpd -1 /var/log/public-inbox/nntpd.out.log > publici+ 18273 1 0 52599 23832 9 May24 ? 00:00:00 /usr/bin/perl -w /usr/local/bin/public-inbox-httpd -1 /var/log/public-inbox/httpd.out.log > publici+ 18275 18273 4 5016115 19713872 10 May24 ? 13:59:13 /usr/bin/perl -w /usr/local/bin/public-inbox-httpd -1 /var/log/public-inbox/httpd.out.log > > You'll notice that process 18275 has been running since May 24 and takes up > 19GB in RSS. This is a 16-core 64-GB system, so it's not necessarily super > alarming, but seems large. :) > > Is that normal, and if not, what can I do to help troubleshoot where it's > all going? Btw, has this gotten better since the Perl 5.16.3 workarounds? My 32-bit instance which sees the most HTTP traffic hasn't exceeded 80M per-process in a while. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: how's memory usage on public-inbox-httpd? 2019-10-16 22:10 ` Eric Wong @ 2019-10-18 19:23 ` Konstantin Ryabitsev 2019-10-19 0:11 ` Eric Wong 0 siblings, 1 reply; 17+ messages in thread From: Konstantin Ryabitsev @ 2019-10-18 19:23 UTC (permalink / raw) To: Eric Wong; +Cc: meta On Wed, Oct 16, 2019 at 10:10:45PM +0000, Eric Wong wrote: >> This is an old-ish discussion, but we finally had a chance to run the >> httpd >> daemon for a long time without restarting it to add more lists, and the >> memory usage on it is actually surprising: >> >> $ ps -eF | grep public-inbox >> publici+ 17741 1 0 52667 24836 8 May24 ? 00:00:00 /usr/bin/perl -w /usr/local/bin/public-inbox-nntpd -1 /var/log/public-inbox/nntpd.out.log >> publici+ 17744 17741 0 69739 90288 9 May24 ? 00:38:43 /usr/bin/perl -w /usr/local/bin/public-inbox-nntpd -1 /var/log/public-inbox/nntpd.out.log >> publici+ 18273 1 0 52599 23832 9 May24 ? 00:00:00 /usr/bin/perl -w /usr/local/bin/public-inbox-httpd -1 /var/log/public-inbox/httpd.out.log >> publici+ 18275 18273 4 5016115 19713872 10 May24 ? 13:59:13 /usr/bin/perl -w /usr/local/bin/public-inbox-httpd -1 /var/log/public-inbox/httpd.out.log >> >> You'll notice that process 18275 has been running since May 24 and takes up >> 19GB in RSS. This is a 16-core 64-GB system, so it's not necessarily super >> alarming, but seems large. :) >> >> Is that normal, and if not, what can I do to help troubleshoot where it's >> all going? > >Btw, has this gotten better since the Perl 5.16.3 workarounds? > >My 32-bit instance which sees the most HTTP traffic hasn't >exceeded 80M per-process in a while. It's been definitely dramatically better. We keep adding lists to lore, so I haven't really been able to watch memory usage after a long period of daemon uptime, but it's never really gone very much above 1GB. In fact, we're downgrading lore to a smaller instance in the near future since we don't need to worry about running out of RAM any more. Best, -K ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: how's memory usage on public-inbox-httpd? 2019-10-18 19:23 ` Konstantin Ryabitsev @ 2019-10-19 0:11 ` Eric Wong 2019-10-22 17:28 ` Konstantin Ryabitsev 2019-10-28 23:24 ` Eric Wong 0 siblings, 2 replies; 17+ messages in thread From: Eric Wong @ 2019-10-19 0:11 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: meta Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote: > On Wed, Oct 16, 2019 at 10:10:45PM +0000, Eric Wong wrote: > > Btw, has this gotten better since the Perl 5.16.3 workarounds? > > > > My 32-bit instance which sees the most HTTP traffic hasn't > > exceeded 80M per-process in a while. > > It's been definitely dramatically better. We keep adding lists to lore, so I > haven't really been able to watch memory usage after a long period of daemon > uptime, but it's never really gone very much above 1GB. In fact, we're > downgrading lore to a smaller instance in the near future since we don't > need to worry about running out of RAM any more. Cool, but 1GB is still an order of magnitude worse that what I'd expect :< I remember Email::MIME had huge explosions with some 30MB+ spam messages: https://public-inbox.org/meta/20190609083918.gfr2kurah7f2hysx@dcvr/ (maybe gmime can help) Depending on your storage speed/latency, more RAM can still help significantly with Xapian. The NVME stuff has amazing numbers, but my mobos are too old and I'm still stuck on SATA 2. Is nntpd better? That only uses Email::Simple and not MIME; so less explosions. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: how's memory usage on public-inbox-httpd? 2019-10-19 0:11 ` Eric Wong @ 2019-10-22 17:28 ` Konstantin Ryabitsev 2019-10-22 19:11 ` Eric Wong 2019-10-28 23:24 ` Eric Wong 1 sibling, 1 reply; 17+ messages in thread From: Konstantin Ryabitsev @ 2019-10-22 17:28 UTC (permalink / raw) To: Eric Wong; +Cc: meta On Sat, Oct 19, 2019 at 12:11:44AM +0000, Eric Wong wrote: >> It's been definitely dramatically better. We keep adding lists to >> lore, so I >> haven't really been able to watch memory usage after a long period of daemon >> uptime, but it's never really gone very much above 1GB. In fact, we're >> downgrading lore to a smaller instance in the near future since we don't >> need to worry about running out of RAM any more. > >Cool, but 1GB is still an order of magnitude worse that what >I'd expect :< I remember Email::MIME had huge explosions with >some 30MB+ spam messages: > https://public-inbox.org/meta/20190609083918.gfr2kurah7f2hysx@dcvr/ > (maybe gmime can help) > >Depending on your storage speed/latency, more RAM can still help >significantly with Xapian. The NVME stuff has amazing numbers, >but my mobos are too old and I'm still stuck on SATA 2. My goal is to rework lore.kernel.org significantly. Currently, it's a single system hosted at AWS that both receives mail and serves the archives, but I would actually like to split it into two: - the archiver that just generates git repositories but serves no traffic (probably running directly on mail.kernel.org). - several front-ends that replicate repositories from the archiver and provide http/nntp access, probably reusing mirrors.edge.kernel.org nodes that run from us-west, us-east, eu-west and ap-east. That should provide both redundancy and better geographic availability of the service. This requires some testing first to ensure that grokmirror hooks and reindexing works reliably for replicated repo collections. >Is nntpd better? That only uses Email::Simple and not MIME; >so less explosions. The number of people using nntp is several orders of magnitude lower, so I'm not sure it's a good metric for anything. Best, -K ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: how's memory usage on public-inbox-httpd? 2019-10-22 17:28 ` Konstantin Ryabitsev @ 2019-10-22 19:11 ` Eric Wong 0 siblings, 0 replies; 17+ messages in thread From: Eric Wong @ 2019-10-22 19:11 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: meta Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote: > On Sat, Oct 19, 2019 at 12:11:44AM +0000, Eric Wong wrote: > >> It's been definitely dramatically better. We keep adding lists to > >> lore, so I > >> haven't really been able to watch memory usage after a long period of daemon > >> uptime, but it's never really gone very much above 1GB. In fact, we're > >> downgrading lore to a smaller instance in the near future since we don't > >> need to worry about running out of RAM any more. > > > >Cool, but 1GB is still an order of magnitude worse that what > >I'd expect :< I remember Email::MIME had huge explosions with > >some 30MB+ spam messages: > > https://public-inbox.org/meta/20190609083918.gfr2kurah7f2hysx@dcvr/ > > (maybe gmime can help) > > > >Depending on your storage speed/latency, more RAM can still help > >significantly with Xapian. The NVME stuff has amazing numbers, > >but my mobos are too old and I'm still stuck on SATA 2. > > My goal is to rework lore.kernel.org significantly. Currently, it's a > single system hosted at AWS that both receives mail and serves the > archives, but I would actually like to split it into two: > > - the archiver that just generates git repositories but serves no > traffic (probably running directly on mail.kernel.org). > - several front-ends that replicate repositories from the archiver and > provide http/nntp access, probably reusing mirrors.edge.kernel.org > nodes that run from us-west, us-east, eu-west and ap-east. > > That should provide both redundancy and better geographic availability > of the service. This requires some testing first to ensure that > grokmirror hooks and reindexing works reliably for replicated repo > collections. Yeah, I noticed some mirror indexing bugs at: https://public-inbox.org/meta/20191016211415.GA6084@dcvr/ But patches 4/3 and 5/3 seem to be doing everything right and I expect the series to be merged soon. I'm also planning to dogfood a mirror of lore off some of my .onions, soon. > >Is nntpd better? That only uses Email::Simple and not MIME; > >so less explosions. > > The number of people using nntp is several orders of magnitude lower, so > I'm not sure it's a good metric for anything. Hopefully nntp usage goes up over time. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: how's memory usage on public-inbox-httpd? 2019-10-19 0:11 ` Eric Wong 2019-10-22 17:28 ` Konstantin Ryabitsev @ 2019-10-28 23:24 ` Eric Wong 1 sibling, 0 replies; 17+ messages in thread From: Eric Wong @ 2019-10-28 23:24 UTC (permalink / raw) To: meta Eric Wong <e@80x24.org> wrote: > Cool, but 1GB is still an order of magnitude worse that what > I'd expect :< I remember Email::MIME had huge explosions with > some 30MB+ spam messages: > https://public-inbox.org/meta/20190609083918.gfr2kurah7f2hysx@dcvr/ > (maybe gmime can help) Fwiw, I'm working on porting a scripting-language-aware malloc tracker I wrote for another scripting language over to Perl/XS which can track down which line of Perl called a particular malloc() statement. ...And reading perlguts/perlxstut manpages again :x ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2019-10-28 23:24 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-12-01 19:44 how's memory usage on public-inbox-httpd? Eric Wong 2019-06-06 19:04 ` Konstantin Ryabitsev 2019-06-06 20:37 ` Eric Wong 2019-06-06 21:45 ` Konstantin Ryabitsev 2019-06-06 22:10 ` Eric Wong 2019-06-06 22:19 ` Konstantin Ryabitsev 2019-06-06 22:29 ` Eric Wong 2019-06-10 10:09 ` [RFC] optionally support glibc malloc_info via SIGCONT Eric Wong 2019-06-09 8:39 ` how's memory usage on public-inbox-httpd? Eric Wong 2019-06-12 17:08 ` Eric Wong 2019-06-06 20:54 ` Eric Wong 2019-10-16 22:10 ` Eric Wong 2019-10-18 19:23 ` Konstantin Ryabitsev 2019-10-19 0:11 ` Eric Wong 2019-10-22 17:28 ` Konstantin Ryabitsev 2019-10-22 19:11 ` Eric Wong 2019-10-28 23:24 ` Eric Wong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).