From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#43389: 28.0.50; Emacs memory leaks Date: Fri, 30 Oct 2020 10:00:29 +0200 Message-ID: <83imasb0te.fsf@gnu.org> References: <87r1r5428d.fsf@web.de> <874kmcvlbj.fsf@mail.trevorbentley.com> Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="10866"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 43389@debbugs.gnu.org To: Trevor Bentley Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Fri Oct 30 09:01:55 2020 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1kYPMA-0002iq-Tq for geb-bug-gnu-emacs@m.gmane-mx.org; Fri, 30 Oct 2020 09:01:55 +0100 Original-Received: from localhost ([::1]:46682 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kYPM9-0002dt-Tp for geb-bug-gnu-emacs@m.gmane-mx.org; Fri, 30 Oct 2020 04:01:53 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:42552) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kYPLK-0002cb-3W for bug-gnu-emacs@gnu.org; Fri, 30 Oct 2020 04:01:05 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:44411) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1kYPLJ-0006vJ-QM for bug-gnu-emacs@gnu.org; Fri, 30 Oct 2020 04:01:01 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1kYPLJ-0001Ol-Nh for bug-gnu-emacs@gnu.org; Fri, 30 Oct 2020 04:01:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Fri, 30 Oct 2020 08:01:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 43389 X-GNU-PR-Package: emacs Original-Received: via spool by 43389-submit@debbugs.gnu.org id=B43389.16040448565357 (code B ref 43389); Fri, 30 Oct 2020 08:01:01 +0000 Original-Received: (at 43389) by debbugs.gnu.org; 30 Oct 2020 08:00:56 +0000 Original-Received: from localhost ([127.0.0.1]:55957 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kYPLE-0001OK-E3 for submit@debbugs.gnu.org; Fri, 30 Oct 2020 04:00:56 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:42350) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kYPLC-0001O8-PW for 43389@debbugs.gnu.org; Fri, 30 Oct 2020 04:00:55 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:56217) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kYPL6-0006m5-65; Fri, 30 Oct 2020 04:00:49 -0400 Original-Received: from [176.228.60.248] (port=1181 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1kYPL5-0007lE-Ku; Fri, 30 Oct 2020 04:00:48 -0400 In-Reply-To: <874kmcvlbj.fsf@mail.trevorbentley.com> (message from Trevor Bentley on Thu, 29 Oct 2020 21:17:20 +0100) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:192041 Archived-At: > From: Trevor Bentley > Date: Thu, 29 Oct 2020 21:17:20 +0100 > > It doesn't start leaking until it has been active for 2-3 days. > It might depends on other factors, such as suspending or losing > network connectivity. Once the leak triggers, it grows at a rate > of about 1MB every few seconds. My machine has 32GB, so it gets > pretty far before I notice and kill it. I'm not sure if there is a > limit. > > I built emacs with debug symbols and dumped some strace logs last > time it happened. This is from the "native-comp" branch, since > it's the only one I had built with debug symbols: GNU Emacs > 28.0.50, commit feed53f8b5da0e58cce412cd41a52883dba6c1be. I see > the same with the version installed from my package manager (Arch, > GNU Emacs 27.1), and the strace log looks about the same, though > without symbols. > > I waited until it was actively leaking, and then ran the following > command to print a stack trace whenever the heap is extended with > brk(): > > $ sudo strace -p $PID -k -r --trace="?brk" --signal="SIGTERM" > > The findings: this particular leak is triggered in libgnutls. I > get large batches of the following (truncated) stack trace Thanks. This trace doesn't show how many bytes were allocated, does it? Without that it is hard to judge whether these GnuTLS calls could be the culprit. Because the full trace shows other calls to malloc, for example this: > /usr/lib/libc-2.32.so(brk+0xb) [0xf6e7b] > /usr/lib/libc-2.32.so(__sbrk+0x84) [0xf6f54] > /usr/lib/libc-2.32.so(__default_morecore+0xd) [0x8d80d] > /usr/lib/libc-2.32.so(sysmalloc+0x372) [0x890e2] > /usr/lib/libc-2.32.so(_int_malloc+0xd9e) [0x8ad6e] > /usr/lib/libc-2.32.so(_int_memalign+0x3f) [0x8b01f] > /usr/lib/libc-2.32.so(_mid_memalign+0x13c) [0x8c12c] > /home/trevor/applications/opt/bin/emacs-28.0.50(lisp_align_malloc+0x2e) [0x2364ee] > /home/trevor/applications/opt/bin/emacs-28.0.50(Fcons+0x65) [0x237f74] > /home/trevor/applications/opt/bin/emacs-28.0.50(store_in_alist+0x5f) [0x5c9a3] > /home/trevor/applications/opt/bin/emacs-28.0.50(gui_report_frame_params+0x46a) [0x607f1] > /home/trevor/applications/opt/bin/emacs-28.0.50(Fframe_parameters+0x499) [0x5d88b] > /home/trevor/applications/opt/bin/emacs-28.0.50(Fframe_parameter+0x381) [0x5dc9c] > /home/trevor/applications/opt/bin/emacs-28.0.50(eval_sub+0x7a7) [0x26f964] > /home/trevor/applications/opt/bin/emacs-28.0.50(Fif+0x1f) [0x26b590] > /home/trevor/applications/opt/bin/emacs-28.0.50(eval_sub+0x38b) [0x26f548] > /home/trevor/applications/opt/bin/emacs-28.0.50(Feval+0x7a) [0x26ef45] > /home/trevor/applications/opt/bin/emacs-28.0.50(funcall_subr+0x257) [0x271463] > /home/trevor/applications/opt/bin/emacs-28.0.50(Ffuncall+0x192) [0x270fe9] > /home/trevor/applications/opt/bin/emacs-28.0.50(internal_condition_case_n+0xa1) [0x26d81a] > /home/trevor/applications/opt/bin/emacs-28.0.50(safe__call+0x211) [0x73943] > /home/trevor/applications/opt/bin/emacs-28.0.50(safe__call1+0xba) [0x73b47] > /home/trevor/applications/opt/bin/emacs-28.0.50(safe__eval+0x35) [0x73bd7] > /home/trevor/applications/opt/bin/emacs-28.0.50(display_mode_element+0xe32) [0xb5515] This seems to indicate some mode-line element that uses :eval, but without knowing what it does it is hard to say anything more specific. I also see this: > /home/trevor/applications/opt/bin/emacs-28.0.50(_start+0x2e) [0x4598e] 2.870962 brk(0x55f5ed9a4000) = 0x55f5ed9a4000 > /usr/lib/libc-2.32.so(brk+0xb) [0xf6e7b] > /usr/lib/libc-2.32.so(__sbrk+0x84) [0xf6f54] > /usr/lib/libc-2.32.so(__default_morecore+0xd) [0x8d80d] > /usr/lib/libc-2.32.so(sysmalloc+0x372) [0x890e2] > /usr/lib/libc-2.32.so(_int_malloc+0xd9e) [0x8ad6e] > /usr/lib/libc-2.32.so(_int_memalign+0x3f) [0x8b01f] > /usr/lib/libc-2.32.so(_mid_memalign+0x13c) [0x8c12c] > /home/trevor/applications/opt/bin/emacs-28.0.50(lisp_align_malloc+0x2e) [0x2364ee] > /home/trevor/applications/opt/bin/emacs-28.0.50(Fcons+0x65) [0x237f74] > /home/trevor/applications/opt/bin/emacs-28.0.50(Fmake_list+0x4f) [0x238544] > /home/trevor/applications/opt/bin/emacs-28.0.50(concat+0x5c3) [0x2792f6] > /home/trevor/applications/opt/bin/emacs-28.0.50(Fcopy_sequence+0x16a) [0x278d2a] > /home/trevor/applications/opt/bin/emacs-28.0.50(timer_check+0x33) [0x1b79dd] > /home/trevor/applications/opt/bin/emacs-28.0.50(readable_events+0x1a) [0x1b5d00] > /home/trevor/applications/opt/bin/emacs-28.0.50(get_input_pending+0x2f) [0x1bcf3a] > /home/trevor/applications/opt/bin/emacs-28.0.50(detect_input_pending_run_timers+0x2e) [0x1c4eb1] > /home/trevor/applications/opt/bin/emacs-28.0.50(wait_reading_process_output+0x14ec) [0x2de0c0] > /home/trevor/applications/opt/bin/emacs-28.0.50(sit_for+0x211) [0x53e78] > /home/trevor/applications/opt/bin/emacs-28.0.50(read_char+0x1019) [0x1b3f62] This indicates some timer that runs; again, without knowing which timer and what it does, it is hard to proceed. Etc. etc. -- the bottom line is that I think we need to know how many bytes are allocated in each call to make some progress. It would be even more useful if we could somehow know which of the allocated buffers are free'd soon and which aren't. That's because Emacs calls memory allocation functions _a_lot_, and it is completely normal to see a lot of these calls. What we need is to find allocations that don't get free'd, and whose byte counts come close to explaining the rate of 1MB every few seconds. So these calls need to be filtered somehow, otherwise we will not see the forest for the gazillion trees. > I'm not sure if gnutls is giving back buffers that emacs is > supposed to free, or if the leak is entirely contained within > gnutls, but something in that path is hanging on to a lot of > allocations indefinitely. The GnuTLS functions we call in emacs_gnutls_read are: gnutls_record_recv emacs_gnutls_handle_error The latter is only called if there's an error, so I'm guessing it is not part of your trace. And the former doesn't say in its documentation that Emacs should free any buffers after calling it, so I'm not sure how Emacs could be the culprit here. If GnuTLS is the culprit (and as explained above, this is not certain at this point), perhaps upgrading to a newer GnuTLS version or reporting this to GnuTLS developers would allow some progress.