From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: "ISHIKAWA,chiaki" Newsgroups: gmane.emacs.bugs Subject: bug#39413: 26.2; Emacs gets hung Date: Wed, 29 Apr 2020 06:36:46 +0900 Message-ID: <119c5df5-080e-c45b-6498-2a2bc0f8273b@yk.rim.or.jp> References: <85r1ynam0x.fsf@gmail.com> <44e850a0-c8dc-4aec-5706-c52db33697a1@ubin.jp> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="33070"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 Cc: chiaki-ishikawa-thunderbird-account , Noam Postavsky To: 39413@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Tue Apr 28 23:39:59 2020 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jTXxO-0008QS-Kp for geb-bug-gnu-emacs@m.gmane-mx.org; Tue, 28 Apr 2020 23:39:58 +0200 Original-Received: from localhost ([::1]:52118 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jTXxN-0002El-IX for geb-bug-gnu-emacs@m.gmane-mx.org; Tue, 28 Apr 2020 17:39:57 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:50022) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jTXwa-0001JV-RS for bug-gnu-emacs@gnu.org; Tue, 28 Apr 2020 17:39:24 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.90_1) (envelope-from ) id 1jTXuY-00086T-BQ for bug-gnu-emacs@gnu.org; Tue, 28 Apr 2020 17:39:08 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:58480) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1jTXuX-00086O-Uu for bug-gnu-emacs@gnu.org; Tue, 28 Apr 2020 17:37:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1jTXuX-0005ao-Rs for bug-gnu-emacs@gnu.org; Tue, 28 Apr 2020 17:37:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: "ISHIKAWA,chiaki" Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 28 Apr 2020 21:37:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 39413 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: unreproducible Original-Received: via spool by 39413-submit@debbugs.gnu.org id=B39413.158810981321482 (code B ref 39413); Tue, 28 Apr 2020 21:37:01 +0000 Original-Received: (at 39413) by debbugs.gnu.org; 28 Apr 2020 21:36:53 +0000 Original-Received: from localhost ([127.0.0.1]:41793 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jTXuP-0005aQ-3l for submit@debbugs.gnu.org; Tue, 28 Apr 2020 17:36:53 -0400 Original-Received: from mail05.siriuscloud.jp ([219.118.72.5]:53272) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jTXuM-0005aC-85 for 39413@debbugs.gnu.org; Tue, 28 Apr 2020 17:36:51 -0400 Original-Received: from [192.168.0.5] (M106073032161.v4.enabler.ne.jp [106.73.32.161]) (Authenticated sender: ishikawa@yk.rim.or.jp) by access05.SiriusCloud.jp (Postfix) with ESMTPA id 49BZjB4S4kz1XMWSb; Wed, 29 Apr 2020 06:36:46 +0900 (JST) Authentication-Results: access05.SiriusCloud.jp; dkim=none reason="no signature"; dkim-adsp=none (unprotected policy); dkim-atps=neutral In-Reply-To: <44e850a0-c8dc-4aec-5706-c52db33697a1@ubin.jp> Content-Language: en-US X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-Received-From: 209.51.188.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:179252 Archived-At: Hi, I am back to this strange guru meditation of emacs. Emacs would eventually begin working, but the silent period where no key or mouse event is acknowledged is rather longish. Several weeks after I installed GNU Emacs 28.0.50 (that is what |emacs-version| variable says. I installed from git repo.) , I began noticing the occasional long pause (no response) again. That is noticeable when I was running a mozilla thunderbird build script in *shell* buffer. (M-X shell RETURN) Obviously, the large log may cause some resource issues, but, say the year before, I might have gotten the warning about memory running low (I can't recall the exact warning. It asked something about "Am I sure to do the operation?", like for example, when I invoked |erase-buffer| to the very large *shell* buffer that contains many log lines. Emacs responded something about dire situation and if I was really invoking the function, and not by mistake, etc. However, it never got seemingly hung for like 10-15 seconds. Yesterday, this long pause happened again, and so I attached gdb to the running emacs process and the following is what I found. Basically, sweep_conses() takes unusually long time. (It is possible that by the time I noticed the long hung and attached gdb, the major portion of the problematic code had already been executed.) This is the first stac ktrace when I attach gdb to emacs. So it looks that it was trying to free cons cells. #0  lisp_align_free (block=) at alloc.c:1266 #1  0x000055bc512500f2 in sweep_conses () at alloc.c:6907 #2  0x000055bc51255b27 in gc_sweep () at alloc.c:7134 #3  garbage_collect () at alloc.c:6105 #4  0x000055bc51255ed1 in maybe_garbage_collect () at alloc.c:5968 #5  0x000055bc51271785 in maybe_gc () at lisp.h:5068 #6  Ffuncall (nargs=4, args=args@entry=0x7fff92901a70) at eval.c:2778 #7  0x000055bc512a5838 in exec_byte_code     (bytestr=, vector=, maxdepth=, args_template=, nargs=, args=) at bytecode.c:633 #8  0x000055bc51271707 in Ffuncall (nargs=4, args=args@entry=0x7fff92901f38) at eval.c:2808     ... omitted ... I invoked "finish" command to gdb. (A very handy command, indeed.) finish Run till exit from #0  lisp_align_free (block=) at alloc.c:1266 0x000055bc512500f2 in sweep_conses () at alloc.c:6907 6907              lisp_align_free (cblk);  where #0  0x000055bc512500f2 in sweep_conses () at alloc.c:6907 #1  0x000055bc51255b27 in gc_sweep () at alloc.c:7134 #2  garbage_collect () at alloc.c:6105 #3  0x000055bc51255ed1 in maybe_garbage_collect () at alloc.c:5968     ... omitted ... Another finish Run till exit from #0  0x000055bc512500f2 in sweep_conses () at alloc.c:6907 gc_sweep () at alloc.c:7135 7135      sweep_floats (); (gdb) sweep_conses() took rather a long time. Maybe five seconds or so. (Could be longer almost 10 seconds...) Another finish.  sweep_floats (); (gdb) finish Run till exit from #0  gc_sweep () at alloc.c:7135 garbage_collect () at alloc.c:7141 7141      check_string_bytes (!noninteractive); (gdb) sweep_float() ends quickly. Another finish. finish Run till exit from #0  garbage_collect () at alloc.c:7141 0x000055bc51271785 in maybe_gc () at lisp.h:5068 5068        maybe_garbage_collect (); (gdb) check_string_bytes finished quickly. finish Run till exit from #0  0x000055bc51271785 in maybe_gc () at lisp.h:5068 deliver_input_available_signal (sig=29) at keyboard.c:7198 7198      deliver_process_signal (sig, handle_input_available_signal); (gdb) Quit (gdb) where The above one took a fraction of a second. It was noticeable. finish Run till exit from #0  deliver_input_available_signal (sig=29) at keyboard.c:7198 (gdb) finish Run till exit from #0  0x000055bc51271785 in maybe_gc () at lisp.h:5068 5068        maybe_garbage_collect (); (gdb) immediate return Well, actually after a few more finish (which completed immediately), emacs was responsive again. I had a few similar experiences yesterday. My observation: So my take on this is sweep_conses() takes a rather long time and not letting emacs to take any input. (And there could be other functions before it.) Question.: Has there been any change in emacs's alloc.c code in the last year or two?  (Even an seemingly innocuous change might have introduced a very bad CPU cache behavior that makes sweep_conses() to perform badly. Or maybe it is now allowed to run without time-constraint?) I suspect that since the apparent long hung occurred during the time when |*shell*| buffer was accumulating many lines from a largish running build process, maybe the string allocation was not catching up and causing excessive GC. In that case, the ordering of reclaiming functions could be arranged to find string first? (Addition to the buffer of lines read from a child process is basically an insertion at the end of the buffer of strings, correct?) OR maybe I am hitting the age old adagio of "GC considered harmful". [1] Simply my virtual space is too large to be handled efficiently.  I remember reading the paper the first time it appeared, but I never thought it would hit me on my home PC. (The PC at the time had 64KB-128KB of memory). I am running emacs under Debian GNU/linux inside VirtualBox which is assigned 16GB memory. My suggestion: I would think sweep_conses() may want to quit early, if possible, after obtaining certain amount of space (or for a casual GC invoked periodically, may want to quit after one second or something like that. That is, we may want to set a elapsed time limit.) In a really tight situation, we can invoke the reclaiming functions with a flag telling them to use as much CPU/Elapsed time to reclaim as many octets as possible, but for a periodic GC, it might make sense to return half-heartedly. Oh, wait, have I set a variable to control periodic GC and forgot about it? I checked. I don't think I tweaked the following variables. C-H k output. gc-cons-threshold is a variable defined in ‘C source code’. Its value is 800000 gc-cons-percentage is a variable defined in ‘C source code’. Its value is 0.1 As of now in a different incarnation of emacs process (this is a differnt process from the one which I attached gdb.), gc-elapsed is a variable defined in ‘C source code’. Its value is 68.3795433 Hope this helps us shedding some light on the guru meditation of emacs probably due to an unusually longish GC. TIA Chiaki PS: Given that this may have something to do with the paging activity of the emacs process, the linux version change (from 4 to 5) may have affected the paging behavior of the program, but it is hard to say.  I now see this linux instance is using 293M SWAP.  Compiling mozilla thunderbird source runs |make -j6| and g++-9 compiles CPP source files which are very large and so they exert heavy memory pressure. A bad memory access pattern of emacs, not friendly to locality of access, would add to page faults. Except, |xosview| program does not show any paging activity. I am not sure if |xosview| is correct or not. [1] "Address/memory management for a gigantic LISP environment or,      GC considered harmful",  Jon L. White,  LFP '80: Proceedings of the 1980 ACM conference on LISP and functional programming, August 1980 Pages 119–127 https://doi.org/10.1145/800087.802797 https://dl.acm.org/doi/10.1145/1317203.1317206 On 2020/02/23 2:53, chiaki-ishikawa-thunderbird-account wrote: > Thank you. I will taken note of "finish" and see if I can observe the > return from alloc.c code > if I get to see the next obvious hung. > > Chiaki > > On 2020/02/22 2:27, Noam Postavsky wrote: >> tags 39413 + unreproducible >> quit >> >> chiaki-ishikawa-thunderbird-account writes: >> >>> I printed stack backtrace and Emacs seemed to be in memory allocator. >>> Then I let it continue. >>> Still the emacs screen was locked up. >>> So I interrupted the execution, and printed stack backtrace. >>> Again Emacs was in memory allocator. >>> Eventually I gave up and killed emacs. >>> >>> It seems that garbage collection or >>> routine in alloc.c was looping. >> That's not enough evidence to show that the garbage collection was >> looping.  There could be some higher level loop that does a lot of >> allocation, so that if you stop at some random point you would have a >> high probability of stopping in the alloc.c.  If you catch this in gdb >> again, try running the 'finish' a few times to see if it can leave the >> alloc.c code. >> >>> This is not a repeatable bug. I have no idea how to reproduce this. It >>> occurs every now and then. > > >