From: chiaki-ishikawa-thunderbird-account <chiaki.ishikawa@ubin.jp>
To: Eli Zaretskii <eliz@gnu.org>
Cc: "chiaki.ishikawa@ubin.jp" <chiaki.ishikawa@ubin.jp>,
39413@debbugs.gnu.org, Noam Postavsky <npostavs@gmail.com>
Subject: bug#39413: 26.2; Emacs gets hung
Date: Wed, 20 May 2020 13:31:29 +0900 [thread overview]
Message-ID: <e6a3ec42-1c21-6ab7-527a-d717b1e7a8a7@ubin.jp> (raw)
In-Reply-To: <0b9cd846-b9f5-8351-5f45-5614bbc644c0@ubin.jp>
[-- Attachment #1: Type: text/plain, Size: 12624 bytes --]
Hi,
I am back.
I have observed another long hung.
This time, I did verify that it is during GC.
The hung lasted close to 8 minutes give or take.
My comment inline.
On 2020/04/30 11:45, chiaki-ishikawa-thunderbird-account wrote:
> Hi,
>
> Thank you for your tips.
>
> My comment inline.:
>
> On 2020/04/29 15:55, Eli Zaretskii wrote:
>>> From: "ISHIKAWA,chiaki" <ishikawa@yk.rim.or.jp>
>>> Date: Wed, 29 Apr 2020 06:36:46 +0900
>>> Cc: chiaki-ishikawa-thunderbird-account <chiaki.ishikawa@ubin.jp>,
>>> Noam Postavsky <npostavs@gmail.com>
>>>
>>> I am back to this strange guru meditation of emacs.
>>> Emacs would eventually begin working, but the silent period where no
>>> key
>>> or mouse event is acknowledged is rather longish.
>>>
>>> Several weeks after I installed GNU Emacs 28.0.50 (that is what
>>> |emacs-version| variable says. I installed from git repo.) , I began
>>> noticing the occasional long pause (no response) again.
>> I suggest to begin by establishing whether these pauses are caused by
>> GC. To that end, set garbage-collection-messages to a non-nil value,
>> and see if Emacs displays a message about GC in progress when these
>> pauses occur. Also, try to tell how frequent are these pauses and
>> much time each pause takes.
>>
>> If you see that every time you experience a pause there's GC running,
>> we can then proceed to investigating whether the frequency of GC and
>> the time it takes on your system is something abnormal, and why.
> I have set garbage-collection-messages to t.
>
> Let me see what happens when the next apparent long guru meditation
> happens.
>
Since the last message, I did not get a very long hung until this
morning. Maybe 5-10 second long GC only. Those I can tolerate.
I feel "what?" after maybe a five second hung and then realize something
is holding up emacs, and then the hung is over. I tell myself GC was
rather long, but that is it.
But this morning I get a rather long hung, or GC, twice.
Attaching gdb to the emacs process then after a 10 second or so of the
first hung, and noticed that sweep_cons() took three seconds to finish
after I entered "finish" command to gdb. That certainly felt long during
gdb session.
So that was an initial clue that something is wrong.
(Now I suspect some kind of memory area corruption. Before this happened
I loaded a several large log files each about 20MB, and when I tried to
erase a rather large *shell* buffer that contained the log from a large
compilation job, this garbage collection kicked in.)
Anyway, I checked and found gc-elapsed showed a number in 400-500
seconds range. I really wished I wrote down the exact number.
>
>
>>> So my take on this is sweep_conses() takes a rather long time and not
>>> letting emacs to take any input. (And there could be other functions
>>> before it.)
>> The question is how long is "long time". We need some quantitative
>> measure of this before we can tell whether it's normal or not.
> I agree. It is all relative.
>
I agree "long" is a relative term.
But after the somewhat long pause of maybe 20-30 seconds, I get another
really long pause which prompted me to write this mail.
The second long hung this morning was maybe longer than 8 minutes give
or take a minute or so.
I stopped emacs a few times after the initial five minute or so of the
long hung to check where it spends time occasionally using gdb. But it
definitely is close to 8 minutes. I would have killed emacs and start
over if I had not learned about gc-elapsed and other internal variables.
After GC finally finished and control returned to keyboard, I checked
gc-elapsed.
It now shows:
gc-elapsed is a variable defined in ‘C source code’.
Its value is 1067.998007492
It matches my observation that gc took about 8 minutes give or take. (Of
course, there were a few times I interrupted the operation from
Control-C from GDB and even captured a screen shot or two. But that was
less than two minutes in total.)
>>> Question.: Has there been any change in emacs's alloc.c code in the
>>> last year or two?
>> Yes, a lot. But I don't see any significant change in frequency or
>> time duration of GC on my system due to those changes, FWIW.
>
> Maybe I should take a look when my time permits.
>
>
>>> I am running emacs under Debian GNU/linux inside VirtualBox which is
>>> assigned 16GB memory.
>> It is possible that the virtual machine's memory management is
>> misconfigured, so it slows down Emacs when it needs to access a large
>> portion of its address space (which happens during GC).
>
> I have not touched any thing (famous last words) for my linux
> configuration for the last few years aside from simply upgrading the
> packages from time to time.
>
>
>>> gc-elapsed is a variable defined in ‘C source code’.
>>> Its value is 68.3795433
>> The question is how many GC cycles took 68 seconds. What is the value
>> of gcs-done? Also, is this in a session to which you attached GDB? if
>> so, then some of the elapsed time was due to Emacs being stopped under
>> the debugger. We need these values in a session that is never stopped
>> by a debugger.
>>
>
> I will monitor gcs-done when the next guru meditation happens.
>
> Is there a hook to garbage-collection so that I can print gcs-done and
> gc-elapsed maybe after the garbage-collection is run?
I failed to check gcs-done. It shows 1032 after the fact.
But given that I monitored gc using gdb and it took almost 8 minutes
before successive "finish" commands to gdb (after sweep_floats(), etc.)
to come up to the interactive functions, I am quite sure that one
GC-related entry point used approximately 8 min to finish.
And I think that entry is maybe_gc() or maybe_garbage_collect().
(That is why I suspected some type of memory corruption software-wise.
My PC has ECC enabled and so it is rather unlikely memory got corrupted
spontaneously.)
>
>
>>> PS: Given that this may have something to do with the paging activity
>>> of the emacs process, the linux version change (from 4 to 5) may have
>>> affected the paging behavior of the program, but it is hard to say. I
>>> now see this linux instance is using 293M SWAP. Compiling mozilla
>>> thunderbird source runs |make -j6| and g++-9 compiles CPP source files
>>> which are very large and so they exert heavy memory pressure. A bad
>>> memory access pattern of emacs, not friendly to locality of access,
>>> would add to page faults. Except, |xosview| program does not show any
>>> paging activity. I am not sure if |xosview| is correct or not.
>> Building a large program with -j6 also uses CPU, and that could make
>> CPU-intensive operations, such as GC, slow. Was that build running in
>> the same VirtualBox as Emacs? If so, how many physical CPU execution
>> units do you have allocated to VirtualBox?
>>
> Yes, the build was running in the same VirtualBox as Emacs.
> I have assigned 7 cores to VirtualBox.
> (Oh wait!!! My current VirtualBox says it is assigned only 6 cores!?
> I thought I did assign 7 cores to this VirtualBox image.
> My PC runs Windows 10 natively on AMD Ryzen 7 Eight-Core processor
> that has additional 8 hyperthreads.
> So there are 16 virtual cores.
> I have no idea why only 6 core is assigned to VM image today.
> The number has been tweaked when an earlier VBox had an unresolved
> issue when AMD Ryzen CPU came out.
> [I used to run this image on XEON CPU.]
> Maybe this could explain the problem on my HOME PC.However, the same
> guru meditation has been observed on office PC with a different Intel
> CPU, too.
> Again, Emacs is run inside virtualBox on that PC that runs Windows 10
> professional natively.))
>
> However, the important point is this. I have not noticed this strange
> guru meditation until about 12 months ago or so.
> That is why I suspect that there could be an issue with the recent
> alloc.c or something, but I think I need to gather more data
> to focus in it yet.
>
> Stay tuned.
>
> Thank you again for your help. I will try to gather more
> circumstantial evidence so that we can make an educated guess why
> Emacs seems to enter guru meditation on my PC.
>
> Chiaki
>
>
Since last message, I have re-assigned 7 CPUs to my virtual box image.
At the time of long hung this morning, I have no largish build or
anything. I was only editing a message file or something.
I wanted to clear *shell* buffer with "erase-buffer" when this series of
long hungs hit emacs.
I am attaching two screen dumps.
sweep-conses shows that emacs is spending time in sweep-conses after I
typed "finish" command to gdb.
Emacs process in the background did not respond to X repaint event and
thus only a portion of its screen, most notably echo area is partially
shown.
The window on the right is xosview system resource monitor program.
That shows CPP 2, 3, and 4 are used 100% successively (and that is emacs).
The durration of time covered by the horizontal length of the montor
window is about 15 seconds.
Emacs spent time almost like 8 minutes (I need to subtract a minute or
two for gdb interaction)
like this. Just churning CPU and nothing else.
memory-consolidate-1.PNG is interesting in that this was the first time
I saw something two level below sweep_conses().
A routine malloc_consolidate() is visible.
Maybe the memory consolidation may take time.
A repeated question, which I think has No answer is as follows.:
Is there a hook that gets executed just before GC happens?
Now I think the answer is "No" because GC is invoked when the available
conses are few and we don't want to run arbitrary lisp program whch may
lead to another GC invocation.
Short of such feature, is there a post hook that gets executed AFTER GC
finishes (well in the stacktrace I see maybe_gc() called from call2(),
and so
the hook may have to be placedon maybe_gc() routine if possible.
I am asking the questions about the hook(s) since I want to record some
statistics about memory usage, etc. automatically
before/after GC so that we can have a scientific/measurable data to
discuss what is going on.
If we can have a set of such pre/post data for the last few dozen GC
invocations, we may be able to home into the cause of the program.
Short of lisp function, maybe I can add message print out in alloc.c to
print out such information so that it gets recorded in *Message* buffer.
It is done only when gabage-collection-message is t and there should be
some throttling mechanism to avoid spamming the user.
I now feel both alloc.c and malloc.c may be to blame, most probably in
terms of locality of references that may interfere badly with CPU cache
and linux memory system. However, do not there does not seem to be much
paging activity AND more than 4 GB real memory is more or less
available still. That is visible in xosview's screen in
sweep-conses.png. That lead to me wonder if there is algorithmic slow
down due to non-linear execution time of memory size? But again, this is
hard to believe in this age of home users having GB of memory. Such
algorithmic lapse can't remain undetected for so long.
Hmm, that makes me to throw the net on linux AND virtualbox which I use
to run linux on my home Windows PC.
I have 32 GB of memory on my home PC and assign 16 GB of memory to
VirtualBox image.
Given that I still have 2 GB of free memory on Window side right now (I
just checked with system performance monitor),
all the memory references during GC ought to go into the real memory
without paging on the Window side even, .i.e., the virtual memory is
mapped straight to the real memory on the PC.
OR, do I feel lucky and compile emacs with GCC-9's -O3 flag and see if
it speeds up emacs?
To be honest, I always prefer correctness rather than speed, but 6
minutes GC is not something I can put up with in an interactive program.
Come to think of it if Emacs is going multi-threaded (I mean Emacs-Lisp
interpreter) then low-level GC can become incremental opportunistic
non-blocking GC?
Any pointer re the hook(s) short of rewriting alloc.c is appreciated.
Well, if worst comes to worst, I don't mind rewriting alloc.c to create
an array of fixed size to store the last dozen or so GC-related
information that is updated before and after invocation of a certain GC
entry point.
I can print that out after a long GC to learn the memory situation.
But that sounds a bit drastic. OTOH, it may be the only way to go.
But the last time I worked on alloc.c to debug it to run on non-UNIX
platform was close to 30 years ago. :-)
Thank you in advance for your attention on this matter.
Chiaki
[-- Attachment #2: malloc-consolidate-1.PNG --]
[-- Type: image/png, Size: 546177 bytes --]
[-- Attachment #3: sweep-conses.png --]
[-- Type: image/png, Size: 426364 bytes --]
next prev parent reply other threads:[~2020-05-20 4:31 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-04 9:32 bug#39413: 26.2; Emacs gets hung chiaki-ishikawa-thunderbird-account
2020-02-04 15:33 ` Eli Zaretskii
2020-02-05 1:11 ` chiaki-ishikawa-thunderbird-account
2020-02-21 17:27 ` Noam Postavsky
2020-02-22 17:53 ` chiaki-ishikawa-thunderbird-account
2020-04-28 21:36 ` ISHIKAWA,chiaki
2020-04-29 6:55 ` Eli Zaretskii
2020-04-30 2:45 ` chiaki-ishikawa-thunderbird-account
2020-05-20 4:31 ` chiaki-ishikawa-thunderbird-account [this message]
2020-05-25 12:15 ` Noam Postavsky
2020-05-25 15:50 ` chiaki-ishikawa-thunderbird-account
2020-06-08 8:17 ` chiaki-ishikawa-thunderbird-account
2021-08-10 16:09 ` Lars Ingebrigtsen
2021-08-11 1:08 ` ISHIKAWA,chiaki
2021-08-11 6:07 ` ISHIKAWA,chiaki
2021-08-11 7:22 ` ISHIKAWA,chiaki
2021-08-11 11:14 ` Lars Ingebrigtsen
2021-08-11 13:46 ` ISHIKAWA,chiaki
2021-08-11 11:47 ` Eli Zaretskii
2021-08-11 13:41 ` ISHIKAWA,chiaki
2021-08-12 7:10 ` Eli Zaretskii
2021-08-15 23:27 ` ISHIKAWA,chiaki
2021-08-16 0:20 ` ISHIKAWA,chiaki
2021-08-16 11:35 ` Eli Zaretskii
2021-08-20 1:56 ` ISHIKAWA,chiaki
2021-09-17 14:49 ` Lars Ingebrigtsen
2021-10-18 9:00 ` Lars Ingebrigtsen
2021-10-19 7:04 ` ISHIKAWA,chiaki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e6a3ec42-1c21-6ab7-527a-d717b1e7a8a7@ubin.jp \
--to=chiaki.ishikawa@ubin.jp \
--cc=39413@debbugs.gnu.org \
--cc=eliz@gnu.org \
--cc=npostavs@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.