all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: chiaki-ishikawa-thunderbird-account <chiaki.ishikawa@ubin.jp>
To: Eli Zaretskii <eliz@gnu.org>
Cc: "chiaki.ishikawa@ubin.jp" <chiaki.ishikawa@ubin.jp>,
	39413@debbugs.gnu.org, Noam Postavsky <npostavs@gmail.com>
Subject: bug#39413: 26.2; Emacs gets hung
Date: Wed, 20 May 2020 13:31:29 +0900	[thread overview]
Message-ID: <e6a3ec42-1c21-6ab7-527a-d717b1e7a8a7@ubin.jp> (raw)
In-Reply-To: <0b9cd846-b9f5-8351-5f45-5614bbc644c0@ubin.jp>

[-- Attachment #1: Type: text/plain, Size: 12624 bytes --]

Hi,

I am back.

I have observed another long hung.

This time, I did verify that it is during GC.

The hung lasted close to 8 minutes give or take.

My comment inline.

On 2020/04/30 11:45, chiaki-ishikawa-thunderbird-account wrote:
> Hi,
>
> Thank you for your tips.
>
> My comment inline.:
>
> On 2020/04/29 15:55, Eli Zaretskii wrote:
>>> From: "ISHIKAWA,chiaki" <ishikawa@yk.rim.or.jp>
>>> Date: Wed, 29 Apr 2020 06:36:46 +0900
>>> Cc: chiaki-ishikawa-thunderbird-account <chiaki.ishikawa@ubin.jp>,
>>>   Noam Postavsky <npostavs@gmail.com>
>>>
>>> I am back to this strange guru meditation of emacs.
>>> Emacs would eventually begin working, but the silent period where no 
>>> key
>>> or mouse event is acknowledged is rather longish.
>>>
>>> Several weeks after I installed GNU Emacs 28.0.50 (that is what
>>> |emacs-version| variable says. I installed from git repo.) , I began
>>> noticing the occasional long pause (no response) again.
>> I suggest to begin by establishing whether these pauses are caused by
>> GC.  To that end, set garbage-collection-messages to a non-nil value,
>> and see if Emacs displays a message about GC in progress when these
>> pauses occur.  Also, try to tell how frequent are these pauses and
>> much time each pause takes.
>>
>> If you see that every time you experience a pause there's GC running,
>> we can then proceed to investigating whether the frequency of GC and
>> the time it takes on your system is something abnormal, and why.
> I have set garbage-collection-messages to t.
>
> Let me see what happens when the next apparent long guru meditation 
> happens.
>
Since the last message, I did not get a very long hung  until this 
morning. Maybe 5-10 second long GC only. Those I can tolerate.
I feel "what?" after maybe a five second hung and then realize something 
is holding up emacs, and then the hung is over. I tell myself GC was 
rather long, but that is it.

But this morning I get a rather long hung, or GC, twice.

Attaching gdb to the emacs process then after a 10 second or so of the 
first hung, and noticed that sweep_cons() took three seconds to finish 
after I entered "finish" command to gdb. That certainly felt long during 
gdb session.

So that was an initial clue that something is wrong.

(Now I suspect some kind of memory area corruption. Before this happened 
I loaded a several large log files each about 20MB, and when I tried to
erase a rather large *shell* buffer that contained the log from a large 
compilation job, this garbage collection kicked in.)

Anyway, I checked and found gc-elapsed showed a number in 400-500 
seconds range. I really wished I wrote down the exact number.

>
>
>>> So my take on this is sweep_conses() takes a rather long time and not
>>> letting emacs to take any input. (And there could be other functions
>>> before it.)
>> The question is how long is "long time".  We need some quantitative
>> measure of this before we can tell whether it's normal or not.
> I agree. It  is all relative.
>
I agree "long" is a relative term.

But after the somewhat long pause of maybe 20-30 seconds, I get another 
really long pause which prompted me to write this mail.

The second long hung this morning was maybe longer than 8 minutes give 
or take a minute or so.

I stopped emacs  a few times after the initial five minute or so of the 
long hung to check where it spends time occasionally using gdb. But it 
definitely is close to 8 minutes. I would have killed emacs and start 
over if I had not learned about gc-elapsed and other internal variables.

After GC finally finished and control returned to keyboard, I checked 
gc-elapsed.
It now shows:
gc-elapsed is a variable defined in ‘C source code’.
Its value is 1067.998007492

It matches my observation that gc took about 8 minutes give or take. (Of 
course, there were a few times I interrupted the operation from 
Control-C from GDB and even captured a screen shot or two. But that was 
less than two minutes in total.)


>>> Question.: Has there been any change in emacs's alloc.c code in the
>>> last year or two?
>> Yes, a lot.  But I don't see any significant change in frequency or
>> time duration of GC on my system due to those changes, FWIW.
>
> Maybe I should take a look when my time permits.
>
>
>>> I am running emacs under Debian GNU/linux inside VirtualBox which is
>>> assigned 16GB memory.
>> It is possible that the virtual machine's memory management is
>> misconfigured, so it slows down Emacs when it needs to access a large
>> portion of its address space (which happens during GC).
>
> I have not touched any thing (famous last words) for my linux 
> configuration for the last few years aside from simply upgrading the 
> packages from time to time.
>
>
>>> gc-elapsed is a variable defined in ‘C source code’.
>>> Its value is 68.3795433
>> The question is how many GC cycles took 68 seconds.  What is the value
>> of gcs-done?  Also, is this in a session to which you attached GDB? if
>> so, then some of the elapsed time was due to Emacs being stopped under
>> the debugger.  We need these values in a session that is never stopped
>> by a debugger.
>>
>
> I will monitor gcs-done when the next guru meditation happens.
>
> Is there a hook to garbage-collection so that I can print gcs-done and 
> gc-elapsed maybe after the garbage-collection is run?

I failed to check gcs-done. It shows 1032 after the fact.

But given that I monitored gc using gdb and it took almost 8 minutes 
before successive "finish" commands to gdb (after sweep_floats(), etc.)
to come up to the interactive functions, I am quite sure that one 
GC-related entry point used approximately 8 min to finish.
And I think that entry is maybe_gc() or maybe_garbage_collect().

(That is why I suspected some type of memory corruption software-wise. 
My PC has ECC enabled and so it is rather unlikely memory got corrupted 
spontaneously.)


>
>
>>> PS: Given that this may have something to do with the paging activity
>>> of the emacs process, the linux version change (from 4 to 5) may have
>>> affected the paging behavior of the program, but it is hard to say.  I
>>> now see this linux instance is using 293M SWAP.  Compiling mozilla
>>> thunderbird source runs |make -j6| and g++-9 compiles CPP source files
>>> which are very large and so they exert heavy memory pressure. A bad
>>> memory access pattern of emacs, not friendly to locality of access,
>>> would add to page faults. Except, |xosview| program does not show any
>>> paging activity. I am not sure if |xosview| is correct or not.
>> Building a large program with -j6 also uses CPU, and that could make
>> CPU-intensive operations, such as GC, slow.  Was that build running in
>> the same VirtualBox as Emacs?  If so, how many physical CPU execution
>> units do you have allocated to VirtualBox?
>>
> Yes, the build was running in the same VirtualBox as Emacs.
> I have assigned 7 cores to VirtualBox.
> (Oh wait!!! My current VirtualBox says it is assigned only 6 cores!?
> I thought I did assign 7 cores to this VirtualBox image.
> My PC runs Windows 10 natively on AMD Ryzen 7 Eight-Core processor 
> that has additional 8 hyperthreads.
> So there are 16 virtual cores.
> I have no idea why only 6 core is assigned to VM image today.
> The number has been tweaked when an earlier VBox had an unresolved 
> issue when AMD Ryzen CPU came out.
> [I used to run this image on XEON CPU.]
> Maybe this could explain the problem on my HOME PC.However, the same 
> guru meditation has been observed on office PC with a different Intel 
> CPU, too.
> Again, Emacs is run inside virtualBox on that PC that runs Windows 10 
> professional natively.))
>
> However, the important point is this. I have not noticed this strange 
> guru meditation until about 12 months ago or so.
> That is why I suspect that there could be an issue with the recent 
> alloc.c or something, but I think I need to gather more data
> to focus in it yet.
>
> Stay tuned.
>
> Thank you again for your help. I will try to gather more 
> circumstantial evidence so that we can make an educated guess why 
> Emacs seems to enter guru meditation on my PC.
>
> Chiaki
>
>
Since last message, I have re-assigned 7 CPUs to my virtual box image.

At the time of long hung this morning, I have no largish build or 
anything. I was only editing a message file or something.
I wanted to clear *shell* buffer with "erase-buffer" when this series of 
long hungs hit emacs.

I am attaching two screen dumps.
sweep-conses shows that emacs is spending time in sweep-conses after I 
typed "finish" command to gdb.
Emacs process in the background did not respond to X repaint event and 
thus only a portion of its screen, most notably echo area is partially 
shown.
The window on the right is xosview system resource monitor program.
That shows CPP 2, 3, and 4 are used 100% successively (and that is emacs).
The durration of time covered by the horizontal length of the montor 
window is about 15 seconds.
Emacs spent time almost like 8 minutes (I need to subtract a minute or 
two for gdb interaction)
like this. Just churning CPU and nothing else.

memory-consolidate-1.PNG is interesting in that this was the first time 
I saw something two level below sweep_conses().
A routine malloc_consolidate() is visible.
Maybe the memory consolidation may take time.

A repeated question, which I think has No answer is as follows.:
Is there a hook that gets executed just before GC happens?

Now I think the answer is "No" because GC is invoked when the available 
conses are few and we don't want to run arbitrary lisp program whch may 
lead to another GC invocation.

Short of such feature, is there a post hook that gets executed AFTER GC 
finishes (well in the stacktrace I see maybe_gc() called from call2(), 
and so
the hook may have to be placedon maybe_gc() routine if possible.

I am asking the questions about the hook(s) since I want to record some 
statistics about memory usage, etc. automatically
before/after GC so that we can have a scientific/measurable data to 
discuss what is going on.
If we can have a set of such pre/post data for the last few dozen GC 
invocations, we may be able to home into the cause of the program.
Short of lisp function, maybe I can add message print out in alloc.c to 
print out such information so that it gets recorded in *Message* buffer.
It is done only when gabage-collection-message is t and there should be 
some throttling mechanism to avoid spamming the user.


I now feel both alloc.c and malloc.c may be to blame, most probably in 
terms of locality of references that may interfere badly with CPU cache 
and linux memory system. However, do not there does not seem to be much 
paging activity AND more than 4 GB real memory is more or less 
available  still. That is visible in xosview's screen in 
sweep-conses.png. That lead to me wonder if there is algorithmic slow 
down due to non-linear execution time of memory size? But again, this is 
hard to believe in this age of home users having GB of memory. Such 
algorithmic lapse can't remain undetected for so long.

Hmm, that makes me to throw the net on linux AND virtualbox which I use 
to run linux on my home Windows PC.
I have 32 GB of memory on my home PC and assign 16 GB of memory to 
VirtualBox image.
Given that I still have 2 GB of free memory on Window side right now (I 
just checked with system performance monitor),
all the memory references during GC ought to go into the real memory 
without paging on the Window side even, .i.e., the virtual memory is 
mapped straight to the real memory on the PC.

OR, do I feel lucky and compile emacs with GCC-9's -O3 flag and see if 
it speeds up emacs?
To be honest, I always prefer correctness rather than speed, but 6 
minutes GC is not something I can put up with in an interactive program.
Come to think of it if Emacs is going multi-threaded (I mean Emacs-Lisp 
interpreter) then low-level GC can become incremental opportunistic 
non-blocking GC?

Any pointer re the hook(s) short of rewriting alloc.c is appreciated. 
Well, if worst comes to worst, I don't mind rewriting alloc.c to create 
an array of fixed size to store the last dozen or so GC-related 
information that is updated before and after invocation of a certain GC 
entry point.
I can print that out after a long GC to learn the memory situation.
But that sounds a bit drastic. OTOH, it may be the only way to go.
But the last time I worked on alloc.c to debug it to run on non-UNIX 
platform was close to 30 years ago. :-)

Thank you in advance for your attention on this matter.

Chiaki



[-- Attachment #2: malloc-consolidate-1.PNG --]
[-- Type: image/png, Size: 546177 bytes --]

[-- Attachment #3: sweep-conses.png --]
[-- Type: image/png, Size: 426364 bytes --]

  reply	other threads:[~2020-05-20  4:31 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-04  9:32 bug#39413: 26.2; Emacs gets hung chiaki-ishikawa-thunderbird-account
2020-02-04 15:33 ` Eli Zaretskii
2020-02-05  1:11   ` chiaki-ishikawa-thunderbird-account
2020-02-21 17:27 ` Noam Postavsky
2020-02-22 17:53   ` chiaki-ishikawa-thunderbird-account
2020-04-28 21:36     ` ISHIKAWA,chiaki
2020-04-29  6:55       ` Eli Zaretskii
2020-04-30  2:45         ` chiaki-ishikawa-thunderbird-account
2020-05-20  4:31           ` chiaki-ishikawa-thunderbird-account [this message]
2020-05-25 12:15             ` Noam Postavsky
2020-05-25 15:50               ` chiaki-ishikawa-thunderbird-account
2020-06-08  8:17                 ` chiaki-ishikawa-thunderbird-account
2021-08-10 16:09                 ` Lars Ingebrigtsen
2021-08-11  1:08                   ` ISHIKAWA,chiaki
2021-08-11  6:07                     ` ISHIKAWA,chiaki
2021-08-11  7:22                       ` ISHIKAWA,chiaki
2021-08-11 11:14                       ` Lars Ingebrigtsen
2021-08-11 13:46                         ` ISHIKAWA,chiaki
2021-08-11 11:47                       ` Eli Zaretskii
2021-08-11 13:41                         ` ISHIKAWA,chiaki
2021-08-12  7:10                           ` Eli Zaretskii
2021-08-15 23:27                             ` ISHIKAWA,chiaki
2021-08-16  0:20                               ` ISHIKAWA,chiaki
2021-08-16 11:35                               ` Eli Zaretskii
2021-08-20  1:56                                 ` ISHIKAWA,chiaki
2021-09-17 14:49                                   ` Lars Ingebrigtsen
2021-10-18  9:00                                     ` Lars Ingebrigtsen
2021-10-19  7:04                                       ` ISHIKAWA,chiaki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e6a3ec42-1c21-6ab7-527a-d717b1e7a8a7@ubin.jp \
    --to=chiaki.ishikawa@ubin.jp \
    --cc=39413@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    --cc=npostavs@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.