unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: "Kévin Le Gouguec" <kevin.legouguec@gmail.com>
To: Andrea Corallo <akrl@sdf.org>
Cc: edouard.debry@gmail.com, 45705@debbugs.gnu.org
Subject: bug#45705: [feature/native-comp] Excessive memory consumption on windows 10
Date: Sat, 09 Jan 2021 18:26:46 +0100	[thread overview]
Message-ID: <87im86c97t.fsf@gmail.com> (raw)
In-Reply-To: <xjfy2h29tgd.fsf@sdf.org> (Andrea Corallo's message of "Sat, 09 Jan 2021 12:37:54 +0000")

[-- Attachment #1: Type: text/plain, Size: 2626 bytes --]

Andrea Corallo <akrl@sdf.org> writes:

> Eli Zaretskii <eliz@gnu.org> writes:
>
>>> From: Andrea Corallo <akrl@sdf.org>
>>> 
>>> In June we changed the way we store immediate objects in the shared and
>>> this makes the compilation way lighter on the GCC side (both in time and
>>> memory).  I've no precise data on this other than the experimental
>>> observation that compiling all Elisp files in Emacs on 32bit systems is
>>> not anymore an issue.  This IIUC implies that the memory footprint for
>>> each compilation is always < 2GB.
>>
>> You assume that the compilations are all done serially?  AFAIK, most
>> people build Emacs with "make -jN", so parallel compilation is an
>> important use case.
>
>> I guess we will have to collect the information about that, if you say
>> we don't have it now.
>
> I'm adding in CC Kevin, IIRC for bug#41077 he used a nice setup to
> produce quite accurate results on memory footprint during the
> compilation process.  Perhaps he has time and he's so kind to gather
> some data on the current state, that would be extremely helpful.

See also bug#41194#20 and bug#41194#28 where I outlined how the
improvements reduced compilation time and memory usage.

I've dusted off my 32-bit laptop; unfortunately the fan sounds like it's
in need of… something (probably exorcism, given the noise).

Until I figure that out, here are the (very hacky) scripts I used to
measure and plot the RAM usage, in case someone else wants to take some
measurements:

- ./monitor.sh $PID finds the most RAM-consuming process among $PID and
  its children, and logs its memory usage (VSZ and RSS) and its
  command-line.

  (Logs are collected every 10 seconds; this probably needs to be
  reduced for faster machines)

- ./plot.py uses matplotlib to make graphs out of these measurements; it
  attempts to replace the command line with the less-verbose diagnostics
  from "make".

- My workflow was to start an emacs session, run M-x compile RET make,
  then ./monitor.sh $PID_OF_EMACS_SESSION.

  (PARENT_RE in plot.py should match the command-line of this parent
   session; its RAM consumption is then labeled as "noise floor" on the
   graph.

   This serves no real purpose and should be removed; monitor.sh should
   be amended to filter the parent session out of monitored PIDs, with
   some error control to handle the lack of child processes when
   compilation is finished.)

- There are some hardcoded things to tweak at the bottom of plot.py,
  e.g. how long should a child process last for it to have a label on
  the graph.


[-- Attachment #2: monitor.sh --]
[-- Type: application/x-shellscript, Size: 350 bytes --]

[-- Attachment #3: plot.py --]
[-- Type: text/x-python, Size: 5200 bytes --]

#!/usr/bin/env python3

from datetime import datetime, timedelta
from pathlib import Path
import re

import matplotlib
from matplotlib import pyplot
from matplotlib.dates import DateFormatter, HourLocator, MinuteLocator
from matplotlib.ticker import EngFormatter


MONITOR_RE = re.compile('\n'.join((
    '(?P<time>.+)',
    r' *(?P<seconds>\d+) +(?P<vsz>\d+) +(?P<rss>\d+) +(?P<args>.+)',
    ' *(?P<memheader>.+)',
    'Mem: *(?P<memvalues>.+)',
    'Swap: *(?P<swapvalues>.+)',
    ''
)), flags=re.MULTILINE)


def list_snapshots(monitor_log):
    snapshots = []

    for match in MONITOR_RE.finditer(monitor_log):
        md = match.groupdict()

        memkeys = md['memheader'].split()
        memvalues = md['memvalues'].split()
        swapvalues = md['swapvalues'].split()

        snapshot = {
            'time': datetime.strptime(md['time'], '%Y-%m-%d-%H:%M:%S'),
            'uptime': int(md['seconds']),
            'vsz': int(md['vsz'])*1024,
            'rss': int(md['rss'])*1024,
            'process': md['args'],
            'mem': {memkeys[i]: int(val)*1024 for i, val in enumerate(memvalues)},
            'swap': {memkeys[i]: int(val)*1024 for i, val in enumerate(swapvalues)}
        }

        snapshots.append(snapshot)

    return snapshots


LOADDEFS_RE = re.compile(
    r'--eval \(setq generated-autoload-file'
    r' \(expand-file-name \(unmsys--file-name "([^"]+)"\)\)\)'
    r' -f batch-update-autoloads'
)

SEMANTIC_RE = re.compile(
    r'-l semantic/(?:wisent|bovine)/grammar -f (?:wisent|bovine)-batch-make-parser'
    r' -o (.+) .+\.[wb]y'
)

ELCELN_RE = re.compile(
    r'\.\./src/(?:bootstrap-)?emacs -batch --no-site-file --no-site-lisp'
    r' --eval \(setq load-prefer-newer t\) -l comp'
    r'(?: -f byte-compile-refresh-preloaded)?'
    r' -f batch-byte-native-compile-for-bootstrap'
    r' (.+\.el)'
)

SHORTENED_NAMES = {
    LOADDEFS_RE: 'GEN',
    SEMANTIC_RE: 'GEN',
    ELCELN_RE: 'ELC+ELN'
}

QUAIL_TIT_RE = re.compile(
    r'-l titdic-cnv -f batch-titdic-convert'
    r' -dir \./\.\./lisp/leim/quail CXTERM-DIC/(.+)\.tit'
)

QUAIL_MISC_RE = re.compile(
    r'-l titdic-cnv -f batch-miscdic-convert'
    r' -dir \./\.\./lisp/leim/quail MISC-DIC/(.+\.(html|map|cin|cns|b5))'
)

QUAIL_JA_RE = re.compile(
    r'-l ja-dic-cnv -f batch-skkdic-convert'
)

PARENT_RE = re.compile(
    r'$^'                       # Adjust to match parent process.
)

TRANSFORMED_NAMES = {
    QUAIL_TIT_RE: lambda m: f'GEN ../lisp/leim/quail/{m.group(1)}.el',
    QUAIL_MISC_RE: lambda m: f'GEN from {m.group(1)}',
    QUAIL_JA_RE: lambda m: f'GEN ../lisp/leim/ja-dic/ja-dic.el',
    PARENT_RE: lambda _: '(noise floor)'
}

def shorten(process):
    for r, name in SHORTENED_NAMES.items():
        match = r.search(process)
        if match is not None:
            return f'{name} {match.group(1)}'

    for r, transform in TRANSFORMED_NAMES.items():
        match = r.search(process)
        if match is not None:
            return transform(match)

    if len(process) > 40:
        return f'{process[:20]}…{process[-20:]}'
    return process


def list_processes(snapshots):
    t0 = snapshots[0]['time']
    current_process = snapshots[0]['process']
    current_process_start = t0

    processes = []

    for s in snapshots[1:]:
        if s['process'] == current_process:
            continue

        s_start = s['time']
        processes.append((
            current_process, current_process_start, s_start-current_process_start
        ))

        current_process = s['process']
        current_process_start = s_start

    processes.append((
        current_process,
        current_process_start,
        snapshots[-1]['time']-current_process_start
    ))

    return processes


snapshots = list_snapshots(Path('monitor.log').read_text())

xs = tuple(s['time'] for s in snapshots)
vsz = tuple(s['vsz'] for s in snapshots)
rss = tuple(s['rss'] for s in snapshots)
memavail = tuple(s['mem']['available'] for s in snapshots)
swapused = tuple(s['swap']['used'] for s in snapshots)

matplotlib.use('TkAgg')
fig, axes = pyplot.subplots(figsize=(128, 9.6))
axes.plot(xs, vsz, label='VSZ (process)')
axes.plot(xs, rss, label='RSS (process)')
axes.plot(xs, memavail, label='available memory (system)', linewidth=0.5)
axes.plot(xs, swapused, label='used swap (system)')

axes.set_xlim(snapshots[0]['time'], snapshots[-1]['time'])
axes.xaxis.set_major_formatter(DateFormatter('%H:%M'))
axes.xaxis.set_major_locator(HourLocator())
axes.xaxis.set_minor_locator(MinuteLocator(tuple(5*i for i in range(1, 12))))
axes.xaxis.set_label_text('Hours')
axes.set_ylim(0)
axes.yaxis.set_major_formatter(EngFormatter(unit='B'))
axes.legend()

for p, start, duration in list_processes(snapshots):
    if duration < timedelta(minutes=2):
        continue
    pyplot.text(start, 1e9, shorten(p), rotation=45)
    pyplot.plot(
        (start, start+duration), (1e9, 1e9),
        marker='|', linewidth=0.5, linestyle='--',
        color='black', alpha=0.8
    )

pyplot.savefig('monitor.pdf')
pyplot.show()

  reply	other threads:[~2021-01-09 17:26 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-06 20:48 bug#45705: [feature/native-comp] Excessive memory consumption on windows 10 Édouard Debry
2021-01-06 20:55 ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-01-07 14:25   ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-01-08 14:25   ` Eli Zaretskii
2021-01-08 15:50     ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-01-08 16:10       ` Eli Zaretskii
2021-01-08 22:02         ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-01-09  7:56           ` Eli Zaretskii
2021-01-09 10:55             ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-01-09 11:55               ` Eli Zaretskii
2021-01-09 12:37                 ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-01-09 17:26                   ` Kévin Le Gouguec [this message]
2021-01-09 19:41                     ` Andrea Corallo via Bug reports for GNU Emacs, the Swiss army knife of text editors

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87im86c97t.fsf@gmail.com \
    --to=kevin.legouguec@gmail.com \
    --cc=45705@debbugs.gnu.org \
    --cc=akrl@sdf.org \
    --cc=edouard.debry@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).