unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: "Kévin Le Gouguec" <kevin.legouguec@gmail.com>
To: Andrea Corallo <akrl@sdf.org>
Cc: 41077-done@debbugs.gnu.org
Subject: bug#41077: [feature/native-comp] virtual memory exhausted
Date: Sun, 10 May 2020 16:26:21 +0200	[thread overview]
Message-ID: <877dxjn9wy.fsf@gmail.com> (raw)
In-Reply-To: <xjfwo5o256j.fsf@sdf.org> (Andrea Corallo's message of "Wed, 06 May 2020 20:12:52 +0000")

[-- Attachment #1: Type: text/plain, Size: 816 bytes --]

Hi Andrea!

Thank you for implementing this blacklist; it turns out that
char-fold.el was the only file my laptop could not handle[1], though as
you'll see org.el was a strong contender.

It took 3 days, but make -j1 successfully ran to completion on commit
92cf4bb.  In comparison, I've just compiled commit 9d8fc3a on master
from scratch, and it took exactly 1 hour 35 minutes.

As soon as I've figured out how to use the elisp-benchmarks package,
I'll post some figures; is there a specific place (bug number,
emacs-devel thread) where you usually collect such feedback?

I've attached some more graphs on the compilation process, as well as
the source material[2] (measurement script, measurements, and plotting
script).

The overall picture (only files which took more than 10 minutes to
compile are labeled):


[-- Attachment #2: monitor.pdf --]
[-- Type: application/pdf, Size: 239983 bytes --]

[-- Attachment #3: Type: text/plain, Size: 93 bytes --]


Some "areas of detail" (only files which took more than 5 minutes to
compile are labeled):


[-- Attachment #4: monitor-swap.pdf --]
[-- Type: application/pdf, Size: 37251 bytes --]

[-- Attachment #5: monitor-gnus.pdf --]
[-- Type: application/pdf, Size: 37737 bytes --]

[-- Attachment #6: monitor-org.pdf --]
[-- Type: application/pdf, Size: 33116 bytes --]

[-- Attachment #7: Type: text/plain, Size: 1292 bytes --]


Some comments:

- As could be predicted from your previous measurements, org.el was a
  beast, but unlike with char-fold.el the little guy pulled through 🙌

- It's hard to be sure since my measurements were so imprecise[3], but
  AFAICT the compilation process for a single file seems to follow a
  memory usage pattern of "slow rise - spike - drop - spike".  See
  e.g. files.el, isearch.el, simple.el, subr.el, window.el, info.el,
  package.el, erc.el, gnus-sum.el, org.el, python.el.


Thank you for taking the time to guide me through compiling this branch.
I know that reducing the memory footprint of native compilation is
probably not your main focus right now, but I figured it would be
interesting to provide some orders of magnitude.


[1] $ git diff
    diff --git a/lisp/emacs-lisp/comp.el b/lisp/emacs-lisp/comp.el
    index 60b41f95bd..ff3c42a178 100644
    --- a/lisp/emacs-lisp/comp.el
    +++ b/lisp/emacs-lisp/comp.el
    @@ -85,7 +85,7 @@ comp-always-compile
       :group 'comp)

     (defcustom comp-bootstrap-black-list
    -  '("^leim/")
    +  '("^leim/" "^char-fold")
       "List of regexps to exclude files from native compilation during bootstrap.
     Skip if any is matching."
       :type 'list

[2] Measurement script:


[-- Attachment #8: monitor.sh --]
[-- Type: application/x-shellscript, Size: 350 bytes --]

[-- Attachment #9: Type: text/plain, Size: 20 bytes --]


    Measurements:


[-- Attachment #10: monitor.log.tgz --]
[-- Type: application/x-compressed-tar, Size: 284423 bytes --]

[-- Attachment #11: Type: text/plain, Size: 45 bytes --]


    Plotting script (requires matplotlib):


[-- Attachment #12: plot.py --]
[-- Type: text/x-python, Size: 5057 bytes --]

#!/usr/bin/env python3

from datetime import datetime, timedelta
from pathlib import Path
import re

import matplotlib
from matplotlib import pyplot
from matplotlib.dates import DateFormatter, HourLocator#, MinuteLocator
from matplotlib.ticker import EngFormatter


MONITOR_RE = re.compile('\n'.join((
    '(?P<time>.+)',
    r' *(?P<seconds>\d+) +(?P<vsz>\d+) +(?P<rss>\d+) +(?P<args>.+)',
    ' *(?P<memheader>.+)',
    'Mem: *(?P<memvalues>.+)',
    'Swap: *(?P<swapvalues>.+)',
    ''
)), flags=re.MULTILINE)


def list_snapshots(monitor_log):
    snapshots = []

    for match in MONITOR_RE.finditer(monitor_log):
        md = match.groupdict()

        memkeys = md['memheader'].split()
        memvalues = md['memvalues'].split()
        swapvalues = md['swapvalues'].split()

        snapshot = {
            'time': datetime.strptime(md['time'], '%Y-%m-%d-%H:%M:%S'),
            'uptime': int(md['seconds']),
            'vsz': int(md['vsz'])*1024,
            'rss': int(md['rss'])*1024,
            'process': md['args'],
            'mem': {memkeys[i]: int(val)*1024 for i, val in enumerate(memvalues)},
            'swap': {memkeys[i]: int(val)*1024 for i, val in enumerate(swapvalues)}
        }

        snapshots.append(snapshot)

    return snapshots


LOADDEFS_RE = re.compile(
    r'--eval \(setq generated-autoload-file'
    r' \(expand-file-name \(unmsys--file-name "([^"]+)"\)\)\)'
    r' -f batch-update-autoloads'
)

SEMANTIC_RE = re.compile(
    r'-l semantic/(?:wisent|bovine)/grammar -f (?:wisent|bovine)-batch-make-parser'
    r' -o (.+) .+\.[wb]y'
)

ELCELN_RE = re.compile(
    r'\.\./src/(?:bootstrap-)?emacs -batch --no-site-file --no-site-lisp'
    r' --eval \(setq load-prefer-newer t\) -l comp'
    r'(?: -f byte-compile-refresh-preloaded)?'
    r' -f batch-byte-native-compile-for-bootstrap'
    r' (.+\.el)'
)

SHORTENED_NAMES = {
    LOADDEFS_RE: 'GEN',
    SEMANTIC_RE: 'GEN',
    ELCELN_RE: 'ELC+ELN'
}

QUAIL_TIT_RE = re.compile(
    r'-l titdic-cnv -f batch-titdic-convert'
    r' -dir \./\.\./lisp/leim/quail CXTERM-DIC/(.+)\.tit'
)

QUAIL_MISC_RE = re.compile(
    r'-l titdic-cnv -f batch-miscdic-convert'
    r' -dir \./\.\./lisp/leim/quail MISC-DIC/(.+\.(html|map|cin|cns|b5))'
)

QUAIL_JA_RE = re.compile(
    r'-l ja-dic-cnv -f batch-skkdic-convert'
)

TRANSFORMED_NAMES = {
    QUAIL_TIT_RE: lambda m: f'GEN ../lisp/leim/quail/{m.group(1)}.el',
    QUAIL_MISC_RE: lambda m: f'GEN from {m.group(1)}',
    QUAIL_JA_RE: lambda m: f'GEN ../lisp/leim/ja-dic/ja-dic.el'
}

def shorten(process):
    for r, name in SHORTENED_NAMES.items():
        match = r.search(process)
        if match is not None:
            return f'{name} {match.group(1)}'

    for r, transform in TRANSFORMED_NAMES.items():
        match = r.search(process)
        if match is not None:
            return transform(match)

    if len(process) > 40:
        return f'{process[:20]}…{process[-20:]}'
    return process


def list_processes(snapshots):
    t0 = snapshots[0]['time']
    current_process = snapshots[0]['process']
    current_process_start = t0

    processes = {}

    for s in snapshots[1:]:
        if s['process'] == current_process:
            continue

        s_start = s['time']
        processes[current_process] = (
            current_process_start, s_start-current_process_start
        )

        current_process = s['process']
        current_process_start = s_start

    processes[current_process] = (
        current_process_start,
        snapshots[-1]['time']-current_process_start
    )

    return processes


snapshots = list_snapshots(Path('monitor.log').read_text())

xs = tuple(s['time'] for s in snapshots)
vsz = tuple(s['vsz'] for s in snapshots)
rss = tuple(s['rss'] for s in snapshots)
memavail = tuple(s['mem']['available'] for s in snapshots)
swapused = tuple(s['swap']['used'] for s in snapshots)

matplotlib.use('TkAgg')
fig, axes = pyplot.subplots(figsize=(128, 9.6))
axes.plot(xs, vsz, label='VSZ (process)')
axes.plot(xs, rss, label='RSS (process)')
axes.plot(xs, memavail, label='available memory (system)', linewidth=0.5)
axes.plot(xs, swapused, label='used swap (system)')

axes.set_xlim(snapshots[0]['time'], snapshots[-1]['time'])
axes.xaxis.set_major_formatter(DateFormatter('%H:%M'))
axes.xaxis.set_major_locator(HourLocator())
# axes.xaxis.set_minor_locator(MinuteLocator(tuple(5*i for i in range(1, 12))))
axes.xaxis.set_label_text('Hours')
axes.set_ylim(0)
axes.yaxis.set_major_formatter(EngFormatter(unit='B'))
axes.legend()

for (p, (start, duration)) in list_processes(snapshots).items():
    if duration < timedelta(minutes=10):
        continue
    pyplot.text(start, 1e9, shorten(p), rotation=45)
    pyplot.plot(
        (start, start+duration), (1e9, 1e9),
        marker='|', linewidth=0.5, linestyle='--',
        color='black', alpha=0.8
    )

# pyplot.savefig('monitor.pdf')
pyplot.show()

[-- Attachment #13: Type: text/plain, Size: 491 bytes --]


[3] I didn't instrument the build process; as can be seen in the
    scripts, I just relied on:

        ps --sort=-vsz p $pids | head 1

    $pids include the emacs session in which I was running make; this
    session was masking basically everything other than the ELC+ELN
    processes.

    Also, the sampling step being 30 seconds, a lot of interesting
    patterns may not have been recorded.

    Overall these scripts are an exercise in How Not To Collect Data 😒.

  reply	other threads:[~2020-05-10 14:26 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-04 15:08 bug#41077: [feature/native-comp] Segfaults when compiling ELC+ELN Kévin Le Gouguec
2020-05-04 16:31 ` Andrea Corallo
2020-05-04 20:57   ` Andrea Corallo
2020-05-04 21:05   ` Kévin Le Gouguec
2020-05-04 21:15     ` Andrea Corallo
2020-05-06 14:15       ` bug#41077: [feature/native-comp] virtual memory exhausted (was: bug#41077: [feature/native-comp] Segfaults when compiling ELC+ELN) Kévin Le Gouguec
     [not found]         ` <xjfftcd5chc.fsf@sdf.org>
2020-05-06 20:12           ` bug#41077: [feature/native-comp] virtual memory exhausted Andrea Corallo
2020-05-10 14:26             ` Kévin Le Gouguec [this message]
2020-05-10 15:02               ` Andrea Corallo
2020-05-10 22:04                 ` Kévin Le Gouguec
2020-05-10 22:17                   ` Andrea Corallo
2020-05-11  9:12                     ` Kévin Le Gouguec
2020-05-11 10:00                       ` Andrea Corallo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877dxjn9wy.fsf@gmail.com \
    --to=kevin.legouguec@gmail.com \
    --cc=41077-done@debbugs.gnu.org \
    --cc=akrl@sdf.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).