From: "Kévin Le Gouguec" <kevin.legouguec@gmail.com>
To: Andrea Corallo <akrl@sdf.org>
Cc: 41077-done@debbugs.gnu.org
Subject: bug#41077: [feature/native-comp] virtual memory exhausted
Date: Sun, 10 May 2020 16:26:21 +0200 [thread overview]
Message-ID: <877dxjn9wy.fsf@gmail.com> (raw)
In-Reply-To: <xjfwo5o256j.fsf@sdf.org> (Andrea Corallo's message of "Wed, 06 May 2020 20:12:52 +0000")
[-- Attachment #1: Type: text/plain, Size: 816 bytes --]
Hi Andrea!
Thank you for implementing this blacklist; it turns out that
char-fold.el was the only file my laptop could not handle[1], though as
you'll see org.el was a strong contender.
It took 3 days, but make -j1 successfully ran to completion on commit
92cf4bb. In comparison, I've just compiled commit 9d8fc3a on master
from scratch, and it took exactly 1 hour 35 minutes.
As soon as I've figured out how to use the elisp-benchmarks package,
I'll post some figures; is there a specific place (bug number,
emacs-devel thread) where you usually collect such feedback?
I've attached some more graphs on the compilation process, as well as
the source material[2] (measurement script, measurements, and plotting
script).
The overall picture (only files which took more than 10 minutes to
compile are labeled):
[-- Attachment #2: monitor.pdf --]
[-- Type: application/pdf, Size: 239983 bytes --]
[-- Attachment #3: Type: text/plain, Size: 93 bytes --]
Some "areas of detail" (only files which took more than 5 minutes to
compile are labeled):
[-- Attachment #4: monitor-swap.pdf --]
[-- Type: application/pdf, Size: 37251 bytes --]
[-- Attachment #5: monitor-gnus.pdf --]
[-- Type: application/pdf, Size: 37737 bytes --]
[-- Attachment #6: monitor-org.pdf --]
[-- Type: application/pdf, Size: 33116 bytes --]
[-- Attachment #7: Type: text/plain, Size: 1292 bytes --]
Some comments:
- As could be predicted from your previous measurements, org.el was a
beast, but unlike with char-fold.el the little guy pulled through 🙌
- It's hard to be sure since my measurements were so imprecise[3], but
AFAICT the compilation process for a single file seems to follow a
memory usage pattern of "slow rise - spike - drop - spike". See
e.g. files.el, isearch.el, simple.el, subr.el, window.el, info.el,
package.el, erc.el, gnus-sum.el, org.el, python.el.
Thank you for taking the time to guide me through compiling this branch.
I know that reducing the memory footprint of native compilation is
probably not your main focus right now, but I figured it would be
interesting to provide some orders of magnitude.
[1] $ git diff
diff --git a/lisp/emacs-lisp/comp.el b/lisp/emacs-lisp/comp.el
index 60b41f95bd..ff3c42a178 100644
--- a/lisp/emacs-lisp/comp.el
+++ b/lisp/emacs-lisp/comp.el
@@ -85,7 +85,7 @@ comp-always-compile
:group 'comp)
(defcustom comp-bootstrap-black-list
- '("^leim/")
+ '("^leim/" "^char-fold")
"List of regexps to exclude files from native compilation during bootstrap.
Skip if any is matching."
:type 'list
[2] Measurement script:
[-- Attachment #8: monitor.sh --]
[-- Type: application/x-shellscript, Size: 350 bytes --]
[-- Attachment #9: Type: text/plain, Size: 20 bytes --]
Measurements:
[-- Attachment #10: monitor.log.tgz --]
[-- Type: application/x-compressed-tar, Size: 284423 bytes --]
[-- Attachment #11: Type: text/plain, Size: 45 bytes --]
Plotting script (requires matplotlib):
[-- Attachment #12: plot.py --]
[-- Type: text/x-python, Size: 5057 bytes --]
#!/usr/bin/env python3
from datetime import datetime, timedelta
from pathlib import Path
import re
import matplotlib
from matplotlib import pyplot
from matplotlib.dates import DateFormatter, HourLocator#, MinuteLocator
from matplotlib.ticker import EngFormatter
MONITOR_RE = re.compile('\n'.join((
'(?P<time>.+)',
r' *(?P<seconds>\d+) +(?P<vsz>\d+) +(?P<rss>\d+) +(?P<args>.+)',
' *(?P<memheader>.+)',
'Mem: *(?P<memvalues>.+)',
'Swap: *(?P<swapvalues>.+)',
''
)), flags=re.MULTILINE)
def list_snapshots(monitor_log):
snapshots = []
for match in MONITOR_RE.finditer(monitor_log):
md = match.groupdict()
memkeys = md['memheader'].split()
memvalues = md['memvalues'].split()
swapvalues = md['swapvalues'].split()
snapshot = {
'time': datetime.strptime(md['time'], '%Y-%m-%d-%H:%M:%S'),
'uptime': int(md['seconds']),
'vsz': int(md['vsz'])*1024,
'rss': int(md['rss'])*1024,
'process': md['args'],
'mem': {memkeys[i]: int(val)*1024 for i, val in enumerate(memvalues)},
'swap': {memkeys[i]: int(val)*1024 for i, val in enumerate(swapvalues)}
}
snapshots.append(snapshot)
return snapshots
LOADDEFS_RE = re.compile(
r'--eval \(setq generated-autoload-file'
r' \(expand-file-name \(unmsys--file-name "([^"]+)"\)\)\)'
r' -f batch-update-autoloads'
)
SEMANTIC_RE = re.compile(
r'-l semantic/(?:wisent|bovine)/grammar -f (?:wisent|bovine)-batch-make-parser'
r' -o (.+) .+\.[wb]y'
)
ELCELN_RE = re.compile(
r'\.\./src/(?:bootstrap-)?emacs -batch --no-site-file --no-site-lisp'
r' --eval \(setq load-prefer-newer t\) -l comp'
r'(?: -f byte-compile-refresh-preloaded)?'
r' -f batch-byte-native-compile-for-bootstrap'
r' (.+\.el)'
)
SHORTENED_NAMES = {
LOADDEFS_RE: 'GEN',
SEMANTIC_RE: 'GEN',
ELCELN_RE: 'ELC+ELN'
}
QUAIL_TIT_RE = re.compile(
r'-l titdic-cnv -f batch-titdic-convert'
r' -dir \./\.\./lisp/leim/quail CXTERM-DIC/(.+)\.tit'
)
QUAIL_MISC_RE = re.compile(
r'-l titdic-cnv -f batch-miscdic-convert'
r' -dir \./\.\./lisp/leim/quail MISC-DIC/(.+\.(html|map|cin|cns|b5))'
)
QUAIL_JA_RE = re.compile(
r'-l ja-dic-cnv -f batch-skkdic-convert'
)
TRANSFORMED_NAMES = {
QUAIL_TIT_RE: lambda m: f'GEN ../lisp/leim/quail/{m.group(1)}.el',
QUAIL_MISC_RE: lambda m: f'GEN from {m.group(1)}',
QUAIL_JA_RE: lambda m: f'GEN ../lisp/leim/ja-dic/ja-dic.el'
}
def shorten(process):
for r, name in SHORTENED_NAMES.items():
match = r.search(process)
if match is not None:
return f'{name} {match.group(1)}'
for r, transform in TRANSFORMED_NAMES.items():
match = r.search(process)
if match is not None:
return transform(match)
if len(process) > 40:
return f'{process[:20]}…{process[-20:]}'
return process
def list_processes(snapshots):
t0 = snapshots[0]['time']
current_process = snapshots[0]['process']
current_process_start = t0
processes = {}
for s in snapshots[1:]:
if s['process'] == current_process:
continue
s_start = s['time']
processes[current_process] = (
current_process_start, s_start-current_process_start
)
current_process = s['process']
current_process_start = s_start
processes[current_process] = (
current_process_start,
snapshots[-1]['time']-current_process_start
)
return processes
snapshots = list_snapshots(Path('monitor.log').read_text())
xs = tuple(s['time'] for s in snapshots)
vsz = tuple(s['vsz'] for s in snapshots)
rss = tuple(s['rss'] for s in snapshots)
memavail = tuple(s['mem']['available'] for s in snapshots)
swapused = tuple(s['swap']['used'] for s in snapshots)
matplotlib.use('TkAgg')
fig, axes = pyplot.subplots(figsize=(128, 9.6))
axes.plot(xs, vsz, label='VSZ (process)')
axes.plot(xs, rss, label='RSS (process)')
axes.plot(xs, memavail, label='available memory (system)', linewidth=0.5)
axes.plot(xs, swapused, label='used swap (system)')
axes.set_xlim(snapshots[0]['time'], snapshots[-1]['time'])
axes.xaxis.set_major_formatter(DateFormatter('%H:%M'))
axes.xaxis.set_major_locator(HourLocator())
# axes.xaxis.set_minor_locator(MinuteLocator(tuple(5*i for i in range(1, 12))))
axes.xaxis.set_label_text('Hours')
axes.set_ylim(0)
axes.yaxis.set_major_formatter(EngFormatter(unit='B'))
axes.legend()
for (p, (start, duration)) in list_processes(snapshots).items():
if duration < timedelta(minutes=10):
continue
pyplot.text(start, 1e9, shorten(p), rotation=45)
pyplot.plot(
(start, start+duration), (1e9, 1e9),
marker='|', linewidth=0.5, linestyle='--',
color='black', alpha=0.8
)
# pyplot.savefig('monitor.pdf')
pyplot.show()
[-- Attachment #13: Type: text/plain, Size: 491 bytes --]
[3] I didn't instrument the build process; as can be seen in the
scripts, I just relied on:
ps --sort=-vsz p $pids | head 1
$pids include the emacs session in which I was running make; this
session was masking basically everything other than the ELC+ELN
processes.
Also, the sampling step being 30 seconds, a lot of interesting
patterns may not have been recorded.
Overall these scripts are an exercise in How Not To Collect Data 😒.
next prev parent reply other threads:[~2020-05-10 14:26 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-04 15:08 bug#41077: [feature/native-comp] Segfaults when compiling ELC+ELN Kévin Le Gouguec
2020-05-04 16:31 ` Andrea Corallo
2020-05-04 20:57 ` Andrea Corallo
2020-05-04 21:05 ` Kévin Le Gouguec
2020-05-04 21:15 ` Andrea Corallo
2020-05-06 14:15 ` bug#41077: [feature/native-comp] virtual memory exhausted (was: bug#41077: [feature/native-comp] Segfaults when compiling ELC+ELN) Kévin Le Gouguec
[not found] ` <xjfftcd5chc.fsf@sdf.org>
2020-05-06 20:12 ` bug#41077: [feature/native-comp] virtual memory exhausted Andrea Corallo
2020-05-10 14:26 ` Kévin Le Gouguec [this message]
2020-05-10 15:02 ` Andrea Corallo
2020-05-10 22:04 ` Kévin Le Gouguec
2020-05-10 22:17 ` Andrea Corallo
2020-05-11 9:12 ` Kévin Le Gouguec
2020-05-11 10:00 ` Andrea Corallo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=877dxjn9wy.fsf@gmail.com \
--to=kevin.legouguec@gmail.com \
--cc=41077-done@debbugs.gnu.org \
--cc=akrl@sdf.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.