unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#41755: feature/native-comp (master?): temacs crash in GC during mark phase
@ 2020-06-07 19:16 Andrea Corallo
  2020-06-07 19:41 ` Pip Cet
  0 siblings, 1 reply; 14+ messages in thread
From: Andrea Corallo @ 2020-06-07 19:16 UTC (permalink / raw)
  To: 41755; +Cc: Nicolas Bértolo, Paul Eggert

I'm experiencing non reproducible GC related crashes building
feature/native-comp.

Both back-traces I've got looks similar:

#0  0x00000000004dfe4c in symbol_marked_p (s=0xb4f0) at pdumper.h:149
#1  mark_object (arg=<optimized out>) at alloc.c:6731
#2  0x0000000000552390 in traverse_intervals_noorder (tree=0xffffffffe070,
    function=0x4e0fe0 <mark_interval_tree_1>, arg=0x0) at intervals.c:234
#3  0x00000000004e0060 in mark_object (arg=<optimized out>) at alloc.c:6784
#4  0x00000000004e08ec in mark_memory (end=<optimized out>, start=<optimized out>)
    at alloc.c:4860
#5  mark_stack (bottom=<optimized out>, end=end@entry=0xffffffff71d0 "")
    at alloc.c:5071
#6  0x000000000055fd48 in mark_one_thread (thread=0x903b80 <main_thread>)
    at thread.c:630
#7  mark_threads_callback (ignore=<optimized out>) at thread.c:661
#8  0x00000000004e1238 in garbage_collect () at alloc.c:6101
#9  0x00000000004ff874 in maybe_gc () at lisp.h:5090
#10 eval_sub (form=form@entry=XIL(0xcefe63)) at eval.c:2243
#11 0x0000000000500108 in Fwhile (args=<optimized out>) at eval.c:1013
...

During: ./temacs --batch  -l loadup --temacs=pbootstrap

Not sure why but this looks easier to reproduce on aarch64 (even if most
of the times is still bootstraping clean).

IIUC in this case we are trying to access (struct Lisp_Symbol *) 0xb4f0
but the memory cannot be accessed.  The address looks quite odd to me
and infact checking with 'info proc mappings' the lowest mapped memory
seams to start at 0x400000.

So far I was not able to reproduce on X86_64 (where I've rr).

This may not be related to feature/native-comp but to one of the recent
GC changes and the stack marking strategy.

I suspect Nicolas may be observing the same issue on Windows
https://lists.gnu.org/archive/html/bug-gnu-emacs/2020-06/msg00320.html

I'll keep on looking into this.

Regards

  Andrea

In GNU Emacs 28.0.50 (build 1, aarch64-unknown-linux-gnu)
 of 2020-06-06 built on
Repository revision: 3d576c784b3fa01b4d6b33a4172351b7c3a61660
Repository branch: HEAD
System Description: Ubuntu 18.04.3 LTS

Recent messages:
Grep finished with 2 matches found
Current locus from *grep*
Mark set [2 times]
Reverting buffer ‘Makefile’.
Mark set [2 times]
Mark saved where search started
Mark set [4 times]
File GTAGS not found. Run 'gtags'? (y or n) y
Quit [2 times]
No match

Configured using:
 'configure --with-nativecomp --with-x-toolkit=no --with-xpm=ifavailable
 --with-jpeg=ifavailable --with-png=ifavailable --with-gif=ifavailable
 --with-tiff=ifavailable --with-gnutls=ifavailable --with-nativecomp
 --prefix=/home/koral'

Configured features:
SOUND NOTIFY INOTIFY ZLIB MODULES THREADS PDUMPER GMP

Important settings:
  value of $LC_ALL: en_US.utf8
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Debugger

Minor modes in effect:
  global-magit-file-mode: t
  magit-auto-revert-mode: t
  global-git-commit-mode: t
  shell-dirtrack-mode: t
  recentf-mode: t
  savehist-mode: t
  global-ede-mode: t
  beacon-mode: t
  global-git-gutter-mode: t
  global-whitespace-mode: t
  delete-selection-mode: t
  async-bytecomp-package-mode: t
  helm--remap-mouse-mode: t
  show-paren-mode: t
  global-undo-tree-mode: t
  undo-tree-mode: t
  winner-mode: t
  global-auto-revert-mode: t
  display-time-mode: t
  which-key-mode: t
  projectile-mode: t
  global-company-mode: t
  company-mode: t
  global-flycheck-mode: t
  tooltip-mode: t
  global-eldoc-mode: t
  electric-indent-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  column-number-mode: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:
~/.emacs.d/lisp/verilog-mode hides /home/koral/share/emacs/28.0.50/lisp/progmodes/verilog-mode

Features:
(shadow sort mail-extr emacsbug sendmail find-dired make-mode help-fns
radix-tree git-rebase vc-annotate vc vc-dispatcher two-column iso-transl
semantic/db-file data-debug cedet-files ede/locate semantic/lex-spp
vc-git bug-reference macrostep-c cmacexp macrostep hideshow misearch
multi-isearch magit-extras magit-bookmark magit-submodule magit-obsolete
magit-blame magit-stash magit-reflog magit-bisect magit-push magit-pull
magit-fetch magit-clone magit-remote magit-commit magit-sequence
magit-notes magit-worktree magit-tag magit-merge magit-branch
magit-reset magit-files magit-refs magit-status magit magit-repos
magit-apply magit-wip magit-log magit-diff smerge-mode diff-mode
magit-core magit-autorevert magit-margin magit-transient magit-process
git-commit log-edit message rmc puny rfc822 mml mml-sec epa derived epg
epg-config mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev
gmm-utils mailheader pcvs-util add-log with-editor ffap tramp
tramp-loaddefs trampver tramp-integration files-x tramp-compat shell
parse-time iso8601 ls-lisp ivy colir ivy-overlay cus-start cus-load
ede/emacs semantic/db semantic/util-modes semantic/util semantic
semantic/tag semantic/lex semantic/fw mode-local ede/dired mule-util
recentf tree-widget helm-x-files helm-sys term/xterm xterm paredit gnus
nnheader gnus-util rmail rmail-loaddefs rfc2047 rfc2045 ietf-drums
mail-utils mm-util mail-prsvr wdired dired dired-loaddefs
sanityinc-tomorrow-eighties-theme color-theme-sanityinc-tomorrow color
elisp-slime-nav ob-lisp org ob ob-tangle ob-ref ob-lob ob-table ob-exp
org-macro org-footnote org-src ob-comint org-pcomplete pcomplete
org-list org-faces org-entities time-date noutline outline org-version
ob-emacs-lisp ob-core ob-eval org-table ol org-keys org-compat org-macs
org-loaddefs cal-menu calendar cal-loaddefs server savehist ede/speedbar
ede/files ede ede/detect ede/base ede/auto ede/source eieio-base
eieio-speedbar speedbar ezimage dframe eieio-custom wid-edit cedet
beacon elisp-depend google-translate google-translate-default-ui
google-translate-core-ui google-translate-core google-translate-tk
google-translate-backend git-gutter helm-gtags pulse which-func imenu
whitespace bash-completion vlf vlf-base vlf-tune flex-mode jison-mode
bison-mode cc-mode cc-guess cc-menus cc-cmds cc-styles cc-align cc-fonts
cc-engine cc-vars cc-defs git f s helm-git-grep delsel magit-mode
magit-git magit-utils crm magit-section transient helm-swoop ido
helm-command helm-elisp helm-eval edebug backtrace helm-for-files
helm-bookmark helm-adaptive helm-info bookmark text-property-search pp
helm-external helm-net xml url url-proxy url-privacy url-expand
url-methods url-history url-cookie url-domsuf url-util mailcap helm-mode
helm-files helm-buffers helm-occur helm-tags helm-locate helm-grep
helm-regexp format-spec helm-utils helm-help helm-types helm-config
helm-easymenu async-bytecomp helm helm-source eieio-compat
helm-multi-match helm-lib async paren undo-tree diff winner windmove
autorevert filenotify ibuf-macs time image which-key advice projectile
grep compile ibuf-ext ibuffer ibuffer-loaddefs thingatpt company-oddmuse
company-keywords company-etags etags fileloop generator xref project
company-gtags company-dabbrev-code company-dabbrev company-files
company-capf company-cmake company-xcode company-clang company-semantic
company-eclim company-template company-bbdb company pcase flycheck
find-func dash gcmh comp rx cl-extra help-mode k-utils gud easy-mmode
comint regexp-opt ansi-color ring edmacro kmacro slime-autoloads info
tool-bar package easymenu browse-url url-handlers url-parse auth-source
cl-seq eieio eieio-core cl-macs eieio-loaddefs password-cache json
subr-x map url-vars seq byte-opt gv bytecomp byte-compile cconv
cl-loaddefs cl-lib tooltip eldoc electric uniquify ediff-hook vc-hooks
lisp-float-type tabulated-list replace newcomment text-mode elisp-mode
lisp-mode prog-mode register page tab-bar menu-bar rfn-eshadow isearch
timer select mouse jit-lock font-lock syntax facemenu font-core
term/tty-colors frame minibuffer cl-generic cham georgian utf-8-lang
misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms
cp51932 hebrew greek romanian slovak czech european ethiopic indian
cyrillic chinese composite charscript charprop case-table epa-hook
jka-cmpr-hook help simple abbrev obarray cl-preloaded nadvice loaddefs
button faces cus-face macroexp files text-properties overlay sha1 md5
base64 format env code-pages mule custom widget hashtable-print-readable
backquote threads inotify multi-tty make-network-process emacs)

Memory information:
((conses 16 2475879 106891)
 (symbols 48 40051 2)
 (strings 32 181600 47924)
 (string-bytes 1 9741313)
 (vectors 16 77416)
 (vector-slots 8 1228210 168271)
 (floats 8 780 7731)
 (intervals 56 158344 397)
 (buffers 992 56))





^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#41755: feature/native-comp (master?): temacs crash in GC during mark phase
  2020-06-07 19:16 bug#41755: feature/native-comp (master?): temacs crash in GC during mark phase Andrea Corallo
@ 2020-06-07 19:41 ` Pip Cet
  2020-06-07 19:57   ` Nicolas Bértolo
  0 siblings, 1 reply; 14+ messages in thread
From: Pip Cet @ 2020-06-07 19:41 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: Nicolas Bértolo, Paul Eggert, 41755

On Sun, Jun 7, 2020 at 7:26 PM Andrea Corallo <akrl@sdf.org> wrote:
> I'm experiencing non reproducible GC related crashes building
> feature/native-comp.

Does it happen for non-optimized builds? Also, what symbol is at
Lisp_Object value 0xb4f0 (i.e. iQwhatever == 405)?

> Both back-traces I've got looks similar:
>
> #0  0x00000000004dfe4c in symbol_marked_p (s=0xb4f0) at pdumper.h:149
> #1  mark_object (arg=<optimized out>) at alloc.c:6731
> #2  0x0000000000552390 in traverse_intervals_noorder (tree=0xffffffffe070,
>     function=0x4e0fe0 <mark_interval_tree_1>, arg=0x0) at intervals.c:234
> #3  0x00000000004e0060 in mark_object (arg=<optimized out>) at alloc.c:6784
> #4  0x00000000004e08ec in mark_memory (end=<optimized out>, start=<optimized out>)
>     at alloc.c:4860
> #5  mark_stack (bottom=<optimized out>, end=end@entry=0xffffffff71d0 "")
>     at alloc.c:5071
> #6  0x000000000055fd48 in mark_one_thread (thread=0x903b80 <main_thread>)
>     at thread.c:630
> #7  mark_threads_callback (ignore=<optimized out>) at thread.c:661
> #8  0x00000000004e1238 in garbage_collect () at alloc.c:6101
> #9  0x00000000004ff874 in maybe_gc () at lisp.h:5090
> #10 eval_sub (form=form@entry=XIL(0xcefe63)) at eval.c:2243
> #11 0x0000000000500108 in Fwhile (args=<optimized out>) at eval.c:1013
> ...
> So far I was not able to reproduce on X86_64 (where I've rr).

Please let us know if you manage to.

> This may not be related to feature/native-comp but to one of the recent
> GC changes and the stack marking strategy.

Or to the live_*_p changes...





^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#41755: feature/native-comp (master?): temacs crash in GC during mark phase
  2020-06-07 19:41 ` Pip Cet
@ 2020-06-07 19:57   ` Nicolas Bértolo
  2020-06-07 20:18     ` Pip Cet
  0 siblings, 1 reply; 14+ messages in thread
From: Nicolas Bértolo @ 2020-06-07 19:57 UTC (permalink / raw)
  To: Pip Cet; +Cc: Paul Eggert, 41755, Andrea Corallo

Hi,

I can confirm that what I found was this issue.

> Does it happen for non-optimized builds? Also, what symbol is at
> Lisp_Object value 0xb4f0 (i.e. iQwhatever == 405)?

I haven't been able to reproduce it in non-optimized builds.

What I understand so far is that the GC begins marking the stack of the main
thread and it takes some data in the stack as a pointer to valid Lisp data. It
starts following all the pointers and it eventually SIGSEGVs. I have seen it
crash trying to read symbols, conses and strings.





^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#41755: feature/native-comp (master?): temacs crash in GC during mark phase
  2020-06-07 19:57   ` Nicolas Bértolo
@ 2020-06-07 20:18     ` Pip Cet
  2020-06-07 23:09       ` Nicolas Bértolo
  0 siblings, 1 reply; 14+ messages in thread
From: Pip Cet @ 2020-06-07 20:18 UTC (permalink / raw)
  To: Nicolas Bértolo; +Cc: Paul Eggert, 41755, Andrea Corallo

On Sun, Jun 7, 2020 at 7:58 PM Nicolas Bértolo <nicolasbertolo@gmail.com> wrote:
> I can confirm that what I found was this issue.
>
> > Does it happen for non-optimized builds? Also, what symbol is at
> > Lisp_Object value 0xb4f0 (i.e. iQwhatever == 405)?
>
> I haven't been able to reproduce it in non-optimized builds.

But you still have last_marked in your build, right? That would be a
good starting point to find out which object was marked and what was
actually on the stack there...

> What I understand so far is that the GC begins marking the stack of the main
> thread and it takes some data in the stack as a pointer to valid Lisp data.

That's my understanding as well. In Andrea's case, it looks like
something was marked as though it were a symbol, but it was actually
pointing back to the stack...

> It
> starts following all the pointers and it eventually SIGSEGVs. I have seen it
> crash trying to read symbols, conses and strings.

Is it always a symbol that's found on the stack by mark_maybe_*, though?





^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#41755: feature/native-comp (master?): temacs crash in GC during mark phase
  2020-06-07 20:18     ` Pip Cet
@ 2020-06-07 23:09       ` Nicolas Bértolo
  2020-06-08  3:39         ` Nicolas Bértolo
  0 siblings, 1 reply; 14+ messages in thread
From: Nicolas Bértolo @ 2020-06-07 23:09 UTC (permalink / raw)
  To: Pip Cet; +Cc: Paul Eggert, 41755, Andrea Corallo

> But you still have last_marked in your build, right? That would be a
> good starting point to find out which object was marked and what was
> actually on the stack there...

Yes, I'll take a look there. BTW, adding

#pragma GCC optimize ("-O0")

to the top of alloc.c does not prevent it from crashing, so debugging could be
easier.

> Is it always a symbol that's found on the stack by mark_maybe_*, though?

No, in this case it is a cons.

0x0000000400115a1a in cons_marked_p (c=0xfffffffc00, c@entry=0xbf07b0)
at alloc.c:3899
3899      return pdumper_object_p (c)
(gdb) bt
#0  0x0000000400115a1a in cons_marked_p (c=0xfffffffc00,
c@entry=0xbf07b0) at alloc.c:3899
#1  0x000000040011a567 in mark_object (arg=XIL(0xbf0890)) at alloc.c:6775
#2  0x00000004001125d9 in mark_interval_tree_1 (i=0x464a9b3,
dummy=0x0) at alloc.c:1468
#3  0x000000040018fde4 in traverse_intervals_noorder
(tree=tree@entry=0x464a9b3, function=function@entry=0x4001125b0
<mark_interval_tree_1>, arg=arg@entry=0x0) at intervals.c:234
#4  0x0000000400112619 in mark_interval_tree (i=0x464a9b3) at alloc.c:1477
#5  0x000000040011a2d4 in mark_object (arg=XIL(0x454c0b0)) at alloc.c:6629
#6  0x000000040011a5b2 in mark_object (arg=XIL(0x40061cf60),
arg@entry=XIL(0x4d14553)) at alloc.c:6786
#7  0x00000004001171dd in mark_maybe_pointer (p=p@entry=0x4d14553) at
alloc.c:4804
#8  0x0000000400117253 in mark_memory (start=0xbf0b30,
start@entry=0xbff990, end=0xbff990, end@entry=0xbf0b30) at
alloc.c:4854
#9  0x00000004001172b0 in mark_stack (bottom=0xbff990 "",
end=end@entry=0xbf0b30 "0\f\277") at alloc.c:5073
#10 0x00000004001a0a71 in mark_one_thread (thread=0x400559500
<main_thread>) at thread.c:630
#11 mark_threads_callback (ignore=ignore@entry=0x0) at thread.c:661
#12 0x00000004001172fe in flush_stack_call_func1
(func=func@entry=0x4001a0a30 <mark_threads_callback>,
arg=arg@entry=0x0) at alloc.c:5114
#13 0x00000004001a1c9c in flush_stack_call_func (arg=0x0,
func=0x4001a0a30 <mark_threads_callback>) at lisp.h:3825
#14 mark_threads () at thread.c:668
#15 0x0000000000000000 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)





^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#41755: feature/native-comp (master?): temacs crash in GC during mark phase
  2020-06-07 23:09       ` Nicolas Bértolo
@ 2020-06-08  3:39         ` Nicolas Bértolo
  2020-06-08  6:29           ` Eli Zaretskii
  2020-06-08  6:41           ` Pip Cet
  0 siblings, 2 replies; 14+ messages in thread
From: Nicolas Bértolo @ 2020-06-08  3:39 UTC (permalink / raw)
  To: Pip Cet; +Cc: Paul Eggert, 41755, Andrea Corallo

[-- Attachment #1: Type: text/plain, Size: 3208 bytes --]

I think I found the bug. Summary: A heap-based cons points to a
stack-based string. The bug was introduced by:

e38678b268c * Reduce the number of files probed when finding a lisp file.

The function load_charset_map_from_file () creates two AUTO_STRINGs for the
suffixes it needs. It puts this into a stack-based cons that it passes as the
"suffixes" argument of openp ().

This function then constructs the list of "extended suffixes" that associates
each suffix with the directory that needs to be inserted in the middle of the
path, if any. This list is allocated in the heap.

When the GC kicks in, the lists are still lying around in memory but the
stack-based strings don't exist anymore.

The attached patch fixes this case. There may be similar cases in other parts of
the code. I'll take a look.

Nico.


El dom., 7 jun. 2020 a las 20:09, Nicolas Bértolo
(<nicolasbertolo@gmail.com>) escribió:
>
> > But you still have last_marked in your build, right? That would be a
> > good starting point to find out which object was marked and what was
> > actually on the stack there...
>
> Yes, I'll take a look there. BTW, adding
>
> #pragma GCC optimize ("-O0")
>
> to the top of alloc.c does not prevent it from crashing, so debugging could be
> easier.
>
> > Is it always a symbol that's found on the stack by mark_maybe_*, though?
>
> No, in this case it is a cons.
>
> 0x0000000400115a1a in cons_marked_p (c=0xfffffffc00, c@entry=0xbf07b0)
> at alloc.c:3899
> 3899      return pdumper_object_p (c)
> (gdb) bt
> #0  0x0000000400115a1a in cons_marked_p (c=0xfffffffc00,
> c@entry=0xbf07b0) at alloc.c:3899
> #1  0x000000040011a567 in mark_object (arg=XIL(0xbf0890)) at alloc.c:6775
> #2  0x00000004001125d9 in mark_interval_tree_1 (i=0x464a9b3,
> dummy=0x0) at alloc.c:1468
> #3  0x000000040018fde4 in traverse_intervals_noorder
> (tree=tree@entry=0x464a9b3, function=function@entry=0x4001125b0
> <mark_interval_tree_1>, arg=arg@entry=0x0) at intervals.c:234
> #4  0x0000000400112619 in mark_interval_tree (i=0x464a9b3) at alloc.c:1477
> #5  0x000000040011a2d4 in mark_object (arg=XIL(0x454c0b0)) at alloc.c:6629
> #6  0x000000040011a5b2 in mark_object (arg=XIL(0x40061cf60),
> arg@entry=XIL(0x4d14553)) at alloc.c:6786
> #7  0x00000004001171dd in mark_maybe_pointer (p=p@entry=0x4d14553) at
> alloc.c:4804
> #8  0x0000000400117253 in mark_memory (start=0xbf0b30,
> start@entry=0xbff990, end=0xbff990, end@entry=0xbf0b30) at
> alloc.c:4854
> #9  0x00000004001172b0 in mark_stack (bottom=0xbff990 "",
> end=end@entry=0xbf0b30 "0\f\277") at alloc.c:5073
> #10 0x00000004001a0a71 in mark_one_thread (thread=0x400559500
> <main_thread>) at thread.c:630
> #11 mark_threads_callback (ignore=ignore@entry=0x0) at thread.c:661
> #12 0x00000004001172fe in flush_stack_call_func1
> (func=func@entry=0x4001a0a30 <mark_threads_callback>,
> arg=arg@entry=0x0) at alloc.c:5114
> #13 0x00000004001a1c9c in flush_stack_call_func (arg=0x0,
> func=0x4001a0a30 <mark_threads_callback>) at lisp.h:3825
> #14 mark_threads () at thread.c:668
> #15 0x0000000000000000 in ?? ()
> Backtrace stopped: previous frame inner to this frame (corrupt stack?)

[-- Attachment #2: 0001-Avoid-stack-based-strings-when-calling-openp-.-Fixes.patch --]
[-- Type: application/octet-stream, Size: 1242 bytes --]

From 2febde53d6ed070fa033754511795b5cec5367c4 Mon Sep 17 00:00:00 2001
From: Nicolas Bertolo <nicolasbertolo@gmail.com>
Date: Mon, 8 Jun 2020 00:35:30 -0300
Subject: [PATCH] Avoid stack-based strings when calling openp(). Fixes #41755.

The openp() function builds a heap based list from the suffixes. If we
pass stack-based strings we'll get a heap-cons pointing to a
stack-based string, which will explode during GC.

* src/charset.c (load_charset_map_from_file): Allocate suffixes in the
heap.
---
 src/charset.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/charset.c b/src/charset.c
index 8635aad3ed6..d9ceaea8eb2 100644
--- a/src/charset.c
+++ b/src/charset.c
@@ -480,9 +480,9 @@ load_charset_map_from_file (struct charset *charset, Lisp_Object mapfile,
   FILE *fp;
   struct charset_map_entries *head, *entries;
   int n_entries;
-  AUTO_STRING (map, ".map");
-  AUTO_STRING (txt, ".txt");
-  AUTO_LIST2 (suffixes, map, txt);
+  Lisp_Object map = build_string (".map");
+  Lisp_Object txt = build_string (".txt");
+  Lisp_Object suffixes = list2 (map, txt);
   ptrdiff_t count = SPECPDL_INDEX ();
   record_unwind_protect_nothing ();
   specbind (Qfile_name_handler_alist, Qnil);
-- 
2.25.1.windows.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* bug#41755: feature/native-comp (master?): temacs crash in GC during mark phase
  2020-06-08  3:39         ` Nicolas Bértolo
@ 2020-06-08  6:29           ` Eli Zaretskii
  2020-06-08 18:24             ` Nicolas Bértolo
  2020-06-08  6:41           ` Pip Cet
  1 sibling, 1 reply; 14+ messages in thread
From: Eli Zaretskii @ 2020-06-08  6:29 UTC (permalink / raw)
  To: 41755, nicolasbertolo, pipcet; +Cc: eggert, akrl

On June 8, 2020 6:39:34 AM GMT+03:00, "Nicolas Bértolo" <nicolasbertolo@gmail.com> wrote:
> I think I found the bug. Summary: A heap-based cons points to a
> stack-based string. The bug was introduced by:
> 
> e38678b268c * Reduce the number of files probed when finding a lisp
> file.
> 
> The function load_charset_map_from_file () creates two AUTO_STRINGs
> for the
> suffixes it needs. It puts this into a stack-based cons that it passes
> as the
> "suffixes" argument of openp ().
> 
> This function then constructs the list of "extended suffixes" that
> associates
> each suffix with the directory that needs to be inserted in the middle
> of the
> path, if any. This list is allocated in the heap.
> 
> When the GC kicks in, the lists are still lying around in memory but
> the
> stack-based strings don't exist anymore.
> 
> The attached patch fixes this case. There may be similar cases in
> other parts of
> the code. I'll take a look.
> 
> Nico.
> 
> 
> El dom., 7 jun. 2020 a las 20:09, Nicolas Bértolo
> (<nicolasbertolo@gmail.com>) escribió:
> >
> > > But you still have last_marked in your build, right? That would be
> a
> > > good starting point to find out which object was marked and what
> was
> > > actually on the stack there...
> >
> > Yes, I'll take a look there. BTW, adding
> >
> > #pragma GCC optimize ("-O0")
> >
> > to the top of alloc.c does not prevent it from crashing, so
> debugging could be
> > easier.
> >
> > > Is it always a symbol that's found on the stack by mark_maybe_*,
> though?
> >
> > No, in this case it is a cons.
> >
> > 0x0000000400115a1a in cons_marked_p (c=0xfffffffc00,
> c@entry=0xbf07b0)
> > at alloc.c:3899
> > 3899      return pdumper_object_p (c)
> > (gdb) bt
> > #0  0x0000000400115a1a in cons_marked_p (c=0xfffffffc00,
> > c@entry=0xbf07b0) at alloc.c:3899
> > #1  0x000000040011a567 in mark_object (arg=XIL(0xbf0890)) at
> alloc.c:6775
> > #2  0x00000004001125d9 in mark_interval_tree_1 (i=0x464a9b3,
> > dummy=0x0) at alloc.c:1468
> > #3  0x000000040018fde4 in traverse_intervals_noorder
> > (tree=tree@entry=0x464a9b3, function=function@entry=0x4001125b0
> > <mark_interval_tree_1>, arg=arg@entry=0x0) at intervals.c:234
> > #4  0x0000000400112619 in mark_interval_tree (i=0x464a9b3) at
> alloc.c:1477
> > #5  0x000000040011a2d4 in mark_object (arg=XIL(0x454c0b0)) at
> alloc.c:6629
> > #6  0x000000040011a5b2 in mark_object (arg=XIL(0x40061cf60),
> > arg@entry=XIL(0x4d14553)) at alloc.c:6786
> > #7  0x00000004001171dd in mark_maybe_pointer (p=p@entry=0x4d14553)
> at
> > alloc.c:4804
> > #8  0x0000000400117253 in mark_memory (start=0xbf0b30,
> > start@entry=0xbff990, end=0xbff990, end@entry=0xbf0b30) at
> > alloc.c:4854
> > #9  0x00000004001172b0 in mark_stack (bottom=0xbff990 "",
> > end=end@entry=0xbf0b30 "0\f\277") at alloc.c:5073
> > #10 0x00000004001a0a71 in mark_one_thread (thread=0x400559500
> > <main_thread>) at thread.c:630
> > #11 mark_threads_callback (ignore=ignore@entry=0x0) at thread.c:661
> > #12 0x00000004001172fe in flush_stack_call_func1
> > (func=func@entry=0x4001a0a30 <mark_threads_callback>,
> > arg=arg@entry=0x0) at alloc.c:5114
> > #13 0x00000004001a1c9c in flush_stack_call_func (arg=0x0,
> > func=0x4001a0a30 <mark_threads_callback>) at lisp.h:3825
> > #14 mark_threads () at thread.c:668
> > #15 0x0000000000000000 in ?? ()
> > Backtrace stopped: previous frame inner to this frame (corrupt
> stack?)

Thanks, but are you sure the fix should be in load_charset_map_from_file and not in the new code added to openp in the offending commit?  Why does the additional code in openp need to cons its list off the heap?

IOW, it sounds like those additions now made openp unsafe for callers who manipulate stack-based Lisp objects.  The old openp was AFAIK careful enough not to impose any constraints like that on its callers.





^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#41755: feature/native-comp (master?): temacs crash in GC during mark phase
  2020-06-08  3:39         ` Nicolas Bértolo
  2020-06-08  6:29           ` Eli Zaretskii
@ 2020-06-08  6:41           ` Pip Cet
  2020-06-08 18:51             ` Nicolas Bértolo
  1 sibling, 1 reply; 14+ messages in thread
From: Pip Cet @ 2020-06-08  6:41 UTC (permalink / raw)
  To: Nicolas Bértolo; +Cc: Paul Eggert, 41755, Andrea Corallo

On Mon, Jun 8, 2020 at 3:39 AM Nicolas Bértolo <nicolasbertolo@gmail.com> wrote:
> I think I found the bug. Summary: A heap-based cons points to a
> stack-based string. The bug was introduced by:

Good catch!

I'm wondering what we could do to make such bugs easier to find...

> The attached patch fixes this case. There may be similar cases in other parts of
> the code. I'll take a look.

Would GC_CHECK_MARKED_OBJECTS have caught this?





^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#41755: feature/native-comp (master?): temacs crash in GC during mark phase
  2020-06-08  6:29           ` Eli Zaretskii
@ 2020-06-08 18:24             ` Nicolas Bértolo
  0 siblings, 0 replies; 14+ messages in thread
From: Nicolas Bértolo @ 2020-06-08 18:24 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: eggert, 41755, pipcet, akrl

> Thanks, but are you sure the fix should be in load_charset_map_from_file and
> not in the new code added to openp in the offending commit? Why does the
> additional code in openp need to cons its list off the heap?

Good idea. It should be easy to rework the openp to avoid putting the suffixes
in a heap based list.

What openp() does now is turn a list of suffixes into a list of suffixes and
associated directories that need to be added to the file path.

For example, it turns this:

(".eln" ".elc" ".elc.gz" ".el" ".el.gz")

 into this:

((nil . ".eln")
 (comp-native-path-postfix . ".eln")
 (nil . ".elc")
 (nil . ".elc.gz")
 (nil . ".el")
 (nil . ".el.gz"))

I did it this way because then iterating over it can be done using a
FOR_EACH_TAIL macro. I'll change this to use two lists, one for the suffixes and
another one for the directories. The code for iterating them will be just a
little more complicated, but it should be safe to pass stack-based suffixes.





^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#41755: feature/native-comp (master?): temacs crash in GC during mark phase
  2020-06-08  6:41           ` Pip Cet
@ 2020-06-08 18:51             ` Nicolas Bértolo
  2020-06-08 19:05               ` Pip Cet
  0 siblings, 1 reply; 14+ messages in thread
From: Nicolas Bértolo @ 2020-06-08 18:51 UTC (permalink / raw)
  To: Pip Cet; +Cc: Paul Eggert, 41755, Andrea Corallo

> I'm wondering what we could do to make such bugs easier to find...

We could add a canary to stack based strings and conses. Then while
marking if we
come across a stack based string or cons we check that the canary is intact. If
it is not, then we can be sure that the memory has been written over.

Something like this:

struct Stack_String
{
  struct Lisp_String string;
  uint64_t canary = 0x12341234;
};

> Would GC_CHECK_MARKED_OBJECTS have caught this?

As far as I can see, during a GC we can't know if a stack-based string
is still alive.





^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#41755: feature/native-comp (master?): temacs crash in GC during mark phase
  2020-06-08 18:51             ` Nicolas Bértolo
@ 2020-06-08 19:05               ` Pip Cet
  2020-06-09 14:20                 ` Nicolas Bértolo
  0 siblings, 1 reply; 14+ messages in thread
From: Pip Cet @ 2020-06-08 19:05 UTC (permalink / raw)
  To: Nicolas Bértolo; +Cc: Paul Eggert, 41755, Andrea Corallo

Nicolas Bértolo <nicolasbertolo@gmail.com> writes:

>> I'm wondering what we could do to make such bugs easier to find...
>
> We could add a canary to stack based strings and conses. Then while
> marking if we
> come across a stack based string or cons we check that the canary is
> intact. If
> it is not, then we can be sure that the memory has been written over.

I believe we should never be marking stack-based objects. If we do
that's a GC bug.

Code like

  AUTO_STRING (s, "foo");
  Lisp_Object c = Fcons (s, s);
  garbage_collect ();
  ...
  Fsetcar (c, Qnil);
  Fsetcdr (c, Qnil);

shouldn't work. I hope it doesn't :-) (With GC_CHECK_MARKED_OBJECTS, it
should abort; without, it would leave the mark bit of s set, so the
"..." code would presumably crash).

> Something like this:
>
> struct Stack_String
> {
>   struct Lisp_String string;
>   uint64_t canary = 0x12341234;
> };
>
>> Would GC_CHECK_MARKED_OBJECTS have caught this?
>
> As far as I can see, during a GC we can't know if a stack-based string
> is still alive.

But we can know whether a string is stack-based or not; if it is, we
shouldn't be marking it, so we can abort in that case...





^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#41755: feature/native-comp (master?): temacs crash in GC during mark phase
  2020-06-08 19:05               ` Pip Cet
@ 2020-06-09 14:20                 ` Nicolas Bértolo
  2020-06-10 12:53                   ` Andrea Corallo
  0 siblings, 1 reply; 14+ messages in thread
From: Nicolas Bértolo @ 2020-06-09 14:20 UTC (permalink / raw)
  To: 41755; +Cc: Paul Eggert, Pip Cet, Andrea Corallo

[-- Attachment #1: Type: text/plain, Size: 328 bytes --]

Hi,

Here's a new patch that copies the suffixes before building the heap-based list
in openp. I know this is not the solution I proposed, but I couldn't adapt the
code without increasing its complexity way too much for my liking. If you think
this is not an appropriate solution I will come up with another one.

Thanks, Nico.

[-- Attachment #2: 0001-Copy-suffixes-passed-to-openp-to-avoid-GC-crashes.-F.patch --]
[-- Type: application/octet-stream, Size: 2209 bytes --]

From 4b4fd526abe124c8a74bfa11209dd53c3a564cc7 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Nicol=C3=A1s=20B=C3=A9rtolo?= <nicolasbertolo@gmail.com>
Date: Mon, 8 Jun 2020 22:01:25 -0300
Subject: [PATCH] Copy suffixes passed to 'openp' to avoid GC crashes. Fixes
 bug#41755

In openp_add_middle_dir_to_suffixes we build a heap-based list from
the passed suffixes.  It is crucial that we don't create a heap-based
cons that points to a stack-based list.

* src/lread.c (openp_add_middle_dir_to_suffixes): Copy suffixes when
building a list of middle-dirs and suffixes.
---
 src/lread.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/src/lread.c b/src/lread.c
index c127d32eb17..65d84462c4c 100644
--- a/src/lread.c
+++ b/src/lread.c
@@ -1635,21 +1635,27 @@ openp_add_middle_dir_to_suffixes (Lisp_Object suffixes)
   Lisp_Object extended_suf = Qnil;
   FOR_EACH_TAIL_SAFE (tail)
     {
-#ifdef HAVE_NATIVE_COMP
+      /*  suffixes may be a stack-based cons pointing to stack-based
+          strings.  We must copy the suffix if we are putting it into
+          a heap-based cons to avoid a dangling reference.  This would
+          lead to crashes during the GC.  */
       CHECK_STRING_CAR (tail);
       char * suf = SSDATA (XCAR (tail));
+      Lisp_Object copied_suffix = build_string (suf);
+#ifdef HAVE_NATIVE_COMP
       if (strcmp (NATIVE_ELISP_SUFFIX, suf) == 0)
         {
           CHECK_STRING (Vcomp_native_path_postfix);
           /* Here we add them in the opposite order so that nreverse
              corrects it.  */
-          extended_suf = Fcons (Fcons (Qnil, XCAR (tail)), extended_suf);
-          extended_suf = Fcons (Fcons (Vcomp_native_path_postfix, XCAR (tail)),
+          extended_suf = Fcons (Fcons (Qnil, copied_suffix), extended_suf);
+          extended_suf = Fcons (Fcons (Vcomp_native_path_postfix,
+                                       copied_suffix),
                                 extended_suf);
         }
       else
 #endif
-	extended_suf = Fcons (Fcons (Qnil, XCAR (tail)), extended_suf);
+	extended_suf = Fcons (Fcons (Qnil, copied_suffix), extended_suf);
     }
 
   suffixes = Fnreverse (extended_suf);
-- 
2.25.1.windows.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* bug#41755: feature/native-comp (master?): temacs crash in GC during mark phase
  2020-06-09 14:20                 ` Nicolas Bértolo
@ 2020-06-10 12:53                   ` Andrea Corallo
  2020-06-27 14:39                     ` Andrea Corallo
  0 siblings, 1 reply; 14+ messages in thread
From: Andrea Corallo @ 2020-06-10 12:53 UTC (permalink / raw)
  To: Nicolas Bértolo; +Cc: Paul Eggert, 41755, Pip Cet

Nicolas Bértolo <nicolasbertolo@gmail.com> writes:

> Hi,
>
> Here's a new patch that copies the suffixes before building the heap-based list
> in openp. I know this is not the solution I proposed, but I couldn't adapt the
> code without increasing its complexity way too much for my liking. If you think
> this is not an appropriate solution I will come up with another one.
>
> Thanks, Nico.

Hi Nico,

I pushed this so the branch is stable to work on for everybody and
there's no rush if you want to improve your solution in the meanwhile.

Thanks

  Andrea

-- 
akrl@sdf.org





^ permalink raw reply	[flat|nested] 14+ messages in thread

* bug#41755: feature/native-comp (master?): temacs crash in GC during mark phase
  2020-06-10 12:53                   ` Andrea Corallo
@ 2020-06-27 14:39                     ` Andrea Corallo
  0 siblings, 0 replies; 14+ messages in thread
From: Andrea Corallo @ 2020-06-27 14:39 UTC (permalink / raw)
  To: 41755-done

I'm closing this bug as the issue originating the crash is fixed.

  Andrea





^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-06-27 14:39 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-07 19:16 bug#41755: feature/native-comp (master?): temacs crash in GC during mark phase Andrea Corallo
2020-06-07 19:41 ` Pip Cet
2020-06-07 19:57   ` Nicolas Bértolo
2020-06-07 20:18     ` Pip Cet
2020-06-07 23:09       ` Nicolas Bértolo
2020-06-08  3:39         ` Nicolas Bértolo
2020-06-08  6:29           ` Eli Zaretskii
2020-06-08 18:24             ` Nicolas Bértolo
2020-06-08  6:41           ` Pip Cet
2020-06-08 18:51             ` Nicolas Bértolo
2020-06-08 19:05               ` Pip Cet
2020-06-09 14:20                 ` Nicolas Bértolo
2020-06-10 12:53                   ` Andrea Corallo
2020-06-27 14:39                     ` Andrea Corallo

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).