* bug#41755: feature/native-comp (master?): temacs crash in GC during mark phase @ 2020-06-07 19:16 Andrea Corallo 2020-06-07 19:41 ` Pip Cet 0 siblings, 1 reply; 14+ messages in thread From: Andrea Corallo @ 2020-06-07 19:16 UTC (permalink / raw) To: 41755; +Cc: Nicolas Bértolo, Paul Eggert I'm experiencing non reproducible GC related crashes building feature/native-comp. Both back-traces I've got looks similar: #0 0x00000000004dfe4c in symbol_marked_p (s=0xb4f0) at pdumper.h:149 #1 mark_object (arg=<optimized out>) at alloc.c:6731 #2 0x0000000000552390 in traverse_intervals_noorder (tree=0xffffffffe070, function=0x4e0fe0 <mark_interval_tree_1>, arg=0x0) at intervals.c:234 #3 0x00000000004e0060 in mark_object (arg=<optimized out>) at alloc.c:6784 #4 0x00000000004e08ec in mark_memory (end=<optimized out>, start=<optimized out>) at alloc.c:4860 #5 mark_stack (bottom=<optimized out>, end=end@entry=0xffffffff71d0 "") at alloc.c:5071 #6 0x000000000055fd48 in mark_one_thread (thread=0x903b80 <main_thread>) at thread.c:630 #7 mark_threads_callback (ignore=<optimized out>) at thread.c:661 #8 0x00000000004e1238 in garbage_collect () at alloc.c:6101 #9 0x00000000004ff874 in maybe_gc () at lisp.h:5090 #10 eval_sub (form=form@entry=XIL(0xcefe63)) at eval.c:2243 #11 0x0000000000500108 in Fwhile (args=<optimized out>) at eval.c:1013 ... During: ./temacs --batch -l loadup --temacs=pbootstrap Not sure why but this looks easier to reproduce on aarch64 (even if most of the times is still bootstraping clean). IIUC in this case we are trying to access (struct Lisp_Symbol *) 0xb4f0 but the memory cannot be accessed. The address looks quite odd to me and infact checking with 'info proc mappings' the lowest mapped memory seams to start at 0x400000. So far I was not able to reproduce on X86_64 (where I've rr). This may not be related to feature/native-comp but to one of the recent GC changes and the stack marking strategy. I suspect Nicolas may be observing the same issue on Windows https://lists.gnu.org/archive/html/bug-gnu-emacs/2020-06/msg00320.html I'll keep on looking into this. Regards Andrea In GNU Emacs 28.0.50 (build 1, aarch64-unknown-linux-gnu) of 2020-06-06 built on Repository revision: 3d576c784b3fa01b4d6b33a4172351b7c3a61660 Repository branch: HEAD System Description: Ubuntu 18.04.3 LTS Recent messages: Grep finished with 2 matches found Current locus from *grep* Mark set [2 times] Reverting buffer ‘Makefile’. Mark set [2 times] Mark saved where search started Mark set [4 times] File GTAGS not found. Run 'gtags'? (y or n) y Quit [2 times] No match Configured using: 'configure --with-nativecomp --with-x-toolkit=no --with-xpm=ifavailable --with-jpeg=ifavailable --with-png=ifavailable --with-gif=ifavailable --with-tiff=ifavailable --with-gnutls=ifavailable --with-nativecomp --prefix=/home/koral' Configured features: SOUND NOTIFY INOTIFY ZLIB MODULES THREADS PDUMPER GMP Important settings: value of $LC_ALL: en_US.utf8 value of $LANG: en_US.UTF-8 locale-coding-system: utf-8-unix Major mode: Debugger Minor modes in effect: global-magit-file-mode: t magit-auto-revert-mode: t global-git-commit-mode: t shell-dirtrack-mode: t recentf-mode: t savehist-mode: t global-ede-mode: t beacon-mode: t global-git-gutter-mode: t global-whitespace-mode: t delete-selection-mode: t async-bytecomp-package-mode: t helm--remap-mouse-mode: t show-paren-mode: t global-undo-tree-mode: t undo-tree-mode: t winner-mode: t global-auto-revert-mode: t display-time-mode: t which-key-mode: t projectile-mode: t global-company-mode: t company-mode: t global-flycheck-mode: t tooltip-mode: t global-eldoc-mode: t electric-indent-mode: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t column-number-mode: t line-number-mode: t transient-mark-mode: t Load-path shadows: ~/.emacs.d/lisp/verilog-mode hides /home/koral/share/emacs/28.0.50/lisp/progmodes/verilog-mode Features: (shadow sort mail-extr emacsbug sendmail find-dired make-mode help-fns radix-tree git-rebase vc-annotate vc vc-dispatcher two-column iso-transl semantic/db-file data-debug cedet-files ede/locate semantic/lex-spp vc-git bug-reference macrostep-c cmacexp macrostep hideshow misearch multi-isearch magit-extras magit-bookmark magit-submodule magit-obsolete magit-blame magit-stash magit-reflog magit-bisect magit-push magit-pull magit-fetch magit-clone magit-remote magit-commit magit-sequence magit-notes magit-worktree magit-tag magit-merge magit-branch magit-reset magit-files magit-refs magit-status magit magit-repos magit-apply magit-wip magit-log magit-diff smerge-mode diff-mode magit-core magit-autorevert magit-margin magit-transient magit-process git-commit log-edit message rmc puny rfc822 mml mml-sec epa derived epg epg-config mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader pcvs-util add-log with-editor ffap tramp tramp-loaddefs trampver tramp-integration files-x tramp-compat shell parse-time iso8601 ls-lisp ivy colir ivy-overlay cus-start cus-load ede/emacs semantic/db semantic/util-modes semantic/util semantic semantic/tag semantic/lex semantic/fw mode-local ede/dired mule-util recentf tree-widget helm-x-files helm-sys term/xterm xterm paredit gnus nnheader gnus-util rmail rmail-loaddefs rfc2047 rfc2045 ietf-drums mail-utils mm-util mail-prsvr wdired dired dired-loaddefs sanityinc-tomorrow-eighties-theme color-theme-sanityinc-tomorrow color elisp-slime-nav ob-lisp org ob ob-tangle ob-ref ob-lob ob-table ob-exp org-macro org-footnote org-src ob-comint org-pcomplete pcomplete org-list org-faces org-entities time-date noutline outline org-version ob-emacs-lisp ob-core ob-eval org-table ol org-keys org-compat org-macs org-loaddefs cal-menu calendar cal-loaddefs server savehist ede/speedbar ede/files ede ede/detect ede/base ede/auto ede/source eieio-base eieio-speedbar speedbar ezimage dframe eieio-custom wid-edit cedet beacon elisp-depend google-translate google-translate-default-ui google-translate-core-ui google-translate-core google-translate-tk google-translate-backend git-gutter helm-gtags pulse which-func imenu whitespace bash-completion vlf vlf-base vlf-tune flex-mode jison-mode bison-mode cc-mode cc-guess cc-menus cc-cmds cc-styles cc-align cc-fonts cc-engine cc-vars cc-defs git f s helm-git-grep delsel magit-mode magit-git magit-utils crm magit-section transient helm-swoop ido helm-command helm-elisp helm-eval edebug backtrace helm-for-files helm-bookmark helm-adaptive helm-info bookmark text-property-search pp helm-external helm-net xml url url-proxy url-privacy url-expand url-methods url-history url-cookie url-domsuf url-util mailcap helm-mode helm-files helm-buffers helm-occur helm-tags helm-locate helm-grep helm-regexp format-spec helm-utils helm-help helm-types helm-config helm-easymenu async-bytecomp helm helm-source eieio-compat helm-multi-match helm-lib async paren undo-tree diff winner windmove autorevert filenotify ibuf-macs time image which-key advice projectile grep compile ibuf-ext ibuffer ibuffer-loaddefs thingatpt company-oddmuse company-keywords company-etags etags fileloop generator xref project company-gtags company-dabbrev-code company-dabbrev company-files company-capf company-cmake company-xcode company-clang company-semantic company-eclim company-template company-bbdb company pcase flycheck find-func dash gcmh comp rx cl-extra help-mode k-utils gud easy-mmode comint regexp-opt ansi-color ring edmacro kmacro slime-autoloads info tool-bar package easymenu browse-url url-handlers url-parse auth-source cl-seq eieio eieio-core cl-macs eieio-loaddefs password-cache json subr-x map url-vars seq byte-opt gv bytecomp byte-compile cconv cl-loaddefs cl-lib tooltip eldoc electric uniquify ediff-hook vc-hooks lisp-float-type tabulated-list replace newcomment text-mode elisp-mode lisp-mode prog-mode register page tab-bar menu-bar rfn-eshadow isearch timer select mouse jit-lock font-lock syntax facemenu font-core term/tty-colors frame minibuffer cl-generic cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech european ethiopic indian cyrillic chinese composite charscript charprop case-table epa-hook jka-cmpr-hook help simple abbrev obarray cl-preloaded nadvice loaddefs button faces cus-face macroexp files text-properties overlay sha1 md5 base64 format env code-pages mule custom widget hashtable-print-readable backquote threads inotify multi-tty make-network-process emacs) Memory information: ((conses 16 2475879 106891) (symbols 48 40051 2) (strings 32 181600 47924) (string-bytes 1 9741313) (vectors 16 77416) (vector-slots 8 1228210 168271) (floats 8 780 7731) (intervals 56 158344 397) (buffers 992 56)) ^ permalink raw reply [flat|nested] 14+ messages in thread
* bug#41755: feature/native-comp (master?): temacs crash in GC during mark phase 2020-06-07 19:16 bug#41755: feature/native-comp (master?): temacs crash in GC during mark phase Andrea Corallo @ 2020-06-07 19:41 ` Pip Cet 2020-06-07 19:57 ` Nicolas Bértolo 0 siblings, 1 reply; 14+ messages in thread From: Pip Cet @ 2020-06-07 19:41 UTC (permalink / raw) To: Andrea Corallo; +Cc: Nicolas Bértolo, Paul Eggert, 41755 On Sun, Jun 7, 2020 at 7:26 PM Andrea Corallo <akrl@sdf.org> wrote: > I'm experiencing non reproducible GC related crashes building > feature/native-comp. Does it happen for non-optimized builds? Also, what symbol is at Lisp_Object value 0xb4f0 (i.e. iQwhatever == 405)? > Both back-traces I've got looks similar: > > #0 0x00000000004dfe4c in symbol_marked_p (s=0xb4f0) at pdumper.h:149 > #1 mark_object (arg=<optimized out>) at alloc.c:6731 > #2 0x0000000000552390 in traverse_intervals_noorder (tree=0xffffffffe070, > function=0x4e0fe0 <mark_interval_tree_1>, arg=0x0) at intervals.c:234 > #3 0x00000000004e0060 in mark_object (arg=<optimized out>) at alloc.c:6784 > #4 0x00000000004e08ec in mark_memory (end=<optimized out>, start=<optimized out>) > at alloc.c:4860 > #5 mark_stack (bottom=<optimized out>, end=end@entry=0xffffffff71d0 "") > at alloc.c:5071 > #6 0x000000000055fd48 in mark_one_thread (thread=0x903b80 <main_thread>) > at thread.c:630 > #7 mark_threads_callback (ignore=<optimized out>) at thread.c:661 > #8 0x00000000004e1238 in garbage_collect () at alloc.c:6101 > #9 0x00000000004ff874 in maybe_gc () at lisp.h:5090 > #10 eval_sub (form=form@entry=XIL(0xcefe63)) at eval.c:2243 > #11 0x0000000000500108 in Fwhile (args=<optimized out>) at eval.c:1013 > ... > So far I was not able to reproduce on X86_64 (where I've rr). Please let us know if you manage to. > This may not be related to feature/native-comp but to one of the recent > GC changes and the stack marking strategy. Or to the live_*_p changes... ^ permalink raw reply [flat|nested] 14+ messages in thread
* bug#41755: feature/native-comp (master?): temacs crash in GC during mark phase 2020-06-07 19:41 ` Pip Cet @ 2020-06-07 19:57 ` Nicolas Bértolo 2020-06-07 20:18 ` Pip Cet 0 siblings, 1 reply; 14+ messages in thread From: Nicolas Bértolo @ 2020-06-07 19:57 UTC (permalink / raw) To: Pip Cet; +Cc: Paul Eggert, 41755, Andrea Corallo Hi, I can confirm that what I found was this issue. > Does it happen for non-optimized builds? Also, what symbol is at > Lisp_Object value 0xb4f0 (i.e. iQwhatever == 405)? I haven't been able to reproduce it in non-optimized builds. What I understand so far is that the GC begins marking the stack of the main thread and it takes some data in the stack as a pointer to valid Lisp data. It starts following all the pointers and it eventually SIGSEGVs. I have seen it crash trying to read symbols, conses and strings. ^ permalink raw reply [flat|nested] 14+ messages in thread
* bug#41755: feature/native-comp (master?): temacs crash in GC during mark phase 2020-06-07 19:57 ` Nicolas Bértolo @ 2020-06-07 20:18 ` Pip Cet 2020-06-07 23:09 ` Nicolas Bértolo 0 siblings, 1 reply; 14+ messages in thread From: Pip Cet @ 2020-06-07 20:18 UTC (permalink / raw) To: Nicolas Bértolo; +Cc: Paul Eggert, 41755, Andrea Corallo On Sun, Jun 7, 2020 at 7:58 PM Nicolas Bértolo <nicolasbertolo@gmail.com> wrote: > I can confirm that what I found was this issue. > > > Does it happen for non-optimized builds? Also, what symbol is at > > Lisp_Object value 0xb4f0 (i.e. iQwhatever == 405)? > > I haven't been able to reproduce it in non-optimized builds. But you still have last_marked in your build, right? That would be a good starting point to find out which object was marked and what was actually on the stack there... > What I understand so far is that the GC begins marking the stack of the main > thread and it takes some data in the stack as a pointer to valid Lisp data. That's my understanding as well. In Andrea's case, it looks like something was marked as though it were a symbol, but it was actually pointing back to the stack... > It > starts following all the pointers and it eventually SIGSEGVs. I have seen it > crash trying to read symbols, conses and strings. Is it always a symbol that's found on the stack by mark_maybe_*, though? ^ permalink raw reply [flat|nested] 14+ messages in thread
* bug#41755: feature/native-comp (master?): temacs crash in GC during mark phase 2020-06-07 20:18 ` Pip Cet @ 2020-06-07 23:09 ` Nicolas Bértolo 2020-06-08 3:39 ` Nicolas Bértolo 0 siblings, 1 reply; 14+ messages in thread From: Nicolas Bértolo @ 2020-06-07 23:09 UTC (permalink / raw) To: Pip Cet; +Cc: Paul Eggert, 41755, Andrea Corallo > But you still have last_marked in your build, right? That would be a > good starting point to find out which object was marked and what was > actually on the stack there... Yes, I'll take a look there. BTW, adding #pragma GCC optimize ("-O0") to the top of alloc.c does not prevent it from crashing, so debugging could be easier. > Is it always a symbol that's found on the stack by mark_maybe_*, though? No, in this case it is a cons. 0x0000000400115a1a in cons_marked_p (c=0xfffffffc00, c@entry=0xbf07b0) at alloc.c:3899 3899 return pdumper_object_p (c) (gdb) bt #0 0x0000000400115a1a in cons_marked_p (c=0xfffffffc00, c@entry=0xbf07b0) at alloc.c:3899 #1 0x000000040011a567 in mark_object (arg=XIL(0xbf0890)) at alloc.c:6775 #2 0x00000004001125d9 in mark_interval_tree_1 (i=0x464a9b3, dummy=0x0) at alloc.c:1468 #3 0x000000040018fde4 in traverse_intervals_noorder (tree=tree@entry=0x464a9b3, function=function@entry=0x4001125b0 <mark_interval_tree_1>, arg=arg@entry=0x0) at intervals.c:234 #4 0x0000000400112619 in mark_interval_tree (i=0x464a9b3) at alloc.c:1477 #5 0x000000040011a2d4 in mark_object (arg=XIL(0x454c0b0)) at alloc.c:6629 #6 0x000000040011a5b2 in mark_object (arg=XIL(0x40061cf60), arg@entry=XIL(0x4d14553)) at alloc.c:6786 #7 0x00000004001171dd in mark_maybe_pointer (p=p@entry=0x4d14553) at alloc.c:4804 #8 0x0000000400117253 in mark_memory (start=0xbf0b30, start@entry=0xbff990, end=0xbff990, end@entry=0xbf0b30) at alloc.c:4854 #9 0x00000004001172b0 in mark_stack (bottom=0xbff990 "", end=end@entry=0xbf0b30 "0\f\277") at alloc.c:5073 #10 0x00000004001a0a71 in mark_one_thread (thread=0x400559500 <main_thread>) at thread.c:630 #11 mark_threads_callback (ignore=ignore@entry=0x0) at thread.c:661 #12 0x00000004001172fe in flush_stack_call_func1 (func=func@entry=0x4001a0a30 <mark_threads_callback>, arg=arg@entry=0x0) at alloc.c:5114 #13 0x00000004001a1c9c in flush_stack_call_func (arg=0x0, func=0x4001a0a30 <mark_threads_callback>) at lisp.h:3825 #14 mark_threads () at thread.c:668 #15 0x0000000000000000 in ?? () Backtrace stopped: previous frame inner to this frame (corrupt stack?) ^ permalink raw reply [flat|nested] 14+ messages in thread
* bug#41755: feature/native-comp (master?): temacs crash in GC during mark phase 2020-06-07 23:09 ` Nicolas Bértolo @ 2020-06-08 3:39 ` Nicolas Bértolo 2020-06-08 6:29 ` Eli Zaretskii 2020-06-08 6:41 ` Pip Cet 0 siblings, 2 replies; 14+ messages in thread From: Nicolas Bértolo @ 2020-06-08 3:39 UTC (permalink / raw) To: Pip Cet; +Cc: Paul Eggert, 41755, Andrea Corallo [-- Attachment #1: Type: text/plain, Size: 3208 bytes --] I think I found the bug. Summary: A heap-based cons points to a stack-based string. The bug was introduced by: e38678b268c * Reduce the number of files probed when finding a lisp file. The function load_charset_map_from_file () creates two AUTO_STRINGs for the suffixes it needs. It puts this into a stack-based cons that it passes as the "suffixes" argument of openp (). This function then constructs the list of "extended suffixes" that associates each suffix with the directory that needs to be inserted in the middle of the path, if any. This list is allocated in the heap. When the GC kicks in, the lists are still lying around in memory but the stack-based strings don't exist anymore. The attached patch fixes this case. There may be similar cases in other parts of the code. I'll take a look. Nico. El dom., 7 jun. 2020 a las 20:09, Nicolas Bértolo (<nicolasbertolo@gmail.com>) escribió: > > > But you still have last_marked in your build, right? That would be a > > good starting point to find out which object was marked and what was > > actually on the stack there... > > Yes, I'll take a look there. BTW, adding > > #pragma GCC optimize ("-O0") > > to the top of alloc.c does not prevent it from crashing, so debugging could be > easier. > > > Is it always a symbol that's found on the stack by mark_maybe_*, though? > > No, in this case it is a cons. > > 0x0000000400115a1a in cons_marked_p (c=0xfffffffc00, c@entry=0xbf07b0) > at alloc.c:3899 > 3899 return pdumper_object_p (c) > (gdb) bt > #0 0x0000000400115a1a in cons_marked_p (c=0xfffffffc00, > c@entry=0xbf07b0) at alloc.c:3899 > #1 0x000000040011a567 in mark_object (arg=XIL(0xbf0890)) at alloc.c:6775 > #2 0x00000004001125d9 in mark_interval_tree_1 (i=0x464a9b3, > dummy=0x0) at alloc.c:1468 > #3 0x000000040018fde4 in traverse_intervals_noorder > (tree=tree@entry=0x464a9b3, function=function@entry=0x4001125b0 > <mark_interval_tree_1>, arg=arg@entry=0x0) at intervals.c:234 > #4 0x0000000400112619 in mark_interval_tree (i=0x464a9b3) at alloc.c:1477 > #5 0x000000040011a2d4 in mark_object (arg=XIL(0x454c0b0)) at alloc.c:6629 > #6 0x000000040011a5b2 in mark_object (arg=XIL(0x40061cf60), > arg@entry=XIL(0x4d14553)) at alloc.c:6786 > #7 0x00000004001171dd in mark_maybe_pointer (p=p@entry=0x4d14553) at > alloc.c:4804 > #8 0x0000000400117253 in mark_memory (start=0xbf0b30, > start@entry=0xbff990, end=0xbff990, end@entry=0xbf0b30) at > alloc.c:4854 > #9 0x00000004001172b0 in mark_stack (bottom=0xbff990 "", > end=end@entry=0xbf0b30 "0\f\277") at alloc.c:5073 > #10 0x00000004001a0a71 in mark_one_thread (thread=0x400559500 > <main_thread>) at thread.c:630 > #11 mark_threads_callback (ignore=ignore@entry=0x0) at thread.c:661 > #12 0x00000004001172fe in flush_stack_call_func1 > (func=func@entry=0x4001a0a30 <mark_threads_callback>, > arg=arg@entry=0x0) at alloc.c:5114 > #13 0x00000004001a1c9c in flush_stack_call_func (arg=0x0, > func=0x4001a0a30 <mark_threads_callback>) at lisp.h:3825 > #14 mark_threads () at thread.c:668 > #15 0x0000000000000000 in ?? () > Backtrace stopped: previous frame inner to this frame (corrupt stack?) [-- Attachment #2: 0001-Avoid-stack-based-strings-when-calling-openp-.-Fixes.patch --] [-- Type: application/octet-stream, Size: 1242 bytes --] From 2febde53d6ed070fa033754511795b5cec5367c4 Mon Sep 17 00:00:00 2001 From: Nicolas Bertolo <nicolasbertolo@gmail.com> Date: Mon, 8 Jun 2020 00:35:30 -0300 Subject: [PATCH] Avoid stack-based strings when calling openp(). Fixes #41755. The openp() function builds a heap based list from the suffixes. If we pass stack-based strings we'll get a heap-cons pointing to a stack-based string, which will explode during GC. * src/charset.c (load_charset_map_from_file): Allocate suffixes in the heap. --- src/charset.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/charset.c b/src/charset.c index 8635aad3ed6..d9ceaea8eb2 100644 --- a/src/charset.c +++ b/src/charset.c @@ -480,9 +480,9 @@ load_charset_map_from_file (struct charset *charset, Lisp_Object mapfile, FILE *fp; struct charset_map_entries *head, *entries; int n_entries; - AUTO_STRING (map, ".map"); - AUTO_STRING (txt, ".txt"); - AUTO_LIST2 (suffixes, map, txt); + Lisp_Object map = build_string (".map"); + Lisp_Object txt = build_string (".txt"); + Lisp_Object suffixes = list2 (map, txt); ptrdiff_t count = SPECPDL_INDEX (); record_unwind_protect_nothing (); specbind (Qfile_name_handler_alist, Qnil); -- 2.25.1.windows.1 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* bug#41755: feature/native-comp (master?): temacs crash in GC during mark phase 2020-06-08 3:39 ` Nicolas Bértolo @ 2020-06-08 6:29 ` Eli Zaretskii 2020-06-08 18:24 ` Nicolas Bértolo 2020-06-08 6:41 ` Pip Cet 1 sibling, 1 reply; 14+ messages in thread From: Eli Zaretskii @ 2020-06-08 6:29 UTC (permalink / raw) To: 41755, nicolasbertolo, pipcet; +Cc: eggert, akrl On June 8, 2020 6:39:34 AM GMT+03:00, "Nicolas Bértolo" <nicolasbertolo@gmail.com> wrote: > I think I found the bug. Summary: A heap-based cons points to a > stack-based string. The bug was introduced by: > > e38678b268c * Reduce the number of files probed when finding a lisp > file. > > The function load_charset_map_from_file () creates two AUTO_STRINGs > for the > suffixes it needs. It puts this into a stack-based cons that it passes > as the > "suffixes" argument of openp (). > > This function then constructs the list of "extended suffixes" that > associates > each suffix with the directory that needs to be inserted in the middle > of the > path, if any. This list is allocated in the heap. > > When the GC kicks in, the lists are still lying around in memory but > the > stack-based strings don't exist anymore. > > The attached patch fixes this case. There may be similar cases in > other parts of > the code. I'll take a look. > > Nico. > > > El dom., 7 jun. 2020 a las 20:09, Nicolas Bértolo > (<nicolasbertolo@gmail.com>) escribió: > > > > > But you still have last_marked in your build, right? That would be > a > > > good starting point to find out which object was marked and what > was > > > actually on the stack there... > > > > Yes, I'll take a look there. BTW, adding > > > > #pragma GCC optimize ("-O0") > > > > to the top of alloc.c does not prevent it from crashing, so > debugging could be > > easier. > > > > > Is it always a symbol that's found on the stack by mark_maybe_*, > though? > > > > No, in this case it is a cons. > > > > 0x0000000400115a1a in cons_marked_p (c=0xfffffffc00, > c@entry=0xbf07b0) > > at alloc.c:3899 > > 3899 return pdumper_object_p (c) > > (gdb) bt > > #0 0x0000000400115a1a in cons_marked_p (c=0xfffffffc00, > > c@entry=0xbf07b0) at alloc.c:3899 > > #1 0x000000040011a567 in mark_object (arg=XIL(0xbf0890)) at > alloc.c:6775 > > #2 0x00000004001125d9 in mark_interval_tree_1 (i=0x464a9b3, > > dummy=0x0) at alloc.c:1468 > > #3 0x000000040018fde4 in traverse_intervals_noorder > > (tree=tree@entry=0x464a9b3, function=function@entry=0x4001125b0 > > <mark_interval_tree_1>, arg=arg@entry=0x0) at intervals.c:234 > > #4 0x0000000400112619 in mark_interval_tree (i=0x464a9b3) at > alloc.c:1477 > > #5 0x000000040011a2d4 in mark_object (arg=XIL(0x454c0b0)) at > alloc.c:6629 > > #6 0x000000040011a5b2 in mark_object (arg=XIL(0x40061cf60), > > arg@entry=XIL(0x4d14553)) at alloc.c:6786 > > #7 0x00000004001171dd in mark_maybe_pointer (p=p@entry=0x4d14553) > at > > alloc.c:4804 > > #8 0x0000000400117253 in mark_memory (start=0xbf0b30, > > start@entry=0xbff990, end=0xbff990, end@entry=0xbf0b30) at > > alloc.c:4854 > > #9 0x00000004001172b0 in mark_stack (bottom=0xbff990 "", > > end=end@entry=0xbf0b30 "0\f\277") at alloc.c:5073 > > #10 0x00000004001a0a71 in mark_one_thread (thread=0x400559500 > > <main_thread>) at thread.c:630 > > #11 mark_threads_callback (ignore=ignore@entry=0x0) at thread.c:661 > > #12 0x00000004001172fe in flush_stack_call_func1 > > (func=func@entry=0x4001a0a30 <mark_threads_callback>, > > arg=arg@entry=0x0) at alloc.c:5114 > > #13 0x00000004001a1c9c in flush_stack_call_func (arg=0x0, > > func=0x4001a0a30 <mark_threads_callback>) at lisp.h:3825 > > #14 mark_threads () at thread.c:668 > > #15 0x0000000000000000 in ?? () > > Backtrace stopped: previous frame inner to this frame (corrupt > stack?) Thanks, but are you sure the fix should be in load_charset_map_from_file and not in the new code added to openp in the offending commit? Why does the additional code in openp need to cons its list off the heap? IOW, it sounds like those additions now made openp unsafe for callers who manipulate stack-based Lisp objects. The old openp was AFAIK careful enough not to impose any constraints like that on its callers. ^ permalink raw reply [flat|nested] 14+ messages in thread
* bug#41755: feature/native-comp (master?): temacs crash in GC during mark phase 2020-06-08 6:29 ` Eli Zaretskii @ 2020-06-08 18:24 ` Nicolas Bértolo 0 siblings, 0 replies; 14+ messages in thread From: Nicolas Bértolo @ 2020-06-08 18:24 UTC (permalink / raw) To: Eli Zaretskii; +Cc: eggert, 41755, pipcet, akrl > Thanks, but are you sure the fix should be in load_charset_map_from_file and > not in the new code added to openp in the offending commit? Why does the > additional code in openp need to cons its list off the heap? Good idea. It should be easy to rework the openp to avoid putting the suffixes in a heap based list. What openp() does now is turn a list of suffixes into a list of suffixes and associated directories that need to be added to the file path. For example, it turns this: (".eln" ".elc" ".elc.gz" ".el" ".el.gz") into this: ((nil . ".eln") (comp-native-path-postfix . ".eln") (nil . ".elc") (nil . ".elc.gz") (nil . ".el") (nil . ".el.gz")) I did it this way because then iterating over it can be done using a FOR_EACH_TAIL macro. I'll change this to use two lists, one for the suffixes and another one for the directories. The code for iterating them will be just a little more complicated, but it should be safe to pass stack-based suffixes. ^ permalink raw reply [flat|nested] 14+ messages in thread
* bug#41755: feature/native-comp (master?): temacs crash in GC during mark phase 2020-06-08 3:39 ` Nicolas Bértolo 2020-06-08 6:29 ` Eli Zaretskii @ 2020-06-08 6:41 ` Pip Cet 2020-06-08 18:51 ` Nicolas Bértolo 1 sibling, 1 reply; 14+ messages in thread From: Pip Cet @ 2020-06-08 6:41 UTC (permalink / raw) To: Nicolas Bértolo; +Cc: Paul Eggert, 41755, Andrea Corallo On Mon, Jun 8, 2020 at 3:39 AM Nicolas Bértolo <nicolasbertolo@gmail.com> wrote: > I think I found the bug. Summary: A heap-based cons points to a > stack-based string. The bug was introduced by: Good catch! I'm wondering what we could do to make such bugs easier to find... > The attached patch fixes this case. There may be similar cases in other parts of > the code. I'll take a look. Would GC_CHECK_MARKED_OBJECTS have caught this? ^ permalink raw reply [flat|nested] 14+ messages in thread
* bug#41755: feature/native-comp (master?): temacs crash in GC during mark phase 2020-06-08 6:41 ` Pip Cet @ 2020-06-08 18:51 ` Nicolas Bértolo 2020-06-08 19:05 ` Pip Cet 0 siblings, 1 reply; 14+ messages in thread From: Nicolas Bértolo @ 2020-06-08 18:51 UTC (permalink / raw) To: Pip Cet; +Cc: Paul Eggert, 41755, Andrea Corallo > I'm wondering what we could do to make such bugs easier to find... We could add a canary to stack based strings and conses. Then while marking if we come across a stack based string or cons we check that the canary is intact. If it is not, then we can be sure that the memory has been written over. Something like this: struct Stack_String { struct Lisp_String string; uint64_t canary = 0x12341234; }; > Would GC_CHECK_MARKED_OBJECTS have caught this? As far as I can see, during a GC we can't know if a stack-based string is still alive. ^ permalink raw reply [flat|nested] 14+ messages in thread
* bug#41755: feature/native-comp (master?): temacs crash in GC during mark phase 2020-06-08 18:51 ` Nicolas Bértolo @ 2020-06-08 19:05 ` Pip Cet 2020-06-09 14:20 ` Nicolas Bértolo 0 siblings, 1 reply; 14+ messages in thread From: Pip Cet @ 2020-06-08 19:05 UTC (permalink / raw) To: Nicolas Bértolo; +Cc: Paul Eggert, 41755, Andrea Corallo Nicolas Bértolo <nicolasbertolo@gmail.com> writes: >> I'm wondering what we could do to make such bugs easier to find... > > We could add a canary to stack based strings and conses. Then while > marking if we > come across a stack based string or cons we check that the canary is > intact. If > it is not, then we can be sure that the memory has been written over. I believe we should never be marking stack-based objects. If we do that's a GC bug. Code like AUTO_STRING (s, "foo"); Lisp_Object c = Fcons (s, s); garbage_collect (); ... Fsetcar (c, Qnil); Fsetcdr (c, Qnil); shouldn't work. I hope it doesn't :-) (With GC_CHECK_MARKED_OBJECTS, it should abort; without, it would leave the mark bit of s set, so the "..." code would presumably crash). > Something like this: > > struct Stack_String > { > struct Lisp_String string; > uint64_t canary = 0x12341234; > }; > >> Would GC_CHECK_MARKED_OBJECTS have caught this? > > As far as I can see, during a GC we can't know if a stack-based string > is still alive. But we can know whether a string is stack-based or not; if it is, we shouldn't be marking it, so we can abort in that case... ^ permalink raw reply [flat|nested] 14+ messages in thread
* bug#41755: feature/native-comp (master?): temacs crash in GC during mark phase 2020-06-08 19:05 ` Pip Cet @ 2020-06-09 14:20 ` Nicolas Bértolo 2020-06-10 12:53 ` Andrea Corallo 0 siblings, 1 reply; 14+ messages in thread From: Nicolas Bértolo @ 2020-06-09 14:20 UTC (permalink / raw) To: 41755; +Cc: Paul Eggert, Pip Cet, Andrea Corallo [-- Attachment #1: Type: text/plain, Size: 328 bytes --] Hi, Here's a new patch that copies the suffixes before building the heap-based list in openp. I know this is not the solution I proposed, but I couldn't adapt the code without increasing its complexity way too much for my liking. If you think this is not an appropriate solution I will come up with another one. Thanks, Nico. [-- Attachment #2: 0001-Copy-suffixes-passed-to-openp-to-avoid-GC-crashes.-F.patch --] [-- Type: application/octet-stream, Size: 2209 bytes --] From 4b4fd526abe124c8a74bfa11209dd53c3a564cc7 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Nicol=C3=A1s=20B=C3=A9rtolo?= <nicolasbertolo@gmail.com> Date: Mon, 8 Jun 2020 22:01:25 -0300 Subject: [PATCH] Copy suffixes passed to 'openp' to avoid GC crashes. Fixes bug#41755 In openp_add_middle_dir_to_suffixes we build a heap-based list from the passed suffixes. It is crucial that we don't create a heap-based cons that points to a stack-based list. * src/lread.c (openp_add_middle_dir_to_suffixes): Copy suffixes when building a list of middle-dirs and suffixes. --- src/lread.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/src/lread.c b/src/lread.c index c127d32eb17..65d84462c4c 100644 --- a/src/lread.c +++ b/src/lread.c @@ -1635,21 +1635,27 @@ openp_add_middle_dir_to_suffixes (Lisp_Object suffixes) Lisp_Object extended_suf = Qnil; FOR_EACH_TAIL_SAFE (tail) { -#ifdef HAVE_NATIVE_COMP + /* suffixes may be a stack-based cons pointing to stack-based + strings. We must copy the suffix if we are putting it into + a heap-based cons to avoid a dangling reference. This would + lead to crashes during the GC. */ CHECK_STRING_CAR (tail); char * suf = SSDATA (XCAR (tail)); + Lisp_Object copied_suffix = build_string (suf); +#ifdef HAVE_NATIVE_COMP if (strcmp (NATIVE_ELISP_SUFFIX, suf) == 0) { CHECK_STRING (Vcomp_native_path_postfix); /* Here we add them in the opposite order so that nreverse corrects it. */ - extended_suf = Fcons (Fcons (Qnil, XCAR (tail)), extended_suf); - extended_suf = Fcons (Fcons (Vcomp_native_path_postfix, XCAR (tail)), + extended_suf = Fcons (Fcons (Qnil, copied_suffix), extended_suf); + extended_suf = Fcons (Fcons (Vcomp_native_path_postfix, + copied_suffix), extended_suf); } else #endif - extended_suf = Fcons (Fcons (Qnil, XCAR (tail)), extended_suf); + extended_suf = Fcons (Fcons (Qnil, copied_suffix), extended_suf); } suffixes = Fnreverse (extended_suf); -- 2.25.1.windows.1 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* bug#41755: feature/native-comp (master?): temacs crash in GC during mark phase 2020-06-09 14:20 ` Nicolas Bértolo @ 2020-06-10 12:53 ` Andrea Corallo 2020-06-27 14:39 ` Andrea Corallo 0 siblings, 1 reply; 14+ messages in thread From: Andrea Corallo @ 2020-06-10 12:53 UTC (permalink / raw) To: Nicolas Bértolo; +Cc: Paul Eggert, 41755, Pip Cet Nicolas Bértolo <nicolasbertolo@gmail.com> writes: > Hi, > > Here's a new patch that copies the suffixes before building the heap-based list > in openp. I know this is not the solution I proposed, but I couldn't adapt the > code without increasing its complexity way too much for my liking. If you think > this is not an appropriate solution I will come up with another one. > > Thanks, Nico. Hi Nico, I pushed this so the branch is stable to work on for everybody and there's no rush if you want to improve your solution in the meanwhile. Thanks Andrea -- akrl@sdf.org ^ permalink raw reply [flat|nested] 14+ messages in thread
* bug#41755: feature/native-comp (master?): temacs crash in GC during mark phase 2020-06-10 12:53 ` Andrea Corallo @ 2020-06-27 14:39 ` Andrea Corallo 0 siblings, 0 replies; 14+ messages in thread From: Andrea Corallo @ 2020-06-27 14:39 UTC (permalink / raw) To: 41755-done I'm closing this bug as the issue originating the crash is fixed. Andrea ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2020-06-27 14:39 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-06-07 19:16 bug#41755: feature/native-comp (master?): temacs crash in GC during mark phase Andrea Corallo 2020-06-07 19:41 ` Pip Cet 2020-06-07 19:57 ` Nicolas Bértolo 2020-06-07 20:18 ` Pip Cet 2020-06-07 23:09 ` Nicolas Bértolo 2020-06-08 3:39 ` Nicolas Bértolo 2020-06-08 6:29 ` Eli Zaretskii 2020-06-08 18:24 ` Nicolas Bértolo 2020-06-08 6:41 ` Pip Cet 2020-06-08 18:51 ` Nicolas Bértolo 2020-06-08 19:05 ` Pip Cet 2020-06-09 14:20 ` Nicolas Bértolo 2020-06-10 12:53 ` Andrea Corallo 2020-06-27 14:39 ` Andrea Corallo
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).