* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects @ 2020-05-16 10:33 Eli Zaretskii 2020-05-16 16:33 ` Paul Eggert 2020-05-17 10:56 ` Pip Cet 0 siblings, 2 replies; 132+ messages in thread From: Eli Zaretskii @ 2020-05-16 10:33 UTC (permalink / raw) To: 41321 I don't have a reproducible recipe, unfortunately. What happens is that Emacs aborts a short time after reverting a buffer (reverted because the file it is visiting was changed on disk). So far, I've seen this in a C Mode buffer reverted because "git pull" brought a modified version, and in an Info mode buffer reverted because the manual was rebuilt after the Texinfo sources were modified. In the latter case I captured a backtrace, see below. The problem seem to involve invalid markers, perhaps markers that were unchained and put on the free list (witness the PVEC_FREE object that caused the abort in the backtrace below, where Emacs seems to be trying to display an error message about an invalid marker). I don't think I saw such problems in Emacs 27.0.90, so I walked through all the changes since then till 27.0.91 release, but didn't see anything that could explain the problem. Needless to say, this is a serious problem, so I'd like to ask everyone to please run the latest pretest under a debugger and report any similar problems with all the details they can provide. Here's the backtrace and some additional information from the session where it happened last: Thread 1 hit Breakpoint 3, 0x77c36bb3 in msvcrt!abort () from C:\WINDOWS\system32\msvcrt.dll (gdb) bt #0 0x77c36bb3 in msvcrt!abort () from C:\WINDOWS\system32\msvcrt.dll #1 0x011cfdd8 in emacs_abort () at w32fns.c:10893 #2 0x01175f3a in print_vectorlike (obj=<optimized out>, printcharfun=XIL(0x30), escapeflag=escapeflag@entry=true, buf=buf@entry=0x82f07a "") at print.c:1830 #3 0x01172055 in print_object (obj=<optimized out>, printcharfun=XIL(0x30), escapeflag=true) at print.c:2148 #4 0x01172f04 in print (obj=<optimized out>, printcharfun=<optimized out>, escapeflag=<optimized out>, escapeflag@entry=true) at print.c:1147 #5 0x01173355 in Fprin1 (object=XIL(0xa00000001c9866d8), printcharfun=<optimized out>) at print.c:653 #6 0x0117483b in print_error_message (data=<optimized out>, stream=<optimized out>, context=<optimized out>, caller=<optimized out>) at print.c:979 #7 0x010c13c5 in Fcommand_error_default_function ( data=XIL(0xc000000000ff92e0), context=XIL(0x80000000058e9118), sys_signal=XIL(0x5d72548)) at keyboard.c:1029 #8 0x0114fb99 in funcall_subr (subr=<optimized out>, numargs=<optimized out>, numargs@entry=3, args=<optimized out>, args@entry=0x82f498) at eval.c:2872 #9 0x0114d9fd in Ffuncall (nargs=4, args=0x82f490) at eval.c:2794 #10 0x0114dca3 in Fapply (nargs=<optimized out>, args=<optimized out>) at eval.c:2424 #11 0x0114d9fd in Ffuncall (nargs=3, args=args@entry=0x82f590) at eval.c:2794 #12 0x0118eaf7 in exec_byte_code (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, nargs@entry=3, args=<optimized out>, args@entry=0x82f888) at bytecode.c:633 #13 0x0115125f in funcall_lambda (fun=<optimized out>, nargs=nargs@entry=3, arg_vector=arg_vector@entry=0x82f888) at eval.c:2989 #14 0x0114d953 in Ffuncall (nargs=nargs@entry=4, args=args@entry=0x82f880) at eval.c:2808 #15 0x01151d29 in call3 (fn=XIL(0xa000000005e00b20), arg1=XIL(0xc000000000ff92e0), arg2=XIL(0x80000000058e9118), arg3=XIL(0x5d72548)) at eval.c:2668 #16 0x010c5020 in cmd_error_internal (data=XIL(0xc000000000ff92e0), context=context@entry=0x82f92e "") at keyboard.c:984 #17 0x010c51e6 in cmd_error (data=XIL(0xc000000000ff92e0)) at keyboard.c:953 #18 0x0114c952 in internal_condition_case ( bfun=bfun@entry=0x10d0a0e <command_loop_1>, handlers=XIL(0x90), hfun=hfun@entry=0x10c5049 <cmd_error>) at eval.c:1351 #19 0x010bdbda in command_loop_2 (ignore=XIL(0)) at keyboard.c:1091 #20 0x0114c8a6 in internal_catch (tag=XIL(0xdfb0), func=func@entry=0x10bdbb3 <command_loop_2>, arg=XIL(0)) at eval.c:1116 #21 0x010bdb5d in command_loop () at keyboard.c:1070 #22 0x010c4bf3 in recursive_edit_1 () at keyboard.c:714 #23 0x010c4f0c in Frecursive_edit () at keyboard.c:786 #24 0x0124a4a4 in main (argc=<optimized out>, argv=<optimized out>) at emacs.c:2054 Lisp Backtrace: "command-error-default-function" (0x82f498) "apply" (0x82f598) 0x5e00b20 PVEC_COMPILED (gdb) fr 2 #2 0x01175f3a in print_vectorlike (obj=<optimized out>, printcharfun=XIL(0x30), escapeflag=escapeflag@entry=true, buf=buf@entry=0x82f07a "") at print.c:1830 1830 emacs_abort (); (gdb) fr 7 #7 0x010c13c5 in Fcommand_error_default_function ( data=XIL(0xc000000000ff92e0), context=XIL(0x80000000058e9118), sys_signal=XIL(0x5d72548)) at keyboard.c:1029 1029 print_error_message (data, Qt, SSDATA (context), signal); (gdb) p data $1 = XIL(0xc000000000ff92e0) (gdb) xtype Lisp_Cons (gdb) xcar $2 = 0xfd80 (gdb) xtype Lisp_Symbol (gdb) xsym xsymbol xsymname (gdb) xsymbol $3 = (struct Lisp_Symbol *) 0x15d9f60 <lispsym+64896> "wrong-type-argument" (gdb) p data $4 = XIL(0xc000000000ff92e0) (gdb) xcdr $5 = 0xc000000000ff9300 (gdb) xtype Lisp_Cons (gdb) xcar $6 = 0x9810 (gdb) xtype Lisp_Symbol (gdb) xsymbol $7 = (struct Lisp_Symbol *) 0x15d39f0 <lispsym+38928> "markerp" (gdb) p data $8 = XIL(0xc000000000ff92e0) (gdb) xcdr $9 = 0xc000000000ff9300 (gdb) xcdr $10 = 0xc000000000ff9310 (gdb) xtype Lisp_Cons (gdb) xcar $11 = 0xa00000001c9866d8 (gdb) xtype Lisp_Vectorlike PVEC_FREE (gdb) fr 17 #17 0x010c51e6 in cmd_error (data=XIL(0xc000000000ff92e0)) at keyboard.c:953 953 cmd_error_internal (data, macroerror); In GNU Emacs 27.0.91 (build 1, i686-pc-mingw32) of 2020-04-18 built on HOME-C4E4A596F7 Windowing system distributor 'Microsoft Corp.', version 5.1.2600 System Description: Microsoft Windows XP Service Pack 3 (v5.1.0.2600) Recent messages: For information about GNU Emacs and the GNU system, type C-h C-a. Configured using: 'configure --prefix=/d/usr --with-wide-int --with-modules 'CFLAGS=-O2 -gdwarf-4 -g3'' Configured features: XPM JPEG TIFF GIF PNG RSVG SOUND NOTIFY W32NOTIFY ACL GNUTLS LIBXML2 HARFBUZZ ZLIB TOOLKIT_SCROLL_BARS MODULES THREADS JSON PDUMPER LCMS2 GMP Important settings: value of $LANG: ENU locale-coding-system: cp1255 Major mode: Lisp Interaction Minor modes in effect: tooltip-mode: t global-eldoc-mode: t eldoc-mode: t electric-indent-mode: t mouse-wheel-mode: t tool-bar-mode: t menu-bar-mode: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t blink-cursor-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t line-number-mode: t transient-mark-mode: t Load-path shadows: None found. Features: (shadow sort mail-extr emacsbug message rmc puny dired dired-loaddefs format-spec rfc822 mml easymenu mml-sec password-cache epa derived epg epg-config gnus-util rmail rmail-loaddefs text-property-search time-date subr-x seq byte-opt gv bytecomp byte-compile cconv mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader cl-loaddefs cl-lib sendmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr mail-utils tooltip eldoc electric uniquify ediff-hook vc-hooks lisp-float-type mwheel dos-w32 ls-lisp disp-table term/w32-win w32-win w32-vars term/common-win tool-bar dnd fontset image regexp-opt fringe tabulated-list replace newcomment text-mode elisp-mode lisp-mode prog-mode register page tab-bar menu-bar rfn-eshadow isearch timer select scroll-bar mouse jit-lock font-lock syntax facemenu font-core term/tty-colors frame minibuffer cl-generic cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech european ethiopic indian cyrillic chinese composite charscript charprop case-table epa-hook jka-cmpr-hook help simple abbrev obarray cl-preloaded nadvice loaddefs button faces cus-face macroexp files text-properties overlay sha1 md5 base64 format env code-pages mule custom widget hashtable-print-readable backquote threads w32notify w32 lcms2 multi-tty make-network-process emacs) Memory information: ((conses 16 50536 10936) (symbols 48 7172 1) (strings 16 18837 2268) (string-bytes 1 532938) (vectors 16 9527) (vector-slots 8 127687 7318) (floats 8 21 170) (intervals 40 254 84) (buffers 888 11)) ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-16 10:33 bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects Eli Zaretskii @ 2020-05-16 16:33 ` Paul Eggert 2020-05-16 16:47 ` Eli Zaretskii 2020-05-17 10:56 ` Pip Cet 1 sibling, 1 reply; 132+ messages in thread From: Paul Eggert @ 2020-05-16 16:33 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 41321 I fooled around a bit with emacs-27 on Ubuntu 18.04.4 (compiled in 32-bit mode --with-wide-int) and couldn't reproduce it. I'll keep trying. Could you give more details about the failures you observed? That might help attempts at reproducing. How did you revert your info buffer - was it by typing "M-x revert-buffer"? Are you using auto-revert-mode? That sort of thing. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-16 16:33 ` Paul Eggert @ 2020-05-16 16:47 ` Eli Zaretskii 0 siblings, 0 replies; 132+ messages in thread From: Eli Zaretskii @ 2020-05-16 16:47 UTC (permalink / raw) To: Paul Eggert; +Cc: 41321 > Cc: 41321@debbugs.gnu.org > From: Paul Eggert <eggert@cs.ucla.edu> > Date: Sat, 16 May 2020 09:33:35 -0700 > > I fooled around a bit with emacs-27 on Ubuntu 18.04.4 (compiled in 32-bit mode > --with-wide-int) and couldn't reproduce it. I'll keep trying. Yes, I didn't succeed reproducing it on purpose, either. Not sure why, maybe there's some other factor that is at work, e.g. how many markers are there in the buffer. > Could you give more details about the failures you observed? That might help > attempts at reproducing. How did you revert your info buffer - was it by typing > "M-x revert-buffer"? Are you using auto-revert-mode? That sort of thing. Just "M-x revert-buffer RET" followed by 'y'. I don't use auto-revert-mode. In the Git case, I would usually switch to a buffer visiting the file, perhaps via "M-.", and Emacs would ask me whether to re-read the file into its buffer, I'd say yes, and then I see an error about a bad marker; the next command would abort Emacs. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-16 10:33 bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects Eli Zaretskii 2020-05-16 16:33 ` Paul Eggert @ 2020-05-17 10:56 ` Pip Cet 2020-05-17 15:28 ` Eli Zaretskii 1 sibling, 1 reply; 132+ messages in thread From: Pip Cet @ 2020-05-17 10:56 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 41321 On Sat, May 16, 2020 at 10:34 AM Eli Zaretskii <eliz@gnu.org> wrote: > So far, I've seen this in a C Mode buffer reverted because "git pull" > brought a modified version, and in an Info mode buffer reverted > because the manual was rebuilt after the Texinfo sources were > modified. In the latter case I captured a backtrace, see below. > > The problem seem to involve invalid markers, perhaps markers that were > unchained and put on the free list Even unchained markers shouldn't be put on the free list as long as they're still reachable, so I suspect the problem is more likely to be caused by that. > (witness the PVEC_FREE object that > caused the abort in the backtrace below, where Emacs seems to be > trying to display an error message about an invalid marker). What I would do next is run with a breakpoint on wrong_type_argument (if that's impossible, change the code in CHECK_MARKER to abort upon encountering a PVEC_FREE vector) to see where the reference to the freed pseudovector came from. An undo list, maybe? ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-17 10:56 ` Pip Cet @ 2020-05-17 15:28 ` Eli Zaretskii 2020-05-17 15:57 ` Eli Zaretskii 0 siblings, 1 reply; 132+ messages in thread From: Eli Zaretskii @ 2020-05-17 15:28 UTC (permalink / raw) To: Pip Cet; +Cc: 41321 > From: Pip Cet <pipcet@gmail.com> > Date: Sun, 17 May 2020 10:56:28 +0000 > Cc: 41321@debbugs.gnu.org > > What I would do next is run with a breakpoint on wrong_type_argument > (if that's impossible, change the code in CHECK_MARKER to abort upon > encountering a PVEC_FREE vector) to see where the reference to the > freed pseudovector came from. An undo list, maybe? I'm already running with such a breakpoint, let's how it will catch something. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-17 15:28 ` Eli Zaretskii @ 2020-05-17 15:57 ` Eli Zaretskii 2020-05-22 7:22 ` Eli Zaretskii 0 siblings, 1 reply; 132+ messages in thread From: Eli Zaretskii @ 2020-05-17 15:57 UTC (permalink / raw) To: pipcet; +Cc: 41321 > Date: Sun, 17 May 2020 18:28:04 +0300 > From: Eli Zaretskii <eliz@gnu.org> > Cc: 41321@debbugs.gnu.org > > I'm already running with such a breakpoint, let's how it will catch > something. ^^^ Should have been "hope". Sorry. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-17 15:57 ` Eli Zaretskii @ 2020-05-22 7:22 ` Eli Zaretskii 2020-05-22 8:35 ` Andrea Corallo ` (4 more replies) 0 siblings, 5 replies; 132+ messages in thread From: Eli Zaretskii @ 2020-05-22 7:22 UTC (permalink / raw) To: pipcet, Stefan Monnier; +Cc: 41321 > Date: Sun, 17 May 2020 18:57:53 +0300 > From: Eli Zaretskii <eliz@gnu.org> > Cc: 41321@debbugs.gnu.org > > > Date: Sun, 17 May 2020 18:28:04 +0300 > > From: Eli Zaretskii <eliz@gnu.org> > > Cc: 41321@debbugs.gnu.org > > > > I'm already running with such a breakpoint, let's how it will catch > > something. ^^^ > > Should have been "hope". Sorry. It happened again, and now insert-file-contents wasn't involved, so I guess it's off the hook. The command which triggered the problem was self-insert-command, as shown in the backtrace below. The problem seems to be with handling overlays when buffer text changes. The backtrace below, as well as some tinkering with values of relevant variables, indicate that the buffer has two overlays, both of which point to invalid memory. The crash happens here: /* Now run the before-change-functions if any. */ if (!NILP (Vbefore_change_functions)) { rvoe_arg.location = &Vbefore_change_functions; rvoe_arg.errorp = 1; PRESERVE_VALUE; PRESERVE_START_END; /* Mark before-change-functions to be reset to nil in case of error. */ record_unwind_protect_ptr (reset_var_on_error, &rvoe_arg); /* Actually run the hook functions. */ CALLN (Frun_hook_with_args, Qbefore_change_functions, FETCH_START, FETCH_END); /* There was no error: unarm the reset_on_error. */ rvoe_arg.errorp = 0; } if (buffer_has_overlays ()) { PRESERVE_VALUE; report_overlay_modification (FETCH_START, FETCH_END, 0, <<<<<<<<<<<< FETCH_START, FETCH_END, Qnil); } FETCH_END calls marker-position, and that segfaults because the marker points to invalid memory, which was probably unmapped from the process address space (so I guess this is w32-specific, as GNU systems don't really return memory to the system). The start_marker is also invalid, it's just that FETCH_END is called first. Since the previous call to before-change-functions already used the same overlay markers, I suspect that the call to before-change-functions caused the memory to be unmapped (perhaps due to GC). As you see below, the value of before-change-functions is (t syntax-ppss-flush-cache) So the prime suspect is what happens when syntax-ppss-flush-cache runs, and thus I CC Stefan. The main question to answer now from my POV is how come a marker on buffer position 3116 which was valid before before-change-functions was called became invalid as result of some Lisp, in particular as result of calling before-change-functions. Here's the backtrace; ideas for further debugging are welcome. Thread 1 received signal SIGSEGV, Segmentation fault. PSEUDOVECTORP (code=<optimized out>, a=<optimized out>) at lisp.h:1720 1720 return PSEUDOVECTOR_TYPEP (XUNTAG (a, Lisp_Vectorlike, (gdb) bt #0 PSEUDOVECTORP (code=<optimized out>, a=<optimized out>) at lisp.h:1720 #1 MARKERP (x=<optimized out>) at lisp.h:2618 #2 CHECK_MARKER (x=XIL(0xa000000018ac0518)) at marker.c:133 #3 0x010f073c in Fmarker_position (marker=XIL(0xa000000018ac0518)) at marker.c:452 #4 0x010edd34 in signal_before_change (preserve_ptr=0x0, end_int=3116, start_int=3116) at insdel.c:2179 #5 prepare_to_modify_buffer_1 (start=start@entry=3116, end=end@entry=3116, preserve_ptr=preserve_ptr@entry=0x0) at insdel.c:2007 #6 0x010ee27d in prepare_to_modify_buffer (start=3116, end=3116, preserve_ptr=preserve_ptr@entry=0x0) at insdel.c:2018 #7 0x010ee54d in insert_1_both (string=string@entry=0x82ef1b "r", nchars=nchars@entry=1, nbytes=nbytes@entry=1, inherit=inherit@entry=true, prepare=prepare@entry=true, before_markers=before_markers@entry=false) at insdel.c:896 #8 0x010ef005 in insert_1_both (before_markers=false, prepare=true, inherit=true, nbytes=1, nchars=1, string=0x82ef1b "r") at insdel.c:697 #9 insert_and_inherit (string=string@entry=0x82ef1b "r", nbytes=nbytes@entry=1) at insdel.c:692 #10 0x01107160 in internal_self_insert (c=114, n=<optimized out>) at cmds.c:477 #11 0x01107804 in Fself_insert_command (n=make_fixnum(1), c=<optimized out>) at cmds.c:302 #12 0x0114fb6c in funcall_subr (subr=<optimized out>, numargs=<optimized out>, numargs@entry=2, args=<optimized out>, args@entry=0x82f120) at eval.c:2869 #13 0x0114d9fd in Ffuncall (nargs=nargs@entry=3, args=args@entry=0x82f118) at eval.c:2794 #14 0x01148f7d in Ffuncall_interactively (nargs=3, args=0x82f118) at callint.c:254 #15 0x0114d9fd in Ffuncall (nargs=4, args=0x82f110) at eval.c:2794 #16 0x0114dca3 in Fapply (nargs=<optimized out>, nargs@entry=3, args=<optimized out>, args@entry=0x82f288) at eval.c:2424 #17 0x0114aecb in Fcall_interactively (function=XIL(0x43b3350), record_flag=<optimized out>, keys=XIL(0xa000000016a31530)) at callint.c:342 #18 0x0114fb99 in funcall_subr (subr=<optimized out>, numargs=<optimized out>, numargs@entry=3, args=<optimized out>, args@entry=0x82f430) at eval.c:2872 #19 0x0114d9fd in Ffuncall (nargs=4, args=args@entry=0x82f428) at eval.c:2794 #20 0x0118eaf7 in exec_byte_code (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, nargs@entry=1, args=<optimized out>, args@entry=0x82f7b8) at bytecode.c:633 #21 0x0115125f in funcall_lambda (fun=<optimized out>, nargs=nargs@entry=1, arg_vector=arg_vector@entry=0x82f7b8) at eval.c:2989 #22 0x0114d953 in Ffuncall (nargs=nargs@entry=2, args=args@entry=0x82f7b0) at eval.c:2808 #23 0x0114db2c in call1 (fn=XIL(0x3f30), arg1=XIL(0x43b3350)) at eval.c:2654 #24 0x010d0efe in command_loop_1 () at keyboard.c:1463 #25 0x0114c91f in internal_condition_case ( bfun=bfun@entry=0x10d0a0e <command_loop_1>, handlers=XIL(0x90), hfun=hfun@entry=0x10c5049 <cmd_error>) at eval.c:1355 #26 0x010bdbda in command_loop_2 (ignore=XIL(0)) at keyboard.c:1091 #27 0x0114c8a6 in internal_catch (tag=XIL(0xdfb0), func=func@entry=0x10bdbb3 <command_loop_2>, arg=XIL(0)) at eval.c:1116 #28 0x010bdb5d in command_loop () at keyboard.c:1070 #29 0x010c4bf3 in recursive_edit_1 () at keyboard.c:714 #30 0x010c4f0c in Frecursive_edit () at keyboard.c:786 #31 0x0124a4a4 in main (argc=<optimized out>, argv=<optimized out>) at emacs.c:2054 Lisp Backtrace: "self-insert-command" (0x82f120) "funcall-interactively" (0x82f118) "call-interactively" (0x82f430) "command-execute" (0x82f7b8) (gdb) fr 3 #3 CHECK_MARKER (x=XIL(0xa000000018ac0518)) at marker.c:133 133 CHECK_TYPE (MARKERP (x), Qmarkerp, x); (gdb) up #4 0x010f073c in Fmarker_position (marker=XIL(0xa000000018ac0518)) at marker.c:452 452 CHECK_MARKER (marker); (gdb) p marker $3 = XIL(0xa000000018ac0518) (gdb) xtype Lisp_Vectorlike Cannot access memory at address 0x18ac0518 (gdb) p marker+0 $4 = -6917529027227155176 (gdb) p/x marker+0 $5 = 0xa000000018ac0518 (gdb) up #5 0x010edd34 in signal_before_change (preserve_ptr=0x0, end_int=3116, start_int=3116) at insdel.c:2179 2179 report_overlay_modification (FETCH_START, FETCH_END, 0, (gdb) p Vbefore_change_functions $6 = XIL(0xc000000018dbef20) (gdb) xtype Lisp_Cons (gdb) xcar $7 = 0x30 (gdb) xtype Lisp_Symbol (gdb) xsymbol $8 = (struct Lisp_Symbol *) 0x15ca210 <lispsym+48> "t" (gdb) p Vbefore_change_functions $9 = XIL(0xc000000018dbef20) (gdb) xcdr $10 = 0xc000000018dbf410 (gdb) xcar $11 = 0xd5c0 (gdb) xtype Lisp_Symbol (gdb) xsym xsymbol xsymname (gdb) xsymbol $12 = (struct Lisp_Symbol *) 0x15d77a0 <lispsym+54720> "syntax-ppss-flush-cache" (gdb) p Vbefore_change_functions $13 = XIL(0xc000000018dbef20) (gdb) xcdr $14 = 0xc000000018dbf410 (gdb) xcdr $15 = 0x0 (gdb) p start $16 = <optimized out> (gdb) p start_int $17 = 3116 (gdb) p end_int $18 = 3116 (gdb) p start_marker $19 = XIL(0xa000000018ac04f8) (gdb) p end_marker $20 = XIL(0xa000000018ac0518) (gdb) p start_marker $21 = XIL(0xa000000018ac04f8) (gdb) xtype Lisp_Vectorlike Cannot access memory at address 0x18ac04f8 (gdb) p preserve_ptr $22 = (ptrdiff_t *) 0x0 (gdb) p *(current_buffer->text->beg+3000) $23 = 115 's' (gdb) p *(current_buffer->text->beg+3000)@200 $24 = "sense would then\nsuggest us that the feature should be extended to other means of\ndkispaying messages in the echo a", '\000' <repeats 84 times> (gdb) p *(current_buffer->text->beg+3116) $25 = 0 '\000' (gdb) p GPT $26 = 3116 (gdb) p GPT_ADDR $27 = (unsigned char *) 0x7d80c2b "" (gdb) p current_buffer->overlays_before $28 = (struct Lisp_Overlay *) 0x170cb080 (gdb) p $28->start $29 = XIL(0xa0000000170cb040) (gdb) xtype Lisp_Vectorlike Cannot access memory at address 0x18ac04f8 (gdb) p $28->next $30 = (struct Lisp_Overlay *) 0x13050320 (gdb) p $28->next->start $31 = XIL(0xa000000016172310) (gdb) xtype Lisp_Vectorlike Cannot access memory at address 0x18ac04f8 (gdb) p current_buffer->overlays_after $32 = (struct Lisp_Overlay *) 0x0 (gdb) p $28->next->next $33 = (struct Lisp_Overlay *) 0x0 (gdb) p rvoe_arg.location $35 = (Lisp_Object *) 0x15c9298 <globals+120> (gdb) xtype Lisp_Vectorlike Cannot access memory at address 0x18ac04f8 (gdb) p rvoe_arg.errorp $36 = false ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-22 7:22 ` Eli Zaretskii @ 2020-05-22 8:35 ` Andrea Corallo 2020-05-22 11:04 ` Eli Zaretskii 2020-05-22 10:54 ` Eli Zaretskii ` (3 subsequent siblings) 4 siblings, 1 reply; 132+ messages in thread From: Andrea Corallo @ 2020-05-22 8:35 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 41321, Stefan Monnier, pipcet Eli Zaretskii <eliz@gnu.org> writes: > FETCH_END calls marker-position, and that segfaults because the marker > points to invalid memory, which was probably unmapped from the process > address space (so I guess this is w32-specific, as GNU systems don't > really return memory to the system). The start_marker is also > invalid, it's just that FETCH_END is called first. > > Since the previous call to before-change-functions already used the > same overlay markers, I suspect that the call to > before-change-functions caused the memory to be unmapped (perhaps due > to GC). As you see below, the value of before-change-functions is > > (t syntax-ppss-flush-cache) > > So the prime suspect is what happens when syntax-ppss-flush-cache > runs, and thus I CC Stefan. The main question to answer now from my > POV is how come a marker on buffer position 3116 which was valid > before before-change-functions was called became invalid as result of > some Lisp, in particular as result of calling before-change-functions. > > Here's the backtrace; ideas for further debugging are welcome. Hi Eli, I'be curious of the outcome if you had a look to your 'garbage_collect' assembly to investigate the possible relation with 41357 as suggested here https://lists.gnu.org/archive/html/bug-gnu-emacs/2020-05/msg01095.html Hope it helps Andrea -- akrl@sdf.org ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-22 8:35 ` Andrea Corallo @ 2020-05-22 11:04 ` Eli Zaretskii 2020-05-22 12:55 ` Andrea Corallo 0 siblings, 1 reply; 132+ messages in thread From: Eli Zaretskii @ 2020-05-22 11:04 UTC (permalink / raw) To: Andrea Corallo; +Cc: 41321, monnier, pipcet > From: Andrea Corallo <akrl@sdf.org> > Cc: pipcet@gmail.com, Stefan Monnier <monnier@iro.umontreal.ca>, > 41321@debbugs.gnu.org > Date: Fri, 22 May 2020 08:35:55 +0000 > > I'be curious of the outcome if you had a look to your 'garbage_collect' > assembly to investigate the possible relation with 41357 as suggested > here > https://lists.gnu.org/archive/html/bug-gnu-emacs/2020-05/msg01095.html Sorry, I'm not sure I understand what you mean by the above. Did you mean whether I disassembled garbage_collect and looked at the code? If so, the answer is NO, I didn't yet have time for that. However, given the latest findings, I now doubt even more that the issue you identified can have any relation to this problem. As seen by the backtrace I've shown in my last message, the buffer's overlay list has invalid overlay objects at the point of the crash. The 2 pointers to the overlay lists of a buffer are unconditionally marked in mark_buffer, so I don't see how problems in GC with Lisp objects in registers could interfere in this case. Am I missing something? ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-22 11:04 ` Eli Zaretskii @ 2020-05-22 12:55 ` Andrea Corallo 0 siblings, 0 replies; 132+ messages in thread From: Andrea Corallo @ 2020-05-22 12:55 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 41321, monnier, pipcet Eli Zaretskii <eliz@gnu.org> writes: >> From: Andrea Corallo <akrl@sdf.org> >> Cc: pipcet@gmail.com, Stefan Monnier <monnier@iro.umontreal.ca>, >> 41321@debbugs.gnu.org >> Date: Fri, 22 May 2020 08:35:55 +0000 >> >> I'be curious of the outcome if you had a look to your 'garbage_collect' >> assembly to investigate the possible relation with 41357 as suggested >> here >> https://lists.gnu.org/archive/html/bug-gnu-emacs/2020-05/msg01095.html > > Sorry, I'm not sure I understand what you mean by the above. Did you > mean whether I disassembled garbage_collect and looked at the code? Yes, should be quick to see if callee-save regs are pushed. > However, given the latest findings, I now doubt even more that the > issue you identified can have any relation to this problem. As seen > by the backtrace I've shown in my last message, the buffer's overlay > list has invalid overlay objects at the point of the crash. The 2 > pointers to the overlay lists of a buffer are unconditionally marked > in mark_buffer, so I don't see how problems in GC with Lisp objects in > registers could interfere in this case. Am I missing something? Not that I'm aware, I'm no expert of the piece of code you are looking at and haven't investigated into. Was just a 'cheap' idea to exclude a potential problem from the table. Andrea -- akrl@sdf.org ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-22 7:22 ` Eli Zaretskii 2020-05-22 8:35 ` Andrea Corallo @ 2020-05-22 10:54 ` Eli Zaretskii 2020-05-22 11:47 ` Pip Cet ` (2 subsequent siblings) 4 siblings, 0 replies; 132+ messages in thread From: Eli Zaretskii @ 2020-05-22 10:54 UTC (permalink / raw) To: Stefan Monnier; +Cc: 41321, pipcet > Date: Fri, 22 May 2020 10:22:56 +0300 > From: Eli Zaretskii <eliz@gnu.org> > Cc: 41321@debbugs.gnu.org > > Since the previous call to before-change-functions already used the > same overlay markers, I suspect that the call to > before-change-functions caused the memory to be unmapped (perhaps due > to GC). FTR: I am now running the 27.0.91 pretest with the patch for bug#40661 applied. It's a long shot, since the problem here is not with pointers to buffer text, but I just want to be sure I didn't rediscover a complicated way to reproduce that bug ;-) ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-22 7:22 ` Eli Zaretskii 2020-05-22 8:35 ` Andrea Corallo 2020-05-22 10:54 ` Eli Zaretskii @ 2020-05-22 11:47 ` Pip Cet 2020-05-22 12:13 ` Eli Zaretskii ` (2 more replies) 2020-05-23 23:54 ` Pip Cet 2020-05-29 10:16 ` Eli Zaretskii 4 siblings, 3 replies; 132+ messages in thread From: Pip Cet @ 2020-05-22 11:47 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 41321, Stefan Monnier On Fri, May 22, 2020 at 7:22 AM Eli Zaretskii <eliz@gnu.org> wrote: > (gdb) p current_buffer->overlays_before > $28 = (struct Lisp_Overlay *) 0x170cb080 > (gdb) p $28->start > $29 = XIL(0xa0000000170cb040) > (gdb) xtype > Lisp_Vectorlike > Cannot access memory at address 0x18ac04f8 Note that didn't try to print $29, but the original invalid marker. In particular, I believe 0x170cb040 is a pointer to a valid marker. > (gdb) p $28->next > $30 = (struct Lisp_Overlay *) 0x13050320 > (gdb) p $28->next->start > $31 = XIL(0xa000000016172310) > (gdb) xtype > Lisp_Vectorlike > Cannot access memory at address 0x18ac04f8 Same here. If you could disassemble signal_before_change, we'd know whether start_marker and end_marker live in callee-saved registers, and thus whether this is likely to be Andrea's bug. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-22 11:47 ` Pip Cet @ 2020-05-22 12:13 ` Eli Zaretskii 2020-05-22 12:39 ` Pip Cet 2020-05-22 12:32 ` Eli Zaretskii 2020-05-29 9:51 ` Eli Zaretskii 2 siblings, 1 reply; 132+ messages in thread From: Eli Zaretskii @ 2020-05-22 12:13 UTC (permalink / raw) To: Pip Cet; +Cc: 41321, monnier > From: Pip Cet <pipcet@gmail.com> > Date: Fri, 22 May 2020 11:47:03 +0000 > Cc: Stefan Monnier <monnier@iro.umontreal.ca>, 41321@debbugs.gnu.org > > On Fri, May 22, 2020 at 7:22 AM Eli Zaretskii <eliz@gnu.org> wrote: > > (gdb) p current_buffer->overlays_before > > $28 = (struct Lisp_Overlay *) 0x170cb080 > > (gdb) p $28->start > > $29 = XIL(0xa0000000170cb040) > > (gdb) xtype > > Lisp_Vectorlike > > Cannot access memory at address 0x18ac04f8 > > Note that didn't try to print $29, but the original invalid marker. Sorry, I don't follow. "xtype" shows the type of the last result, AFAIK, in this case the type of $29. If this changed somehow, either we have a bug in .gdbinit or I have been using GDB incorrectly for I don't know how many years. What am I missing? ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-22 12:13 ` Eli Zaretskii @ 2020-05-22 12:39 ` Pip Cet 2020-05-22 12:48 ` Eli Zaretskii 0 siblings, 1 reply; 132+ messages in thread From: Pip Cet @ 2020-05-22 12:39 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 41321, Stefan Monnier On Fri, May 22, 2020 at 12:13 PM Eli Zaretskii <eliz@gnu.org> wrote: > > From: Pip Cet <pipcet@gmail.com> > > Date: Fri, 22 May 2020 11:47:03 +0000 > > Cc: Stefan Monnier <monnier@iro.umontreal.ca>, 41321@debbugs.gnu.org > > > > On Fri, May 22, 2020 at 7:22 AM Eli Zaretskii <eliz@gnu.org> wrote: > > > (gdb) p current_buffer->overlays_before > > > $28 = (struct Lisp_Overlay *) 0x170cb080 > > > (gdb) p $28->start > > > $29 = XIL(0xa0000000170cb040) > > > (gdb) xtype > > > Lisp_Vectorlike > > > Cannot access memory at address 0x18ac04f8 > > > > Note that didn't try to print $29, but the original invalid marker. > > Sorry, I don't follow. "xtype" shows the type of the last result, > AFAIK, in this case the type of $29. If this changed somehow, either > we have a bug in .gdbinit or I have been using GDB incorrectly for I > don't know how many years. I think it's most likely to be a GDB bug, and I can't reproduce it here. But it's definitely trying to access memory at address 0x18ac04f8, which corresponds to start_marker. (gdb) p rvoe_arg.location $35 = (Lisp_Object *) 0x15c9298 <globals+120> (gdb) xtype Lisp_Vectorlike Cannot access memory at address 0x18ac04f8 (gdb) p rvoe_arg.errorp $36 = false Surely rvoe_arg.location isn't a vectorlike, so that also points to GDB not dealing with things correctly. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-22 12:39 ` Pip Cet @ 2020-05-22 12:48 ` Eli Zaretskii 2020-05-22 14:04 ` Pip Cet 0 siblings, 1 reply; 132+ messages in thread From: Eli Zaretskii @ 2020-05-22 12:48 UTC (permalink / raw) To: Pip Cet; +Cc: 41321, monnier > From: Pip Cet <pipcet@gmail.com> > Date: Fri, 22 May 2020 12:39:27 +0000 > Cc: Stefan Monnier <monnier@iro.umontreal.ca>, 41321@debbugs.gnu.org > > > Sorry, I don't follow. "xtype" shows the type of the last result, > > AFAIK, in this case the type of $29. If this changed somehow, either > > we have a bug in .gdbinit or I have been using GDB incorrectly for I > > don't know how many years. > > I think it's most likely to be a GDB bug, and I can't reproduce it here. > > But it's definitely trying to access memory at address 0x18ac04f8, > which corresponds to start_marker. My interpretation of that equality was that both start_marker and the buffer's overlay chain git invalidated because some code relocated objects and unmapped the previously referenced memory, perhaps due to GC. I don't yet have an explanation for how this could happen, so maybe this hypothesis is wrong. > (gdb) p rvoe_arg.location > $35 = (Lisp_Object *) 0x15c9298 <globals+120> > (gdb) xtype > Lisp_Vectorlike > Cannot access memory at address 0x18ac04f8 > (gdb) p rvoe_arg.errorp > $36 = false > > Surely rvoe_arg.location isn't a vectorlike, so that also points to > GDB not dealing with things correctly. rvoe_arg.location should be a pointer to the value of before-change-functions, so yes, it isn't supposed to be vectorlike. But I very much doubt there's such a blatant bug in GDB: this is the latest GDB 9.1, and I'm using these commands from .gdbinit all the time. I tend to think this is somehow part of the bug that caused the crash. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-22 12:48 ` Eli Zaretskii @ 2020-05-22 14:04 ` Pip Cet 2020-05-22 14:26 ` Eli Zaretskii 0 siblings, 1 reply; 132+ messages in thread From: Pip Cet @ 2020-05-22 14:04 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 41321, Stefan Monnier On Fri, May 22, 2020 at 12:48 PM Eli Zaretskii <eliz@gnu.org> wrote: > > From: Pip Cet <pipcet@gmail.com> > > Date: Fri, 22 May 2020 12:39:27 +0000 > > Cc: Stefan Monnier <monnier@iro.umontreal.ca>, 41321@debbugs.gnu.org > > > > > Sorry, I don't follow. "xtype" shows the type of the last result, > > > AFAIK, in this case the type of $29. If this changed somehow, either > > > we have a bug in .gdbinit or I have been using GDB incorrectly for I > > > don't know how many years. > > > > I think it's most likely to be a GDB bug, and I can't reproduce it here. > > > > But it's definitely trying to access memory at address 0x18ac04f8, > > which corresponds to start_marker. > > My interpretation of that equality was that both start_marker and the > buffer's overlay chain git invalidated because some code relocated > objects and unmapped the previously referenced memory, perhaps due to > GC. I don't yet have an explanation for how this could happen, so > maybe this hypothesis is wrong. I think it has to be, because the error message would then read "Cannot access memory at address 0x170cb040", which is the only address xvectype is supposed to look at. > But I very much doubt there's such a blatant bug in GDB: this is the > latest GDB 9.1, and I'm using these commands from .gdbinit all the > time. I tend to think this is somehow part of the bug that caused the > crash. I'm not sure how it could be. I don't think posting the disassembled code for `signal_before_change' can hurt, since there's no easy way for anyone else to reproduce it. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-22 14:04 ` Pip Cet @ 2020-05-22 14:26 ` Eli Zaretskii 2020-05-22 14:40 ` Andrea Corallo 0 siblings, 1 reply; 132+ messages in thread From: Eli Zaretskii @ 2020-05-22 14:26 UTC (permalink / raw) To: Pip Cet; +Cc: 41321, monnier > From: Pip Cet <pipcet@gmail.com> > Date: Fri, 22 May 2020 14:04:03 +0000 > Cc: Stefan Monnier <monnier@iro.umontreal.ca>, 41321@debbugs.gnu.org > > I don't think posting the disassembled code for > `signal_before_change' can hurt, since there's no easy way for > anyone else to reproduce it. I see this on two different systems where Emacs was compiled with two different versions of GCC. So if you want to see the disassembly, any 32-bit GCC will do, I think. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-22 14:26 ` Eli Zaretskii @ 2020-05-22 14:40 ` Andrea Corallo 2020-05-22 19:03 ` Eli Zaretskii 0 siblings, 1 reply; 132+ messages in thread From: Andrea Corallo @ 2020-05-22 14:40 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 41321, monnier, Pip Cet Eli Zaretskii <eliz@gnu.org> writes: >> From: Pip Cet <pipcet@gmail.com> >> Date: Fri, 22 May 2020 14:04:03 +0000 >> Cc: Stefan Monnier <monnier@iro.umontreal.ca>, 41321@debbugs.gnu.org >> >> I don't think posting the disassembled code for >> `signal_before_change' can hurt, since there's no easy way for >> anyone else to reproduce it. > > I see this on two different systems where Emacs was compiled with two > different versions of GCC. So if you want to see the disassembly, any > 32-bit GCC will do, I think. I believe the triplet can make a difference given the calling convention can change no? Also CFLAGS are clearly a factor. -- akrl@sdf.org ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-22 14:40 ` Andrea Corallo @ 2020-05-22 19:03 ` Eli Zaretskii [not found] ` <CAOqdjBdpU4U1NqErNH0idBmUxNeE3fL=2=KKpo9kbCM3DhW5gA@mail.gmail.com> 0 siblings, 1 reply; 132+ messages in thread From: Eli Zaretskii @ 2020-05-22 19:03 UTC (permalink / raw) To: Andrea Corallo; +Cc: 41321, monnier, pipcet > From: Andrea Corallo <akrl@sdf.org> > Cc: Pip Cet <pipcet@gmail.com>, 41321@debbugs.gnu.org, > monnier@iro.umontreal.ca > Date: Fri, 22 May 2020 14:40:05 +0000 > > > I see this on two different systems where Emacs was compiled with two > > different versions of GCC. So if you want to see the disassembly, any > > 32-bit GCC will do, I think. > > I believe the triplet can make a difference given the calling convention > can change no? Also CFLAGS are clearly a factor. My CFLAGS are in my original report of this bug. ^ permalink raw reply [flat|nested] 132+ messages in thread
[parent not found: <CAOqdjBdpU4U1NqErNH0idBmUxNeE3fL=2=KKpo9kbCM3DhW5gA@mail.gmail.com>]
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects [not found] ` <CAOqdjBdpU4U1NqErNH0idBmUxNeE3fL=2=KKpo9kbCM3DhW5gA@mail.gmail.com> @ 2020-05-23 17:58 ` Andrea Corallo 2020-05-23 22:37 ` Stefan Monnier 0 siblings, 1 reply; 132+ messages in thread From: Andrea Corallo @ 2020-05-23 17:58 UTC (permalink / raw) To: Pip Cet; +Cc: 41321, Stefan Monnier Pip Cet <pipcet@gmail.com> writes: > I believe this isn't the problem we're looking for, but it might be > related anyway. > > I'm seeing this in the assembler source code for insdel.c produced > with the mingw cross compiler (i686-w64-mingw32-gcc-win32): > > movl 60(%esp), %eax > movl %eax, (%esp) > movl 72(%esp), %eax > movl %eax, 4(%esp) > call _Fmarker_position > If I'm reading this correctly, it's of some concern for wide-int > builds: the two 32-bit halves of a Lisp_Object are stored > non-consecutively. > > Our stack marking doesn't catch that; at least, it doesn't for > symbols, where the less-significant half isn't a valid pointer. For > pseudovectors, things should still work... > > So I think we have a problem with such --wide-int builds in cases > where a stack temporary holds an unpinned uninterned symbol while GC > is called. Something like > > (prog1 > (gensym) > (garbage-collect)) > > might trigger it. No problem with gcc -m32 on GNU/Linux, for some reason. Very interesting. AFAIK there's no guarantees for the compiler to spill a DI reg in adjacent memory. Also reading the GC code your observation seems correct to me. -- akrl@sdf.org ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-23 17:58 ` Andrea Corallo @ 2020-05-23 22:37 ` Stefan Monnier 2020-05-23 22:41 ` Pip Cet 0 siblings, 1 reply; 132+ messages in thread From: Stefan Monnier @ 2020-05-23 22:37 UTC (permalink / raw) To: Andrea Corallo; +Cc: 41321, Pip Cet >> If I'm reading this correctly, it's of some concern for wide-int >> builds: the two 32-bit halves of a Lisp_Object are stored >> non-consecutively. This shouldn't be a problem: wide-int builds use MSB tagging, so all Lisp_Objects which contain a pointer have their lowest 32bits exactly identical to that pointer (and the higher 32bits just contain the tag). So we'll find them in the stack even if the two halves are separate simply because the pointer-part will be found like any other pointer. Stefan ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-23 22:37 ` Stefan Monnier @ 2020-05-23 22:41 ` Pip Cet 2020-05-23 23:26 ` Stefan Monnier 0 siblings, 1 reply; 132+ messages in thread From: Pip Cet @ 2020-05-23 22:41 UTC (permalink / raw) To: Stefan Monnier; +Cc: 41321, Andrea Corallo On Sat, May 23, 2020 at 10:38 PM Stefan Monnier <monnier@iro.umontreal.ca> wrote: > >> If I'm reading this correctly, it's of some concern for wide-int > >> builds: the two 32-bit halves of a Lisp_Object are stored > >> non-consecutively. > > This shouldn't be a problem: wide-int builds use MSB tagging, so all > Lisp_Objects which contain a pointer have their lowest 32bits exactly > identical to that pointer (and the higher 32bits just contain the tag). As I said, I don't believe that's true for symbols. Qnil is always binary 0, so we offset all symbols by the offset of lispsym. > So we'll find them in the stack even if the two halves are separate > simply because the pointer-part will be found like any other pointer. Yes, that's what I meant to say when I said it should still work for pseudovectors. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-23 22:41 ` Pip Cet @ 2020-05-23 23:26 ` Stefan Monnier 0 siblings, 0 replies; 132+ messages in thread From: Stefan Monnier @ 2020-05-23 23:26 UTC (permalink / raw) To: Pip Cet; +Cc: 41321, Andrea Corallo >> This shouldn't be a problem: wide-int builds use MSB tagging, so all >> Lisp_Objects which contain a pointer have their lowest 32bits exactly >> identical to that pointer (and the higher 32bits just contain the tag). > As I said, I don't believe that's true for symbols. Qnil is always > binary 0, so we offset all symbols by the offset of lispsym. Oh, right, good point: I had completely forgotten about that "detail". We should probably adjust our conservative stack scanning accordingly. Stefan ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-22 11:47 ` Pip Cet 2020-05-22 12:13 ` Eli Zaretskii @ 2020-05-22 12:32 ` Eli Zaretskii 2020-05-29 9:51 ` Eli Zaretskii 2 siblings, 0 replies; 132+ messages in thread From: Eli Zaretskii @ 2020-05-22 12:32 UTC (permalink / raw) To: Pip Cet; +Cc: 41321, monnier > From: Pip Cet <pipcet@gmail.com> > Date: Fri, 22 May 2020 11:47:03 +0000 > Cc: Stefan Monnier <monnier@iro.umontreal.ca>, 41321@debbugs.gnu.org > > On Fri, May 22, 2020 at 7:22 AM Eli Zaretskii <eliz@gnu.org> wrote: > > (gdb) p current_buffer->overlays_before > > $28 = (struct Lisp_Overlay *) 0x170cb080 > > (gdb) p $28->start > > $29 = XIL(0xa0000000170cb040) > > (gdb) xtype > > Lisp_Vectorlike > > Cannot access memory at address 0x18ac04f8 > > Note that didn't try to print $29, but the original invalid marker. In > particular, I believe 0x170cb040 is a pointer to a valid marker. > > > (gdb) p $28->next > > $30 = (struct Lisp_Overlay *) 0x13050320 > > (gdb) p $28->next->start > > $31 = XIL(0xa000000016172310) > > (gdb) xtype > > Lisp_Vectorlike > > Cannot access memory at address 0x18ac04f8 > > Same here. > > If you could disassemble signal_before_change, we'd know whether > start_marker and end_marker live in callee-saved registers, and thus > whether this is likely to be Andrea's bug. Since $28 is neither start_marker nor end_marker, but the first overlay on the buffer's overlay chain, how could it be affected by whether start_marker or end_marker are in a callee-saved register? What am I missing here? ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-22 11:47 ` Pip Cet 2020-05-22 12:13 ` Eli Zaretskii 2020-05-22 12:32 ` Eli Zaretskii @ 2020-05-29 9:51 ` Eli Zaretskii 2020-05-29 10:00 ` Pip Cet 2 siblings, 1 reply; 132+ messages in thread From: Eli Zaretskii @ 2020-05-29 9:51 UTC (permalink / raw) To: Pip Cet; +Cc: 41321, monnier > From: Pip Cet <pipcet@gmail.com> > Date: Fri, 22 May 2020 11:47:03 +0000 > Cc: Stefan Monnier <monnier@iro.umontreal.ca>, 41321@debbugs.gnu.org > > If you could disassemble signal_before_change, we'd know whether > start_marker and end_marker live in callee-saved registers, and thus > whether this is likely to be Andrea's bug. signal_before_change cannot be disassembled because it's inlined. Diassemblying its caller, prepare_to_modify_buffer_1, seems to indicate that start_marker and end_marker are pushed onto the stack when they are returned by copy-marker, and taken from there when we later call marker-position (which segfaults): 2163 PRESERVE_START_END; 0x010ed99e <+834>: mov 0x58(%esp),%eax 0x010ed9a2 <+838>: or 0x4c(%esp),%eax 0x010ed9a6 <+842>: je 0x10edd77 <prepare_to_modify_buffer_1+1819> 0x010ed9ac <+848>: mov 0x44(%esp),%ecx 0x010ed9b0 <+852>: or 0x38(%esp),%ecx 0x010ed9b4 <+856>: je 0x10edf90 <prepare_to_modify_buffer_1+2356> 0x010edd77 <+1819>: movl $0x0,0x8(%esp) 0x010edd7f <+1827>: movl $0x0,0xc(%esp) 0x010edd87 <+1835>: mov 0x50(%esp),%eax 0x010edd8b <+1839>: mov 0x54(%esp),%edx 0x010edd8f <+1843>: mov %eax,(%esp) 0x010edd92 <+1846>: mov %edx,0x4(%esp) 0x010edd96 <+1850>: call 0x10f15a5 <Fcopy_marker> 0x010edd9b <+1855>: mov %eax,0x4c(%esp) <<<<<<<<<<<<<<<<<<<<< 0x010edd9f <+1859>: mov %edx,0x58(%esp) <<<<<<<<<<<<<<<<<<<<< 0x010edda3 <+1863>: mov 0x44(%esp),%ecx 0x010edda7 <+1867>: or 0x38(%esp),%ecx 0x010eddab <+1871>: jne 0x10ede59 <prepare_to_modify_buffer_1+2045> 0x010eddb1 <+1877>: movl $0x0,0x8(%esp) 0x010eddb9 <+1885>: movl $0x0,0xc(%esp) 0x010eddc1 <+1893>: mov %esi,(%esp) 0x010eddc4 <+1896>: mov %edi,0x4(%esp) 0x010eddc8 <+1900>: call 0x10f15a5 <Fcopy_marker> 0x010eddcd <+1905>: mov %eax,0x38(%esp) <<<<<<<<<<<<<<<<<<<< 0x010eddd1 <+1909>: mov %edx,0x44(%esp) <<<<<<<<<<<<<<<<<<<< 0x010edf90 <+2356>: movl $0x0,0x8(%esp) 0x010edf98 <+2364>: movl $0x0,0xc(%esp) 0x010edfa0 <+2372>: mov %esi,(%esp) 0x010edfa3 <+2375>: mov %edi,0x4(%esp) 0x010edfa7 <+2379>: call 0x10f15a5 <Fcopy_marker> 0x010edfac <+2384>: mov %eax,0x38(%esp) 0x010edfb0 <+2388>: mov %edx,0x44(%esp) [...] 2179 report_overlay_modification (FETCH_START, FETCH_END, 0, 0x010eda5f <+1027>: mov 0x44(%esp),%eax 0x010eda63 <+1031>: or 0x38(%esp),%eax 0x010eda67 <+1035>: jne 0x10edd20 <prepare_to_modify_buffer_1+1732> 0x010eda6d <+1041>: mov 0x58(%esp),%ecx 0x010eda71 <+1045>: or 0x4c(%esp),%ecx 0x010eda75 <+1049>: jne 0x10edf1e <prepare_to_modify_buffer_1+2242> 0x010eda7b <+1055>: mov %esi,0x68(%esp) 0x010eda7f <+1059>: mov %edi,0x6c(%esp) 0x010eda83 <+1063>: mov 0x50(%esp),%eax 0x010eda87 <+1067>: mov 0x54(%esp),%edx 0x010eda8b <+1071>: mov %eax,0x60(%esp) 0x010eda8f <+1075>: mov %edx,0x64(%esp) 0x010eda93 <+1079>: movl $0x0,0x24(%esp) 0x010eda9b <+1087>: movl $0x0,0x28(%esp) 0x010edaa3 <+1095>: mov 0x68(%esp),%eax 0x010edaa7 <+1099>: mov 0x6c(%esp),%edx 0x010edaab <+1103>: mov %eax,0x1c(%esp) 0x010edaaf <+1107>: mov %edx,0x20(%esp) 0x010edab3 <+1111>: mov 0x60(%esp),%eax 0x010edab7 <+1115>: mov 0x64(%esp),%edx 0x010edabb <+1119>: mov %eax,0x14(%esp) 0x010edabf <+1123>: mov %edx,0x18(%esp) 0x010edac3 <+1127>: movl $0x0,0x10(%esp) 0x010edacb <+1135>: mov %esi,0x8(%esp) 0x010edacf <+1139>: mov %edi,0xc(%esp) 0x010edad3 <+1143>: mov 0x50(%esp),%eax 0x010edad7 <+1147>: mov 0x54(%esp),%edx 0x010edadb <+1151>: mov %eax,(%esp) 0x010edade <+1154>: mov %edx,0x4(%esp) 0x010edae2 <+1158>: call 0x10e76ea <report_overlay_modification> 0x010edd20 <+1732>: mov 0x38(%esp),%eax <<<<<<<<<<<<<<<<<<<<<< 0x010edd24 <+1736>: mov %eax,(%esp) 0x010edd27 <+1739>: mov 0x44(%esp),%eax <<<<<<<<<<<<<<<<<<<<<< 0x010edd2b <+1743>: mov %eax,0x4(%esp) 0x010edd2f <+1747>: call 0x10f072a <Fmarker_position> 0x010edd34 <+1752>: mov %eax,0x68(%esp) 0x010edd38 <+1756>: mov %edx,0x6c(%esp) 0x010edd3c <+1760>: mov 0x58(%esp),%eax 0x010edd40 <+1764>: or 0x4c(%esp),%eax 0x010edd44 <+1768>: jne 0x10edeba <prepare_to_modify_buffer_1+2142> 0x010edd4a <+1774>: mov 0x38(%esp),%eax <<<<<<<<<<<<<<<<<<<<< 0x010edd4e <+1778>: mov %eax,(%esp) 0x010edd51 <+1781>: mov 0x44(%esp),%eax <<<<<<<<<<<<<<<<<<<<< 0x010edd55 <+1785>: mov %eax,0x4(%esp) 0x010edd59 <+1789>: call 0x10f072a <Fmarker_position> 0x010edd5e <+1794>: mov %eax,%esi 0x010edd60 <+1796>: mov %edx,%edi 0x010edd62 <+1798>: mov 0x50(%esp),%eax 0x010edd66 <+1802>: mov 0x54(%esp),%edx 0x010edd6a <+1806>: mov %eax,0x60(%esp) 0x010edd6e <+1810>: mov %edx,0x64(%esp) 0x010edd72 <+1814>: jmp 0x10eda93 <prepare_to_modify_buffer_1+1079> 0x010edeba <+2142>: mov 0x4c(%esp),%eax <<<<<<<<<<<<<<<<<<<<<< 0x010edebe <+2146>: mov %eax,(%esp) 0x010edec1 <+2149>: mov 0x58(%esp),%eax <<<<<<<<<<<<<<<<<<<<<< 0x010edec5 <+2153>: mov %eax,0x4(%esp) 0x010edec9 <+2157>: call 0x10f072a <Fmarker_position> 0x010edece <+2162>: mov %eax,0x60(%esp) 0x010eded2 <+2166>: mov %edx,0x64(%esp) 0x010eded6 <+2170>: mov 0x38(%esp),%eax <<<<<<<<<<<<<<<<<<<<<< 0x010ededa <+2174>: mov %eax,(%esp) 0x010ededd <+2177>: mov 0x44(%esp),%eax <<<<<<<<<<<<<<<<<<<<<< 0x010edee1 <+2181>: mov %eax,0x4(%esp) 0x010edee5 <+2185>: call 0x10f072a <Fmarker_position> 0x010edeea <+2190>: mov %eax,%esi 0x010edeec <+2192>: mov %edx,%edi 0x010edeee <+2194>: mov 0x4c(%esp),%eax <<<<<<<<<<<<<<<<<<<<<< 0x010edef2 <+2198>: mov %eax,(%esp) 0x010edef5 <+2201>: mov 0x58(%esp),%eax <<<<<<<<<<<<<<<<<<<<<< 0x010edef9 <+2205>: mov %eax,0x4(%esp) 0x010edefd <+2209>: call 0x10f072a <Fmarker_position> 0x010edf02 <+2214>: mov %eax,0x50(%esp) 0x010edf06 <+2218>: mov %edx,0x54(%esp) 0x010edf0a <+2222>: jmp 0x10eda93 <prepare_to_modify_buffer_1+1079> 0x010edf1e <+2242>: mov 0x4c(%esp),%eax <<<<<<<<<<<<<<<<<<<<<< 0x010edf22 <+2246>: mov %eax,(%esp) 0x010edf25 <+2249>: mov 0x58(%esp),%eax <<<<<<<<<<<<<<<<<<<<<< 0x010edf29 <+2253>: mov %eax,0x4(%esp) 0x010edf2d <+2257>: call 0x10f072a <Fmarker_position> ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-29 9:51 ` Eli Zaretskii @ 2020-05-29 10:00 ` Pip Cet 0 siblings, 0 replies; 132+ messages in thread From: Pip Cet @ 2020-05-29 10:00 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 41321, Stefan Monnier On Fri, May 29, 2020 at 9:51 AM Eli Zaretskii <eliz@gnu.org> wrote: > > From: Pip Cet <pipcet@gmail.com> > > Date: Fri, 22 May 2020 11:47:03 +0000 > > Cc: Stefan Monnier <monnier@iro.umontreal.ca>, 41321@debbugs.gnu.org > > > > If you could disassemble signal_before_change, we'd know whether > > start_marker and end_marker live in callee-saved registers, and thus > > whether this is likely to be Andrea's bug. > > signal_before_change cannot be disassembled because it's inlined. Sorry. On my system, gdb does the right thing if I enter "disassemble signal_before_change". > Diassemblying its caller, prepare_to_modify_buffer_1, seems to > indicate that start_marker and end_marker are pushed onto the stack > when they are returned by copy-marker, and taken from there when we > later call marker-position (which segfaults): That's my reading as well. > 0x010edd96 <+1850>: call 0x10f15a5 <Fcopy_marker> > 0x010edd9b <+1855>: mov %eax,0x4c(%esp) <<<<<<<<<<<<<<<<<<<<< > 0x010edd9f <+1859>: mov %edx,0x58(%esp) <<<<<<<<<<<<<<<<<<<<< As you can see, the stack positions aren't consecutive: the Lisp_Object is split between bytes 0x58..5b(%esp) and bytes 0x4c..0x4f(%esp). > 0x010eddc8 <+1900>: call 0x10f15a5 <Fcopy_marker> > 0x010eddcd <+1905>: mov %eax,0x38(%esp) <<<<<<<<<<<<<<<<<<<< > 0x010eddd1 <+1909>: mov %edx,0x44(%esp) <<<<<<<<<<<<<<<<<<<< Same here. So we know (from your backtrace) these objects aren't 16-byte-aligned, and we know your GC won't mark them because they're discontinuously-stored and max_align_t has an alignment of 16 on your system. We also know the only reference to them is on the stack. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-22 7:22 ` Eli Zaretskii ` (2 preceding siblings ...) 2020-05-22 11:47 ` Pip Cet @ 2020-05-23 23:54 ` Pip Cet 2020-05-24 14:24 ` Eli Zaretskii 2020-05-29 10:16 ` Eli Zaretskii 4 siblings, 1 reply; 132+ messages in thread From: Pip Cet @ 2020-05-23 23:54 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 41321, Stefan Monnier [-- Attachment #1: Type: text/plain, Size: 1003 bytes --] On Fri, May 22, 2020 at 7:22 AM Eli Zaretskii <eliz@gnu.org> wrote: > #0 PSEUDOVECTORP (code=<optimized out>, a=<optimized out>) at lisp.h:1720 > #1 MARKERP (x=<optimized out>) at lisp.h:2618 > #2 CHECK_MARKER (x=XIL(0xa000000018ac0518)) at marker.c:133 > #3 0x010f073c in Fmarker_position (marker=XIL(0xa000000018ac0518)) > at marker.c:452 I think I've worked it out: it's this mingw bug: https://sourceforge.net/p/mingw-w64/bugs/778/ On mingw, if <stdint.h> is included before/instead of stddef.h, alignof (max_align_t) == 16. However, as can be seen by the backtrace above, Eli's malloc only returned an 8-byte-aligned block. That's not normally a problem, because mark_maybe_object doesn't care about alignment; but in conjunction with the gcc behavior change, we rely or mark_maybe_pointer to mark the pointer, and it doesn't, because the pointer is not aligned to a LISP_ALIGNMENT = 16-byte boundary. Brute-force patch attached until we can work out how to fix this properly. [-- Attachment #2: 0001-Accept-unaligned-pointers-in-maybe_lisp_pointer.patch --] [-- Type: text/x-patch, Size: 839 bytes --] From abb79bf33622b4e8407565ab8e82771b6a35945e Mon Sep 17 00:00:00 2001 From: Pip Cet <pipcet@gmail.com> Date: Sat, 23 May 2020 23:51:55 +0000 Subject: [PATCH] Accept unaligned pointers in maybe_lisp_pointer * src/alloc.c (maybe_lisp_pointer): Don't require pointers be aligned to a LISP_ALIGNMENT boundary, as this is false on mingw builds. --- src/alloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/alloc.c b/src/alloc.c index 1c6b664b22..86e81cd1f6 100644 --- a/src/alloc.c +++ b/src/alloc.c @@ -4594,7 +4594,7 @@ mark_maybe_objects (Lisp_Object const *array, ptrdiff_t nelts) static bool maybe_lisp_pointer (void *p) { - return (uintptr_t) p % LISP_ALIGNMENT == 0; + return (uintptr_t) p % GC_ALIGNMENT == 0; } /* If P points to Lisp data, mark that as live if it isn't already -- 2.27.0.rc0 ^ permalink raw reply related [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-23 23:54 ` Pip Cet @ 2020-05-24 14:24 ` Eli Zaretskii 2020-05-24 15:00 ` Pip Cet 0 siblings, 1 reply; 132+ messages in thread From: Eli Zaretskii @ 2020-05-24 14:24 UTC (permalink / raw) To: Pip Cet; +Cc: 41321, monnier > From: Pip Cet <pipcet@gmail.com> > Date: Sat, 23 May 2020 23:54:17 +0000 > Cc: Stefan Monnier <monnier@iro.umontreal.ca>, 41321@debbugs.gnu.org > > I think I've worked it out: it's this mingw bug: > https://sourceforge.net/p/mingw-w64/bugs/778/ Thank you for working on this tricky problem. FTR, I don't use that flavor of MinGW. > On mingw, if <stdint.h> is included before/instead of stddef.h, > alignof (max_align_t) == 16. The problem with the order of inclusion doesn't exist in my header files, so alignof (max_align_t) is always 16. > However, as can be seen by the backtrace > above, Eli's malloc only returned an 8-byte-aligned block. Isn't that strange? Lisp data is allocated via lmalloc, AFAIK, and lmalloc is supposed to guarantee LISP_ALIGNMENT alignment. Or am I missing something? > That's not normally a problem, because mark_maybe_object doesn't > care about alignment; but in conjunction with the gcc behavior > change, we rely or mark_maybe_pointer to mark the pointer, and it > doesn't, because the pointer is not aligned to a LISP_ALIGNMENT = > 16-byte boundary. I still very much doubt that this has anything to do with stack marking during GC, since I've shown in my backtrace that current_buffer->overlays_before points to an overlay with invalid markers. And GC always marks buffer's overlays (and thus their markers), as can be seen in mark_buffer. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-24 14:24 ` Eli Zaretskii @ 2020-05-24 15:00 ` Pip Cet 2020-05-24 16:25 ` Eli Zaretskii 2020-05-24 19:00 ` Andy Moreton 0 siblings, 2 replies; 132+ messages in thread From: Pip Cet @ 2020-05-24 15:00 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 41321, Stefan Monnier On Sun, May 24, 2020 at 2:24 PM Eli Zaretskii <eliz@gnu.org> wrote: > > From: Pip Cet <pipcet@gmail.com> > > Date: Sat, 23 May 2020 23:54:17 +0000 > > Cc: Stefan Monnier <monnier@iro.umontreal.ca>, 41321@debbugs.gnu.org > > > > I think I've worked it out: it's this mingw bug: > > https://sourceforge.net/p/mingw-w64/bugs/778/ > > Thank you for working on this tricky problem. > > FTR, I don't use that flavor of MinGW. So your flavor is even more broken than what Debian ships? That's interesting, which flavor is it? > > On mingw, if <stdint.h> is included before/instead of stddef.h, > > alignof (max_align_t) == 16. > > The problem with the order of inclusion doesn't exist in my header > files, so alignof (max_align_t) is always 16. Okay, so that is our bug. > > However, as can be seen by the backtrace > > above, Eli's malloc only returned an 8-byte-aligned block. > > Isn't that strange? Lisp data is allocated via lmalloc, AFAIK, and > lmalloc is supposed to guarantee LISP_ALIGNMENT alignment. Or am I > missing something? No, it relies on the compile-time constants and never checks. The relevant code is: enum { MALLOC_IS_LISP_ALIGNED = alignof (max_align_t) % LISP_ALIGNMENT == 0 }; static bool laligned (void *p, size_t size) { return (MALLOC_IS_LISP_ALIGNED || (intptr_t) p % LISP_ALIGNMENT == 0 || size % LISP_ALIGNMENT != 0); } ... so laligned is a constant "true" function on your machine, since alignof (max_align_t) is 16 and LISP_ALIGNMENT is 16. static void * lmalloc (size_t size, bool clearit) { #ifdef USE_ALIGNED_ALLOC if (! MALLOC_IS_LISP_ALIGNED && size % LISP_ALIGNMENT == 0) { void *p = aligned_alloc (LISP_ALIGNMENT, size); if (clearit && p) memclear (p, size); return p; } #endif while (true) { void *p = clearit ? calloc (1, size) : malloc (size); if (laligned (p, size)) return p; free (p); size_t bigger = size + LISP_ALIGNMENT; if (size < bigger) size = bigger; } } That optimizes down to returning the malloc/calloc return value directly. IOW, alloc.c relies on malloc() being max_align_t-aligned, and never checks, not even in debug builds. That's something that needs to be fixed, since broken-malloc environments such as yours exist. > > That's not normally a problem, because mark_maybe_object doesn't > > care about alignment; but in conjunction with the gcc behavior > > change, we rely or mark_maybe_pointer to mark the pointer, and it > > doesn't, because the pointer is not aligned to a LISP_ALIGNMENT = > > 16-byte boundary. > > I still very much doubt that this has anything to do with stack > marking during GC, since I've shown in my backtrace that > current_buffer->overlays_before points to an overlay with invalid > markers. You haven't. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-24 15:00 ` Pip Cet @ 2020-05-24 16:25 ` Eli Zaretskii 2020-05-24 16:55 ` Eli Zaretskii 2020-05-24 19:00 ` Andy Moreton 1 sibling, 1 reply; 132+ messages in thread From: Eli Zaretskii @ 2020-05-24 16:25 UTC (permalink / raw) To: Pip Cet; +Cc: 41321, monnier > From: Pip Cet <pipcet@gmail.com> > Date: Sun, 24 May 2020 15:00:36 +0000 > Cc: Stefan Monnier <monnier@iro.umontreal.ca>, 41321@debbugs.gnu.org > > > FTR, I don't use that flavor of MinGW. > > So your flavor is even more broken than what Debian ships? Why _more_ broken? > That's interesting, which flavor is it? mingw.org's MinGW. > > Isn't that strange? Lisp data is allocated via lmalloc, AFAIK, and > > lmalloc is supposed to guarantee LISP_ALIGNMENT alignment. Or am I > > missing something? > > No, it relies on the compile-time constants and never checks. So that is the bug to fix, no? > > I still very much doubt that this has anything to do with stack > > marking during GC, since I've shown in my backtrace that > > current_buffer->overlays_before points to an overlay with invalid > > markers. > > You haven't. Of course, I have. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-24 16:25 ` Eli Zaretskii @ 2020-05-24 16:55 ` Eli Zaretskii 2020-05-24 18:03 ` Pip Cet 0 siblings, 1 reply; 132+ messages in thread From: Eli Zaretskii @ 2020-05-24 16:55 UTC (permalink / raw) To: pipcet; +Cc: 41321, monnier > Date: Sun, 24 May 2020 19:25:14 +0300 > From: Eli Zaretskii <eliz@gnu.org> > Cc: 41321@debbugs.gnu.org, monnier@iro.umontreal.ca > > > > I still very much doubt that this has anything to do with stack > > > marking during GC, since I've shown in my backtrace that > > > current_buffer->overlays_before points to an overlay with invalid > > > markers. > > > > You haven't. > > Of course, I have. Here's how healthy overlays look in a healthy buffer: (gdb) p current_buffer->overlays_after $10 = (struct Lisp_Overlay *) 0x0 (gdb) p current_buffer->overlays_before $11 = (struct Lisp_Overlay *) 0x7728258 (gdb) p $11->start $12 = XIL(0xa000000007728218) (gdb) xtype Lisp_Vectorlike PVEC_MARKER (gdb) xmarker $13 = (struct Lisp_Marker *) 0x7728218 (gdb) p *$ $14 = { header = { size = 1124081664 }, buffer = 0x728fc38, need_adjustment = 0, insertion_type = 0, next = 0x765eae8, charpos = 13968, bytepos = 13968 } (gdb) p $11->next $15 = (struct Lisp_Overlay *) 0x0 And here's a reminder from how the same looked in the session that segfaulted: (gdb) p current_buffer->overlays_before $28 = (struct Lisp_Overlay *) 0x170cb080 (gdb) p $28->start $29 = XIL(0xa0000000170cb040) (gdb) xtype Lisp_Vectorlike Cannot access memory at address 0x18ac04f8 (gdb) p $28->next $30 = (struct Lisp_Overlay *) 0x13050320 (gdb) p $28->next->start $31 = XIL(0xa000000016172310) (gdb) xtype Lisp_Vectorlike Cannot access memory at address 0x18ac04f8 (gdb) p current_buffer->overlays_after $32 = (struct Lisp_Overlay *) 0x0 (gdb) p $28->next->next $33 = (struct Lisp_Overlay *) 0x0 If you still claim that I didn't demonstrate that the buffer's overlay chain got corrupted as part of the bug that caused the segfault, please point out what I missed here. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-24 16:55 ` Eli Zaretskii @ 2020-05-24 18:03 ` Pip Cet 2020-05-24 18:40 ` Eli Zaretskii 0 siblings, 1 reply; 132+ messages in thread From: Pip Cet @ 2020-05-24 18:03 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 41321, Stefan Monnier On Sun, May 24, 2020 at 4:55 PM Eli Zaretskii <eliz@gnu.org> wrote: > And here's a reminder from how the same looked in the session that> segfaulted: > > (gdb) p current_buffer->overlays_before > $28 = (struct Lisp_Overlay *) 0x170cb080 > (gdb) p $28->start > $29 = XIL(0xa0000000170cb040) > (gdb) xtype > Lisp_Vectorlike > Cannot access memory at address 0x18ac04f8 That should read "Cannot access memory at address 0x170cb080". It doesn't. It doesn't tell you whether the memory at page 0x170cb000 is mapped, because gdb, for whatever reason (a bug in .gdbinit, a bug in gdb, some weird command entered at the gdb prompt before the transcript started, or even, as you yourself suggested, somehow as the result of the memory corruption that caused the crash), looked in the wrong place. Instead, it tells you that the page at 0x18ac0000 isn't mapped. Which we knew. > (gdb) p $28->next > $30 = (struct Lisp_Overlay *) 0x13050320 > (gdb) p $28->next->start > $31 = XIL(0xa000000016172310) > (gdb) xtype > Lisp_Vectorlike > Cannot access memory at address 0x18ac04f8 Same here. It should read "Cannot access memory at address 0x16172310". > If you still claim that I didn't demonstrate that the buffer's overlay > chain got corrupted I do, of course. The message GDB prints simply does not say anything problematic about the buffer's overlay chain. > as part of the bug that caused the segfault, > please point out what I missed here. You omitted the third call to xtype, which was even more clearly nonsensical: xtype was misbehaving. We don't know in which way it was misbehaving. So there's no evidence either way. FWIW, running into gdb bugs is something that happens to me almost on a regular basis. There's no point reporting those, as there's generally no response. In your case, you're in an unusual environment with a rather large and complicated .gdbinit file which does very strange things to avoid running into GDB bugs that we know about. All that increases the likelihood of your encountering a gdb bug that no one else has, or that has been reported but never responded to. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-24 18:03 ` Pip Cet @ 2020-05-24 18:40 ` Eli Zaretskii 2020-05-24 19:40 ` Pip Cet 0 siblings, 1 reply; 132+ messages in thread From: Eli Zaretskii @ 2020-05-24 18:40 UTC (permalink / raw) To: Pip Cet; +Cc: 41321, monnier > From: Pip Cet <pipcet@gmail.com> > Date: Sun, 24 May 2020 18:03:57 +0000 > Cc: 41321@debbugs.gnu.org, Stefan Monnier <monnier@iro.umontreal.ca> > > > If you still claim that I didn't demonstrate that the buffer's overlay > > chain got corrupted > > I do, of course. The message GDB prints simply does not say anything > problematic about the buffer's overlay chain. > > > as part of the bug that caused the segfault, > > please point out what I missed here. > > You omitted the third call to xtype, which was even more clearly > nonsensical: xtype was misbehaving. We don't know in which way it was > misbehaving. So there's no evidence either way. > > FWIW, running into gdb bugs is something that happens to me almost on > a regular basis. There's no point reporting those, as there's > generally no response. In your case, you're in an unusual environment > with a rather large and complicated .gdbinit file which does very > strange things to avoid running into GDB bugs that we know about. All > that increases the likelihood of your encountering a gdb bug that no > one else has, or that has been reported but never responded to. I don't buy this, sorry. I use GDB every day in this very "unusual environment", both when debugging Emacs and other programs. The probability of these being due to some bug in GDB or in .gdbinit commands is very low, as I and others use them all the time. It is much more probable that the commands I've shown are signs of a real trouble in Emacs and not in GDB. I'm not willing to disregard what those commands show me because they don't match your theory. I prefer facts. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-24 18:40 ` Eli Zaretskii @ 2020-05-24 19:40 ` Pip Cet 2020-05-25 2:30 ` Eli Zaretskii 0 siblings, 1 reply; 132+ messages in thread From: Pip Cet @ 2020-05-24 19:40 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 41321, Stefan Monnier On Sun, May 24, 2020 at 6:40 PM Eli Zaretskii <eliz@gnu.org> wrote: > > From: Pip Cet <pipcet@gmail.com> > > Date: Sun, 24 May 2020 18:03:57 +0000 > > Cc: 41321@debbugs.gnu.org, Stefan Monnier <monnier@iro.umontreal.ca> > > > > > If you still claim that I didn't demonstrate that the buffer's overlay > > > chain got corrupted > > > > I do, of course. The message GDB prints simply does not say anything > > problematic about the buffer's overlay chain. > > > > > as part of the bug that caused the segfault, > > > please point out what I missed here. > > > > You omitted the third call to xtype, which was even more clearly > > nonsensical: xtype was misbehaving. We don't know in which way it was > > misbehaving. So there's no evidence either way. > > > > FWIW, running into gdb bugs is something that happens to me almost on > > a regular basis. There's no point reporting those, as there's > > generally no response. In your case, you're in an unusual environment > > with a rather large and complicated .gdbinit file which does very > > strange things to avoid running into GDB bugs that we know about. All > > that increases the likelihood of your encountering a gdb bug that no > > one else has, or that has been reported but never responded to. > > I don't buy this, sorry. So you think there's a second bug, located in Emacs, which causes GDB, which isn't supposed to be broken by anything the debuggee does, to be broken and respond in nonsensical ways? > I use GDB every day in this very "unusual > environment", both when debugging Emacs and other programs. And you've never run into GDB bugs? > The > probability of these being due to some bug in GDB or in .gdbinit > commands is very low, as I and others use them all the time. I'm perfectly willing to help you trace down this bug (in GDB or .gdbinit; we've already found the bug in mingw and the one in Emacs) if it serves any purpose, but I suspect you don't have the time. But I can't conceive of an explanation in which a bug in Emacs could cause a bug-free GDB to respond in the nonsensical way your last invocation of xtype did. > It is much more probable that the commands I've shown are signs of a real > trouble in Emacs and not in GDB. Are you saying the bug I've found isn't "a real trouble"? I'm curious as to what trouble you're imagining. > I'm not willing to disregard what > those commands show me because they don't match your theory. What they show you is that memory at a certain address, which they helpfully specify, isn't mapped. You conclude that memory at a totally different address isn't mapped, even though GDB quite explicitly never says so. That conclusion is invalid. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-24 19:40 ` Pip Cet @ 2020-05-25 2:30 ` Eli Zaretskii 2020-05-25 6:40 ` Pip Cet 0 siblings, 1 reply; 132+ messages in thread From: Eli Zaretskii @ 2020-05-25 2:30 UTC (permalink / raw) To: Pip Cet; +Cc: 41321, monnier > From: Pip Cet <pipcet@gmail.com> > Date: Sun, 24 May 2020 19:40:09 +0000 > Cc: 41321@debbugs.gnu.org, Stefan Monnier <monnier@iro.umontreal.ca> > > > I use GDB every day in this very "unusual > > environment", both when debugging Emacs and other programs. > > And you've never run into GDB bugs? Not such blatant ones, no, and not lately. > Are you saying the bug I've found isn't "a real trouble"? I'm saying I'm not convinced that problem has anything to do with this particular segfault. > What they show you is that memory at a certain address, which they > helpfully specify, isn't mapped. > > You conclude that memory at a totally different address isn't mapped, > even though GDB quite explicitly never says so. > > That conclusion is invalid. Your opinion, not mine, not yet anyway. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-25 2:30 ` Eli Zaretskii @ 2020-05-25 6:40 ` Pip Cet 2020-05-25 11:28 ` Pip Cet 2020-05-25 15:14 ` Eli Zaretskii 0 siblings, 2 replies; 132+ messages in thread From: Pip Cet @ 2020-05-25 6:40 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 41321, Stefan Monnier On Mon, May 25, 2020 at 2:30 AM Eli Zaretskii <eliz@gnu.org> wrote: > > From: Pip Cet <pipcet@gmail.com> > > Date: Sun, 24 May 2020 19:40:09 +0000 > > Cc: 41321@debbugs.gnu.org, Stefan Monnier <monnier@iro.umontreal.ca> > > What they show you is that memory at a certain address, which they > > helpfully specify, isn't mapped. > > > > You conclude that memory at a totally different address isn't mapped, > > even though GDB quite explicitly never says so. > > > > That conclusion is invalid. > Your opinion, not mine, not yet anyway. Maybe I'm approaching this the wrong way: What are you actually planning to do? I think we should work around the mingw bug on both the master and emacs-27 branches. We should also fix the (symbol-related) Emacs bug before it bites us: on both branches, unless we can get a mingw user to provide the output of "disassemble Fprog1" (and a bunch of other functions). (OTOH, we've already decided to keep crashable GC bugs on the emacs-27 branch). And we should wait and see whether similar crashes keep happening. What we should not do is encourage people to keep looking for another Emacs bug based on the existing backtraces. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-25 6:40 ` Pip Cet @ 2020-05-25 11:28 ` Pip Cet 2020-05-25 14:53 ` Eli Zaretskii 2020-05-26 3:33 ` Paul Eggert 2020-05-25 15:14 ` Eli Zaretskii 1 sibling, 2 replies; 132+ messages in thread From: Pip Cet @ 2020-05-25 11:28 UTC (permalink / raw) To: Eli Zaretskii, eggert; +Cc: 41321, Stefan Monnier On Mon, May 25, 2020 at 6:40 AM Pip Cet <pipcet@gmail.com> wrote: > We should also fix the (symbol-related) Emacs bug before it bites us: > on both branches, unless we can get a mingw user to provide the output > of "disassemble Fprog1" (and a bunch of other functions). (OTOH, we've > already decided to keep crashable GC bugs on the emacs-27 branch). And I just noticed strings aren't aligned to LISP_ALIGNMENT on x86_64-pc-linux-gnu. I think we're going to have to weaken the maybe_lisp_pointer check to check only for GC_ALIGNMENT. The commit that introduced this problem, for what it's worth, is 967d2c55ef3908fd378e05b2a0070663ae45f6de ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-25 11:28 ` Pip Cet @ 2020-05-25 14:53 ` Eli Zaretskii 2020-05-25 15:12 ` Stefan Monnier 2020-05-26 3:39 ` Paul Eggert 2020-05-26 3:33 ` Paul Eggert 1 sibling, 2 replies; 132+ messages in thread From: Eli Zaretskii @ 2020-05-25 14:53 UTC (permalink / raw) To: Pip Cet, eggert; +Cc: 41321, monnier > From: Pip Cet <pipcet@gmail.com> > Date: Mon, 25 May 2020 11:28:46 +0000 > Cc: 41321@debbugs.gnu.org, Stefan Monnier <monnier@iro.umontreal.ca> > > And I just noticed strings aren't aligned to LISP_ALIGNMENT on > x86_64-pc-linux-gnu. > > I think we're going to have to weaken the maybe_lisp_pointer check to > check only for GC_ALIGNMENT. I tend to agree. Paul, why did we move to max_align_t as the alignment requirement? AFAIU, GCC enlarged that recently to allow for _Float128 type (at least on 32-bit hosts), but do we really need that? Also, what does this mean for stack-based Lisp objects? AFAIU, we previously required 8-byte alignment on 32-bit hosts (and on MS-Windows we jump through some hoops to guarantee that in callbacks of Windows APIs and in thread functions that manipulate Lisp objects). Does the use of max_align_t means that now stack-based Lisp objects will need to have 16-byte alignment on 32-bit Windows? ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-25 14:53 ` Eli Zaretskii @ 2020-05-25 15:12 ` Stefan Monnier 2020-05-26 3:39 ` Paul Eggert 1 sibling, 0 replies; 132+ messages in thread From: Stefan Monnier @ 2020-05-25 15:12 UTC (permalink / raw) To: Eli Zaretskii; +Cc: eggert, 41321, Pip Cet >> I think we're going to have to weaken the maybe_lisp_pointer check to >> check only for GC_ALIGNMENT. Sounds about right: the only alignment we really need for Lisp_Objects is the GC_ALIGNMENT that allows us to use the 3 LSB for tags. src/alloc.c makes efforts to ensure this alignment and for some objects (e.g. Lisp_Floats as well as (on 32bit hosts) Lisp_Cons cells) that's the only alignment we can meaningfully impose since those objects are only 64bit in size. Stefan ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-25 14:53 ` Eli Zaretskii 2020-05-25 15:12 ` Stefan Monnier @ 2020-05-26 3:39 ` Paul Eggert 1 sibling, 0 replies; 132+ messages in thread From: Paul Eggert @ 2020-05-26 3:39 UTC (permalink / raw) To: Eli Zaretskii, Pip Cet; +Cc: 41321, monnier On 5/25/20 7:53 AM, Eli Zaretskii wrote: > why did we move to max_align_t as the alignment requirement? > AFAIU, GCC enlarged that recently to allow for _Float128 type (at > least on 32-bit hosts), but do we really need that? Not on current glibc on any platform that I know, no. I was merely trying to keep the code portable to platforms where (say) alignof (pthread_cond_t) == 16. POSIX allows this, and this sort of thing is likely to happen somewhere in the not-too-distant future, for performance reasons. > Does the use of max_align_t means that now stack-based Lisp objects > will need to have 16-byte alignment on 32-bit Windows? No, because we don't need to GC stack-based objects themselves (the stack will reclaim them) and the GC find everything they point to (as it scans the stack). ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-25 11:28 ` Pip Cet 2020-05-25 14:53 ` Eli Zaretskii @ 2020-05-26 3:33 ` Paul Eggert 2020-05-26 6:18 ` Pip Cet 2020-05-26 6:46 ` Paul Eggert 1 sibling, 2 replies; 132+ messages in thread From: Paul Eggert @ 2020-05-26 3:33 UTC (permalink / raw) To: Pip Cet, Eli Zaretskii; +Cc: 41321, Stefan Monnier On 5/25/20 4:28 AM, Pip Cet wrote: > And I just noticed strings aren't aligned to LISP_ALIGNMENT on > x86_64-pc-linux-gnu. Could you explain? Strings are allocated via allocate_string -> lisp_malloc -> lmalloc, and lmalloc is supposed to align to LISP_ALIGNMENT for strings just like it does for other Lisp objects. String data (struct sdata) is not Lisp-aligned, but it doesn't need to be. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-26 3:33 ` Paul Eggert @ 2020-05-26 6:18 ` Pip Cet 2020-05-26 7:51 ` Paul Eggert 2020-05-26 6:46 ` Paul Eggert 1 sibling, 1 reply; 132+ messages in thread From: Pip Cet @ 2020-05-26 6:18 UTC (permalink / raw) To: Paul Eggert; +Cc: 41321, Stefan Monnier On Tue, May 26, 2020 at 3:33 AM Paul Eggert <eggert@cs.ucla.edu> wrote: > On 5/25/20 4:28 AM, Pip Cet wrote: > > > And I just noticed strings aren't aligned to LISP_ALIGNMENT on > > x86_64-pc-linux-gnu. > > Could you explain? Strings are allocated via allocate_string -> lisp_malloc -> > lmalloc, and lmalloc is supposed to align to LISP_ALIGNMENT for strings just > like it does for other Lisp objects. Sorry. You're right, the non-aligned strings aren't relevant for GC. However, this is only because struct Lisp_String happens to have an even number of words. If someone changes that, the old code would break... We're still going to have to deal with symbols on --wide-int builds when the two halves of the wide int are saved non-consecutively. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-26 6:18 ` Pip Cet @ 2020-05-26 7:51 ` Paul Eggert 2020-05-26 8:27 ` Pip Cet 0 siblings, 1 reply; 132+ messages in thread From: Paul Eggert @ 2020-05-26 7:51 UTC (permalink / raw) To: Pip Cet; +Cc: 41321, Stefan Monnier [-- Attachment #1: Type: text/plain, Size: 872 bytes --] On 5/25/20 11:18 PM, Pip Cet wrote: > However, this is only because struct Lisp_String happens to have an > even number of words. If someone changes that, the old code would > break... No, because struct Lisp_String contains a GCALIGNED_UNION_MEMBER, so it is always GC-aligned, and (for older compilers that don't support alignas (8)) this is checked statically via 'verify (GCALIGNED (struct Lisp_String))'. Now that I've looked at it, though, I see that I forgot to do something similar with struct Lisp_Float, which has the same issue. Fixed by installing the attached patch on master. > We're still going to have to deal with symbols on --wide-int builds > when the two halves of the wide int are saved non-consecutively. Yes, I think that's the most pressing issue in this area. I will have to take a break now, though, since I have sleep and other work to do. [-- Attachment #2: 0001-Port-struct-Lisp_FLoat-to-oddball-platforms.patch --] [-- Type: text/x-patch, Size: 1203 bytes --] From cfd5e106c3c9334de93ccda0d65523476553fb1f Mon Sep 17 00:00:00 2001 From: Paul Eggert <eggert@cs.ucla.edu> Date: Tue, 26 May 2020 00:47:24 -0700 Subject: [PATCH] Port struct Lisp_FLoat to oddball platforms MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * src/lisp.h (struct Lisp_Float): Declare via GCALIGNED_UNION_MEMBER, not via GCALIGNED_STRUCT, since alloc.c creates these in arrays and GCALIGNED_STRUCT does not necessarily suffice to align struct Lisp_Float when it’s used in an array. This avoids undefined behavior on oddball machines where sizeof (struct Lisp_Float) is not a multiple of 8 and the compiler does not support __attribute__ ((aligned 8)). --- src/lisp.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/lisp.h b/src/lisp.h index 8bd83a888c..f5d581a2f1 100644 --- a/src/lisp.h +++ b/src/lisp.h @@ -2801,8 +2801,10 @@ XBUFFER_OBJFWD (lispfwd a) { double data; struct Lisp_Float *chain; + GCALIGNED_UNION_MEMBER } u; - } GCALIGNED_STRUCT; + }; +verify (GCALIGNED (struct Lisp_Float)); INLINE bool (FLOATP) (Lisp_Object x) -- 2.17.1 ^ permalink raw reply related [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-26 7:51 ` Paul Eggert @ 2020-05-26 8:27 ` Pip Cet 0 siblings, 0 replies; 132+ messages in thread From: Pip Cet @ 2020-05-26 8:27 UTC (permalink / raw) To: Paul Eggert; +Cc: 41321, Stefan Monnier On Tue, May 26, 2020 at 7:51 AM Paul Eggert <eggert@cs.ucla.edu> wrote: > On 5/25/20 11:18 PM, Pip Cet wrote: > > However, this is only because struct Lisp_String happens to have an > > even number of words. If someone changes that, the old code would > > break... > > No, because struct Lisp_String contains a GCALIGNED_UNION_MEMBER, so it is > always GC-aligned, and (for older compilers that don't support alignas (8)) this > is checked statically via 'verify (GCALIGNED (struct Lisp_String))'. As I said, this was specific to the old code, where LISP_ALIGNMENT, not GCALIGNMENT, was used by maybe_lisp_pointer. Things should be fine now (apart from the issue below)! > Now that I've looked at it, though, I see that I forgot to do something similar > with struct Lisp_Float, which has the same issue. Fixed by installing the > attached patch on master. LGTM. > > We're still going to have to deal with symbols on --wide-int builds > > when the two halves of the wide int are saved non-consecutively. > > Yes, I think that's the most pressing issue in this area. > I will have to take a > break now, though, since I have sleep and other work to do. Thanks for all the patches and comments! ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-26 3:33 ` Paul Eggert 2020-05-26 6:18 ` Pip Cet @ 2020-05-26 6:46 ` Paul Eggert 2020-05-26 15:17 ` Eli Zaretskii 1 sibling, 1 reply; 132+ messages in thread From: Paul Eggert @ 2020-05-26 6:46 UTC (permalink / raw) To: Pip Cet, Eli Zaretskii; +Cc: 41321, Stefan Monnier [-- Attachment #1: Type: text/plain, Size: 629 bytes --] On 5/25/20 8:33 PM, Paul Eggert wrote: > On 5/25/20 4:28 AM, Pip Cet wrote: > >> And I just noticed strings aren't aligned to LISP_ALIGNMENT on >> x86_64-pc-linux-gnu. > > Could you explain? Oh, never mind, I figured it out. Sorry about the noise. I installed the first attached patch to fix the bug on master (as a series of commits, the leading ones not quite right unfortunately). This patch does what you proposed, and also tightens up some of the related alignment checks. I propose the second patch for emacs-27; it's limited to what you proposed, namely, it weakens maybe_lisp_pointer to check only for GC_ALIGNMENT. [-- Attachment #2: emacs.diff --] [-- Type: text/x-patch, Size: 6907 bytes --] diff --git a/src/alloc.c b/src/alloc.c index d5a6d9167e..f8609398a3 100644 --- a/src/alloc.c +++ b/src/alloc.c @@ -104,6 +104,46 @@ Copyright (C) 1985-1986, 1988, 1993-1995, 1997-2020 Free Software #include "w32heap.h" /* for sbrk */ #endif +/* A type with alignment at least as large as any object that Emacs + allocates. This is not max_align_t because some platforms (e.g., + mingw) have buggy malloc implementations that do not align for + max_align_t. This union contains types of all GCALIGNED_STRUCT + components visible here. */ +union emacs_align_type +{ + struct frame frame; + struct Lisp_Bignum Lisp_Bignum; + struct Lisp_Bool_Vector Lisp_Bool_Vector; + struct Lisp_Char_Table Lisp_Char_Table; + struct Lisp_CondVar Lisp_CondVar; + struct Lisp_Finalizer Lisp_Finalizer; + struct Lisp_Float Lisp_Float; + struct Lisp_Hash_Table Lisp_Hash_Table; + struct Lisp_Marker Lisp_Marker; + struct Lisp_Misc_Ptr Lisp_Misc_Ptr; + struct Lisp_Mutex Lisp_Mutex; + struct Lisp_Overlay Lisp_Overlay; + struct Lisp_Sub_Char_Table Lisp_Sub_Char_Table; + struct Lisp_Subr Lisp_Subr; + struct Lisp_User_Ptr Lisp_User_Ptr; + struct Lisp_Vector Lisp_Vector; + struct terminal terminal; + struct thread_state thread_state; + struct window window; + + /* Omit the following since they would require including process.h + etc. In practice their alignments never exceed that of the + structs already listed. */ +#if 0 + struct Lisp_Module_Function Lisp_Module_Function; + struct Lisp_Process Lisp_Process; + struct save_window_data save_window_data; + struct scroll_bar scroll_bar; + struct xwidget_view xwidget_view; + struct xwidget xwidget; +#endif +}; + /* MALLOC_SIZE_NEAR (N) is a good number to pass to malloc when allocating a block of memory with size close to N bytes. For best results N should be a power of 2. @@ -112,9 +152,9 @@ Copyright (C) 1985-1986, 1988, 1993-1995, 1997-2020 Free Software adds sizeof (size_t) to SIZE for internal overhead, and then rounds up to a multiple of MALLOC_ALIGNMENT. Emacs can improve performance a bit on GNU platforms by arranging for the resulting - size to be a power of two. This heuristic is good for glibc 2.0 - (1997) through at least glibc 2.31 (2020), and does not affect - correctness on other platforms. */ + size to be a power of two. This heuristic is good for glibc 2.26 + (2017) and later, and does not affect correctness on other + platforms. */ #define MALLOC_SIZE_NEAR(n) \ (ROUNDUP (max (n, sizeof (size_t)), MALLOC_ALIGNMENT) - sizeof (size_t)) @@ -655,25 +695,19 @@ buffer_memory_full (ptrdiff_t nbytes) #define COMMON_MULTIPLE(a, b) \ ((a) % (b) == 0 ? (a) : (b) % (a) == 0 ? (b) : (a) * (b)) -/* LISP_ALIGNMENT is the alignment of Lisp objects. It must be at - least GCALIGNMENT so that pointers can be tagged. It also must be - at least as strict as the alignment of all the C types used to - implement Lisp objects; since pseudovectors can contain any C type, - this is max_align_t. On recent GNU/Linux x86 and x86-64 this can - often waste up to 8 bytes, since alignof (max_align_t) is 16 but - typical vectors need only an alignment of 8. Although shrinking - the alignment to 8 would save memory, it cost a 20% hit to Emacs - CPU performance on Fedora 28 x86-64 when compiled with gcc -m32. */ -enum { LISP_ALIGNMENT = alignof (union { max_align_t x; +/* Alignment needed for memory blocks that are allocated via malloc + and that contain Lisp objects. On typical hosts malloc already + aligns sufficiently, but extra work is needed on oddball hosts + where Emacs would crash if malloc returned a non-GCALIGNED pointer. */ +enum { LISP_ALIGNMENT = alignof (union { union emacs_align_type x; GCALIGNED_UNION_MEMBER }) }; verify (LISP_ALIGNMENT % GCALIGNMENT == 0); /* True if malloc (N) is known to return storage suitably aligned for Lisp objects whenever N is a multiple of LISP_ALIGNMENT. In practice this is true whenever alignof (max_align_t) is also a - multiple of LISP_ALIGNMENT. This works even for x86, where some - platform combinations (e.g., GCC 7 and later, glibc 2.25 and - earlier) have bugs where alignof (max_align_t) is 16 even though + multiple of LISP_ALIGNMENT. This works even for buggy platforms + like MinGW circa 2020, where alignof (max_align_t) is 16 even though the malloc alignment is only 8, and where Emacs still works because it never does anything that requires an alignment of 16. */ enum { MALLOC_IS_LISP_ALIGNED = alignof (max_align_t) % LISP_ALIGNMENT == 0 }; @@ -4657,12 +4691,12 @@ mark_maybe_objects (Lisp_Object const *array, ptrdiff_t nelts) collected, and false otherwise (i.e., false if it is easy to see that P cannot point to Lisp data that can be garbage collected). Symbols are implemented via offsets not pointers, but the offsets - are also multiples of LISP_ALIGNMENT. */ + are also multiples of GCALIGNMENT. */ static bool maybe_lisp_pointer (void *p) { - return (uintptr_t) p % LISP_ALIGNMENT == 0; + return (uintptr_t) p % GCALIGNMENT == 0; } /* If P points to Lisp data, mark that as live if it isn't already @@ -4885,9 +4919,10 @@ test_setjmp (void) as a stack scan limit. */ typedef union { - /* Align the stack top properly. Even if !HAVE___BUILTIN_UNWIND_INIT, - jmp_buf may not be aligned enough on darwin-ppc64. */ - max_align_t o; + /* Make sure stack_top and m_stack_bottom are properly aligned as GC + expects. */ + Lisp_Object o; + void *p; #ifndef HAVE___BUILTIN_UNWIND_INIT sys_jmp_buf j; char c; diff --git a/src/lisp.h b/src/lisp.h index 85bdc172b2..8bd83a888c 100644 --- a/src/lisp.h +++ b/src/lisp.h @@ -277,7 +277,8 @@ DEFINE_GDB_SYMBOL_END (VALMASK) allocation in a containing union that has GCALIGNED_UNION_MEMBER) and does not contain a GC-aligned struct or union, putting GCALIGNED_STRUCT after its closing '}' can help the compiler - generate better code. + generate better code. Also, such structs should be added to the + emacs_align_type union in alloc.c. Although these macros are reasonably portable, they are not guaranteed on non-GCC platforms, as C11 does not require support diff --git a/src/thread.c b/src/thread.c index df1a705382..b638dd77f8 100644 --- a/src/thread.c +++ b/src/thread.c @@ -717,12 +717,17 @@ run_thread (void *state) { /* Make sure stack_top and m_stack_bottom are properly aligned as GC expects. */ - max_align_t stack_pos; + union + { + Lisp_Object o; + void *p; + char c; + } stack_pos; struct thread_state *self = state; struct thread_state **iter; - self->m_stack_bottom = self->stack_top = (char *) &stack_pos; + self->m_stack_bottom = self->stack_top = &stack_pos.c; self->thread_id = sys_thread_self (); if (self->thread_name) [-- Attachment #3: 0001-Fix-aborts-due-to-GC-losing-pseudovectors.patch --] [-- Type: text/x-patch, Size: 1191 bytes --] From 7466aeeb4ac3dace283ecef00c2b38148b56b3b3 Mon Sep 17 00:00:00 2001 From: Paul Eggert <eggert@cs.ucla.edu> Date: Mon, 25 May 2020 23:39:37 -0700 Subject: [PATCH] Fix aborts due to GC losing pseudovectors Problem reported by Eli Zaretskii (Bug#41321). * src/alloc.c (maybe_lisp_pointer): Modulo GCALIGNMENT, not modulo LISP_ALIGNMENT. Master has a more-elaborate fix. Do not merge to master. --- src/alloc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/alloc.c b/src/alloc.c index 1c6b664b22..c7a4a3ee86 100644 --- a/src/alloc.c +++ b/src/alloc.c @@ -4589,12 +4589,12 @@ mark_maybe_objects (Lisp_Object const *array, ptrdiff_t nelts) collected, and false otherwise (i.e., false if it is easy to see that P cannot point to Lisp data that can be garbage collected). Symbols are implemented via offsets not pointers, but the offsets - are also multiples of LISP_ALIGNMENT. */ + are also multiples of GCALIGNMENT. */ static bool maybe_lisp_pointer (void *p) { - return (uintptr_t) p % LISP_ALIGNMENT == 0; + return (uintptr_t) p % GCALIGNMENT == 0; } /* If P points to Lisp data, mark that as live if it isn't already -- 2.17.1 ^ permalink raw reply related [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-26 6:46 ` Paul Eggert @ 2020-05-26 15:17 ` Eli Zaretskii 2020-05-26 22:49 ` Paul Eggert 0 siblings, 1 reply; 132+ messages in thread From: Eli Zaretskii @ 2020-05-26 15:17 UTC (permalink / raw) To: Paul Eggert; +Cc: 41321, monnier, pipcet > From: Paul Eggert <eggert@cs.ucla.edu> > Cc: 41321@debbugs.gnu.org, Stefan Monnier <monnier@iro.umontreal.ca> > Date: Mon, 25 May 2020 23:46:02 -0700 > > I propose the second patch for emacs-27; it's limited to what you proposed, > namely, it weakens maybe_lisp_pointer to check only for GC_ALIGNMENT. > > static bool > maybe_lisp_pointer (void *p) > { > - return (uintptr_t) p % LISP_ALIGNMENT == 0; > + return (uintptr_t) p % GCALIGNMENT == 0; > } On non-USE_LSB_TAG systems, GCALIGNMENT is 1, so this doesn't look right (or maybe I'm missing something). ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-26 15:17 ` Eli Zaretskii @ 2020-05-26 22:49 ` Paul Eggert 2020-05-27 15:26 ` Eli Zaretskii 0 siblings, 1 reply; 132+ messages in thread From: Paul Eggert @ 2020-05-26 22:49 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 41321, monnier, pipcet [-- Attachment #1: Type: text/plain, Size: 982 bytes --] On 5/26/20 8:17 AM, Eli Zaretskii wrote: >> static bool >> maybe_lisp_pointer (void *p) >> { >> - return (uintptr_t) p % LISP_ALIGNMENT == 0; >> + return (uintptr_t) p % GCALIGNMENT == 0; >> } > On non-USE_LSB_TAG systems, GCALIGNMENT is 1, so this doesn't look > right (or maybe I'm missing something). Good point; I'd neglected that. I.e., on !USE_LSB_TAG systems the proposed emacs-27 patch is overly-conservative, as it causes maybe_lisp_pointer to always return true. Although this hurts GC performance it doesn't affect correctness and the patch does fix a crash on USE_LSB_TAG systems, so it (or something like it) is needed for emacs-27. I installed the attached patch into master to fix the !USE_LSB_TAG performance issue you raised. This patch does not fix crashes; it's merely a performance tweak. I am planning on looking into related crashes for Lisp_Symbol next. Perhaps we should wait on that before worrying about what exact patch should go into emacs-27. [-- Attachment #2: 0001-Tweak-GC-performance-if-USE_LSB_TAG.patch --] [-- Type: text/x-patch, Size: 2104 bytes --] From 7a83c4f66cb945d43dcaf8c37f4af1334d34f501 Mon Sep 17 00:00:00 2001 From: Paul Eggert <eggert@cs.ucla.edu> Date: Tue, 26 May 2020 15:47:59 -0700 Subject: [PATCH] Tweak GC performance if !USE_LSB_TAG Performance issue reported by Eli Zaretskii (Bug#41321#149). * src/alloc.c (GC_OBJECT_ALIGNMENT_MINIMUM): New constant. (maybe_lisp_pointer): Use it instead of GCALIGNMENT. --- src/alloc.c | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/src/alloc.c b/src/alloc.c index f8609398a3..e241b9933a 100644 --- a/src/alloc.c +++ b/src/alloc.c @@ -4687,16 +4687,33 @@ mark_maybe_objects (Lisp_Object const *array, ptrdiff_t nelts) mark_maybe_object (*array); } +/* A lower bound on the alignment of Lisp objects that need marking. + Although 1 is safe, higher values speed up mark_maybe_pointer. + If USE_LSB_TAG, this value is typically GCALIGNMENT; otherwise, + it's determined by the natural alignment of Lisp structs. + All vectorlike objects have alignment at least that of union + vectorlike_header and it's unlikely they all have alignment greater, + so use the union as a safe and likely-accurate standin for + vectorlike objects. */ + +enum { GC_OBJECT_ALIGNMENT_MINIMUM + = max (GCALIGNMENT, + min (alignof (union vectorlike_header), + min (min (alignof (struct Lisp_Cons), + alignof (struct Lisp_Float)), + min (alignof (struct Lisp_String), + alignof (struct Lisp_Symbol))))) }; + /* Return true if P might point to Lisp data that can be garbage collected, and false otherwise (i.e., false if it is easy to see that P cannot point to Lisp data that can be garbage collected). Symbols are implemented via offsets not pointers, but the offsets - are also multiples of GCALIGNMENT. */ + are also multiples of GC_OBJECT_ALIGNMENT_MINIMUM. */ static bool maybe_lisp_pointer (void *p) { - return (uintptr_t) p % GCALIGNMENT == 0; + return (uintptr_t) p % GC_OBJECT_ALIGNMENT_MINIMUM == 0; } /* If P points to Lisp data, mark that as live if it isn't already -- 2.17.1 ^ permalink raw reply related [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-26 22:49 ` Paul Eggert @ 2020-05-27 15:26 ` Eli Zaretskii 2020-05-27 16:58 ` Paul Eggert 0 siblings, 1 reply; 132+ messages in thread From: Eli Zaretskii @ 2020-05-27 15:26 UTC (permalink / raw) To: Paul Eggert; +Cc: 41321, monnier, pipcet > Cc: pipcet@gmail.com, 41321@debbugs.gnu.org, monnier@iro.umontreal.ca > From: Paul Eggert <eggert@cs.ucla.edu> > Date: Tue, 26 May 2020 15:49:24 -0700 > > > On non-USE_LSB_TAG systems, GCALIGNMENT is 1, so this doesn't look > > right (or maybe I'm missing something). > > Good point; I'd neglected that. I.e., on !USE_LSB_TAG systems the proposed > emacs-27 patch is overly-conservative, as it causes maybe_lisp_pointer to always > return true. Although this hurts GC performance it doesn't affect correctness > and the patch does fix a crash on USE_LSB_TAG systems, so it (or something like > it) is needed for emacs-27. We used to rely on 8-byte alignment on those systems, and I don't see any reason not to continue relying on that and punishing those systems' performance. What would we gain? ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-27 15:26 ` Eli Zaretskii @ 2020-05-27 16:58 ` Paul Eggert 2020-05-27 17:33 ` Eli Zaretskii ` (2 more replies) 0 siblings, 3 replies; 132+ messages in thread From: Paul Eggert @ 2020-05-27 16:58 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 41321, monnier, pipcet On 5/27/20 8:26 AM, Eli Zaretskii wrote: > We used to rely on 8-byte alignment on those systems, and I don't see > any reason not to continue relying on that and punishing those > systems' performance. What would we gain? In looking into this more, it appears that the maybe_lisp_pointer idea is wrong, in that compilers can make pointers into a Lisp object while losing the address of the original object (and we've seen them do this) and there's no guarantee that these sub-pointers are GCALIGNED. This sort of failure should be quite rare but can cause crashes such as the one you observed. I am looking into a fix and plan to apply it to master (I've already installed some minor glitches I observed on the way); we can then talk about what to do with emacs-27. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-27 16:58 ` Paul Eggert @ 2020-05-27 17:33 ` Eli Zaretskii 2020-05-27 17:53 ` Paul Eggert 2020-05-27 17:57 ` Pip Cet 2020-05-28 18:27 ` Eli Zaretskii 2 siblings, 1 reply; 132+ messages in thread From: Eli Zaretskii @ 2020-05-27 17:33 UTC (permalink / raw) To: Paul Eggert; +Cc: 41321, monnier, pipcet > Cc: pipcet@gmail.com, 41321@debbugs.gnu.org, monnier@iro.umontreal.ca > From: Paul Eggert <eggert@cs.ucla.edu> > Date: Wed, 27 May 2020 09:58:11 -0700 > > In looking into this more, it appears that the maybe_lisp_pointer idea is wrong, > in that compilers can make pointers into a Lisp object while losing the address > of the original object (and we've seen them do this) and there's no guarantee > that these sub-pointers are GCALIGNED. Sorry, I don't follow: what do you mean by "losing the address of the original object" in this case? Can you show an example? ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-27 17:33 ` Eli Zaretskii @ 2020-05-27 17:53 ` Paul Eggert 2020-05-27 18:24 ` Eli Zaretskii 2020-05-28 2:43 ` Stefan Monnier 0 siblings, 2 replies; 132+ messages in thread From: Paul Eggert @ 2020-05-27 17:53 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 41321, monnier, pipcet On 5/27/20 10:33 AM, Eli Zaretskii wrote: > Sorry, I don't follow: what do you mean by "losing the address of the > original object" in this case? Can you show an example? The source code says for (i = 0; i < size; i++) foo (AREF (obj, i)); This is the last reference to obj, so the compiler reuses the register R holding obj, and has that register R contain &XVECTOR (obj)->contents[0], &XVECTOR (obj)->contents[1], etc. each time through the loop, and transforms the call into foo (*R) as an optimization. When foo calls the garbage collector, maybe_lisp_pointer (R) can be false because R doesn't point directly at a Lisp object: it points somewhere into the middle of a Lisp object and R's value is not GC-aligned. We've seen compilers do that. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-27 17:53 ` Paul Eggert @ 2020-05-27 18:24 ` Eli Zaretskii 2020-05-27 18:39 ` Paul Eggert 2020-05-28 2:43 ` Stefan Monnier 1 sibling, 1 reply; 132+ messages in thread From: Eli Zaretskii @ 2020-05-27 18:24 UTC (permalink / raw) To: Paul Eggert; +Cc: 41321, monnier, pipcet > Cc: pipcet@gmail.com, 41321@debbugs.gnu.org, monnier@iro.umontreal.ca > From: Paul Eggert <eggert@cs.ucla.edu> > Date: Wed, 27 May 2020 10:53:22 -0700 > > The source code says > > for (i = 0; i < size; i++) > foo (AREF (obj, i)); > > This is the last reference to obj, so the compiler reuses the register R holding > obj, and has that register R contain &XVECTOR (obj)->contents[0], &XVECTOR > (obj)->contents[1], etc. each time through the loop, and transforms the call > into foo (*R) as an optimization. When foo calls the garbage collector, > maybe_lisp_pointer (R) can be false because R doesn't point directly at a Lisp > object: it points somewhere into the middle of a Lisp object and R's value is > not GC-aligned. For this to cause trouble, you'd need to arrange for no other reference to obj, neither anywhere else up the callstack, nor from another object we will mark. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-27 18:24 ` Eli Zaretskii @ 2020-05-27 18:39 ` Paul Eggert 0 siblings, 0 replies; 132+ messages in thread From: Paul Eggert @ 2020-05-27 18:39 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 41321, monnier, pipcet On 5/27/20 11:24 AM, Eli Zaretskii wrote: > For this to cause trouble, you'd need to arrange for no other > reference to obj, neither anywhere else up the callstack, nor from > another object we will mark. Yes, that's right. It's unlikely, but it does happen and we've seen it happen in the past. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-27 17:53 ` Paul Eggert 2020-05-27 18:24 ` Eli Zaretskii @ 2020-05-28 2:43 ` Stefan Monnier 2020-05-28 7:27 ` Eli Zaretskii 1 sibling, 1 reply; 132+ messages in thread From: Stefan Monnier @ 2020-05-28 2:43 UTC (permalink / raw) To: Paul Eggert; +Cc: 41321, pipcet > (obj)->contents[1], etc. each time through the loop, and transforms the call > into foo (*R) as an optimization. When foo calls the garbage collector, > maybe_lisp_pointer (R) can be false because R doesn't point directly at a Lisp > object: it points somewhere into the middle of a Lisp object and R's value is > not GC-aligned. Indeed, basically `maybe_lisp_pointer` goes against the effort we've put into replacing `live_string_p` with `live_string_holding` (i.e. to recognize anything that points into any part of a Lisp_String so as to prevent collecting it). Stefan ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-28 2:43 ` Stefan Monnier @ 2020-05-28 7:27 ` Eli Zaretskii 2020-05-28 7:41 ` Paul Eggert 0 siblings, 1 reply; 132+ messages in thread From: Eli Zaretskii @ 2020-05-28 7:27 UTC (permalink / raw) To: Stefan Monnier; +Cc: eggert, 41321, pipcet > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: Eli Zaretskii <eliz@gnu.org>, pipcet@gmail.com, 41321@debbugs.gnu.org > Date: Wed, 27 May 2020 22:43:52 -0400 > > Indeed, basically `maybe_lisp_pointer` goes against the effort we've put > into replacing `live_string_p` with `live_string_holding` (i.e. to > recognize anything that points into any part of a Lisp_String so as to > prevent collecting it). You are suggesting that we go back to using live_string_p? ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-28 7:27 ` Eli Zaretskii @ 2020-05-28 7:41 ` Paul Eggert 2020-05-28 13:30 ` Stefan Monnier 0 siblings, 1 reply; 132+ messages in thread From: Paul Eggert @ 2020-05-28 7:41 UTC (permalink / raw) To: Eli Zaretskii, Stefan Monnier; +Cc: 41321, pipcet On 5/28/20 12:27 AM, Eli Zaretskii wrote: >> From: Stefan Monnier <monnier@iro.umontreal.ca> >> Date: Wed, 27 May 2020 22:43:52 -0400 >> >> Indeed, basically `maybe_lisp_pointer` goes against the effort we've put >> into replacing `live_string_p` with `live_string_holding` (i.e. to >> recognize anything that points into any part of a Lisp_String so as to >> prevent collecting it). > > You are suggesting that we go back to using live_string_p? I think he's saying just the opposite: namely, that maybe_lisp_pointer is a mistake, in that it goes against the (solid) reasons we've replaced some calls to live_string_p with calls to live_string_holding. After looking into it I agree. I'll propose a patch shortly that does away with maybe_lisp_pointer. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-28 7:41 ` Paul Eggert @ 2020-05-28 13:30 ` Stefan Monnier 2020-05-28 14:28 ` Pip Cet 0 siblings, 1 reply; 132+ messages in thread From: Stefan Monnier @ 2020-05-28 13:30 UTC (permalink / raw) To: Paul Eggert; +Cc: 41321, pipcet >> You are suggesting that we go back to using live_string_p? > I think he's saying just the opposite: namely, that maybe_lisp_pointer is a > mistake, in that it goes against the (solid) reasons we've replaced some calls > to live_string_p with calls to live_string_holding. > After looking into it I agree. I'll propose a patch shortly that does away with > maybe_lisp_pointer. Exactly. More specifically, `maybe_lisp_pointer` tries to filter out false positives but does it based on the assumption that we should only accept numbers that look like pointers to the beginning of a Lisp_Object. If we still want to try and filter out false positives we need to do it more carefully by considering what is the smallest alignment possible for a pointer to an internal field of a Lisp_Object. And if this least alignment is not the same for all Lisp_Objects, then this test should likely be moved to the respective `live_<foo>_holding`. I suspect that for vectorlike objects, the least alignement is 1 because of some `char` or `bool` fields in some of the pseudovectors. Of course, we could do better by checking for "false positives" after checking the specific kind of vectorlike object (so as to use a different least-alignment-check for those objects that contains `char`s than for those who only contain `int`s, for example). Stefan ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-28 13:30 ` Stefan Monnier @ 2020-05-28 14:28 ` Pip Cet 2020-05-28 16:24 ` Stefan Monnier 2020-05-29 9:43 ` Pip Cet 0 siblings, 2 replies; 132+ messages in thread From: Pip Cet @ 2020-05-28 14:28 UTC (permalink / raw) To: Stefan Monnier; +Cc: Paul Eggert, 41321 On Thu, May 28, 2020 at 1:30 PM Stefan Monnier <monnier@iro.umontreal.ca> wrote: > >> You are suggesting that we go back to using live_string_p? > > I think he's saying just the opposite: namely, that maybe_lisp_pointer is a > > mistake, in that it goes against the (solid) reasons we've replaced some calls > > to live_string_p with calls to live_string_holding. > > After looking into it I agree. I'll propose a patch shortly that does away with > > maybe_lisp_pointer. > > Exactly. More specifically, `maybe_lisp_pointer` tries to filter out > false positives but does it based on the assumption that we should only > accept numbers that look like pointers to the beginning of > a Lisp_Object. > > If we still want to try and filter out false positives we need to do it > more carefully by considering what is the smallest alignment possible > for a pointer to an internal field of a Lisp_Object. > > And if this least alignment is not the same for all Lisp_Objects, then > this test should likely be moved to the respective `live_<foo>_holding`. But at that point, we already have walked the rbtree, which is probably the main performance problem. My suggestion is instead to put MEM_TYPE_SYMBOL blocks into the rbtree twice, once at their proper address and once at the lispsym-based offset. We could then look up each pointer precisely once, though sometimes the blocks might overlap and we'd end up marking two objects for one pointer. But that would lead to overlapping rbtree entries, and that requires some extra code which wouldn't be exercised very often... still, I think it might be worth doing, particularly since there are relatively few symbol blocks on most systems. > I suspect that for vectorlike objects, the least alignement is 1 because > of some `char` or `bool` fields in some of the pseudovectors. > Of course, we could do better by checking for "false positives" after > checking the specific kind of vectorlike object (so as to use > a different least-alignment-check for those objects that contains > `char`s than for those who only contain `int`s, for example). I think the point of maybe_lisp_pointer wasn't to mark fewer objects, it was to look up fewer pointers in the rbtree. I might be wrong. On 64-bit systems with ASLR, at least, it's quite unlikely that we have what looks like a valid pointer into a Lisp object that we can conclude is not based on its offset or alignment... ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-28 14:28 ` Pip Cet @ 2020-05-28 16:24 ` Stefan Monnier 2020-05-29 9:43 ` Pip Cet 1 sibling, 0 replies; 132+ messages in thread From: Stefan Monnier @ 2020-05-28 16:24 UTC (permalink / raw) To: Pip Cet; +Cc: Paul Eggert, 41321 > But at that point, we already have walked the rbtree, which is > probably the main performance problem. Indeed, lisp_maybe_pointer can avoid this cost, but I was more concerned with the risk of increasing the number of objects kept live because of false-positives (i.e. a random integer/float/younameit that happens to look like it's pointing into the object). > I think the point of maybe_lisp_pointer wasn't to mark fewer objects, > it was to look up fewer pointers in the rbtree. I might be wrong. You might right. Stefan ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-28 14:28 ` Pip Cet 2020-05-28 16:24 ` Stefan Monnier @ 2020-05-29 9:43 ` Pip Cet 2020-05-29 18:31 ` Paul Eggert 1 sibling, 1 reply; 132+ messages in thread From: Pip Cet @ 2020-05-29 9:43 UTC (permalink / raw) To: Stefan Monnier; +Cc: Paul Eggert, 41321 [-- Attachment #1: Type: text/plain, Size: 2013 bytes --] On Thu, May 28, 2020 at 2:28 PM Pip Cet <pipcet@gmail.com> wrote: > My suggestion is instead to put MEM_TYPE_SYMBOL blocks into the rbtree > twice, once at their proper address and once at the lispsym-based > offset. > > We could then look up each pointer precisely once, though sometimes > the blocks might overlap and we'd end up marking two objects for one > pointer. > > But that would lead to overlapping rbtree entries, and that requires > some extra code which wouldn't be exercised very often... still, I > think it might be worth doing, particularly since there are relatively > few symbol blocks on most systems. Okay, here's some initial code that does that. It's a little tricky, because real addresses and symbol offsets can overlap arbitrarily and become mapped and unmapped in any order. The basic idea is that symbol offsets are marked two ways: 1. an overlaps_with_symbols flag on a "normal" memory node 2. a mem node type of MEM_TYPE_SYMBOL_ADJUSTED (2) implies (1), but not the other way around. There's only one flag per normal memory node, which is true if any of the addresses in the node are also valid symbol offsets. MEM_TYPE_SYMBOL_ADJUSTED nodes have start and end addresses that do not necessarily correspond to symbol blocks or even symbols; their length is arbitrary. When we insert or delete memory nodes, we perform the obvious operations to keep MEM_TYPE_SYMBOL_ADJUSTED blocks accurate: i.e., when a MEM_TYPE_SYMBOL_ADJUSTED node is split by an intervening/overlapping normal node, we insert one or two new MEM_TYPE_SYMBOL_ADJUSTED nodes to cover the remaining offsets, and set the overlaps_with_symbols flag on the normal node, to cover those, etc. As I said, the code is tricky (i.e. might contain bugs that can only be discovered through extensive testing on 32-bit systems), and it complicates what should be generic functions for the rbtree implementation, so this is probably a 32-bit optimization that is too late because 32-bit systems are no longer that relevant... [-- Attachment #2: 0001-snapshot.patch --] [-- Type: text/x-patch, Size: 8603 bytes --] From 246493425f01fc6876ed2222fd4c1806dc0e12f1 Mon Sep 17 00:00:00 2001 From: Pip Cet <pipcet@gmail.com> Date: Fri, 29 May 2020 09:40:36 +0000 Subject: [PATCH] snapshot --- src/alloc.c | 200 +++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 199 insertions(+), 1 deletion(-) diff --git a/src/alloc.c b/src/alloc.c index e241b9933a..65cbacbe87 100644 --- a/src/alloc.c +++ b/src/alloc.c @@ -475,6 +475,7 @@ no_sanitize_memcpy (void *dest, void const *src, size_t size) MEM_TYPE_CONS, MEM_TYPE_STRING, MEM_TYPE_SYMBOL, + MEM_TYPE_SYMBOL_ADJUSTED, MEM_TYPE_FLOAT, /* Since all non-bool pseudovectors are small enough to be allocated from vector blocks, this memory type denotes @@ -534,6 +535,12 @@ deadp (Lisp_Object x) /* Start and end of allocated region. */ void *start, *end; + /* Whether any symbol blocks are known to exist whose adjusted + offsets fall in this region. If only symbol offsets in this + region are valid, type == MEM_TYPE_SYMBOL_ADJUSTED, but this + flag will also be true. */ + bool overlaps_with_symbols; + /* Node color. */ enum {MEM_BLACK, MEM_RED} color; @@ -981,6 +988,17 @@ record_xmalloc (size_t size) return p; } +static void * +adjust_symbol (void *ptr) +{ + return (void *)((uintptr_t) ptr - (uintptr_t) &lispsym); +} + +static void * +unadjust_symbol (void *ptr) +{ + return (void *)((uintptr_t) ptr + (uintptr_t) &lispsym); +} /* Like malloc but used for allocating Lisp data. NBYTES is the number of bytes to allocate, TYPE describes the intended use of the @@ -1023,6 +1041,9 @@ lisp_malloc (size_t nbytes, bool clearit, enum mem_type type) #ifndef GC_MALLOC_CHECK if (val && type != MEM_TYPE_NON_LISP) mem_insert (val, (char *) val + nbytes, type); + if (val && type == MEM_TYPE_SYMBOL) + mem_insert (adjust_symbol (val), (char *) adjust_symbol (val) + nbytes, + MEM_TYPE_SYMBOL_ADJUSTED)->overlaps_with_symbols = true; #endif MALLOC_UNBLOCK_INPUT; @@ -1259,6 +1280,9 @@ lisp_align_malloc (size_t nbytes, enum mem_type type) #ifndef GC_MALLOC_CHECK if (type != MEM_TYPE_NON_LISP) mem_insert (val, (char *) val + nbytes, type); + if (val && type == MEM_TYPE_SYMBOL) + mem_insert (adjust_symbol (val), (char *) adjust_symbol (val) + nbytes, + MEM_TYPE_SYMBOL_ADJUSTED)->overlaps_with_symbols = true; #endif MALLOC_UNBLOCK_INPUT; @@ -4073,6 +4097,36 @@ mem_init (void) mem_root = MEM_NIL; } +/* Value is a pointer to the first mem node not to start before START. + Value is MEM_NIL if there is no such node. */ + +static struct mem_node * +mem_find_next (void *start) +{ + struct mem_node *p, *parent; + + p = mem_root; + parent = p; + while (p != MEM_NIL) + { + if (start >= p->end) + { + p = p->right; + } + else if (start <= p->start) + { + parent = p; + p = p->left; + } + else + return p; + } + + if (start <= parent->start) + return parent; + + return MEM_NIL; +} /* Value is a pointer to the mem_node containing START. Value is MEM_NIL if there is no node in the tree containing START. */ @@ -4119,9 +4173,42 @@ mem_insert (void *start, void *end, enum mem_type type) while (c != MEM_NIL) { parent = c; - c = start < c->start ? c->left : c->right; + if (start < c->end && c->start < end) + break; + if (start < c->start) + c = c->left; + else if (end >= c->end) + c = c->right; + else + break; } + if (parent && parent->end > start && parent->start < end) + { + void *old_start = parent->start; + void *old_end = parent->end; + enum mem_type old_type = parent->type; + if (type == MEM_TYPE_SYMBOL_ADJUSTED + && old_type != MEM_TYPE_SYMBOL_ADJUSTED) + { + if (start < old_start) + mem_insert (start, old_start, type)->overlaps_with_symbols = true; + if (old_end < end) + mem_insert (old_end, end, type)->overlaps_with_symbols = true; + } + else + { + eassert (parent->type == MEM_TYPE_SYMBOL_ADJUSTED); + mem_delete (parent); + parent = mem_insert (start, end, type); + if (old_start < start) + mem_insert (old_start, start, old_type)->overlaps_with_symbols = true; + if (old_end > end) + mem_insert (end, old_end, old_type)->overlaps_with_symbols = true; + } + parent->overlaps_with_symbols = true; + return parent; + } /* Create a new node. */ #ifdef GC_MALLOC_CHECK x = malloc (sizeof *x); @@ -4136,6 +4223,7 @@ mem_insert (void *start, void *end, enum mem_type type) x->parent = parent; x->left = x->right = MEM_NIL; x->color = MEM_RED; + x->overlaps_with_symbols = false; /* Insert it as child of PARENT or install it as root. */ if (parent) @@ -4301,12 +4389,92 @@ mem_rotate_right (struct mem_node *x) x->parent = y; } +/* Set the overlaps_with_symbols flag based on MEM_TYPE_SYMBOL + blocks. */ + +static void +mem_set_overlaps_with_symbols (struct mem_node *x) +{ + x->overlaps_with_symbols = false; + + for (void *p = unadjust_symbol (x->start); + p < unadjust_symbol (x->end);) + { + struct mem_node *y = mem_find_next (p); + p = y->end; + if (y->start >= x->end) + break; + if (y->type == MEM_TYPE_SYMBOL) + { + x->overlaps_with_symbols = true; + return; + } + } +} /* Delete node Z from the tree. If Z is null or MEM_NIL, do nothing. */ static void mem_delete (struct mem_node *z) { + if (z->overlaps_with_symbols) + { + void *z_start = z->start; + void *z_end = z->end; + z->overlaps_with_symbols = false; + mem_delete (z); + /* Find all the symbol blocks that intersected with z, and add + them to the rbtree. */ + for (void *unadjusted_start = unadjust_symbol (z_start); + unadjusted_start < unadjust_symbol (z_end);) + { + struct mem_node *x = mem_find_next (unadjusted_start); + unadjusted_start = x->end; + + if (x == MEM_NIL) + break; + + if (x->start > unadjust_symbol (z_end)) + break; + + if (x->type == MEM_TYPE_SYMBOL) + { + mem_insert (max (z_start, adjust_symbol (x->start)), + min (z_end, adjust_symbol (x->end)), + MEM_TYPE_SYMBOL_ADJUSTED) + ->overlaps_with_symbols = true; + } + } + return; + } + if (z->type == MEM_TYPE_SYMBOL) + { + for (void *adjusted_start = adjust_symbol (z->start); + adjusted_start < adjust_symbol (z->end);) + { + struct mem_node *x = mem_find_next (adjusted_start); + adjusted_start = x->end; + if (x->type == MEM_TYPE_SYMBOL_ADJUSTED) + { + void *x_start = x->start; + void *x_end = x->end; + mem_delete (x); + if (x_start < adjust_symbol (z->start)) + mem_insert (x_start, adjust_symbol (z->start), + MEM_TYPE_SYMBOL_ADJUSTED)->overlaps_with_symbols + = true; + if (x_end > adjust_symbol (z->end)) + mem_insert (adjust_symbol (z->end), x_end, + MEM_TYPE_SYMBOL_ADJUSTED)->overlaps_with_symbols + = true; + } + else + { + eassert (x->overlaps_with_symbols); + mem_set_overlaps_with_symbols (x); + } + } + } struct mem_node *x, *y; if (!z || z == MEM_NIL) @@ -4342,6 +4510,7 @@ mem_delete (struct mem_node *z) z->start = y->start; z->end = y->end; z->type = y->type; + z->overlaps_with_symbols = y->overlaps_with_symbols; } if (y->color == MEM_BLACK) @@ -4766,6 +4935,10 @@ mark_maybe_pointer (void *p) obj = live_symbol_holding (m, p); break; + case MEM_TYPE_SYMBOL_ADJUSTED: + /* handled below */ + break; + case MEM_TYPE_FLOAT: if (live_float_p (m, p)) obj = make_lisp_ptr (p, Lisp_Float); @@ -4782,6 +4955,18 @@ mark_maybe_pointer (void *p) if (!NILP (obj)) mark_object (obj); + + if (m->overlaps_with_symbols) + { + obj = Qnil; + p = unadjust_symbol (p); + m = mem_find (p); + if (m != MEM_NIL + && m->type == MEM_TYPE_SYMBOL) + obj = live_symbol_holding (m, unadjust_symbol (p)); + if (!NILP (obj)) + mark_object (obj); + } } } @@ -7077,6 +7262,19 @@ sweep_symbols (void) /* Unhook from the free list. */ symbol_free_list = sblk->symbols[0].u.s.next; lisp_free (sblk); + void *p = adjust_symbol (sblk); + while (true) + { + struct mem_node *m = mem_find_next (p); + if (m->start >= + adjust_symbol (sblk + 1)) + break; + p = m->end; + mem_set_overlaps_with_symbols (m); + if (m->type == MEM_TYPE_SYMBOL_ADJUSTED + && !m->overlaps_with_symbols) + mem_delete (m); + } } else { -- 2.27.0.rc0 ^ permalink raw reply related [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-29 9:43 ` Pip Cet @ 2020-05-29 18:31 ` Paul Eggert 2020-05-29 18:37 ` Pip Cet 0 siblings, 1 reply; 132+ messages in thread From: Paul Eggert @ 2020-05-29 18:31 UTC (permalink / raw) To: Pip Cet, Stefan Monnier; +Cc: 41321 On 5/29/20 2:43 AM, Pip Cet wrote: > As I said, the code is tricky (i.e. might contain bugs that can only > be discovered through extensive testing on 32-bit systems), and it > complicates what should be generic functions for the rbtree > implementation, so this is probably a 32-bit optimization that is too > late because 32-bit systems are no longer that relevant... At least at first, it may make more sense to keep the red-black trees as-is, and to look up what appear to be symbol-tagged pointers twice, once as-is (to find any kind of object) and once offset by '(char *) lispsym - Lisp_Symbol' (to find only symbols). Although a bit slower, this won't require any changes to the rbtree code so it's cleaner. We can then time the optimization you have in mind, to see whether it's worth doing. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-29 18:31 ` Paul Eggert @ 2020-05-29 18:37 ` Pip Cet 2020-05-29 19:32 ` Paul Eggert 0 siblings, 1 reply; 132+ messages in thread From: Pip Cet @ 2020-05-29 18:37 UTC (permalink / raw) To: Paul Eggert; +Cc: 41321, Stefan Monnier On Fri, May 29, 2020 at 6:31 PM Paul Eggert <eggert@cs.ucla.edu> wrote: > On 5/29/20 2:43 AM, Pip Cet wrote: > > As I said, the code is tricky (i.e. might contain bugs that can only > > be discovered through extensive testing on 32-bit systems), and it > > complicates what should be generic functions for the rbtree > > implementation, so this is probably a 32-bit optimization that is too > > late because 32-bit systems are no longer that relevant... > > At least at first, it may make more sense to keep the red-black trees as-is, and > to look up what appear to be symbol-tagged pointers twice, once as-is (to find > any kind of object) and once offset by '(char *) lispsym - Lisp_Symbol' (to find > only symbols). Having had some time to think about this, I agree. I'm certainly not very confident in that code. But the main reason is that it's not an optimization in all circumstances: if you have a very large vector, and a symbol block aliasing it as symbol offsets goes away, you have to search for other symbol blocks with that property, which might take a long time. However, I wonder what you mean by "what appear to be symbol-tagged pointers"? Surely we need to look up all pointers twice, no matter what their tag is, since they might be a reference to something inside the struct Lisp_Symbol. Of course, on 64-bit machines, this line of code would usually save us the trouble: if (start < min_heap_address || start > max_heap_address) return MEM_NIL; So that's another reason to leave the code as it is for now. > Although a bit slower, this won't require any changes to the > rbtree code so it's cleaner. > We can then time the optimization you have in mind, to see whether it's worth doing. ... or something simpler that might actually work better :-) ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-29 18:37 ` Pip Cet @ 2020-05-29 19:32 ` Paul Eggert 2020-05-29 19:37 ` Pip Cet 2020-05-29 20:26 ` Stefan Monnier 0 siblings, 2 replies; 132+ messages in thread From: Paul Eggert @ 2020-05-29 19:32 UTC (permalink / raw) To: Pip Cet; +Cc: 41321, Stefan Monnier On 5/29/20 11:37 AM, Pip Cet wrote: > if you have a very large vector, and a symbol block > aliasing it as symbol offsets goes away, you have to search for other > symbol blocks with that property, which might take a long time. It shouldn't be that bad, because when you are worrying about symbols offset by 'lispsym', you need to look only for symbol blocks; it won't matter if these values appear to point into a vector because you won't follow them in that case. > However, I wonder what you mean by "what appear to be symbol-tagged > pointers"? Surely we need to look up all pointers twice, no matter > what their tag is, since they might be a reference to something inside > the struct Lisp_Symbol. What I was trying to say is that if a pointer lacks the symbol tag, then we needn't worry about it being offset by 'lispsym'. These pointers need to be looked up only once, even if they happen to be pointers into a struct Lisp_Symbol. We can safely assume that a compiler won't take a Lisp_Object that is a symbol, and add a small offset to it without also adding 'lispsym'. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-29 19:32 ` Paul Eggert @ 2020-05-29 19:37 ` Pip Cet 2020-05-29 20:26 ` Stefan Monnier 1 sibling, 0 replies; 132+ messages in thread From: Pip Cet @ 2020-05-29 19:37 UTC (permalink / raw) To: Paul Eggert; +Cc: 41321, Stefan Monnier On Fri, May 29, 2020 at 7:32 PM Paul Eggert <eggert@cs.ucla.edu> wrote: > On 5/29/20 11:37 AM, Pip Cet wrote: > > if you have a very large vector, and a symbol block > > aliasing it as symbol offsets goes away, you have to search for other > > symbol blocks with that property, which might take a long time. > > It shouldn't be that bad, because when you are worrying about symbols offset by > 'lispsym', you need to look only for symbol blocks; it won't matter if these > values appear to point into a vector because you won't follow them in that case. You mean it shouldn't be that bad with the existing code? You're probably right. It would have been very bad with the code I posted though, so best ignore that. > > However, I wonder what you mean by "what appear to be symbol-tagged > > pointers"? Surely we need to look up all pointers twice, no matter > > what their tag is, since they might be a reference to something inside > > the struct Lisp_Symbol. > > What I was trying to say is that if a pointer lacks the symbol tag, then we > needn't worry about it being offset by 'lispsym'. These pointers need to be > looked up only once, even if they happen to be pointers into a struct > Lisp_Symbol. We can safely assume that a compiler won't take a Lisp_Object that > is a symbol, and add a small offset to it without also adding 'lispsym'. Oh! You're right, of course. How silly of me not to realize. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-29 19:32 ` Paul Eggert 2020-05-29 19:37 ` Pip Cet @ 2020-05-29 20:26 ` Stefan Monnier 2020-05-29 20:40 ` Paul Eggert 2020-05-30 5:51 ` Eli Zaretskii 1 sibling, 2 replies; 132+ messages in thread From: Stefan Monnier @ 2020-05-29 20:26 UTC (permalink / raw) To: Paul Eggert; +Cc: 41321, Pip Cet > What I was trying to say is that if a pointer lacks the symbol tag, then we > needn't worry about it being offset by 'lispsym'. These pointers need to be > looked up only once, even if they happen to be pointers into a struct > Lisp_Symbol. We can safely assume that a compiler won't take a Lisp_Object that > is a symbol, and add a small offset to it without also adding 'lispsym'. I don't think that true. The original problematic case is for wide-int where a 64bit Lisp_Object containing a symbol is split into a 32bit tag saying "this is a symbol" and a 32bit pointer to which an offset has been added. So when we encounter a 32bit word on the stack, it may be a "plain pointer" or it may be the 32bit of a pointer to a symbol with an offset applied but we can't tell which it is because we don't have the tag at that point. Stefan "looking forward to bignums replacing wide-ints" ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-29 20:26 ` Stefan Monnier @ 2020-05-29 20:40 ` Paul Eggert 2020-05-30 5:54 ` Eli Zaretskii 2020-05-30 5:51 ` Eli Zaretskii 1 sibling, 1 reply; 132+ messages in thread From: Paul Eggert @ 2020-05-29 20:40 UTC (permalink / raw) To: Stefan Monnier; +Cc: 41321, Pip Cet On 5/29/20 1:26 PM, Stefan Monnier wrote: > The original problematic case is for wide-int where a 64bit Lisp_Object > containing a symbol is split into a 32bit tag saying "this is a symbol" > and a 32bit pointer to which an offset has been added. > > So when we encounter a 32bit word on the stack, it may be a "plain > pointer" or it may be the 32bit of a pointer to a symbol with an > offset applied but we can't tell which it is because we don't have the > tag at that point. Oh, you're right. Thanks, I was thinking only of the USE_LSB_TAG case. For the !USE_LSB_TAG case, we should check whether the word is aligned for 'struct Lisp_Symbol', not whether it has the Lisp_Symbol tag, when deciding quickly whether to add 'lispsym' and then do the second rbtree lookup. Something like this: (USE_LSB_TAG ? (uintptr_t) word % GCALIGNMENT == Lisp_Symbol : (uintptr_t) word % alignof (struct Lisp_Symbol) == 0) I'll fold this idea into the next iteration of the patch I'm working on. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-29 20:40 ` Paul Eggert @ 2020-05-30 5:54 ` Eli Zaretskii 2020-05-30 17:52 ` Paul Eggert 0 siblings, 1 reply; 132+ messages in thread From: Eli Zaretskii @ 2020-05-30 5:54 UTC (permalink / raw) To: Paul Eggert; +Cc: 41321, monnier, pipcet > Cc: Pip Cet <pipcet@gmail.com>, Eli Zaretskii <eliz@gnu.org>, > 41321@debbugs.gnu.org > From: Paul Eggert <eggert@cs.ucla.edu> > Date: Fri, 29 May 2020 13:40:33 -0700 > > (USE_LSB_TAG > ? (uintptr_t) word % GCALIGNMENT == Lisp_Symbol > : (uintptr_t) word % alignof (struct Lisp_Symbol) == 0) I don't understand how this will work, given that Lisp object on the stack can be pushed as 2 non-contiguous 32-bit words. Can you explain? ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-30 5:54 ` Eli Zaretskii @ 2020-05-30 17:52 ` Paul Eggert 2020-05-30 18:11 ` Eli Zaretskii 0 siblings, 1 reply; 132+ messages in thread From: Paul Eggert @ 2020-05-30 17:52 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 41321, monnier, pipcet On 5/29/20 10:54 PM, Eli Zaretskii wrote: >> (USE_LSB_TAG >> ? (uintptr_t) word % GCALIGNMENT == Lisp_Symbol >> : (uintptr_t) word % alignof (struct Lisp_Symbol) == 0) > I don't understand how this will work, given that Lisp object on the > stack can be pushed as 2 non-contiguous 32-bit words. Can you > explain? On a --with-wide-int host where !USE_LSB_TAG, the above test will work correctly on the low-order word of a Lisp object that is a symbol, because ((uintptr_t) word % alignof (struct Lisp_Symbol) == 0) must be true on such a word. The test is only for symbols; it's not for other Lisp objects. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-30 17:52 ` Paul Eggert @ 2020-05-30 18:11 ` Eli Zaretskii 2020-05-30 18:17 ` Paul Eggert 0 siblings, 1 reply; 132+ messages in thread From: Eli Zaretskii @ 2020-05-30 18:11 UTC (permalink / raw) To: Paul Eggert; +Cc: 41321, monnier, pipcet > Cc: monnier@iro.umontreal.ca, pipcet@gmail.com, 41321@debbugs.gnu.org > From: Paul Eggert <eggert@cs.ucla.edu> > Date: Sat, 30 May 2020 10:52:07 -0700 > > On 5/29/20 10:54 PM, Eli Zaretskii wrote: > >> (USE_LSB_TAG > >> ? (uintptr_t) word % GCALIGNMENT == Lisp_Symbol > >> : (uintptr_t) word % alignof (struct Lisp_Symbol) == 0) > > I don't understand how this will work, given that Lisp object on the > > stack can be pushed as 2 non-contiguous 32-bit words. Can you > > explain? > > On a --with-wide-int host where !USE_LSB_TAG, the above test will work > correctly on the low-order word of a Lisp object that is a symbol, > because ((uintptr_t) word % alignof (struct Lisp_Symbol) == 0) must be > true on such a word. > > The test is only for symbols; it's not for other Lisp objects. So any pointer whose alignment is the same as 'struct Lisp_Symbol' will pass the test, regardless of the tag bits? That's basically most of the struct pointers on those architectures, no? ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-30 18:11 ` Eli Zaretskii @ 2020-05-30 18:17 ` Paul Eggert 0 siblings, 0 replies; 132+ messages in thread From: Paul Eggert @ 2020-05-30 18:17 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 41321, monnier, pipcet On 5/30/20 11:11 AM, Eli Zaretskii wrote: > So any pointer whose alignment is the same as 'struct Lisp_Symbol' > will pass the test, regardless of the tag bits? That's basically most > of the struct pointers on those architectures, no? Yes, pretty much. This is an inevitable consequence of the problem at hand. For aligned pointers we must consult the red-black tree no matter what solution we pick, because the compiler may have aligned a pointer for us. Just to make sure we're on the same page here. This stuff is only about how to improve performance (compared to the patch proposed for emacs-27 in Bug#41321#332) by doing fast checks on words before giving them to the red-black search. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-29 20:26 ` Stefan Monnier 2020-05-29 20:40 ` Paul Eggert @ 2020-05-30 5:51 ` Eli Zaretskii 2020-05-30 14:26 ` Stefan Monnier 1 sibling, 1 reply; 132+ messages in thread From: Eli Zaretskii @ 2020-05-30 5:51 UTC (permalink / raw) To: Stefan Monnier; +Cc: eggert, 41321, pipcet > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: Pip Cet <pipcet@gmail.com>, Eli Zaretskii <eliz@gnu.org>, > 41321@debbugs.gnu.org > Date: Fri, 29 May 2020 16:26:59 -0400 > > Stefan "looking forward to bignums replacing wide-ints" Why? so that Emacs could be slower still? ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-30 5:51 ` Eli Zaretskii @ 2020-05-30 14:26 ` Stefan Monnier 0 siblings, 0 replies; 132+ messages in thread From: Stefan Monnier @ 2020-05-30 14:26 UTC (permalink / raw) To: Eli Zaretskii; +Cc: eggert, 41321, pipcet >> Stefan "looking forward to bignums replacing wide-ints" > Why? so that Emacs could be slower still? Well, if performance is a serious problem, then maybe "bignums replacing wide-ints" will never happen. IOW the above assumes that we can make them work as fast if not faster (more specifically, using bignums should(!?) result is better performance in buffers <512MB, while it will indeed likely result is worse performance in buffers bigger than that). Stefan ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-27 16:58 ` Paul Eggert 2020-05-27 17:33 ` Eli Zaretskii @ 2020-05-27 17:57 ` Pip Cet 2020-05-27 18:39 ` Paul Eggert 2020-05-28 18:27 ` Eli Zaretskii 2 siblings, 1 reply; 132+ messages in thread From: Pip Cet @ 2020-05-27 17:57 UTC (permalink / raw) To: Paul Eggert; +Cc: 41321, Stefan Monnier On Wed, May 27, 2020 at 4:58 PM Paul Eggert <eggert@cs.ucla.edu> wrote: > On 5/27/20 8:26 AM, Eli Zaretskii wrote: > > We used to rely on 8-byte alignment on those systems, and I don't see > > any reason not to continue relying on that and punishing those > > systems' performance. What would we gain? > > In looking into this more, it appears that the maybe_lisp_pointer idea is wrong, > in that compilers can make pointers into a Lisp object while losing the address > of the original object (and we've seen them do this) and there's no guarantee > that these sub-pointers are GCALIGNED. Do you know of anything like this happening on 64-bit systems? Because I think it doesn't; Emacs GC does rely, and has always relied since GCPRO was removed, on compilers being sensible about what they put on the stack. There's no guarantee in the C standard that that's true, but there never will be. > This sort of failure should be quite rare > but can cause crashes such as the one you observed. I'm pretty sure we figured out the crash that Eli observed. It's not anything that involved, just a Lisp_Object being stored non-consecutively and simultaneously being misaligned for the purposes of maybe_lisp_pointer. > I am looking into a fix and > plan to apply it to master (I've already installed some minor glitches I > observed on the way); we can then talk about what to do with emacs-27. I may be out of line, but I think it's rash to change things like that, even on master, with no opportunity for prior discussion. This isn't a minor bug, or a spelling fix: it's a fundamental change in what we expect from our C compiler and how GC works. In particular, I don't see how you plan to solve it without treating any pointer that points even in the vicinity of a valid lisp object as keeping that object alive. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-27 17:57 ` Pip Cet @ 2020-05-27 18:39 ` Paul Eggert 2020-05-27 18:56 ` Pip Cet 0 siblings, 1 reply; 132+ messages in thread From: Paul Eggert @ 2020-05-27 18:39 UTC (permalink / raw) To: Pip Cet; +Cc: 41321, Stefan Monnier On 5/27/20 10:57 AM, Pip Cet wrote: > Do you know of anything like this happening on 64-bit systems? I think it's unlikely on 64-bit systems; it'd happen only on platforms where alignof (void *) < 8, such as x86. > Emacs GC does rely, and has always relied since > GCPRO was removed, on compilers being sensible about what they put on > the stack. This isn't merely an issue about what compilers put into the stack; it's an also an issue of what's in registers. There may not be any pointer in the stack that points into the Lisp object. And compilers are not always "sensible" about temps; they may cache &P->x into a temp with no copy of P anywhere. > I'm pretty sure we figured out the crash that Eli observed. It's not > anything that involved, just a Lisp_Object being stored > non-consecutively and simultaneously being misaligned for the purposes > of maybe_lisp_pointer. Not sure what the point is here. None of this is "that involved". We can have pointers into Lisp objects, pointers that are not aligned for the purposes of maybe_lisp_pointer. Emacs should follow all of them, not just the one that Eli happened to observe. > I don't see how you plan to solve it without treating any pointer that > points even in the vicinity of a valid lisp object as keeping that > object alive. Yes, of course. Any pointer that points somewhere within a Lisp object (in the C sense) should count as pointing to the object. If memory serves, we already treat pointers that way in some places; unfortunately we're not doing it consistently. But I take your point; I'll post the change here before committing to master. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-27 18:39 ` Paul Eggert @ 2020-05-27 18:56 ` Pip Cet 2020-05-28 1:21 ` Paul Eggert 0 siblings, 1 reply; 132+ messages in thread From: Pip Cet @ 2020-05-27 18:56 UTC (permalink / raw) To: Paul Eggert; +Cc: 41321, Stefan Monnier On Wed, May 27, 2020 at 6:39 PM Paul Eggert <eggert@cs.ucla.edu> wrote: > On 5/27/20 10:57 AM, Pip Cet wrote: > > > Do you know of anything like this happening on 64-bit systems? > > I think it's unlikely on 64-bit systems; it'd happen only on platforms where > alignof (void *) < 8, such as x86. > > > Emacs GC does rely, and has always relied since > > GCPRO was removed, on compilers being sensible about what they put on > > the stack. > > This isn't merely an issue about what compilers put into the stack; it's an also > an issue of what's in registers. There may not be any pointer in the stack that > points into the Lisp object. And compilers are not always "sensible" about > temps; they may cache &P->x into a temp with no copy of P anywhere. Or they may cache &P->x + 1, and use negative offsets to access it. That used to be the most efficient way of accessing arrays on some machines. We simply can't cater to that. Think about code like: Lisp_Object reverse(Lisp_Object vector) { ptrdiff_t count = ASIZE (vector); Lisp_Object new_vector = make_nil_vector (count); Lisp_Object *p = aref_addr (vector, count); Lisp_Object *q = new_vector->contents; while (count--) { garbage_collect (); *q++ = *--p; } } (which is what many compilers would generate from more sensible code). On the first iteration, p points to a totally different vector, or some random other object, but it still needs to keep its vector alive. So, at the very least, we need to always keep the immediately preceding object alive if we go that way. > > I'm pretty sure we figured out the crash that Eli observed. It's not > > anything that involved, just a Lisp_Object being stored > > non-consecutively and simultaneously being misaligned for the purposes > > of maybe_lisp_pointer. > > Not sure what the point is here. None of this is "that involved". We can have > pointers into Lisp objects, pointers that are not aligned for the purposes of > maybe_lisp_pointer. Emacs should follow all of them, not just the one that Eli > happened to observe. Or pointers past them, and that's a significant overhead because it usually means two objects are being kept alive by one reference. > > I don't see how you plan to solve it without treating any pointer that > > points even in the vicinity of a valid lisp object as keeping that > > object alive. > Yes, of course. I didn't mean just "within the object", I did mean "in the vicinity". With prefetch instructions, it's quite likely the compiler concludes it's easiest to prefetch something 256 bytes ahead of where it actually makes the access, then make the actual access relative to that address... > Any pointer that points somewhere within a Lisp object (in the C > sense) should count as pointing to the object. The C standard explicitly allows pointers (and that's C pointers) to point one past the end of an allocated array, I believe. > If memory serves, we already > treat pointers that way in some places; unfortunately we're not doing it > consistently. Yes, we do. > But I take your point; I'll post the change here before committing to master. I'm sorry, I misunderstood. If you want to fix only pointers within objects, that is quite a small change, but I believe it is incomplete. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-27 18:56 ` Pip Cet @ 2020-05-28 1:21 ` Paul Eggert 2020-05-28 6:31 ` Pip Cet 0 siblings, 1 reply; 132+ messages in thread From: Paul Eggert @ 2020-05-28 1:21 UTC (permalink / raw) To: Pip Cet; +Cc: 41321, Stefan Monnier On 5/27/20 11:56 AM, Pip Cet wrote: > So, at the very least, we need to always keep the immediately > preceding object alive if we go that way. Yes, I'm assuming that. I'll check that the code is doing that (if it isn't doing it already). > that's a significant overhead because it > usually means two objects are being kept alive by one reference. For Lisp_Objects with nonzero tags this shouldn't be an issue, since the tags mean the pointers won't tie down two objects. For Lisp_Symbols (whose tags are zero) it is an issue; also for untagged pointers to the start of objects. I'll measure how much overhead is involved in my usual 'make compile-always' benchmark. If it's not that much then we'll be OK. I'm hoping that's the case. If not, there are some more measures we can take. > With prefetch instructions, it's quite likely the compiler concludes > it's easiest to prefetch something 256 bytes ahead of where it > actually makes the access, then make the actual access relative to > that address... I wouldn't worry about that; it's so unlikely that it's not a practical concern. "Some C optimizers may lose the last undisguised pointer to a memory object as a consequence of clever optimizations. This has almost never been observed in practice." <https://github.com/ivmai/bdwgc> As I understand it, the times "in practice" that Hans-J. Boehm was talking about were for C code deliberately designed to fool the compiler / GC combination. I think it unlikely that a modern compiler would break all the code out there that uses conservative GC. (Besides, if that stuff really were of practical concern we'd have to give up on conservative GC entirely. :-) ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-28 1:21 ` Paul Eggert @ 2020-05-28 6:31 ` Pip Cet 2020-05-28 7:47 ` Paul Eggert 0 siblings, 1 reply; 132+ messages in thread From: Pip Cet @ 2020-05-28 6:31 UTC (permalink / raw) To: Paul Eggert; +Cc: 41321, Stefan Monnier On Thu, May 28, 2020 at 1:21 AM Paul Eggert <eggert@cs.ucla.edu> wrote: > On 5/27/20 11:56 AM, Pip Cet wrote: > > > So, at the very least, we need to always keep the immediately > > preceding object alive if we go that way. > > Yes, I'm assuming that. I'll check that the code is doing that (if it isn't > doing it already). Okay, that makes sense. > > that's a significant overhead because it > > usually means two objects are being kept alive by one reference. > > For Lisp_Objects with nonzero tags this shouldn't be an issue, since the tags > mean the pointers won't tie down two objects. On USE_LSB_TAG systems, you're correct. > I'll measure how much overhead is involved in my usual 'make compile-always' > benchmark. If it's not that much then we'll be OK. I'm hoping that's the case. > If not, there are some more measures we can take. I suspect that garbage collection is only slowed down significantly when there are large objects on the stack; that happens when GC happens during redisplay, for example. (All the more reason to make the struct it stack heap-allocated as I'd proposed). > > With prefetch instructions, it's quite likely the compiler concludes > > it's easiest to prefetch something 256 bytes ahead of where it > > actually makes the access, then make the actual access relative to > > that address... > > I wouldn't worry about that; it's so unlikely that it's not a practical concern. Fingers crossed. > "Some C optimizers may lose the last undisguised pointer to a memory object as a > consequence of clever optimizations. This has almost never been observed in > practice." <https://github.com/ivmai/bdwgc> As I understand it, the times "in > practice" that Hans-J. Boehm was talking about were for C code deliberately > designed to fool the compiler / GC combination. > > I think it unlikely that a modern compiler would break all the code out there > that uses conservative GC. > > (Besides, if that stuff really were of practical concern we'd have to give up on > conservative GC entirely. :-) I hope you're right, in that compilers will support GC better before they move on to clever optimizations that break it :-) (I'm not sure what the current state is of "real" GC support in LLVM; I'm pretty sure not much has happened in GCC.) ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-28 6:31 ` Pip Cet @ 2020-05-28 7:47 ` Paul Eggert 2020-05-28 8:11 ` Pip Cet 0 siblings, 1 reply; 132+ messages in thread From: Paul Eggert @ 2020-05-28 7:47 UTC (permalink / raw) To: Pip Cet; +Cc: 41321, Stefan Monnier [-- Attachment #1: Type: text/plain, Size: 1317 bytes --] On 5/27/20 11:31 PM, Pip Cet wrote: > I hope you're right, in that compilers will support GC better before > they move on to clever optimizations that break it :-) After looking into it, I decided it wasn't worth the hassle of treating pointers just past the end of a Lisp object as pointing into the object. Although such pointers can exist, I can't think of a realistic-with-today's-compilers scenario at the machine level where (1) a pointer like that will exist, (2) no pointers into the middle or start of the object will exist, and (3) the object might be accessed later. In contrast we have seen scenarios with pointers into the middle of Lisp objects. With that in mind, attached is a proposed patch to master that I hope deals with some of the more-serious problems mentioned so far in this thread, in particular the problem with Lisp_Object representations of symbols being split into two registers in a --with-wide-int build. I haven't tested this as much as I'd like, but I need to turn my attention to sleep and work and so this is a good place to broadcast a checkpoint. This patch doesn't address the LISP_ALIGNMENT issues you mentioned, both in lisp.h and in the pdumper; I can work on that soon, I think. PS. Thanks for helping bring this problem to our attention; it's been fun to look into it. [-- Attachment #2: 0001-Fix-crashes-due-to-misidentified-pointers.patch --] [-- Type: text/x-patch, Size: 5798 bytes --] From 023344217e05d2a23b5e8157da2f9aea16a5df78 Mon Sep 17 00:00:00 2001 From: Paul Eggert <eggert@cs.ucla.edu> Date: Thu, 28 May 2020 00:11:08 -0700 Subject: [PATCH] Fix crashes due to misidentified pointers MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Problem reported by Pip Cet (Bug#41321#74, Bug#41321#80) A compiler can create temporaries that point somewhere into a Lisp object but are not GCALIGNED, and these temporaries may be the only thing that addresses the object. So, if any value points within an object, treat the object as being addressed. However, do not worry about pointers that point just past the end of an object, as these do not seem to be a problem in practice and attempting to worry about them would complicate and slow the code. * src/alloc.c (live_float_p): Don’t insist that the offset be aligned properly for a float, since it might be tagged or offset. (GC_OBJECT_ALIGNMENT_MINIMUM, maybe_lisp_pointer): Remove. All uses removed. (mark_maybe_pointer): New arg SYMBOL_ONLY. All callers changed. Don’t insist on pointers being aligned. Align pointers before doing pdumper checks on them, and before giving them to make_lisp_ptr. (mark_memory): Do not use mark_maybe_object here. Instead, use mark_maybe_pointer alone; that suffices. Also look for offsets from lispsym, to mark symbols more reliably. --- src/alloc.c | 65 +++++++++++++++-------------------------------------- 1 file changed, 18 insertions(+), 47 deletions(-) diff --git a/src/alloc.c b/src/alloc.c index e241b9933a..b1d45dbb33 100644 --- a/src/alloc.c +++ b/src/alloc.c @@ -4560,7 +4560,6 @@ live_float_p (struct mem_node *m, void *p) /* P must point to the start of a Lisp_Float and not be one of the unused cells in the current float block. */ return (offset >= 0 - && offset % sizeof b->floats[0] == 0 && offset < (FLOAT_BLOCK_SIZE * sizeof b->floats[0]) && (b != float_block || offset / sizeof b->floats[0] < float_block_index)); @@ -4687,54 +4686,25 @@ mark_maybe_objects (Lisp_Object const *array, ptrdiff_t nelts) mark_maybe_object (*array); } -/* A lower bound on the alignment of Lisp objects that need marking. - Although 1 is safe, higher values speed up mark_maybe_pointer. - If USE_LSB_TAG, this value is typically GCALIGNMENT; otherwise, - it's determined by the natural alignment of Lisp structs. - All vectorlike objects have alignment at least that of union - vectorlike_header and it's unlikely they all have alignment greater, - so use the union as a safe and likely-accurate standin for - vectorlike objects. */ - -enum { GC_OBJECT_ALIGNMENT_MINIMUM - = max (GCALIGNMENT, - min (alignof (union vectorlike_header), - min (min (alignof (struct Lisp_Cons), - alignof (struct Lisp_Float)), - min (alignof (struct Lisp_String), - alignof (struct Lisp_Symbol))))) }; - -/* Return true if P might point to Lisp data that can be garbage - collected, and false otherwise (i.e., false if it is easy to see - that P cannot point to Lisp data that can be garbage collected). - Symbols are implemented via offsets not pointers, but the offsets - are also multiples of GC_OBJECT_ALIGNMENT_MINIMUM. */ - -static bool -maybe_lisp_pointer (void *p) -{ - return (uintptr_t) p % GC_OBJECT_ALIGNMENT_MINIMUM == 0; -} - /* If P points to Lisp data, mark that as live if it isn't already - marked. */ + marked. If SYMBOL_ONLY, mark it only if it is a symbol. */ static void -mark_maybe_pointer (void *p) +mark_maybe_pointer (void *p, bool symbol_only) { + char *cp = p; struct mem_node *m; #ifdef USE_VALGRIND VALGRIND_MAKE_MEM_DEFINED (&p, sizeof (p)); #endif - if (!maybe_lisp_pointer (p)) - return; - if (pdumper_object_p (p)) { + p = cp - (uintptr_t) cp % GCALIGNMENT; int type = pdumper_find_object_type (p); - if (pdumper_valid_object_type_p (type)) + if (pdumper_valid_object_type_p (type) + && (type == Lisp_Symbol || !symbol_only)) mark_object (type == Lisp_Symbol ? make_lisp_symbol (p) : make_lisp_ptr (p, type)); @@ -4755,11 +4725,13 @@ mark_maybe_pointer (void *p) break; case MEM_TYPE_CONS: - obj = live_cons_holding (m, p); + if (!symbol_only) + obj = live_cons_holding (m, p); break; case MEM_TYPE_STRING: - obj = live_string_holding (m, p); + if (!symbol_only) + obj = live_string_holding (m, p); break; case MEM_TYPE_SYMBOL: @@ -4767,13 +4739,14 @@ mark_maybe_pointer (void *p) break; case MEM_TYPE_FLOAT: - if (live_float_p (m, p)) - obj = make_lisp_ptr (p, Lisp_Float); + if (!symbol_only && live_float_p (m, p)) + obj = make_lisp_ptr (cp - (uintptr_t) cp % GCALIGNMENT, Lisp_Float); break; case MEM_TYPE_VECTORLIKE: case MEM_TYPE_VECTOR_BLOCK: - obj = live_vector_holding (m, p); + if (!symbol_only) + obj = live_vector_holding (m, p); break; default: @@ -4830,12 +4803,10 @@ mark_memory (void const *start, void const *end) for (pp = start; (void const *) pp < end; pp += GC_POINTER_ALIGNMENT) { - mark_maybe_pointer (*(void *const *) pp); - - verify (alignof (Lisp_Object) % GC_POINTER_ALIGNMENT == 0); - if (alignof (Lisp_Object) == GC_POINTER_ALIGNMENT - || (uintptr_t) pp % alignof (Lisp_Object) == 0) - mark_maybe_object (*(Lisp_Object const *) pp); + char *p = *(char *const *) pp; + mark_maybe_pointer (p, false); + p += (intptr_t) lispsym; + mark_maybe_pointer (p, true); } } -- 2.17.1 ^ permalink raw reply related [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-28 7:47 ` Paul Eggert @ 2020-05-28 8:11 ` Pip Cet 0 siblings, 0 replies; 132+ messages in thread From: Pip Cet @ 2020-05-28 8:11 UTC (permalink / raw) To: Paul Eggert; +Cc: 41321, Stefan Monnier On Thu, May 28, 2020 at 7:47 AM Paul Eggert <eggert@cs.ucla.edu> wrote: > On 5/27/20 11:31 PM, Pip Cet wrote: > > I hope you're right, in that compilers will support GC better before > > they move on to clever optimizations that break it :-) > > After looking into it, I decided it wasn't worth the hassle of treating pointers > just past the end of a Lisp object as pointing into the object. Although such > pointers can exist, I can't think of a realistic-with-today's-compilers scenario > at the machine level where (1) a pointer like that will exist, (2) no pointers > into the middle or start of the object will exist, and (3) the object might be > accessed later. In contrast we have seen scenarios with pointers into the middle > of Lisp objects. Okay. I was about to write that I'd concluded the same thing, after failing to come up with an example other than that hypothetical Freverse implementation. > With that in mind, attached is a proposed patch to master that I hope deals with > some of the more-serious problems mentioned so far in this thread, in particular > the problem with Lisp_Object representations of symbols being split into two > registers in a --with-wide-int build. I haven't tested this as much as I'd like, > but I need to turn my attention to sleep and work and so this is a good place to > broadcast a checkpoint. Thanks! Looks great generally, though I confess I haven't checked what would happen in a (hypothetical?) !USE_LSB_TAG 64-bit case. + if (!symbol_only && live_float_p (m, p)) + obj = make_lisp_ptr (cp - (uintptr_t) cp % GCALIGNMENT, Lisp_Float); break; I'm not sure about this code, though, it assumes GCALIGNMENT == sizeof Lisp_Float. > PS. Thanks for helping bring this problem to our attention; it's been fun to > look into it. I agree. I'll certainly continue looking for bugs and working on Emacs, but at this point I'm unsure it's worth it to actually share such work with anyone. But that doesn't really belong here. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-27 16:58 ` Paul Eggert 2020-05-27 17:33 ` Eli Zaretskii 2020-05-27 17:57 ` Pip Cet @ 2020-05-28 18:27 ` Eli Zaretskii 2020-05-28 19:33 ` Paul Eggert 2 siblings, 1 reply; 132+ messages in thread From: Eli Zaretskii @ 2020-05-28 18:27 UTC (permalink / raw) To: Paul Eggert; +Cc: 41321, monnier, pipcet > Cc: pipcet@gmail.com, 41321@debbugs.gnu.org, monnier@iro.umontreal.ca > From: Paul Eggert <eggert@cs.ucla.edu> > Date: Wed, 27 May 2020 09:58:11 -0700 > > we can then talk about what to do with emacs-27. After thinking about this some, I think the only sensible thing to do on emacs-27 is to return to 8-byte alignment test in GC for 32-bit MinGW builds. That is, replace max_align_t with just 8 in the definition of LISP_ALIGNMENT in that case. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-28 18:27 ` Eli Zaretskii @ 2020-05-28 19:33 ` Paul Eggert 2020-05-29 6:19 ` Eli Zaretskii 2020-05-29 8:25 ` Pip Cet 0 siblings, 2 replies; 132+ messages in thread From: Paul Eggert @ 2020-05-28 19:33 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 41321, monnier, pipcet [-- Attachment #1: Type: text/plain, Size: 856 bytes --] On 5/28/20 11:27 AM, Eli Zaretskii wrote: >> Cc: pipcet@gmail.com, 41321@debbugs.gnu.org, monnier@iro.umontreal.ca >> From: Paul Eggert <eggert@cs.ucla.edu> >> Date: Wed, 27 May 2020 09:58:11 -0700 >> >> we can then talk about what to do with emacs-27. > > After thinking about this some, I think the only sensible thing to do > on emacs-27 is to return to 8-byte alignment test in GC for 32-bit > MinGW builds. That is, replace max_align_t with just 8 in the > definition of LISP_ALIGNMENT in that case. Exactly the same problem can occur for other x86 platforms (e.g., GNU/Linux, GCC 7-and-later, glibc 2.25-and-earlier), because these other platforms also have the bug that malloc can return a pointer that is 8 modulo 16 even though alignof (max_align_t) is 16. so I suggest doing the replacement for those platforms too, as in the attached patch. [-- Attachment #2: 0001-Fix-aborts-due-to-GC-losing-pseudovectors.patch --] [-- Type: text/x-patch, Size: 1191 bytes --] From b3501c978f315d980f7a26481989725d63953558 Mon Sep 17 00:00:00 2001 From: Paul Eggert <eggert@cs.ucla.edu> Date: Thu, 28 May 2020 12:27:27 -0700 Subject: [PATCH] Fix aborts due to GC losing pseudovectors Problem reported by Eli Zaretskii (Bug#41321). * src/alloc.c (maybe_lisp_pointer): Modulo GCALIGNMENT, not modulo LISP_ALIGNMENT. Master has a more-elaborate fix. Do not merge to master. --- src/alloc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/alloc.c b/src/alloc.c index 1c6b664b22..c7a4a3ee86 100644 --- a/src/alloc.c +++ b/src/alloc.c @@ -4589,12 +4589,12 @@ mark_maybe_objects (Lisp_Object const *array, ptrdiff_t nelts) collected, and false otherwise (i.e., false if it is easy to see that P cannot point to Lisp data that can be garbage collected). Symbols are implemented via offsets not pointers, but the offsets - are also multiples of LISP_ALIGNMENT. */ + are also multiples of GCALIGNMENT. */ static bool maybe_lisp_pointer (void *p) { - return (uintptr_t) p % LISP_ALIGNMENT == 0; + return (uintptr_t) p % GCALIGNMENT == 0; } /* If P points to Lisp data, mark that as live if it isn't already -- 2.17.1 ^ permalink raw reply related [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-28 19:33 ` Paul Eggert @ 2020-05-29 6:19 ` Eli Zaretskii 2020-05-29 20:24 ` Paul Eggert 2020-05-29 8:25 ` Pip Cet 1 sibling, 1 reply; 132+ messages in thread From: Eli Zaretskii @ 2020-05-29 6:19 UTC (permalink / raw) To: Paul Eggert; +Cc: 41321, monnier, pipcet > Cc: pipcet@gmail.com, 41321@debbugs.gnu.org, monnier@iro.umontreal.ca > From: Paul Eggert <eggert@cs.ucla.edu> > Date: Thu, 28 May 2020 12:33:10 -0700 > > > After thinking about this some, I think the only sensible thing to do > > on emacs-27 is to return to 8-byte alignment test in GC for 32-bit > > MinGW builds. That is, replace max_align_t with just 8 in the > > definition of LISP_ALIGNMENT in that case. > > Exactly the same problem can occur for other x86 platforms (e.g., GNU/Linux, GCC > 7-and-later, glibc 2.25-and-earlier), because these other platforms also have > the bug that malloc can return a pointer that is 8 modulo 16 even though alignof > (max_align_t) is 16. so I suggest doing the replacement for those platforms > too, as in the attached patch. I'm okay with doing this on other platforms, but... > static bool > maybe_lisp_pointer (void *p) > { > - return (uintptr_t) p % LISP_ALIGNMENT == 0; > + return (uintptr_t) p % GCALIGNMENT == 0; > } ...replacing LISP_ALIGNMENT with GCALIGNMENT just here doesn't sound right to me: by keeping the current value of LISP_ALIGNMENT, we basically declare that Lisp objects shall be aligned on that boundary, whereas that isn't really the case. Why not change the value of LISP_ALIGNMENT instead? ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-29 6:19 ` Eli Zaretskii @ 2020-05-29 20:24 ` Paul Eggert 2020-05-29 21:01 ` Pip Cet 2020-05-30 5:50 ` Eli Zaretskii 0 siblings, 2 replies; 132+ messages in thread From: Paul Eggert @ 2020-05-29 20:24 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 41321, monnier, pipcet [-- Attachment #1: Type: text/plain, Size: 1570 bytes --] On 5/28/20 11:19 PM, Eli Zaretskii wrote: >> - return (uintptr_t) p % LISP_ALIGNMENT == 0; >> + return (uintptr_t) p % GCALIGNMENT == 0; >> } > ...replacing LISP_ALIGNMENT with GCALIGNMENT just here doesn't sound > right to me: by keeping the current value of LISP_ALIGNMENT, we > basically declare that Lisp objects shall be aligned on that boundary, > whereas that isn't really the case. Why not change the value of > LISP_ALIGNMENT instead? There are really two bugs here. 1. The idea of taking the address modulo LISP_ALIGNMENT is wrong, as a pointer can point into the middle of (say) a pseudovector and not be LISP_ALIGNMENT-aligned. Replacing LISP_ALIGNMENT with GCALIGNMENT does not fix this bug in general, because such a pointer might not be GCALIGNMENT-aligned either. This bug can cause crashes because it causes GC to think an object is garbage when it's not garbage. 2. LISP_ALIGNMENT is too large on MinGW and some other platforms. The patch I sent earlier attempted to be the simplest patch that would fix the bug you observed on MinGW, which is a special case of (1). It does not attempt to fix all plausible cases of (1), nor does it address (2). We can fix these two bugs separately, by installing the attached patches into emacs-27. The first patch fixes (1) and thus fixes the crash along with other plausible crashes. The second one fixes (2), and this fixes the MinGW crash in a different way but does not fix the crash on other plausible platforms. (1) probably has better performance than (2), though I doubt whether users will notice. [-- Attachment #2: 0001-Remove-maybe_lisp_pointer.patch --] [-- Type: text/x-patch, Size: 1669 bytes --] From 2c0bac868a7aefe7dafd2362cce42a7d3738319f Mon Sep 17 00:00:00 2001 From: Paul Eggert <eggert@cs.ucla.edu> Date: Fri, 29 May 2020 12:56:16 -0700 Subject: [PATCH 1/2] =?UTF-8?q?Remove=20=E2=80=98maybe=5Flisp=5Fpointer?= =?UTF-8?q?=E2=80=99?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit It’s an invalid optimization, since pointers can address the middle of Lisp_Object data. * src/alloc.c (maybe_lisp_pointer): Remove. Only use removed. Do not merge to master, as we’ll put in a better fix there. --- src/alloc.c | 15 --------------- 1 file changed, 15 deletions(-) diff --git a/src/alloc.c b/src/alloc.c index 1c6b664b22..b8382aca5b 100644 --- a/src/alloc.c +++ b/src/alloc.c @@ -4585,18 +4585,6 @@ mark_maybe_objects (Lisp_Object const *array, ptrdiff_t nelts) mark_maybe_object (*array); } -/* Return true if P might point to Lisp data that can be garbage - collected, and false otherwise (i.e., false if it is easy to see - that P cannot point to Lisp data that can be garbage collected). - Symbols are implemented via offsets not pointers, but the offsets - are also multiples of LISP_ALIGNMENT. */ - -static bool -maybe_lisp_pointer (void *p) -{ - return (uintptr_t) p % LISP_ALIGNMENT == 0; -} - /* If P points to Lisp data, mark that as live if it isn't already marked. */ @@ -4609,9 +4597,6 @@ mark_maybe_pointer (void *p) VALGRIND_MAKE_MEM_DEFINED (&p, sizeof (p)); #endif - if (!maybe_lisp_pointer (p)) - return; - if (pdumper_object_p (p)) { int type = pdumper_find_object_type (p); -- 2.17.1 [-- Attachment #3: 0002-Don-t-overalign-Lisp-objects.patch --] [-- Type: text/x-patch, Size: 3666 bytes --] From f620b5b802bf2afad033c7cc7856a71fd28b2c13 Mon Sep 17 00:00:00 2001 From: Paul Eggert <eggert@cs.ucla.edu> Date: Fri, 29 May 2020 13:02:32 -0700 Subject: [PATCH 2/2] =?UTF-8?q?Don=E2=80=99t=20overalign=20Lisp=20objects?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Backport from master. * src/alloc.c (union emacs_align_type): New type, used for LISP_ALIGNMENT. (LISP_ALIGNMENT): Use it instead of max_align_t. --- src/alloc.c | 55 +++++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 45 insertions(+), 10 deletions(-) diff --git a/src/alloc.c b/src/alloc.c index b8382aca5b..48e96863db 100644 --- a/src/alloc.c +++ b/src/alloc.c @@ -104,6 +104,46 @@ Copyright (C) 1985-1986, 1988, 1993-1995, 1997-2020 Free Software #include "w32heap.h" /* for sbrk */ #endif +/* A type with alignment at least as large as any object that Emacs + allocates. This is not max_align_t because some platforms (e.g., + mingw) have buggy malloc implementations that do not align for + max_align_t. This union contains types of all GCALIGNED_STRUCT + components visible here. */ +union emacs_align_type +{ + struct frame frame; + struct Lisp_Bignum Lisp_Bignum; + struct Lisp_Bool_Vector Lisp_Bool_Vector; + struct Lisp_Char_Table Lisp_Char_Table; + struct Lisp_CondVar Lisp_CondVar; + struct Lisp_Finalizer Lisp_Finalizer; + struct Lisp_Float Lisp_Float; + struct Lisp_Hash_Table Lisp_Hash_Table; + struct Lisp_Marker Lisp_Marker; + struct Lisp_Misc_Ptr Lisp_Misc_Ptr; + struct Lisp_Mutex Lisp_Mutex; + struct Lisp_Overlay Lisp_Overlay; + struct Lisp_Sub_Char_Table Lisp_Sub_Char_Table; + struct Lisp_Subr Lisp_Subr; + struct Lisp_User_Ptr Lisp_User_Ptr; + struct Lisp_Vector Lisp_Vector; + struct terminal terminal; + struct thread_state thread_state; + struct window window; + + /* Omit the following since they would require including process.h + etc. In practice their alignments never exceed that of the + structs already listed. */ +#if 0 + struct Lisp_Module_Function Lisp_Module_Function; + struct Lisp_Process Lisp_Process; + struct save_window_data save_window_data; + struct scroll_bar scroll_bar; + struct xwidget_view xwidget_view; + struct xwidget xwidget; +#endif +}; + #ifdef DOUG_LEA_MALLOC /* Specify maximum number of areas to mmap. It would be nice to use a @@ -636,16 +676,11 @@ buffer_memory_full (ptrdiff_t nbytes) #define COMMON_MULTIPLE(a, b) \ ((a) % (b) == 0 ? (a) : (b) % (a) == 0 ? (b) : (a) * (b)) -/* LISP_ALIGNMENT is the alignment of Lisp objects. It must be at - least GCALIGNMENT so that pointers can be tagged. It also must be - at least as strict as the alignment of all the C types used to - implement Lisp objects; since pseudovectors can contain any C type, - this is max_align_t. On recent GNU/Linux x86 and x86-64 this can - often waste up to 8 bytes, since alignof (max_align_t) is 16 but - typical vectors need only an alignment of 8. Although shrinking - the alignment to 8 would save memory, it cost a 20% hit to Emacs - CPU performance on Fedora 28 x86-64 when compiled with gcc -m32. */ -enum { LISP_ALIGNMENT = alignof (union { max_align_t x; +/* Alignment needed for memory blocks that are allocated via malloc + and that contain Lisp objects. On typical hosts malloc already + aligns sufficiently, but extra work is needed on oddball hosts + where Emacs would crash if malloc returned a non-GCALIGNED pointer. */ +enum { LISP_ALIGNMENT = alignof (union { union emacs_align_type x; GCALIGNED_UNION_MEMBER }) }; verify (LISP_ALIGNMENT % GCALIGNMENT == 0); -- 2.17.1 ^ permalink raw reply related [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-29 20:24 ` Paul Eggert @ 2020-05-29 21:01 ` Pip Cet 2020-05-30 5:58 ` Eli Zaretskii 2020-05-30 16:31 ` Paul Eggert 2020-05-30 5:50 ` Eli Zaretskii 1 sibling, 2 replies; 132+ messages in thread From: Pip Cet @ 2020-05-29 21:01 UTC (permalink / raw) To: Paul Eggert; +Cc: 41321, Stefan Monnier On Fri, May 29, 2020 at 8:24 PM Paul Eggert <eggert@cs.ucla.edu> wrote: > On 5/28/20 11:19 PM, Eli Zaretskii wrote: > >> - return (uintptr_t) p % LISP_ALIGNMENT == 0; > >> + return (uintptr_t) p % GCALIGNMENT == 0; > >> } > > ...replacing LISP_ALIGNMENT with GCALIGNMENT just here doesn't sound > > right to me: by keeping the current value of LISP_ALIGNMENT, we > > basically declare that Lisp objects shall be aligned on that boundary, > > whereas that isn't really the case. Why not change the value of > > LISP_ALIGNMENT instead? > > There are really two bugs here. > > 1. The idea of taking the address modulo LISP_ALIGNMENT is wrong, as a pointer > can point into the middle of (say) a pseudovector and not be > LISP_ALIGNMENT-aligned. Replacing LISP_ALIGNMENT with GCALIGNMENT does not fix > this bug in general, because such a pointer might not be GCALIGNMENT-aligned > either. This bug can cause crashes because it causes GC to think an object is > garbage when it's not garbage. > > 2. LISP_ALIGNMENT is too large on MinGW and some other platforms. > > The patch I sent earlier attempted to be the simplest patch that would fix the > bug you observed on MinGW, which is a special case of (1). I'm not convinced. I think Eli only observed (2). There were no pointers into the middle of pseudovectors in his backtrace or disassembly... > It does not attempt > to fix all plausible cases of (1), nor does it address (2). It does address (2). It doesn't address all cases of (1). > We can fix these two bugs separately, by installing the attached patches into > We can fix these two bugs separately, by installing the attached patches into > emacs-27. The first patch fixes (1) and thus fixes the crash along with other > plausible crashes. The second one fixes (2), and this fixes the MinGW crash in a > different way but does not fix the crash on other plausible platforms. (1) > probably has better performance than (2), though I doubt whether users will notice. (1) says: It’s an invalid optimization, since pointers can address the middle of Lisp_Object data. That may be true (we still haven't observed it), but it's not what happened in Eli's case: in that case, the "pointer" was actually the lower half of a Lisp_Object, so it pointed at the beginning of a struct Lisp_Vector. That just happened to be misaligned. (2) has this comment: +/* Alignment needed for memory blocks that are allocated via malloc + and that contain Lisp objects. On typical hosts malloc already + aligns sufficiently, but extra work is needed on oddball hosts + where Emacs would crash if malloc returned a non-GCALIGNED pointer. */ I can't make sense of that comment. It describes two problems that don't happen, and omits the problem that does happen. 1. malloc() % GCALIGNMENT != 0. Never happens, as far as I can tell. 2. A Lisp object requires greater alignment than malloc() gives it. IIRC, there was at least one RISC architecture whose specification supported atomic operations only on the first word in each 32-byte-aligned block, but that's such a rare case (and wasn't true for the silicon implementations, I seem to recall) that it seems silly to worry about it today. I'm not saying it's the best solution, but I would prefer simply defining LISP_ALIGNMENT to be 8 to either patch. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-29 21:01 ` Pip Cet @ 2020-05-30 5:58 ` Eli Zaretskii 2020-05-30 7:19 ` Pip Cet 2020-05-30 16:31 ` Paul Eggert 1 sibling, 1 reply; 132+ messages in thread From: Eli Zaretskii @ 2020-05-30 5:58 UTC (permalink / raw) To: Pip Cet; +Cc: eggert, 41321, monnier > From: Pip Cet <pipcet@gmail.com> > Date: Fri, 29 May 2020 21:01:39 +0000 > Cc: Eli Zaretskii <eliz@gnu.org>, 41321@debbugs.gnu.org, > Stefan Monnier <monnier@iro.umontreal.ca> > > (2) has this comment: > +/* Alignment needed for memory blocks that are allocated via malloc > + and that contain Lisp objects. On typical hosts malloc already > + aligns sufficiently, but extra work is needed on oddball hosts > + where Emacs would crash if malloc returned a non-GCALIGNED pointer. */ > > I can't make sense of that comment. It describes two problems that > don't happen, and omits the problem that does happen. > 1. malloc() % GCALIGNMENT != 0. Never happens, as far as I can tell. > 2. A Lisp object requires greater alignment than malloc() gives it. > IIRC, there was at least one RISC architecture whose specification > supported atomic operations only on the first word in each > 32-byte-aligned block, but that's such a rare case (and wasn't true > for the silicon implementations, I seem to recall) that it seems silly > to worry about it today. > > I'm not saying it's the best solution, but I would prefer simply > defining LISP_ALIGNMENT to be 8 to either patch. I agree, but patch 2 basically does that, so I'm okay with saying "8" in so many words. Btw, can someone remind me why we started requiring non-default alignment from Lisp objects? Also, given the fact that in the crashing case the 2 32-bit parts of a Lisp object were pushed onto the stack non-contiguously, will fixing the alignment alone cause those Lisp objects to be marked? ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-30 5:58 ` Eli Zaretskii @ 2020-05-30 7:19 ` Pip Cet 2020-05-30 9:08 ` Eli Zaretskii 0 siblings, 1 reply; 132+ messages in thread From: Pip Cet @ 2020-05-30 7:19 UTC (permalink / raw) To: Eli Zaretskii; +Cc: eggert, 41321, Stefan Monnier On Sat, May 30, 2020 at 5:58 AM Eli Zaretskii <eliz@gnu.org> wrote: > > From: Pip Cet <pipcet@gmail.com> > > Date: Fri, 29 May 2020 21:01:39 +0000 > > Cc: Eli Zaretskii <eliz@gnu.org>, 41321@debbugs.gnu.org, > > Stefan Monnier <monnier@iro.umontreal.ca> > > > > (2) has this comment: > > +/* Alignment needed for memory blocks that are allocated via malloc > > + and that contain Lisp objects. On typical hosts malloc already > > + aligns sufficiently, but extra work is needed on oddball hosts > > + where Emacs would crash if malloc returned a non-GCALIGNED pointer. */ > > > > I can't make sense of that comment. It describes two problems that > > don't happen, and omits the problem that does happen. > > 1. malloc() % GCALIGNMENT != 0. Never happens, as far as I can tell. > > 2. A Lisp object requires greater alignment than malloc() gives it. > > IIRC, there was at least one RISC architecture whose specification > > supported atomic operations only on the first word in each > > 32-byte-aligned block, but that's such a rare case (and wasn't true > > for the silicon implementations, I seem to recall) that it seems silly > > to worry about it today. > > > > I'm not saying it's the best solution, but I would prefer simply > > defining LISP_ALIGNMENT to be 8 to either patch. > > I agree, but patch 2 basically does that, so I'm okay with saying "8" > in so many words. Okay. > Btw, can someone remind me why we started requiring non-default > alignment from Lisp objects? max_align_t was changed to include a float128 type, and alignof(float128) == 16 on x86, even though virtually all x86 systems are configured to allow unaligned accesses. If I understand Paul's concerns correctly, he believes it's possible a system will once again come into use in which atomic accesses only work for offsets aligned to, say, 32 bytes. Since pthread variables require atomic accesses, such a platform would see weird crashes if a pthread inside a Lisp_Vector wasn't aligned to 32 bytes. Of course, it remains to be seen/checked whether any such system would actually define max_align_t to have an alignment of 32, since it covers only primitive types. > Also, given the fact that in the crashing case the 2 32-bit parts of a > Lisp object were pushed onto the stack non-contiguously, will fixing > the alignment alone cause those Lisp objects to be marked? Yes. The lower 32-bit part was ignored because its value wasn't 16-byte aligned, not because its stack location wasn't 8-byte aligned. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-30 7:19 ` Pip Cet @ 2020-05-30 9:08 ` Eli Zaretskii 2020-05-30 11:06 ` Pip Cet 0 siblings, 1 reply; 132+ messages in thread From: Eli Zaretskii @ 2020-05-30 9:08 UTC (permalink / raw) To: Pip Cet; +Cc: eggert, 41321, monnier > From: Pip Cet <pipcet@gmail.com> > Date: Sat, 30 May 2020 07:19:18 +0000 > Cc: eggert@cs.ucla.edu, 41321@debbugs.gnu.org, > Stefan Monnier <monnier@iro.umontreal.ca> > > > Btw, can someone remind me why we started requiring non-default > > alignment from Lisp objects? > > max_align_t was changed to include a float128 type, and > alignof(float128) == 16 on x86, even though virtually all x86 systems > are configured to allow unaligned accesses. I understand that part, but my question was why, even before the change in max_align_t, did we start requiring 8-byte alignment on systems where that is not automatically guaranteed? > If I understand Paul's concerns correctly, he believes it's possible a > system will once again come into use in which atomic accesses only > work for offsets aligned to, say, 32 bytes. Since pthread variables > require atomic accesses, such a platform would see weird crashes if a > pthread inside a Lisp_Vector wasn't aligned to 32 bytes. So this alignment requirement is only due to pthreads being used? But MinGW doesn't use pthreads. > > Also, given the fact that in the crashing case the 2 32-bit parts of a > > Lisp object were pushed onto the stack non-contiguously, will fixing > > the alignment alone cause those Lisp objects to be marked? > > Yes. The lower 32-bit part was ignored because its value wasn't > 16-byte aligned, not because its stack location wasn't 8-byte aligned. Right, but I'm talking about marking. AFAIU, when scanning the stack finds a value that looks like a Lisp object, we mark that object. If the two 32-bit parts of the object are non-contiguous, will we be able to recognize such an object, and will we be able to mark it correctly, and if so, how? IOW, don't we need the upper 32-bit (which encodes the object type) for the purposes of marking it? ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-30 9:08 ` Eli Zaretskii @ 2020-05-30 11:06 ` Pip Cet 2020-05-30 11:31 ` Eli Zaretskii 0 siblings, 1 reply; 132+ messages in thread From: Pip Cet @ 2020-05-30 11:06 UTC (permalink / raw) To: Eli Zaretskii; +Cc: eggert, 41321, Stefan Monnier On Sat, May 30, 2020 at 9:08 AM Eli Zaretskii <eliz@gnu.org> wrote: > > From: Pip Cet <pipcet@gmail.com> > > Date: Sat, 30 May 2020 07:19:18 +0000 > > Cc: eggert@cs.ucla.edu, 41321@debbugs.gnu.org, > > Stefan Monnier <monnier@iro.umontreal.ca> > > > > > Btw, can someone remind me why we started requiring non-default > > > alignment from Lisp objects? > > > > max_align_t was changed to include a float128 type, and > > alignof(float128) == 16 on x86, even though virtually all x86 systems > > are configured to allow unaligned accesses. > > I understand that part, but my question was why, even before the > change in max_align_t, did we start requiring 8-byte alignment on > systems where that is not automatically guaranteed? I don't know. As I said, I think that was always buggy on pdumper systems, though the bug was very subtle. My guess is it predates pdumper, at which time it was a valid optimization. > > If I understand Paul's concerns correctly, he believes it's possible a > > system will once again come into use in which atomic accesses only > > work for offsets aligned to, say, 32 bytes. Since pthread variables > > require atomic accesses, such a platform would see weird crashes if a > > pthread inside a Lisp_Vector wasn't aligned to 32 bytes. > > So this alignment requirement is only due to pthreads being used? I'm not sure what you're asking. Obviously there are systems on which unaligned accesses will fault or be very slow indeed, so we need to make sure, say, pure space allocations are aligned somehow. That requires a LISP_ALIGNMENT of 8. Everything beyond that is only for performance, pthreads, and SIMD types. > > > Also, given the fact that in the crashing case the 2 32-bit parts of a > > > Lisp object were pushed onto the stack non-contiguously, will fixing > > > the alignment alone cause those Lisp objects to be marked? > > > > Yes. The lower 32-bit part was ignored because its value wasn't > > 16-byte aligned, not because its stack location wasn't 8-byte aligned. > > Right, but I'm talking about marking. AFAIU, when scanning the stack > finds a value that looks like a Lisp object, we mark that object. And if we find a value that looks like a pointer to a Lisp structure, as the lower half of a non-symbol Lisp_Object does, we mark the corresponding Lisp object. > If > the two 32-bit parts of the object are non-contiguous, will we be able > to recognize such an object, and will we be able to mark it correctly, > and if so, how? IOW, don't we need the upper 32-bit (which encodes > the object type) for the purposes of marking it? For everything but symbols, we don't, mark_maybe_pointer called on the low 32 bits suffices. For symbols, mark_maybe_pointer needs to be changed to also check the pointer at <low 32-bit word> + &lispsym. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-30 11:06 ` Pip Cet @ 2020-05-30 11:31 ` Eli Zaretskii 2020-05-30 13:29 ` Pip Cet 0 siblings, 1 reply; 132+ messages in thread From: Eli Zaretskii @ 2020-05-30 11:31 UTC (permalink / raw) To: Pip Cet; +Cc: eggert, 41321, monnier > From: Pip Cet <pipcet@gmail.com> > Date: Sat, 30 May 2020 11:06:52 +0000 > Cc: eggert@cs.ucla.edu, 41321@debbugs.gnu.org, > Stefan Monnier <monnier@iro.umontreal.ca> > > > I understand that part, but my question was why, even before the > > change in max_align_t, did we start requiring 8-byte alignment on > > systems where that is not automatically guaranteed? > > I don't know. As I said, I think that was always buggy on pdumper > systems, though the bug was very subtle. My guess is it predates > pdumper, at which time it was a valid optimization. How is pdumper involved here? > > So this alignment requirement is only due to pthreads being used? > > I'm not sure what you're asking. Obviously there are systems on which > unaligned accesses will fault or be very slow indeed, so we need to > make sure, say, pure space allocations are aligned somehow. That > requires a LISP_ALIGNMENT of 8. Everything beyond that is only for > performance, pthreads, and SIMD types. If the system guarantees 4-byte alignment from malloc (and/or a similar alignment of the runtime C stack), then using that doesn't trigger problems related to unaligned accesses, right? So let me rephrase: why isn't 4-byte alignment "good enough" for us on systems where malloc and the runtime stack are guaranteed to be thus aligned? > > If > > the two 32-bit parts of the object are non-contiguous, will we be able > > to recognize such an object, and will we be able to mark it correctly, > > and if so, how? IOW, don't we need the upper 32-bit (which encodes > > the object type) for the purposes of marking it? > > For everything but symbols, we don't, mark_maybe_pointer called on the > low 32 bits suffices. For symbols, mark_maybe_pointer needs to be > changed to also check the pointer at <low 32-bit word> + &lispsym. Right, that's what I thought. So this issue also has to be fixed on emacs-27 in order for us to provide a stable Emacs 27. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-30 11:31 ` Eli Zaretskii @ 2020-05-30 13:29 ` Pip Cet 2020-05-30 16:32 ` Eli Zaretskii 2020-05-30 18:04 ` Paul Eggert 0 siblings, 2 replies; 132+ messages in thread From: Pip Cet @ 2020-05-30 13:29 UTC (permalink / raw) To: Eli Zaretskii; +Cc: eggert, 41321, Stefan Monnier [-- Attachment #1: Type: text/plain, Size: 2652 bytes --] On Sat, May 30, 2020 at 11:31 AM Eli Zaretskii <eliz@gnu.org> wrote: > > From: Pip Cet <pipcet@gmail.com> > > Date: Sat, 30 May 2020 11:06:52 +0000 > > Cc: eggert@cs.ucla.edu, 41321@debbugs.gnu.org, > > Stefan Monnier <monnier@iro.umontreal.ca> > > > > > I understand that part, but my question was why, even before the > > > change in max_align_t, did we start requiring 8-byte alignment on > > > systems where that is not automatically guaranteed? > > > > I don't know. As I said, I think that was always buggy on pdumper > > systems, though the bug was very subtle. My guess is it predates > > pdumper, at which time it was a valid optimization. > > How is pdumper involved here? See the pdumper issue I described above. I can't imagine this being a significant bug, because it needs the sole surviving reference to a pdumper object to be on the stack, while simultaneously being the key in a weak-key hash table cell... > > > So this alignment requirement is only due to pthreads being used? > > > > I'm not sure what you're asking. Obviously there are systems on which > > unaligned accesses will fault or be very slow indeed, so we need to > > make sure, say, pure space allocations are aligned somehow. That > > requires a LISP_ALIGNMENT of 8. Everything beyond that is only for > > performance, pthreads, and SIMD types. > > If the system guarantees 4-byte alignment from malloc (and/or a > similar alignment of the runtime C stack), then using that doesn't > trigger problems related to unaligned accesses, right? So let me > rephrase: why isn't 4-byte alignment "good enough" for us on systems > where malloc and the runtime stack are guaranteed to be thus aligned? (The runtime stack isn't relevant, as far as I can tell, since we walk that in 4-byte steps on such systems anyway.) You're correct that on such a system, we could get away with a LISP_ALIGNMENT of 4, but a LISP_ALIGNMENT of 8 wouldn't hurt either. > > > If > > > the two 32-bit parts of the object are non-contiguous, will we be able > > > to recognize such an object, and will we be able to mark it correctly, > > > and if so, how? IOW, don't we need the upper 32-bit (which encodes > > > the object type) for the purposes of marking it? > > > > For everything but symbols, we don't, mark_maybe_pointer called on the > > low 32 bits suffices. For symbols, mark_maybe_pointer needs to be > > changed to also check the pointer at <low 32-bit word> + &lispsym. > > Right, that's what I thought. So this issue also has to be fixed on > emacs-27 in order for us to provide a stable Emacs 27. I'm surprised, but glad that you think so. Patch for emacs-27 attached. [-- Attachment #2: 0001-Be-more-aggressive-in-marking-objects-during-GC-bug-.patch --] [-- Type: text/x-patch, Size: 2649 bytes --] From 35d50e6108c6edbac93e80aa1b9998dc6ac19054 Mon Sep 17 00:00:00 2001 From: Pip Cet <pipcet@gmail.com> Date: Sat, 30 May 2020 13:23:24 +0000 Subject: [PATCH] Be more aggressive in marking objects during GC (bug#41321) * src/alloc.c (maybe_lisp_pointer): Remove. (mark_memory): Mark 32-bit words that might be the only reference to a Lisp_Symbol. --- src/alloc.c | 37 ++++++++++++++++--------------------- 1 file changed, 16 insertions(+), 21 deletions(-) diff --git a/src/alloc.c b/src/alloc.c index 1c6b664b22..3938cdf054 100644 --- a/src/alloc.c +++ b/src/alloc.c @@ -4585,18 +4585,6 @@ mark_maybe_objects (Lisp_Object const *array, ptrdiff_t nelts) mark_maybe_object (*array); } -/* Return true if P might point to Lisp data that can be garbage - collected, and false otherwise (i.e., false if it is easy to see - that P cannot point to Lisp data that can be garbage collected). - Symbols are implemented via offsets not pointers, but the offsets - are also multiples of LISP_ALIGNMENT. */ - -static bool -maybe_lisp_pointer (void *p) -{ - return (uintptr_t) p % LISP_ALIGNMENT == 0; -} - /* If P points to Lisp data, mark that as live if it isn't already marked. */ @@ -4609,9 +4597,6 @@ mark_maybe_pointer (void *p) VALGRIND_MAKE_MEM_DEFINED (&p, sizeof (p)); #endif - if (!maybe_lisp_pointer (p)) - return; - if (pdumper_object_p (p)) { int type = pdumper_find_object_type (p); @@ -4715,12 +4700,22 @@ mark_memory (void const *start, void const *end) for (pp = start; (void const *) pp < end; pp += GC_POINTER_ALIGNMENT) { - mark_maybe_pointer (*(void *const *) pp); - - verify (alignof (Lisp_Object) % GC_POINTER_ALIGNMENT == 0); - if (alignof (Lisp_Object) == GC_POINTER_ALIGNMENT - || (uintptr_t) pp % alignof (Lisp_Object) == 0) - mark_maybe_object (*(Lisp_Object const *) pp); + uintptr_t offset = (uintptr_t) *(void *const *) pp; + mark_maybe_pointer ((void *) offset); + /* On 32-bit --with-wide-int systems, the two halves of a + Lisp_Object may be stored non-contiguously. Therefore, we + need to recognize the lower 32 bits of a Lisp_Object encoding + a symbol, and since Qnil is binary zero, that requires adding + &lispsym. */ + if (GC_POINTER_ALIGNMENT < sizeof (Lisp_Object)) + mark_maybe_pointer ((void *) (offset + (uintptr_t) &lispsym)); + else + { + verify (alignof (Lisp_Object) % GC_POINTER_ALIGNMENT == 0); + if (alignof (Lisp_Object) == GC_POINTER_ALIGNMENT + || (uintptr_t) pp % alignof (Lisp_Object) == 0) + mark_maybe_object (*(Lisp_Object const *) pp); + } } } -- 2.27.0.rc0 ^ permalink raw reply related [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-30 13:29 ` Pip Cet @ 2020-05-30 16:32 ` Eli Zaretskii 2020-05-30 16:36 ` Pip Cet 2020-05-30 18:04 ` Paul Eggert 1 sibling, 1 reply; 132+ messages in thread From: Eli Zaretskii @ 2020-05-30 16:32 UTC (permalink / raw) To: Pip Cet; +Cc: eggert, 41321, monnier > From: Pip Cet <pipcet@gmail.com> > Date: Sat, 30 May 2020 13:29:33 +0000 > Cc: eggert@cs.ucla.edu, 41321@debbugs.gnu.org, > Stefan Monnier <monnier@iro.umontreal.ca> > > > > > So this alignment requirement is only due to pthreads being used? > > > > > > I'm not sure what you're asking. Obviously there are systems on which > > > unaligned accesses will fault or be very slow indeed, so we need to > > > make sure, say, pure space allocations are aligned somehow. That > > > requires a LISP_ALIGNMENT of 8. Everything beyond that is only for > > > performance, pthreads, and SIMD types. > > > > If the system guarantees 4-byte alignment from malloc (and/or a > > similar alignment of the runtime C stack), then using that doesn't > > trigger problems related to unaligned accesses, right? So let me > > rephrase: why isn't 4-byte alignment "good enough" for us on systems > > where malloc and the runtime stack are guaranteed to be thus aligned? > > (The runtime stack isn't relevant, as far as I can tell, since we walk > that in 4-byte steps on such systems anyway.) I think it might be relevant for stack-based Lisp objects (if we keep requiring that Lisp objects are 8-byte aligned on 32-bit platforms). > You're correct that on such a system, we could get away with a > LISP_ALIGNMENT of 4, but a LISP_ALIGNMENT of 8 wouldn't hurt either. That's for sure. I just wondered why did we start requiring 8-byte alignment back when we did. Perhaps someone still remembers the reason. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-30 16:32 ` Eli Zaretskii @ 2020-05-30 16:36 ` Pip Cet 2020-05-30 16:45 ` Eli Zaretskii 0 siblings, 1 reply; 132+ messages in thread From: Pip Cet @ 2020-05-30 16:36 UTC (permalink / raw) To: Eli Zaretskii; +Cc: eggert, 41321, Stefan Monnier On Sat, May 30, 2020 at 4:32 PM Eli Zaretskii <eliz@gnu.org> wrote: > > From: Pip Cet <pipcet@gmail.com> > > Date: Sat, 30 May 2020 13:29:33 +0000 > > Cc: eggert@cs.ucla.edu, 41321@debbugs.gnu.org, > > Stefan Monnier <monnier@iro.umontreal.ca> > > (The runtime stack isn't relevant, as far as I can tell, since we walk > > that in 4-byte steps on such systems anyway.) > > I think it might be relevant for stack-based Lisp objects (if we keep > requiring that Lisp objects are 8-byte aligned on 32-bit platforms). We should never mark stack-based Lisp objects, no matter how well-aligned they are! ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-30 16:36 ` Pip Cet @ 2020-05-30 16:45 ` Eli Zaretskii 0 siblings, 0 replies; 132+ messages in thread From: Eli Zaretskii @ 2020-05-30 16:45 UTC (permalink / raw) To: Pip Cet; +Cc: eggert, 41321, monnier > From: Pip Cet <pipcet@gmail.com> > Date: Sat, 30 May 2020 16:36:31 +0000 > Cc: eggert@cs.ucla.edu, 41321@debbugs.gnu.org, > Stefan Monnier <monnier@iro.umontreal.ca> > > On Sat, May 30, 2020 at 4:32 PM Eli Zaretskii <eliz@gnu.org> wrote: > > > From: Pip Cet <pipcet@gmail.com> > > > Date: Sat, 30 May 2020 13:29:33 +0000 > > > Cc: eggert@cs.ucla.edu, 41321@debbugs.gnu.org, > > > Stefan Monnier <monnier@iro.umontreal.ca> > > > (The runtime stack isn't relevant, as far as I can tell, since we walk > > > that in 4-byte steps on such systems anyway.) > > > > I think it might be relevant for stack-based Lisp objects (if we keep > > requiring that Lisp objects are 8-byte aligned on 32-bit platforms). > > We should never mark stack-based Lisp objects, no matter how > well-aligned they are! But we do require them to be aligned, at least in the current codebase. We actually had crashes in the past when the Windows build didn't force GCC to align stack on 8-byte boundary in callback functions. I don't remember if this was related to GC or not, but the requirement is definitely there. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-30 13:29 ` Pip Cet 2020-05-30 16:32 ` Eli Zaretskii @ 2020-05-30 18:04 ` Paul Eggert 2020-05-30 18:12 ` Pip Cet ` (2 more replies) 1 sibling, 3 replies; 132+ messages in thread From: Paul Eggert @ 2020-05-30 18:04 UTC (permalink / raw) To: Pip Cet, Eli Zaretskii; +Cc: 41321, Stefan Monnier [-- Attachment #1: Type: text/plain, Size: 780 bytes --] On 5/30/20 6:29 AM, Pip Cet wrote: > I'm surprised, but glad that you think so. Patch for emacs-27 attached. > That patch is on the right track but it's not clear whether it will cause GC to fail to mark some objects that it should, both because it omits mark_maybe_object on platforms like x86 --with-wide-int where alignof (void *) < sizeof (Lisp_Object), and because it skips mark_maybe_pointer on more-typical platforms where alignof (void *) == sizeof (Lisp_Object). For emacs-27 I propose the attached, more-conservative patch instead. This is a backport of part of a patch I've been working on for master. As part of that effort I've found some other obscure GC-related bugs that we've been lucky to avoid; this patch focuses only on the area Eli encountered. [-- Attachment #2: 0001-Be-more-aggressive-in-marking-objects-during-GC.patch --] [-- Type: text/x-patch, Size: 2116 bytes --] From 55dbbc828346aa5aca8c56c2813baa66fdaf7e08 Mon Sep 17 00:00:00 2001 From: Paul Eggert <eggert@cs.ucla.edu> Date: Sat, 30 May 2020 10:10:02 -0700 Subject: [PATCH] Be more aggressive in marking objects during GC MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Simplified version of a patch from Pip Cet (Bug#41321#299). * src/alloc.c (maybe_lisp_pointer): Remove. All uses removed. (mark_memory): Also look at the pointer offset by ‘lispsym’, for symbols. --- src/alloc.c | 20 ++++---------------- 1 file changed, 4 insertions(+), 16 deletions(-) diff --git a/src/alloc.c b/src/alloc.c index 1c6b664b22..d5a0f0aa9d 100644 --- a/src/alloc.c +++ b/src/alloc.c @@ -4585,18 +4585,6 @@ mark_maybe_objects (Lisp_Object const *array, ptrdiff_t nelts) mark_maybe_object (*array); } -/* Return true if P might point to Lisp data that can be garbage - collected, and false otherwise (i.e., false if it is easy to see - that P cannot point to Lisp data that can be garbage collected). - Symbols are implemented via offsets not pointers, but the offsets - are also multiples of LISP_ALIGNMENT. */ - -static bool -maybe_lisp_pointer (void *p) -{ - return (uintptr_t) p % LISP_ALIGNMENT == 0; -} - /* If P points to Lisp data, mark that as live if it isn't already marked. */ @@ -4609,9 +4597,6 @@ mark_maybe_pointer (void *p) VALGRIND_MAKE_MEM_DEFINED (&p, sizeof (p)); #endif - if (!maybe_lisp_pointer (p)) - return; - if (pdumper_object_p (p)) { int type = pdumper_find_object_type (p); @@ -4715,7 +4700,10 @@ mark_memory (void const *start, void const *end) for (pp = start; (void const *) pp < end; pp += GC_POINTER_ALIGNMENT) { - mark_maybe_pointer (*(void *const *) pp); + char *p = *(char *const *) pp; + mark_maybe_pointer (p); + p += (intptr_t) lispsym; + mark_maybe_pointer (p); verify (alignof (Lisp_Object) % GC_POINTER_ALIGNMENT == 0); if (alignof (Lisp_Object) == GC_POINTER_ALIGNMENT -- 2.17.1 ^ permalink raw reply related [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-30 18:04 ` Paul Eggert @ 2020-05-30 18:12 ` Pip Cet 2020-05-30 18:16 ` Eli Zaretskii 2020-05-30 18:39 ` Pip Cet 2 siblings, 0 replies; 132+ messages in thread From: Pip Cet @ 2020-05-30 18:12 UTC (permalink / raw) To: Paul Eggert; +Cc: 41321, Stefan Monnier On Sat, May 30, 2020 at 6:04 PM Paul Eggert <eggert@cs.ucla.edu> wrote: > For emacs-27 I propose the attached, more-conservative patch instead. More conservative is good! So, yes, I prefer your patch. > This is a backport of part of a patch I've been working on for master. > As part of that effort I've found some other obscure GC-related bugs > that we've been lucky to avoid; this patch focuses only on the area Eli > encountered. Looking forward to hearing about those :-) ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-30 18:04 ` Paul Eggert 2020-05-30 18:12 ` Pip Cet @ 2020-05-30 18:16 ` Eli Zaretskii 2020-05-30 18:45 ` Paul Eggert 2020-05-30 18:39 ` Pip Cet 2 siblings, 1 reply; 132+ messages in thread From: Eli Zaretskii @ 2020-05-30 18:16 UTC (permalink / raw) To: Paul Eggert; +Cc: 41321, monnier, pipcet > Cc: 41321@debbugs.gnu.org, Stefan Monnier <monnier@iro.umontreal.ca> > From: Paul Eggert <eggert@cs.ucla.edu> > Date: Sat, 30 May 2020 11:04:55 -0700 > > For emacs-27 I propose the attached, more-conservative patch instead. > This is a backport of part of a patch I've been working on for master. > As part of that effort I've found some other obscure GC-related bugs > that we've been lucky to avoid; this patch focuses only on the area Eli > encountered. Please explain in comments why we are marking one more pointer in the loop. Also, I don't think I understand why this solves all of the problems we were discussing; is this in addition to another patch that you propose for emacs-27? Thanks. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-30 18:16 ` Eli Zaretskii @ 2020-05-30 18:45 ` Paul Eggert 0 siblings, 0 replies; 132+ messages in thread From: Paul Eggert @ 2020-05-30 18:45 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 41321, monnier, pipcet [-- Attachment #1: Type: text/plain, Size: 1238 bytes --] On 5/30/20 11:16 AM, Eli Zaretskii wrote: > Please explain in comments why we are marking one more pointer in the > loop. Sure. I'm attaching the revised patch proposed for emacs-27. This is very similar to what Pip Cet just proposed in Bug#41321#353, but the code is simpler with fewer casts (and I like my comment better :-). > I don't think I understand why this solves all of the > problems we were discussing; is this in addition to another patch that > you propose for emacs-27? This replaces all the patches that I proposed for emacs-27 in this thread. Although this patch doesn't solve all the problems we have been discussing, it does solve the urgent ones: * The problem you observed on MinGW for markers; it can also occur for many other object types. This problem can cause the GC to incorrectly reclaim storage for objects, causing the usual disasters. * The similar problem that Pip Cet noted for symbols. The patch does not solve less-urgent problems we've talked about, such as over-alignment of Lisp objects on MinGW (this is a relatively minor performance issue), or the more-obscure and unlikely GC bugs that we've been living with for a while (which I haven't had the time to think through entirely). [-- Attachment #2: 0001-Be-more-aggressive-in-marking-objects-during-GC.patch --] [-- Type: text/x-patch, Size: 2447 bytes --] From 6a474a55e68a2bada13db69d4099a4b2de7b1271 Mon Sep 17 00:00:00 2001 From: Paul Eggert <eggert@cs.ucla.edu> Date: Sat, 30 May 2020 10:10:02 -0700 Subject: [PATCH] Be more aggressive in marking objects during GC MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Simplified version of a patch from Pip Cet (Bug#41321#299). * src/alloc.c (maybe_lisp_pointer): Remove. All uses removed. (mark_memory): Also look at the pointer offset by ‘lispsym’, for symbols. --- src/alloc.c | 26 ++++++++++---------------- 1 file changed, 10 insertions(+), 16 deletions(-) diff --git a/src/alloc.c b/src/alloc.c index 1c6b664b22..568fee666f 100644 --- a/src/alloc.c +++ b/src/alloc.c @@ -4585,18 +4585,6 @@ mark_maybe_objects (Lisp_Object const *array, ptrdiff_t nelts) mark_maybe_object (*array); } -/* Return true if P might point to Lisp data that can be garbage - collected, and false otherwise (i.e., false if it is easy to see - that P cannot point to Lisp data that can be garbage collected). - Symbols are implemented via offsets not pointers, but the offsets - are also multiples of LISP_ALIGNMENT. */ - -static bool -maybe_lisp_pointer (void *p) -{ - return (uintptr_t) p % LISP_ALIGNMENT == 0; -} - /* If P points to Lisp data, mark that as live if it isn't already marked. */ @@ -4609,9 +4597,6 @@ mark_maybe_pointer (void *p) VALGRIND_MAKE_MEM_DEFINED (&p, sizeof (p)); #endif - if (!maybe_lisp_pointer (p)) - return; - if (pdumper_object_p (p)) { int type = pdumper_find_object_type (p); @@ -4715,7 +4700,16 @@ mark_memory (void const *start, void const *end) for (pp = start; (void const *) pp < end; pp += GC_POINTER_ALIGNMENT) { - mark_maybe_pointer (*(void *const *) pp); + char *p = *(char *const *) pp; + mark_maybe_pointer (p); + + /* Unmask any struct Lisp_Symbol pointer that make_lisp_symbol + previously disguised by adding the address of 'lispsym'. + On a host with 32-bit pointers and 64-bit Lisp_Objects, + a Lisp_Object might be split into registers saved into + non-adjacent words and P might be the low-order word's value. */ + p += (intptr_t) lispsym; + mark_maybe_pointer (p); verify (alignof (Lisp_Object) % GC_POINTER_ALIGNMENT == 0); if (alignof (Lisp_Object) == GC_POINTER_ALIGNMENT -- 2.17.1 ^ permalink raw reply related [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-30 18:04 ` Paul Eggert 2020-05-30 18:12 ` Pip Cet 2020-05-30 18:16 ` Eli Zaretskii @ 2020-05-30 18:39 ` Pip Cet 2020-05-30 18:57 ` Paul Eggert 2 siblings, 1 reply; 132+ messages in thread From: Pip Cet @ 2020-05-30 18:39 UTC (permalink / raw) To: Paul Eggert; +Cc: 41321, Stefan Monnier [-- Attachment #1: Type: text/plain, Size: 1141 bytes --] On Sat, May 30, 2020 at 6:04 PM Paul Eggert <eggert@cs.ucla.edu> wrote: > On 5/30/20 6:29 AM, Pip Cet wrote: > > I'm surprised, but glad that you think so. Patch for emacs-27 attached. > That patch is on the right track but it's not clear whether it will > cause GC to fail to mark some objects that it should, both because it > omits mark_maybe_object on platforms like x86 --with-wide-int where > alignof (void *) < sizeof (Lisp_Object), and because it skips > mark_maybe_pointer on more-typical platforms where alignof (void *) == > sizeof (Lisp_Object). I've thought about this for a while, but I fail to see the problem with my patch. mark_maybe_object is unnecessary on x86 --with-wide-int, and mark_maybe_pointer (off + lispsym) is unnecessary on platforms that don't rip apart our precious Lisp_Objects. The other call to mark_maybe_pointer isn't skipped. I still think we ought to use yours (and accept a ~25% performance penalty in this particular loop on Eli's platform), but include a comment like the one I had in mine. It might hide further bugs, but that's probably what we want to do on emacs-27. Proposed patch attached. [-- Attachment #2: 0001-Be-more-aggressive-in-marking-objects-during-GC-bug-.patch --] [-- Type: text/x-patch, Size: 2222 bytes --] From 047fc04af8f9d6b6e4587ee88d573dac4292eeb0 Mon Sep 17 00:00:00 2001 From: Pip Cet <pipcet@gmail.com> Date: Sat, 30 May 2020 13:23:24 +0000 Subject: [PATCH] Be more aggressive in marking objects during GC (bug#41321) * src/alloc.c (maybe_lisp_pointer): Remove. (mark_memory): Mark 32-bit words that might be the only reference to a Lisp_Symbol. --- src/alloc.c | 24 ++++++++---------------- 1 file changed, 8 insertions(+), 16 deletions(-) diff --git a/src/alloc.c b/src/alloc.c index 1c6b664b22..14f75a2259 100644 --- a/src/alloc.c +++ b/src/alloc.c @@ -4585,18 +4585,6 @@ mark_maybe_objects (Lisp_Object const *array, ptrdiff_t nelts) mark_maybe_object (*array); } -/* Return true if P might point to Lisp data that can be garbage - collected, and false otherwise (i.e., false if it is easy to see - that P cannot point to Lisp data that can be garbage collected). - Symbols are implemented via offsets not pointers, but the offsets - are also multiples of LISP_ALIGNMENT. */ - -static bool -maybe_lisp_pointer (void *p) -{ - return (uintptr_t) p % LISP_ALIGNMENT == 0; -} - /* If P points to Lisp data, mark that as live if it isn't already marked. */ @@ -4609,9 +4597,6 @@ mark_maybe_pointer (void *p) VALGRIND_MAKE_MEM_DEFINED (&p, sizeof (p)); #endif - if (!maybe_lisp_pointer (p)) - return; - if (pdumper_object_p (p)) { int type = pdumper_find_object_type (p); @@ -4715,7 +4700,14 @@ mark_memory (void const *start, void const *end) for (pp = start; (void const *) pp < end; pp += GC_POINTER_ALIGNMENT) { - mark_maybe_pointer (*(void *const *) pp); + uintptr_t offset = (uintptr_t) *(void *const *) pp; + mark_maybe_pointer ((void *) offset); + /* On 32-bit --with-wide-int systems, the two halves of a + Lisp_Object may be stored non-contiguously. Therefore, we + need to recognize the lower 32 bits of a Lisp_Object encoding + a symbol, and since Qnil is binary zero, that requires adding + &lispsym. */ + mark_maybe_pointer ((void *) (offset + (uintptr_t) lispsym)); verify (alignof (Lisp_Object) % GC_POINTER_ALIGNMENT == 0); if (alignof (Lisp_Object) == GC_POINTER_ALIGNMENT -- 2.27.0.rc0 ^ permalink raw reply related [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-30 18:39 ` Pip Cet @ 2020-05-30 18:57 ` Paul Eggert 2020-05-30 19:06 ` Pip Cet 0 siblings, 1 reply; 132+ messages in thread From: Paul Eggert @ 2020-05-30 18:57 UTC (permalink / raw) To: Pip Cet; +Cc: 41321, Stefan Monnier On 5/30/20 11:39 AM, Pip Cet wrote: > I fail to see the problem > with my patch. mark_maybe_object is unnecessary on x86 > --with-wide-int, and mark_maybe_pointer (off + lispsym) is unnecessary > on platforms that don't rip apart our precious Lisp_Objects. The other > call to mark_maybe_pointer isn't skipped. The other alloc.c code is inconsistent with respect to the live_*_holding versus live_*_p functions. There is no live_float_holding function, which means we're relying entirely on mark_maybe_object to find roots that contain Lisp floats. So it's dicey that your earlier (Bug#41321#299) patch skips the call to mark_maybe_object on some platforms. I've been working on improving this for master. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-30 18:57 ` Paul Eggert @ 2020-05-30 19:06 ` Pip Cet 2020-05-30 21:27 ` Paul Eggert 0 siblings, 1 reply; 132+ messages in thread From: Pip Cet @ 2020-05-30 19:06 UTC (permalink / raw) To: Paul Eggert; +Cc: 41321, Stefan Monnier On Sat, May 30, 2020 at 6:57 PM Paul Eggert <eggert@cs.ucla.edu> wrote: > On 5/30/20 11:39 AM, Pip Cet wrote: > > I fail to see the problem > > with my patch. mark_maybe_object is unnecessary on x86 > > --with-wide-int, and mark_maybe_pointer (off + lispsym) is unnecessary > > on platforms that don't rip apart our precious Lisp_Objects. The other > > call to mark_maybe_pointer isn't skipped. > > The other alloc.c code is inconsistent with respect to the > live_*_holding versus live_*_p functions. There is no live_float_holding > function, Indeed. There's just live_float_p. > which means we're relying entirely on mark_maybe_object to > find roots that contain Lisp floats. No, we're not. There's code in mark_maybe_pointer to handle the float case, by calling live_float_p. Is it misaligned pointers into floats you're worried about? > So it's dicey that your earlier > (Bug#41321#299) patch skips the call to mark_maybe_object on some platforms. I still fail to see how. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-30 19:06 ` Pip Cet @ 2020-05-30 21:27 ` Paul Eggert 2020-05-30 21:49 ` Pip Cet 0 siblings, 1 reply; 132+ messages in thread From: Paul Eggert @ 2020-05-30 21:27 UTC (permalink / raw) To: Pip Cet; +Cc: 41321, Stefan Monnier On 5/30/20 12:06 PM, Pip Cet wrote: > Is it misaligned pointers into floats you're worried about? Yes, and it's plausible there will be pointers misaligned because Lisp_Float has been added to them. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-30 21:27 ` Paul Eggert @ 2020-05-30 21:49 ` Pip Cet 2020-05-30 22:23 ` Paul Eggert 0 siblings, 1 reply; 132+ messages in thread From: Pip Cet @ 2020-05-30 21:49 UTC (permalink / raw) To: Paul Eggert; +Cc: 41321, Stefan Monnier On Sat, May 30, 2020 at 9:27 PM Paul Eggert <eggert@cs.ucla.edu> wrote: > > Is it misaligned pointers into floats you're worried about? > > Yes, and it's plausible there will be pointers misaligned because > Lisp_Float has been added to them. Sorry for being dense, but I still don't understand. This is on !LSB_TAG machines, where Lisp_Float does not affect the representation of the lower 32 bits. On LSB_TAG machines, the other code path is taken. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-30 21:49 ` Pip Cet @ 2020-05-30 22:23 ` Paul Eggert 2020-05-30 22:54 ` Pip Cet 0 siblings, 1 reply; 132+ messages in thread From: Paul Eggert @ 2020-05-30 22:23 UTC (permalink / raw) To: Pip Cet; +Cc: 41321, Stefan Monnier On 5/30/20 2:49 PM, Pip Cet wrote: > On Sat, May 30, 2020 at 9:27 PM Paul Eggert <eggert@cs.ucla.edu> wrote: >>> Is it misaligned pointers into floats you're worried about? >> >> Yes, and it's plausible there will be pointers misaligned because >> Lisp_Float has been added to them. > > Sorry for being dense, but I still don't understand. This is on > !LSB_TAG machines, where Lisp_Float does not affect the representation > of the lower 32 bits. On LSB_TAG machines, the other code path is > taken. > Oh, I see I am being the dense one. I was thinking based on some of my master-branch improvements. One option is to do away with mark_maybe_object entirely, so that one needn't deal with looking at each part of the stack twice (this is for efficiency). In emacs-27 the patch you proposed earlier is probably OK, though I haven't had time to think through all the possibilities. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-30 22:23 ` Paul Eggert @ 2020-05-30 22:54 ` Pip Cet 0 siblings, 0 replies; 132+ messages in thread From: Pip Cet @ 2020-05-30 22:54 UTC (permalink / raw) To: Paul Eggert; +Cc: 41321, Stefan Monnier On Sat, May 30, 2020 at 10:23 PM Paul Eggert <eggert@cs.ucla.edu> wrote: > Oh, I see I am being the dense one. I was thinking based on some of my > master-branch improvements. One option is to do away with > mark_maybe_object entirely, so that one needn't deal with looking at > each part of the stack twice (this is for efficiency). Yes, I thought you'd already done that on master. I must not have been keeping up with the patches. Much as I like thinking about putting symbols in the rbtree twice and walking it smartly to retrieve up to two overlapping nodes, I suspect there are much easier ways of fixing this, at least on 64-bit architectures. We could make sure, for example, that all symbol blocks come after lispsym in memory, and store lispsym - address in the Lisp_Object. Those values would then fall outside the 48-bit space of actually valid x86_64 addresses, so we could get away with mark_maybe_pointer (word < 0 ? lispsym - word : word) on that architecture. > In emacs-27 the patch you proposed earlier is probably OK, though I > haven't had time to think through all the possibilities. I was just curious. I think we should go with your latest patch. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-29 21:01 ` Pip Cet 2020-05-30 5:58 ` Eli Zaretskii @ 2020-05-30 16:31 ` Paul Eggert 2020-05-30 16:42 ` Eli Zaretskii 2020-05-30 16:53 ` Pip Cet 1 sibling, 2 replies; 132+ messages in thread From: Paul Eggert @ 2020-05-30 16:31 UTC (permalink / raw) To: Pip Cet; +Cc: 41321, Stefan Monnier On 5/29/20 2:01 PM, Pip Cet wrote: > (1) says: > It’s an invalid optimization, since pointers can address the > middle of Lisp_Object data. > > That may be true (we still haven't observed it), I observed it earlier, in code that iterated through a Lisp vector; at the machine level the only pointer was into the middle of that vector. Addresses of Lisp_Vector elements are not GCALIGNED on x86 and other platforms. > but it's not what > happened in Eli's case: Yes, that's right. That is, the patch for (1) fixed not only Eli's case, but other plausible cases. > 1. malloc() % GCALIGNMENT != 0. Never happens, as far as I can tell. Although that's true of all current Emacs porting targets as far as I know, I'd rather not hardwire this into the code, as neither POSIX nor the C standard require it. This is why the comment refers to platforms where malloc() % 8 != 0 as "oddball hosts". > 2. A Lisp object requires greater alignment than malloc() gives it. > IIRC, there was at least one RISC architecture whose specification We don't need anything that obscure. Just use __int128 on x86 with glibc 2.24. On that platform __int128's alignment is 16, malloc's is 8. > I'm not saying it's the best solution, but I would prefer simply > defining LISP_ALIGNMENT to be 8 to either patch. That's not correct for !USE_LSB_TAG, where LISP_ALIGNMENT is merely the native alignment (there's no need to align objects to 8 because the tags are at the high end). ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-30 16:31 ` Paul Eggert @ 2020-05-30 16:42 ` Eli Zaretskii 2020-05-30 17:06 ` Paul Eggert 2020-05-30 16:53 ` Pip Cet 1 sibling, 1 reply; 132+ messages in thread From: Eli Zaretskii @ 2020-05-30 16:42 UTC (permalink / raw) To: Paul Eggert; +Cc: 41321, monnier, pipcet > Cc: Eli Zaretskii <eliz@gnu.org>, 41321@debbugs.gnu.org, > Stefan Monnier <monnier@iro.umontreal.ca> > From: Paul Eggert <eggert@cs.ucla.edu> > Date: Sat, 30 May 2020 09:31:49 -0700 > > > I'm not saying it's the best solution, but I would prefer simply > > defining LISP_ALIGNMENT to be 8 to either patch. > > That's not correct for !USE_LSB_TAG, where LISP_ALIGNMENT is merely the native > alignment (there's no need to align objects to 8 because the tags are at the > high end). I'm using a !USE_LSB_TAG build, but LISP_ALIGNMENT is 8 nonetheless. What am I missing? ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-30 16:42 ` Eli Zaretskii @ 2020-05-30 17:06 ` Paul Eggert 2020-05-30 17:22 ` Eli Zaretskii 0 siblings, 1 reply; 132+ messages in thread From: Paul Eggert @ 2020-05-30 17:06 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 41321, monnier, pipcet On 5/30/20 9:42 AM, Eli Zaretskii wrote: >> That's not correct for !USE_LSB_TAG, where LISP_ALIGNMENT is merely the native >> alignment (there's no need to align objects to 8 because the tags are at the >> high end). > I'm using a !USE_LSB_TAG build, but LISP_ALIGNMENT is 8 nonetheless. That's true for your platform, since alignof (max_align_t) == 8 on your platform. But neither the C standard nor POSIX guarantee that alignof (max_align_t) is 8. Admittedly these days one would have to look hard to find a platform where alignof (max_align_t) is 4 or less. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-30 17:06 ` Paul Eggert @ 2020-05-30 17:22 ` Eli Zaretskii 2020-05-30 18:12 ` Paul Eggert 0 siblings, 1 reply; 132+ messages in thread From: Eli Zaretskii @ 2020-05-30 17:22 UTC (permalink / raw) To: Paul Eggert; +Cc: 41321, monnier, pipcet > Cc: pipcet@gmail.com, 41321@debbugs.gnu.org, monnier@iro.umontreal.ca > From: Paul Eggert <eggert@cs.ucla.edu> > Date: Sat, 30 May 2020 10:06:35 -0700 > > On 5/30/20 9:42 AM, Eli Zaretskii wrote: > >> That's not correct for !USE_LSB_TAG, where LISP_ALIGNMENT is merely the native > >> alignment (there's no need to align objects to 8 because the tags are at the > >> high end). > > I'm using a !USE_LSB_TAG build, but LISP_ALIGNMENT is 8 nonetheless. > > That's true for your platform, since alignof (max_align_t) == 8 on your > platform. No, it's 16. And I don't understand what does that have to do with LISP_ALIGNMENT on the master branch, since we all but removed max_align_t from there. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-30 17:22 ` Eli Zaretskii @ 2020-05-30 18:12 ` Paul Eggert 2020-05-30 18:21 ` Eli Zaretskii 0 siblings, 1 reply; 132+ messages in thread From: Paul Eggert @ 2020-05-30 18:12 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 41321, monnier, pipcet On 5/30/20 10:22 AM, Eli Zaretskii wrote: >>>> That's not correct for !USE_LSB_TAG, where LISP_ALIGNMENT is merely the native >>>> alignment (there's no need to align objects to 8 because the tags are at the >>>> high end). >>> I'm using a !USE_LSB_TAG build, but LISP_ALIGNMENT is 8 nonetheless. >> That's true for your platform, since alignof (max_align_t) == 8 on your >> platform. > No, it's 16. And I don't understand what does that have to do with > LISP_ALIGNMENT on the master branch, since we all but removed > max_align_t from there. Oh, I thought you were talking about the emacs-27 branch which is still using max_align_t. You're right that LISP_ALIGNMENT is 8 on your platform on the master branch. However, my comment "That's not correct for !USE_LSB_TAG ..." (Bug#41321#305) was responding to Pip Cet's earlier comment "I would prefer simply defining LISP_ALIGNMENT to be 8" (Bug#41321#272) which was talking about the emacs-27 branch. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-30 18:12 ` Paul Eggert @ 2020-05-30 18:21 ` Eli Zaretskii 2020-05-30 19:14 ` Paul Eggert 0 siblings, 1 reply; 132+ messages in thread From: Eli Zaretskii @ 2020-05-30 18:21 UTC (permalink / raw) To: Paul Eggert; +Cc: 41321, monnier, pipcet > Cc: pipcet@gmail.com, 41321@debbugs.gnu.org, monnier@iro.umontreal.ca > From: Paul Eggert <eggert@cs.ucla.edu> > Date: Sat, 30 May 2020 11:12:49 -0700 > > On 5/30/20 10:22 AM, Eli Zaretskii wrote: > >>>> That's not correct for !USE_LSB_TAG, where LISP_ALIGNMENT is merely the native > >>>> alignment (there's no need to align objects to 8 because the tags are at the > >>>> high end). > >>> I'm using a !USE_LSB_TAG build, but LISP_ALIGNMENT is 8 nonetheless. > >> That's true for your platform, since alignof (max_align_t) == 8 on your > >> platform. > > No, it's 16. And I don't understand what does that have to do with > > LISP_ALIGNMENT on the master branch, since we all but removed > > max_align_t from there. > > Oh, I thought you were talking about the emacs-27 branch which is still > using max_align_t. > > You're right that LISP_ALIGNMENT is 8 on your platform on the master > branch. However, my comment "That's not correct for !USE_LSB_TAG ..." > (Bug#41321#305) was responding to Pip Cet's earlier comment "I would > prefer simply defining LISP_ALIGNMENT to be 8" (Bug#41321#272) which was > talking about the emacs-27 branch. I'm still confused, because on current emacs-27, both LISP_ALIGNMENT and alignof(max_align_t) are 16 in my builds. And I still don't understand why using LISP_ALIGNMENT of 8 is not right in this case (on emacs-27). ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-30 18:21 ` Eli Zaretskii @ 2020-05-30 19:14 ` Paul Eggert 2020-05-30 19:33 ` Eli Zaretskii 0 siblings, 1 reply; 132+ messages in thread From: Paul Eggert @ 2020-05-30 19:14 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 41321, monnier, pipcet On 5/30/20 11:21 AM, Eli Zaretskii wrote: > on current emacs-27, both LISP_ALIGNMENT > and alignof(max_align_t) are 16 in my builds. And I still don't > understand why using LISP_ALIGNMENT of 8 is not right in this case (on > emacs-27). You're right that LISP_ALIGNMENT is 16 on your host on the emacs-27 branch, because alignof (max_align_t) is 16 there. And you're also right that setting LISP_ALIGNMENT to be 8 on your host would fix the marker bug you observed there, because it would work around your host's bug where malloc returns a pointer that is not a multiple of alignof(max_align_t). However, C and POSIX allow platforms where LISP_ALIGNMENT should be greater than 8, or (if !USE_LSB_TAG) should be less than 8, so I'd be leery about changing LISP_ALIGNMENT on any host that doesn't have your host's idiosyncrasies. And that specific workaround should not be needed anyway if we install the emacs-27 patch that I have most-recently suggested (or Pip Cet's very-similar recent patch), since this patch solves the problem in a more-general way that should help to prevent more bugs like this one. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-30 19:14 ` Paul Eggert @ 2020-05-30 19:33 ` Eli Zaretskii 2020-05-30 22:18 ` Paul Eggert 0 siblings, 1 reply; 132+ messages in thread From: Eli Zaretskii @ 2020-05-30 19:33 UTC (permalink / raw) To: Paul Eggert; +Cc: 41321, monnier, pipcet > Cc: pipcet@gmail.com, 41321@debbugs.gnu.org, monnier@iro.umontreal.ca > From: Paul Eggert <eggert@cs.ucla.edu> > Date: Sat, 30 May 2020 12:14:14 -0700 > > On 5/30/20 11:21 AM, Eli Zaretskii wrote: > > > on current emacs-27, both LISP_ALIGNMENT > > and alignof(max_align_t) are 16 in my builds. And I still don't > > understand why using LISP_ALIGNMENT of 8 is not right in this case (on > > emacs-27). > > You're right that LISP_ALIGNMENT is 16 on your host on the emacs-27 > branch, because alignof (max_align_t) is 16 there. And you're also right > that setting LISP_ALIGNMENT to be 8 on your host would fix the marker > bug you observed there, because it would work around your host's bug > where malloc returns a pointer that is not a multiple of > alignof(max_align_t). However, C and POSIX allow platforms where > LISP_ALIGNMENT should be greater than 8, or (if !USE_LSB_TAG) should be > less than 8, so I'd be leery about changing LISP_ALIGNMENT on any host > that doesn't have your host's idiosyncrasies. Posix may require it, but do we actually know of any supported important platforms where this happens? If not, let's worry about the more general fix on master, where we still have time to try various solutions, and settle for a simpler and easier fix on emacs-27. > And that specific workaround should not be needed anyway if we > install the emacs-27 patch that I have most-recently suggested (or > Pip Cet's very-similar recent patch), since this patch solves the > problem in a more-general way that should help to prevent more bugs > like this one. But your proposal is also less efficient, isn't it? If so, its more general nature comes at a price. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-30 19:33 ` Eli Zaretskii @ 2020-05-30 22:18 ` Paul Eggert 2020-05-31 15:48 ` Eli Zaretskii 0 siblings, 1 reply; 132+ messages in thread From: Paul Eggert @ 2020-05-30 22:18 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 41321, monnier, pipcet On 5/30/20 12:33 PM, Eli Zaretskii wrote: > Posix may require it, but do we actually know of any supported > important platforms where this happens? That depends on what the question is. If the question is "Are there platforms where the lost-marker bug occurs?", then no, we don't know of any supported important platforms. But if the question is "Are there platforms where LISP_ALIGNMENT should be some value other than 8?", then yes, LISP_ALIGNMENT should be 4 on Ubuntu 18.04.3 i386 when Emacs is configured --with-wide-int (I just tested this, and it is indeed 4 on that platform in the Emacs master branch). This is because on this platform Lisp objects have a native alignment of 4, and this platform is !USE_LSB_TAG so the presence of tag bits imposes no extra alignment requirement. > let's worry about the > more general fix on master, where we still have time to try various > solutions, and settle for a simpler and easier fix on emacs-27. Yes, that's what we're trying to do, and it's what's in the latest patch that Pip Cet and I proposed very similar variants of. > But your proposal is also less efficient, isn't it? If so, its more > general nature comes at a price. Sure. Compared to simply making LISP_ALIGNMENT = 8 as a workaround (which is not correct as noted above but which fixes the lost-marker bug), the proposed patch is about a 1% CPU-time hit in my usual benchmark (make compile-always) on a 32-bit platform compiled with --with-wide-int (this is Ubuntu 18.04.4, gcc -m32, Xeon E3-1225 v2). We can surely speed this up with some cost in complexity (that's what I was working on on the master branch), but for emacs-27 I thought that reliability took precedence over 1% performance improvements. I expect that most of the performance hit is not due to the LISP_ALIGNMENT thing, it's due to the "you have to check pointers three times" thing. In my master-branch draft I'm working on getting this down to "you have to check pointers an average of 1+epsilon times" for some suitable value of epsilon. I don't know yet what epsilon will be. But anyway, this is only about improving that 1% performance hit. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-30 22:18 ` Paul Eggert @ 2020-05-31 15:48 ` Eli Zaretskii 2020-06-01 14:48 ` Eli Zaretskii 0 siblings, 1 reply; 132+ messages in thread From: Eli Zaretskii @ 2020-05-31 15:48 UTC (permalink / raw) To: Paul Eggert; +Cc: 41321, monnier, pipcet > Cc: pipcet@gmail.com, 41321@debbugs.gnu.org, monnier@iro.umontreal.ca > From: Paul Eggert <eggert@cs.ucla.edu> > Date: Sat, 30 May 2020 15:18:53 -0700 > > On 5/30/20 12:33 PM, Eli Zaretskii wrote: > > > Posix may require it, but do we actually know of any supported > > important platforms where this happens? > > > But your proposal is also less efficient, isn't it? If so, its more > > general nature comes at a price. > > Sure. Compared to simply making LISP_ALIGNMENT = 8 as a workaround > (which is not correct as noted above but which fixes the lost-marker > bug), the proposed patch is about a 1% CPU-time hit in my usual > benchmark (make compile-always) on a 32-bit platform compiled with > --with-wide-int (this is Ubuntu 18.04.4, gcc -m32, Xeon E3-1225 v2). We > can surely speed this up with some cost in complexity (that's what I was > working on on the master branch), but for emacs-27 I thought that > reliability took precedence over 1% performance improvements. > > I expect that most of the performance hit is not due to the > LISP_ALIGNMENT thing, it's due to the "you have to check pointers three > times" thing. In my master-branch draft I'm working on getting this down > to "you have to check pointers an average of 1+epsilon times" for some > suitable value of epsilon. I don't know yet what epsilon will be. But > anyway, this is only about improving that 1% performance hit. OK, then let's get this change into emacs-27, and thanks. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-31 15:48 ` Eli Zaretskii @ 2020-06-01 14:48 ` Eli Zaretskii 2020-09-27 14:39 ` Lars Ingebrigtsen 0 siblings, 1 reply; 132+ messages in thread From: Eli Zaretskii @ 2020-06-01 14:48 UTC (permalink / raw) To: eggert; +Cc: 41321, monnier, pipcet > Date: Sun, 31 May 2020 18:48:28 +0300 > From: Eli Zaretskii <eliz@gnu.org> > Cc: 41321@debbugs.gnu.org, monnier@iro.umontreal.ca, pipcet@gmail.com > > > Sure. Compared to simply making LISP_ALIGNMENT = 8 as a workaround > > (which is not correct as noted above but which fixes the lost-marker > > bug), the proposed patch is about a 1% CPU-time hit in my usual > > benchmark (make compile-always) on a 32-bit platform compiled with > > --with-wide-int (this is Ubuntu 18.04.4, gcc -m32, Xeon E3-1225 v2). We > > can surely speed this up with some cost in complexity (that's what I was > > working on on the master branch), but for emacs-27 I thought that > > reliability took precedence over 1% performance improvements. > > > > I expect that most of the performance hit is not due to the > > LISP_ALIGNMENT thing, it's due to the "you have to check pointers three > > times" thing. In my master-branch draft I'm working on getting this down > > to "you have to check pointers an average of 1+epsilon times" for some > > suitable value of epsilon. I don't know yet what epsilon will be. But > > anyway, this is only about improving that 1% performance hit. > > OK, then let's get this change into emacs-27, and thanks. FTR, I'm now running Emacs 27.0.91 pretest patched with Paul's changes in commit 68b6dad1d8e22fe700871c9a5a18da3dd496cc8a. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-06-01 14:48 ` Eli Zaretskii @ 2020-09-27 14:39 ` Lars Ingebrigtsen 2020-09-27 14:45 ` Pip Cet 2020-09-27 15:16 ` Eli Zaretskii 0 siblings, 2 replies; 132+ messages in thread From: Lars Ingebrigtsen @ 2020-09-27 14:39 UTC (permalink / raw) To: Eli Zaretskii; +Cc: eggert, 41321, monnier, pipcet Eli Zaretskii <eliz@gnu.org> writes: >> OK, then let's get this change into emacs-27, and thanks. > > FTR, I'm now running Emacs 27.0.91 pretest patched with Paul's changes > in commit 68b6dad1d8e22fe700871c9a5a18da3dd496cc8a. I've just lightly skimmed this thread, but does this mean that the bug was fixed and this bug report can be closed? -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-09-27 14:39 ` Lars Ingebrigtsen @ 2020-09-27 14:45 ` Pip Cet 2020-09-27 15:02 ` Lars Ingebrigtsen 2020-09-27 15:16 ` Eli Zaretskii 1 sibling, 1 reply; 132+ messages in thread From: Pip Cet @ 2020-09-27 14:45 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: eggert, 41321, Stefan Monnier On Sun, Sep 27, 2020 at 2:40 PM Lars Ingebrigtsen <larsi@gnus.org> wrote: > Eli Zaretskii <eliz@gnu.org> writes: > >> OK, then let's get this change into emacs-27, and thanks. > > > > FTR, I'm now running Emacs 27.0.91 pretest patched with Paul's changes > > in commit 68b6dad1d8e22fe700871c9a5a18da3dd496cc8a. > > I've just lightly skimmed this thread, but does this mean that the bug > was fixed and this bug report can be closed? I believe it can be, yes, though I'm not sure I ever managed to convince Eli that the bug I found was the bug he was seeing... (Sorry for not getting to the other bug reports, BTW, I'm incredibly busy with family business right now.) ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-09-27 14:45 ` Pip Cet @ 2020-09-27 15:02 ` Lars Ingebrigtsen 0 siblings, 0 replies; 132+ messages in thread From: Lars Ingebrigtsen @ 2020-09-27 15:02 UTC (permalink / raw) To: Pip Cet; +Cc: eggert, 41321, Stefan Monnier Pip Cet <pipcet@gmail.com> writes: > (Sorry for not getting to the other bug reports, BTW, I'm incredibly > busy with family business right now.) Sure; no problem. :-) -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-09-27 14:39 ` Lars Ingebrigtsen 2020-09-27 14:45 ` Pip Cet @ 2020-09-27 15:16 ` Eli Zaretskii 1 sibling, 0 replies; 132+ messages in thread From: Eli Zaretskii @ 2020-09-27 15:16 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: eggert, 41321-done, monnier, pipcet > From: Lars Ingebrigtsen <larsi@gnus.org> > Cc: eggert@cs.ucla.edu, 41321@debbugs.gnu.org, monnier@iro.umontreal.ca, > pipcet@gmail.com > Date: Sun, 27 Sep 2020 16:39:51 +0200 > > > FTR, I'm now running Emacs 27.0.91 pretest patched with Paul's changes > > in commit 68b6dad1d8e22fe700871c9a5a18da3dd496cc8a. > > I've just lightly skimmed this thread, but does this mean that the bug > was fixed and this bug report can be closed? Yes, done. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-30 16:31 ` Paul Eggert 2020-05-30 16:42 ` Eli Zaretskii @ 2020-05-30 16:53 ` Pip Cet 1 sibling, 0 replies; 132+ messages in thread From: Pip Cet @ 2020-05-30 16:53 UTC (permalink / raw) To: Paul Eggert; +Cc: 41321, Stefan Monnier On Sat, May 30, 2020 at 4:31 PM Paul Eggert <eggert@cs.ucla.edu> wrote: > On 5/29/20 2:01 PM, Pip Cet wrote: > > (1) says: > > It’s an invalid optimization, since pointers can address the > > middle of Lisp_Object data. > > > > That may be true (we still haven't observed it), > > I observed it earlier, in code that iterated through a Lisp vector; Sorry, I must have missed that. > at the > machine level the only pointer was into the middle of that vector. Addresses of > Lisp_Vector elements are not GCALIGNED on x86 and other platforms. True. > > 1. malloc() % GCALIGNMENT != 0. Never happens, as far as I can tell. > > Although that's true of all current Emacs porting targets as far as I know, I'd > rather not hardwire this into the code, as neither POSIX nor the C standard > require it. This is why the comment refers to platforms where malloc() % 8 != 0 > as "oddball hosts". But we can't figure out what alignment malloc guarantees, on practical hosts. To say we assume a malloc alignment of 8 is much better than to say we assume one of alignof (max_align_t), which is false on many systems. > > 2. A Lisp object requires greater alignment than malloc() gives it. > > IIRC, there was at least one RISC architecture whose specification > > We don't need anything that obscure. Just use __int128 on x86 with glibc 2.24. > On that platform __int128's alignment is 16, malloc's is 8. Sorry, but I think a type that is actually used by Emacs is less obscure than __float128 (which I think you mean; __int128 doesn't exist on x86), nevermind the question of whether the alignment of that should have been 16, since it works just fine misaligned (except when AC is set, but that's no longer x86-as-we-know-and-hate-it). > > I'm not saying it's the best solution, but I would prefer simply > > defining LISP_ALIGNMENT to be 8 to either patch. > > That's not correct for !USE_LSB_TAG, where LISP_ALIGNMENT is merely the native > alignment (there's no need to align objects to 8 because the tags are at the > high end). How is it incorrect? Suboptimal, maybe, though there's a performance improvement keeping things you access together in the same cache line. There's no need to align anything (non-SIMD) to anything on x86 without AC set, it's just good for performance; and that performance improvement applies whether or not Lisp_Objects are natively 64-bit or 2x32-bit. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-29 20:24 ` Paul Eggert 2020-05-29 21:01 ` Pip Cet @ 2020-05-30 5:50 ` Eli Zaretskii 1 sibling, 0 replies; 132+ messages in thread From: Eli Zaretskii @ 2020-05-30 5:50 UTC (permalink / raw) To: Paul Eggert; +Cc: 41321, monnier, pipcet > Cc: pipcet@gmail.com, 41321@debbugs.gnu.org, monnier@iro.umontreal.ca > From: Paul Eggert <eggert@cs.ucla.edu> > Date: Fri, 29 May 2020 13:24:55 -0700 > > There are really two bugs here. > > 1. The idea of taking the address modulo LISP_ALIGNMENT is wrong, as a pointer > can point into the middle of (say) a pseudovector and not be > LISP_ALIGNMENT-aligned. Replacing LISP_ALIGNMENT with GCALIGNMENT does not fix > this bug in general, because such a pointer might not be GCALIGNMENT-aligned > either. This bug can cause crashes because it causes GC to think an object is > garbage when it's not garbage. > > 2. LISP_ALIGNMENT is too large on MinGW and some other platforms. > > The patch I sent earlier attempted to be the simplest patch that would fix the > bug you observed on MinGW, which is a special case of (1). It does not attempt > to fix all plausible cases of (1), nor does it address (2). > > We can fix these two bugs separately, by installing the attached patches into > emacs-27. The first patch fixes (1) and thus fixes the crash along with other > plausible crashes. The second one fixes (2), and this fixes the MinGW crash in a > different way but does not fix the crash on other plausible platforms. (1) > probably has better performance than (2), though I doubt whether users will notice. Since (1) is for now purely theoretical (and rare even in that theoretical case), I'd like to see (2) applied to emacs-27. Let's do that soon, as I'd like to have another pretest in the near future. Thanks. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-28 19:33 ` Paul Eggert 2020-05-29 6:19 ` Eli Zaretskii @ 2020-05-29 8:25 ` Pip Cet 1 sibling, 0 replies; 132+ messages in thread From: Pip Cet @ 2020-05-29 8:25 UTC (permalink / raw) To: Paul Eggert; +Cc: 41321, Stefan Monnier On Thu, May 28, 2020 at 7:33 PM Paul Eggert <eggert@cs.ucla.edu> wrote: > too, as in the attached patch. Are you sure you attached the correct file? This patch is identical to one you'd sent earlier, and which Eli criticized for being overly conservative on GCALIGNMENT==1 systems. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-25 6:40 ` Pip Cet 2020-05-25 11:28 ` Pip Cet @ 2020-05-25 15:14 ` Eli Zaretskii 2020-05-25 17:41 ` Pip Cet 1 sibling, 1 reply; 132+ messages in thread From: Eli Zaretskii @ 2020-05-25 15:14 UTC (permalink / raw) To: Pip Cet; +Cc: 41321, monnier > From: Pip Cet <pipcet@gmail.com> > Date: Mon, 25 May 2020 06:40:11 +0000 > Cc: 41321@debbugs.gnu.org, Stefan Monnier <monnier@iro.umontreal.ca> > > What are you actually planning to do? Given the fact that I'm the only one who sees these problems? Not much: I intend to continue running Emacs under GDB and collect data about the crashes until either I figure out what causes the crashes, or the crashes disappear (which would mean the problem was fixed indirectly by some other change). > I think we should work around the mingw bug on both the master and > emacs-27 branches. That depends on what the proposed solution or workaround will be. We need to see where the discussion of the alignment issue goes and what we decide to do about that. > What we should not do is encourage people to keep looking for another > Emacs bug based on the existing backtraces. Indeed, I'm posting the backtraces for the record; no one should feel compelled to study them unless they are interested. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-25 15:14 ` Eli Zaretskii @ 2020-05-25 17:41 ` Pip Cet 0 siblings, 0 replies; 132+ messages in thread From: Pip Cet @ 2020-05-25 17:41 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 41321, Stefan Monnier On Mon, May 25, 2020 at 3:14 PM Eli Zaretskii <eliz@gnu.org> wrote: > > From: Pip Cet <pipcet@gmail.com> > > Date: Mon, 25 May 2020 06:40:11 +0000 > > Cc: 41321@debbugs.gnu.org, Stefan Monnier <monnier@iro.umontreal.ca> > > > > What are you actually planning to do? > Not > much: I intend to continue running Emacs under GDB and collect data > about the crashes until either I figure out what causes the crashes, > or the crashes disappear (which would mean the problem was fixed > indirectly by some other change). (Or directly, of course. I still believe my "theory" about your bug is correct.) > > I think we should work around the mingw bug on both the master and > > emacs-27 branches. > > That depends on what the proposed solution or workaround will be. For emacs-27, reducing the alignment requirement in maybe_lisp_pointer: that will only make us check more pointers, not fewer, so while it is a GC change it's one that makes sense. For master, I'd consider setting LISP_ALIGNMENT to 8 on the mingw32 platform, where memory is already scarce. I don't trust the alleged performance hit of 20%, so we might have to collect some actual performance data. But we definitely need to make strings aligned to LISP_ALIGNMENT, one way or the other, because that's the original reason for maybe_mark_pointer. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-24 15:00 ` Pip Cet 2020-05-24 16:25 ` Eli Zaretskii @ 2020-05-24 19:00 ` Andy Moreton 2020-05-24 19:09 ` Pip Cet 1 sibling, 1 reply; 132+ messages in thread From: Andy Moreton @ 2020-05-24 19:00 UTC (permalink / raw) To: 41321 On Sun 24 May 2020, Pip Cet wrote: > On Sun, May 24, 2020 at 2:24 PM Eli Zaretskii <eliz@gnu.org> wrote: >> > From: Pip Cet <pipcet@gmail.com> >> > Date: Sat, 23 May 2020 23:54:17 +0000 >> > Cc: Stefan Monnier <monnier@iro.umontreal.ca>, 41321@debbugs.gnu.org >> > >> > I think I've worked it out: it's this mingw bug: >> > https://sourceforge.net/p/mingw-w64/bugs/778/ >> >> Thank you for working on this tricky problem. >> >> FTR, I don't use that flavor of MinGW. > > So your flavor is even more broken than what Debian ships? That's > interesting, which flavor is it? FYI, there are two separate projects: mingw.org: 32bit only. mingw-w64: 32bit and 64bit, using a different C runtime. On my machine a simple test program shows: -------------------------------------------------------------- project gcc cpu alignof(max_align_t) -------------------------------------------------------------- mingw.org 9.2.0 i686 16 mingw-w64 10.1.0 i686 16 (stdint.h before stddef.h) 8 (stdint.h after stddef.h) mingw-w64 10.1.0 x86_64 16 -------------------------------------------------------------- This problem only appears with the 32bit mingw-w64 toolchain. Eli uses the mingw.org toolchain. Linux distros initially used mingw.org, but switched to mingw-w64 cross compilers several years ago. AndyM ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-24 19:00 ` Andy Moreton @ 2020-05-24 19:09 ` Pip Cet 0 siblings, 0 replies; 132+ messages in thread From: Pip Cet @ 2020-05-24 19:09 UTC (permalink / raw) To: Andy Moreton; +Cc: 41321 On Sun, May 24, 2020 at 7:01 PM Andy Moreton <andrewjmoreton@gmail.com> wrote: > > So your flavor is even more broken than what Debian ships? That's > > interesting, which flavor is it? > > FYI, there are two separate projects: > mingw.org: 32bit only. > mingw-w64: 32bit and 64bit, using a different C runtime. > > On my machine a simple test program shows: > > -------------------------------------------------------------- > project gcc cpu alignof(max_align_t) > -------------------------------------------------------------- > mingw.org 9.2.0 i686 16 > mingw-w64 10.1.0 i686 16 (stdint.h before stddef.h) > 8 (stdint.h after stddef.h) > mingw-w64 10.1.0 x86_64 16 > -------------------------------------------------------------- Thanks! > This problem only appears with the 32bit mingw-w64 toolchain. FWIW, the problem is that the incorrect value of 16 is returned in some cases. All 32bit toolchains appear to be broken. I said that mingw.org was "more broken" than mingw-w64 because it _always_ returns the incorrect value, rather than doing so only for an unfortunate combination of #includes. > Eli uses the mingw.org toolchain. Linux distros initially used > mingw.org, but switched to mingw-w64 cross compilers several years ago. I couldn't get the mingw.org toolchain to work at all... ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-22 7:22 ` Eli Zaretskii ` (3 preceding siblings ...) 2020-05-23 23:54 ` Pip Cet @ 2020-05-29 10:16 ` Eli Zaretskii 2020-05-29 10:34 ` Pip Cet 4 siblings, 1 reply; 132+ messages in thread From: Eli Zaretskii @ 2020-05-29 10:16 UTC (permalink / raw) To: Paul Eggert; +Cc: 41321, monnier, pipcet > Date: Fri, 22 May 2020 10:22:56 +0300 > From: Eli Zaretskii <eliz@gnu.org> > Cc: 41321@debbugs.gnu.org > > > > I'm already running with such a breakpoint, let's how it will catch > > > something. ^^^ > > > > Should have been "hope". Sorry. > > It happened again, and now insert-file-contents wasn't involved, so I > guess it's off the hook. The command which triggered the problem was > self-insert-command, as shown in the backtrace below. The problem > seems to be with handling overlays when buffer text changes. One more segfault very similar to the last one I reported: it happened when calling report_overlay_modification due to text being inserted into a buffer. The backtrace and the debugging session are below. Noteworthy observations: . The buffer's overlay chain and the buffer's marker chain are both intact and valid. . The two markers, start_marker and end_marker, which are created by PRESERVE_START_END before calling before-change-functions, are NOT in the buffer's marker chain after run-hook-with-args returns. This most probably means GC was invoked while run-hook-with-args ran and decided to GC those 2 markers, which then unchains them via unchain_dead_markers. . last_marked[] doesn't seem to mention start_marker or end_marker, at least not in its last 470 slots: (gdb) find /g1 &last_marked[0], last_marked[last_marked_index-1], 0xa00000001ffac2c8 Pattern not found. This seems to be a supporting evidence that those two markers were GC'ed. . start_marker and end_marker encode pointers which are 8-byte aligned, not 16-byte aligned. The values of the pointers are 0x1ffac2a8 and 0x1ffac2c8, as can be seen from the debug session. . There's nothing wrong with rvoe_arg.location; in the previous sessions we forgot to dereference it (it's a pointer to a Lisp object). Here's how it looks when shown correctly: (gdb) p rvoe_arg.location $14 = (Lisp_Object *) 0x15c9298 <globals+120> (gdb) p *rvoe_arg.location $15 = XIL(0xc00000001646b9b0) (gdb) xtype Lisp_Cons (gdb) xcar $16 = 0x30 (gdb) xsymbol $17 = (struct Lisp_Symbol *) 0x15ca210 <lispsym+48> "t" (gdb) p *rvoe_arg.location $18 = XIL(0xc00000001646b9b0) (gdb) xcdr $19 = 0xc00000001646b9d0 (gdb) xtype Lisp_Cons (gdb) xcar $20 = 0xd5c0 (gdb) xtype Lisp_Symbol (gdb) xsymbol $21 = (struct Lisp_Symbol *) 0x15d77a0 <lispsym+54720> "syntax-ppss-flush-cache" (gdb) p *rvoe_arg.location $22 = XIL(0xc00000001646b9b0) (gdb) xcdr $23 = 0xc00000001646b9d0 (gdb) xcdr $24 = 0x0 [...] (gdb) pp *rvoe_arg.location (t syntax-ppss-flush-cache) . There's nothing wrong with GDB's xtype command: it fails when a Lisp object encodes a pointer to invalid memory: (gdb) p start_marker $25 = XIL(0xa00000001ffac2a8) (gdb) xtype Lisp_Vectorlike Cannot access memory at address 0x1ffac2a8 (gdb) p/x start_marker $26 = 0xa00000001ffac2a8 (gdb) xgettype $26 (gdb) p $type $27 = Lisp_Vectorlike (gdb) xvectype $26 Cannot access memory at address 0x1ffac2a8 (gdb) p/x ((struct Lisp_Vector *) $26)->header.size warning: value truncated Cannot access memory at address 0x1ffac2a8 (gdb) p/x ((struct Lisp_Vector *) $26)->header warning: value truncated Cannot access memory at address 0x1ffac2a8 (gdb) p/x ((struct Lisp_Vector *) $26) warning: value truncated $35 = 0x1ffac2a8 (gdb) p/x end_marker $38 = 0xa00000001ffac2c8 (gdb) xtype Lisp_Vectorlike Cannot access memory at address 0x1ffac2a8 (gdb) p/x ((struct Lisp_Vector *)0x1ffac2c8)->header Cannot access memory at address 0x1ffac2c8 . Provisional conclusion: the two temporary markers created by signal_before_change were on the stack (see my other message with code disassembly), and were GC'ed as side effect or running syntax-ppss-flush-cache via before-change-functions. So we should see whether fixing the LISP_ALIGNMENT vs GCALIGNMENT discrepancy fixes this problem. Here's the backtrace and the full debug session after the crash, with some omissions: Thread 1 received signal SIGSEGV, Segmentation fault. PSEUDOVECTORP (code=<optimized out>, a=<optimized out>) at lisp.h:1720 1720 return PSEUDOVECTOR_TYPEP (XUNTAG (a, Lisp_Vectorlike, (gdb) bt #0 PSEUDOVECTORP (code=<optimized out>, a=<optimized out>) at lisp.h:1720 #1 MARKERP (x=<optimized out>) at lisp.h:2618 #2 CHECK_MARKER (x=XIL(0xa00000001ffac2c8)) at marker.c:133 #3 0x010f073c in Fmarker_position (marker=XIL(0xa00000001ffac2c8)) at marker.c:452 #4 0x010edd34 in signal_before_change (preserve_ptr=0x0, end_int=276884, start_int=276884) at insdel.c:2179 #5 prepare_to_modify_buffer_1 (start=start@entry=276884, end=end@entry=276884, preserve_ptr=preserve_ptr@entry=0x0) at insdel.c:2007 #6 0x010ee27d in prepare_to_modify_buffer (start=276884, end=276884, preserve_ptr=preserve_ptr@entry=0x0) at insdel.c:2018 #7 0x010ee54d in insert_1_both ( string=0x1e3c9c08 " 2823D 26-May gdb-patches@sourceware.or [244] Re: [PATCH, testsuite] Fix some duplicate test names\n\r...", nchars=100, nbytes=100, inherit=false, prepare=true, before_markers=false) at insdel.c:896 #8 0x010ee5c5 in insert_1_both (string=<optimized out>, nchars=<optimized out>, nchars@entry=100, nbytes=<optimized out>, nbytes@entry=100, inherit=inherit@entry=false, prepare=prepare@entry=true, before_markers=before_markers@entry=false) at insdel.c:947 #9 0x01174188 in Fprinc (object=XIL(0x800000001e05f278), printcharfun=<optimized out>) at print.c:734 #10 0x0114fc5c in funcall_subr (subr=<optimized out>, numargs=<optimized out>, numargs@entry=2, args=<optimized out>, args@entry=0x82d9b8) at eval.c:2869 #11 0x0114daed in Ffuncall (nargs=3, args=args@entry=0x82d9b0) at eval.c:2794 #12 0x0118ebe7 in exec_byte_code (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, nargs@entry=4, args=<optimized out>, args@entry=0x82dde8) at bytecode.c:633 #13 0x0115134f in funcall_lambda (fun=<optimized out>, nargs=nargs@entry=4, arg_vector=arg_vector@entry=0x82dde8) at eval.c:2989 #14 0x0114da43 in Ffuncall (nargs=5, args=args@entry=0x82dde0) at eval.c:2808 #15 0x0118ebe7 in exec_byte_code (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, nargs@entry=3, args=<optimized out>, args@entry=0x82e1b0) at bytecode.c:633 #16 0x0115134f in funcall_lambda (fun=<optimized out>, nargs=nargs@entry=3, arg_vector=arg_vector@entry=0x82e1b0) at eval.c:2989 #17 0x0114da43 in Ffuncall (nargs=4, args=args@entry=0x82e1a8) at eval.c:2808 #18 0x0118ebe7 in exec_byte_code (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, nargs@entry=0, args=<optimized out>, args@entry=0x82e570) at bytecode.c:633 #19 0x0115134f in funcall_lambda (fun=<optimized out>, nargs=nargs@entry=0, arg_vector=arg_vector@entry=0x82e570) at eval.c:2989 #20 0x0114da43 in Ffuncall (nargs=nargs@entry=1, args=args@entry=0x82e568) at eval.c:2808 #21 0x0114de2d in Fapply (nargs=2, args=0x82e568) at eval.c:2377 #22 0x0114daed in Ffuncall (nargs=3, args=args@entry=0x82e560) at eval.c:2794 #23 0x0118ebe7 in exec_byte_code (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, nargs@entry=0, args=<optimized out>, args@entry=0x82e8c0) at bytecode.c:633 #24 0x0115134f in funcall_lambda (fun=<optimized out>, nargs=nargs@entry=0, arg_vector=arg_vector@entry=0x82e8c0) at eval.c:2989 #25 0x0114da43 in Ffuncall (nargs=1, args=args@entry=0x82e8b8) at eval.c:2808 #26 0x0118ebe7 in exec_byte_code (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, nargs@entry=3, args=<optimized out>, args@entry=0x82ed30) at bytecode.c:633 #27 0x0115134f in funcall_lambda (fun=<optimized out>, nargs=nargs@entry=3, arg_vector=arg_vector@entry=0x82ed30) at eval.c:2989 #28 0x0114da43 in Ffuncall (nargs=4, args=args@entry=0x82ed28) at eval.c:2808 #29 0x0118ebe7 in exec_byte_code (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, nargs@entry=1, args=<optimized out>, args@entry=0x82f298) at bytecode.c:633 #30 0x0115134f in funcall_lambda (fun=<optimized out>, nargs=nargs@entry=1, arg_vector=arg_vector@entry=0x82f298) at eval.c:2989 #31 0x0114da43 in Ffuncall (nargs=nargs@entry=2, args=args@entry=0x82f290) at eval.c:2808 #32 0x0114906d in Ffuncall_interactively (nargs=2, args=0x82f290) at callint.c:254 #33 0x0114daed in Ffuncall (nargs=nargs@entry=3, args=args@entry=0x82f288) at eval.c:2794 #34 0x0114df22 in Fapply (nargs=nargs@entry=3, args=args@entry=0x82f288) at eval.c:2381 #35 0x0114afbb in Fcall_interactively (function=XIL(0x5f2c790), record_flag=<optimized out>, keys=XIL(0xa00000000759f578)) at callint.c:342 #36 0x0114fc89 in funcall_subr (subr=<optimized out>, numargs=<optimized out>, numargs@entry=3, args=<optimized out>, args@entry=0x82f430) at eval.c:2872 #37 0x0114daed in Ffuncall (nargs=4, args=args@entry=0x82f428) at eval.c:2794 #38 0x0118ebe7 in exec_byte_code (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, nargs@entry=1, args=<optimized out>, args@entry=0x82f7b8) at bytecode.c:633 #39 0x0115134f in funcall_lambda (fun=<optimized out>, nargs=nargs@entry=1, arg_vector=arg_vector@entry=0x82f7b8) at eval.c:2989 #40 0x0114da43 in Ffuncall (nargs=nargs@entry=2, args=args@entry=0x82f7b0) at eval.c:2808 #41 0x0114dc1c in call1 (fn=XIL(0x3f30), arg1=XIL(0x5f2c790)) at eval.c:2654 #42 0x010d0efe in command_loop_1 () at keyboard.c:1463 #43 0x0114ca0f in internal_condition_case ( bfun=bfun@entry=0x10d0a0e <command_loop_1>, handlers=XIL(0x90), hfun=hfun@entry=0x10c5049 <cmd_error>) at eval.c:1355 #44 0x010bdbda in command_loop_2 (ignore=XIL(0)) at keyboard.c:1091 #45 0x0114c996 in internal_catch (tag=XIL(0xdfb0), func=func@entry=0x10bdbb3 <command_loop_2>, arg=XIL(0)) at eval.c:1116 #46 0x010bdb5d in command_loop () at keyboard.c:1070 #47 0x010c4bf3 in recursive_edit_1 () at keyboard.c:714 #48 0x010c4f0c in Frecursive_edit () at keyboard.c:786 #49 0x0124a594 in main (argc=<optimized out>, argv=<optimized out>) at emacs.c:2054 Lisp Backtrace: "princ" (0x82d9b8) "rmail-new-summary-1" (0x82dde8) "rmail-new-summary" (0x82e1b0) "rmail-summary" (0x82e570) "apply" (0x82e568) "rmail-update-summary" (0x82e8c0) "rmail-get-new-mail-1" (0x82ed30) "rmail-get-new-mail" (0x82f298) "funcall-interactively" (0x82f290) "call-interactively" (0x82f430) "command-execute" (0x82f7b8) (gdb) fr 4 #4 0x010f073c in Fmarker_position (marker=XIL(0xa00000001ffac2c8)) at marker.c:452 452 CHECK_MARKER (marker); (gdb) up #5 0x010edd34 in signal_before_change (preserve_ptr=0x0, end_int=276884, start_int=276884) at insdel.c:2179 2179 report_overlay_modification (FETCH_START, FETCH_END, 0, (gdb) p current_buffer->overlays_before $1 = (struct Lisp_Overlay *) 0x75ac520 (gdb) p *$ $2 = { header = { size = 1140854787 }, start = XIL(0xa0000000075ac4e0), end = XIL(0xa0000000075ac500), plist = XIL(0xc0000000077f2340), next = 0x0 } (gdb) p/x $1->header.size $3 = 0x44001003 (gdb) p current_buffer->name_ $4 = XIL(0x8000000007364540) (gdb) xtype Lisp_String (gdb) xstring $5 = (struct Lisp_String *) 0x7364540 "INBOX-summary" (gdb) p current_buffer->overlays_before->start $6 = XIL(0xa0000000075ac4e0) (gdb) p *$ $7 = 1124081664 (gdb) p current_buffer->overlays_before->start $8 = XIL(0xa0000000075ac4e0) (gdb) xtype Lisp_Vectorlike PVEC_MARKER (gdb) xmarker $9 = (struct Lisp_Marker *) 0x75ac4e0 (gdb) p *$ $10 = { header = { size = 1124081664 }, buffer = 0x7519948, need_adjustment = 0, insertion_type = 0, next = 0x0, charpos = 1, bytepos = 1 } (gdb) p current_buffer->overlays_before->next $11 = (struct Lisp_Overlay *) 0x0 (gdb) p current_buffer->overlays_after $12 = (struct Lisp_Overlay *) 0x0 (gdb) p rvoe_arg $13 = { location = 0x15c9298 <globals+120>, errorp = false } (gdb) p rvoe_arg.location $14 = (Lisp_Object *) 0x15c9298 <globals+120> (gdb) p *rvoe_arg.location $15 = XIL(0xc00000001646b9b0) (gdb) xtype Lisp_Cons (gdb) xcar $16 = 0x30 (gdb) xsymbol $17 = (struct Lisp_Symbol *) 0x15ca210 <lispsym+48> "t" (gdb) p *rvoe_arg.location $18 = XIL(0xc00000001646b9b0) (gdb) xcdr $19 = 0xc00000001646b9d0 (gdb) xtype Lisp_Cons (gdb) xcar $20 = 0xd5c0 (gdb) xtype Lisp_Symbol (gdb) xsymbol $21 = (struct Lisp_Symbol *) 0x15d77a0 <lispsym+54720> "syntax-ppss-flush-cache" (gdb) p *rvoe_arg.location $22 = XIL(0xc00000001646b9b0) (gdb) xcdr $23 = 0xc00000001646b9d0 (gdb) xcdr $24 = 0x0 (gdb) p start_marker $25 = XIL(0xa00000001ffac2a8) (gdb) xtype Lisp_Vectorlike Cannot access memory at address 0x1ffac2a8 (gdb) p/x start_marker $26 = 0xa00000001ffac2a8 (gdb) xgettype $26 (gdb) p $type $27 = Lisp_Vectorlike (gdb) xvectype $26 Cannot access memory at address 0x1ffac2a8 (gdb) p/x ((struct Lisp_Vector *) $26)->header.size warning: value truncated Cannot access memory at address 0x1ffac2a8 (gdb) p/x ((struct Lisp_Vector *) $26)->header warning: value truncated Cannot access memory at address 0x1ffac2a8 (gdb) p/x ((struct Lisp_Vector *) $26) warning: value truncated $35 = 0x1ffac2a8 (gdb) p/x $26 $36 = 0xa00000001ffac2a8 (gdb) p/x ((struct Lisp_Vector *)0x1ffac2a8 A syntax error in expression, near `'. (gdb) p/x ((struct Lisp_Vector *)0x1ffac2a8) $37 = 0x1ffac2a8 (gdb) p/x *((struct Lisp_Vector *)0x1ffac2a8) Cannot access memory at address 0x1ffac2a8 (gdb) p/x end_marker $38 = 0xa00000001ffac2c8 (gdb) xtype Lisp_Vectorlike Cannot access memory at address 0x1ffac2a8 (gdb) p/x ((struct Lisp_Vector *)0x1ffac2c8)->header Cannot access memory at address 0x1ffac2c8 (gdb) p Vfirst_change_hook $39 = XIL(0) (gdb) p current_buffer->text->markers $40 = (struct Lisp_Marker *) 0x76353a0 (gdb) p *$ $41 = { header = { size = 1124081664 }, buffer = 0x7519948, need_adjustment = 0, insertion_type = 0, next = 0x76353e0, charpos = 1, bytepos = 1 } (gdb) p current_buffer->text->markers->next $42 = (struct Lisp_Marker *) 0x76353e0 (gdb) p *$ $43 = { header = { size = 1124081664 }, buffer = 0x7519948, need_adjustment = 0, insertion_type = 0, next = 0x7635420, charpos = 1, bytepos = 1 } (gdb) p current_buffer->text->markers->next->next $44 = (struct Lisp_Marker *) 0x7635420 (gdb) p *$ $45 = { header = { size = 1124081664 }, buffer = 0x7519948, need_adjustment = 0, insertion_type = 0, next = 0x16b6a5d0, charpos = 1, bytepos = 1 } (gdb) p current_buffer->text->markers->next->next->next $46 = (struct Lisp_Marker *) 0x16b6a5d0 (gdb) p *$ $47 = { header = { size = 1124081664 }, buffer = 0x7519948, need_adjustment = 0, insertion_type = 0, next = 0x16b6a5b0, charpos = 1, bytepos = 1 } (gdb) p/x start_marker $98 = 0xa00000001ffac2c8 (gdb) pp *rvoe_arg.location (t syntax-ppss-flush-cache) (gdb) p last_mar last_marked last_marked_index (gdb) p last_marked_index $99 = 498 (gdb) p last_marked[497] $100 = XIL(0x439c370) (gdb) xtype Lisp_Vectorlike Cannot access memory at address 0x1ffac2a8 (gdb) find /g1 &last_marked[0], last_marked[last_marked_index-1], 0xa00000001ffac2a8 Pattern not found. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-29 10:16 ` Eli Zaretskii @ 2020-05-29 10:34 ` Pip Cet 2020-05-29 10:55 ` Eli Zaretskii 0 siblings, 1 reply; 132+ messages in thread From: Pip Cet @ 2020-05-29 10:34 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Paul Eggert, 41321, Stefan Monnier On Fri, May 29, 2020 at 10:16 AM Eli Zaretskii <eliz@gnu.org> wrote: > > Date: Fri, 22 May 2020 10:22:56 +0300 > > From: Eli Zaretskii <eliz@gnu.org> > > Cc: 41321@debbugs.gnu.org > > > > > > I'm already running with such a breakpoint, let's how it will catch > > > > something. ^^^ > > > > > > Should have been "hope". Sorry. > > > > It happened again, and now insert-file-contents wasn't involved, so I > > guess it's off the hook. The command which triggered the problem was > > self-insert-command, as shown in the backtrace below. The problem > > seems to be with handling overlays when buffer text changes. > > One more segfault very similar to the last one I reported: it happened > when calling report_overlay_modification due to text being inserted > into a buffer. Everything looks consistent with the bug I described. > . There's nothing wrong with GDB's xtype command: it fails when a Lisp > object encodes a pointer to invalid memory: (gdb) p last_marked[497] $100 = XIL(0x439c370) (gdb) xtype Lisp_Vectorlike Cannot access memory at address 0x1ffac2a8 Again, that can't be right. $100 is a Lisp_Symbol, not a vectorlike, and it's not at address 0x1ffac2a8. So my suspicion remains that this is a gdb bug, and it appears to be a reproducible one! > . So we should > see whether fixing the LISP_ALIGNMENT vs GCALIGNMENT discrepancy > fixes this problem. I concur. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-29 10:34 ` Pip Cet @ 2020-05-29 10:55 ` Eli Zaretskii 2020-05-29 11:47 ` Pip Cet 0 siblings, 1 reply; 132+ messages in thread From: Eli Zaretskii @ 2020-05-29 10:55 UTC (permalink / raw) To: Pip Cet; +Cc: eggert, 41321, monnier > From: Pip Cet <pipcet@gmail.com> > Date: Fri, 29 May 2020 10:34:20 +0000 > Cc: Paul Eggert <eggert@cs.ucla.edu>, Stefan Monnier <monnier@iro.umontreal.ca>, 41321@debbugs.gnu.org > > > . There's nothing wrong with GDB's xtype command: it fails when a Lisp > > object encodes a pointer to invalid memory: > > (gdb) p last_marked[497] > $100 = XIL(0x439c370) > (gdb) xtype > Lisp_Vectorlike > Cannot access memory at address 0x1ffac2a8 > > Again, that can't be right. $100 is a Lisp_Symbol, not a vectorlike, > and it's not at address 0x1ffac2a8. > > So my suspicion remains that this is a gdb bug, and it appears to be a > reproducible one! There's no bug: the $size variable was not updated inside pvectype because the 'set' command tried to access invalid memory. So the rest is using the stale value of $size. Puff! no miracle and no bug. You just don't need to assign too much importance to the address the error message displays, it might not be the problematic address. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-29 10:55 ` Eli Zaretskii @ 2020-05-29 11:47 ` Pip Cet 2020-05-29 13:52 ` Eli Zaretskii 0 siblings, 1 reply; 132+ messages in thread From: Pip Cet @ 2020-05-29 11:47 UTC (permalink / raw) To: Eli Zaretskii; +Cc: eggert, 41321, Stefan Monnier On Fri, May 29, 2020 at 10:55 AM Eli Zaretskii <eliz@gnu.org> wrote: > > From: Pip Cet <pipcet@gmail.com> > > Date: Fri, 29 May 2020 10:34:20 +0000 > > Cc: Paul Eggert <eggert@cs.ucla.edu>, Stefan Monnier <monnier@iro.umontreal.ca>, 41321@debbugs.gnu.org > > > > > . There's nothing wrong with GDB's xtype command: it fails when a Lisp > > > object encodes a pointer to invalid memory: > > > > (gdb) p last_marked[497] > > $100 = XIL(0x439c370) > > (gdb) xtype > > Lisp_Vectorlike > > Cannot access memory at address 0x1ffac2a8 > > > > Again, that can't be right. $100 is a Lisp_Symbol, not a vectorlike, > > and it's not at address 0x1ffac2a8. > > > > So my suspicion remains that this is a gdb bug, and it appears to be a > > reproducible one! > > There's no bug: I believe there is. > the $size variable was not updated inside pvectype > because the 'set' command tried to access invalid memory. Why would pvectype be called at all? xtype should have said "Lisp_Symbol", not "Lisp_Vectorlike", and never gotten to pvectype at all. Feel free to try that, in a fresh GDB session: p 0x439c370 xtype > So the rest > is using the stale value of $size. Puff! no miracle and no bug. Which rest? There's no message after "Cannot access memory at address 0x1ffac2a8" > You just don't need to assign too much importance to the address the > error message displays, it might not be the problematic address. Or there might not be a problematic address, because xtype is somehow using the value of $ which it used when it encountered the initial bug even for subsequent calls. It doesn't do that here. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-29 11:47 ` Pip Cet @ 2020-05-29 13:52 ` Eli Zaretskii 2020-05-29 14:19 ` Pip Cet 0 siblings, 1 reply; 132+ messages in thread From: Eli Zaretskii @ 2020-05-29 13:52 UTC (permalink / raw) To: Pip Cet; +Cc: eggert, 41321, monnier > From: Pip Cet <pipcet@gmail.com> > Date: Fri, 29 May 2020 11:47:46 +0000 > Cc: eggert@cs.ucla.edu, Stefan Monnier <monnier@iro.umontreal.ca>, 41321@debbugs.gnu.org > > > There's no bug: > > I believe there is. > > > the $size variable was not updated inside pvectype > > because the 'set' command tried to access invalid memory. > > Why would pvectype be called at all? xtype should have said > "Lisp_Symbol", not "Lisp_Vectorlike", and never gotten to pvectype at > all. Look at what xtype does, and you will see. ^ permalink raw reply [flat|nested] 132+ messages in thread
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects 2020-05-29 13:52 ` Eli Zaretskii @ 2020-05-29 14:19 ` Pip Cet 0 siblings, 0 replies; 132+ messages in thread From: Pip Cet @ 2020-05-29 14:19 UTC (permalink / raw) To: Eli Zaretskii; +Cc: eggert, 41321, Stefan Monnier On Fri, May 29, 2020 at 1:53 PM Eli Zaretskii <eliz@gnu.org> wrote: > > From: Pip Cet <pipcet@gmail.com> > > Date: Fri, 29 May 2020 11:47:46 +0000 > > Cc: eggert@cs.ucla.edu, Stefan Monnier <monnier@iro.umontreal.ca>, 41321@debbugs.gnu.org > > > > > There's no bug: > > > > I believe there is. > > > > > the $size variable was not updated inside pvectype > > > because the 'set' command tried to access invalid memory. > > > > Why would pvectype be called at all? xtype should have said > > "Lisp_Symbol", not "Lisp_Vectorlike", and never gotten to pvectype at > > all. > > Look at what xtype does, and you will see. So you think it's a bug in xtype? The relevant definitions are: define xtype xgettype $ output $type echo \n if $type == Lisp_Vectorlike xvectype end end define xgettype if (CHECK_LISP_OBJECT_TYPE) set $bugfix = $arg0.i else set $bugfix = $arg0 end set $type = (enum Lisp_Type) (USE_LSB_TAG ? (EMACS_INT) $bugfix & (1 << GCTYPEBITS) - 1 : (EMACS_UINT) $bugfix >> VALBITS) end Both look fine to me: xtype calls xgettype (not xvectype), which sets $type to the type bits, then outputs them. But the bug must have happened by then, because what's output is "Lisp_Vectorlike" even though $ is a Lisp_Symbol. I fail to see how xvectype and pvectype are relevant at all... ^ permalink raw reply [flat|nested] 132+ messages in thread
end of thread, other threads:[~2020-09-27 15:16 UTC | newest] Thread overview: 132+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-05-16 10:33 bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects Eli Zaretskii 2020-05-16 16:33 ` Paul Eggert 2020-05-16 16:47 ` Eli Zaretskii 2020-05-17 10:56 ` Pip Cet 2020-05-17 15:28 ` Eli Zaretskii 2020-05-17 15:57 ` Eli Zaretskii 2020-05-22 7:22 ` Eli Zaretskii 2020-05-22 8:35 ` Andrea Corallo 2020-05-22 11:04 ` Eli Zaretskii 2020-05-22 12:55 ` Andrea Corallo 2020-05-22 10:54 ` Eli Zaretskii 2020-05-22 11:47 ` Pip Cet 2020-05-22 12:13 ` Eli Zaretskii 2020-05-22 12:39 ` Pip Cet 2020-05-22 12:48 ` Eli Zaretskii 2020-05-22 14:04 ` Pip Cet 2020-05-22 14:26 ` Eli Zaretskii 2020-05-22 14:40 ` Andrea Corallo 2020-05-22 19:03 ` Eli Zaretskii [not found] ` <CAOqdjBdpU4U1NqErNH0idBmUxNeE3fL=2=KKpo9kbCM3DhW5gA@mail.gmail.com> 2020-05-23 17:58 ` Andrea Corallo 2020-05-23 22:37 ` Stefan Monnier 2020-05-23 22:41 ` Pip Cet 2020-05-23 23:26 ` Stefan Monnier 2020-05-22 12:32 ` Eli Zaretskii 2020-05-29 9:51 ` Eli Zaretskii 2020-05-29 10:00 ` Pip Cet 2020-05-23 23:54 ` Pip Cet 2020-05-24 14:24 ` Eli Zaretskii 2020-05-24 15:00 ` Pip Cet 2020-05-24 16:25 ` Eli Zaretskii 2020-05-24 16:55 ` Eli Zaretskii 2020-05-24 18:03 ` Pip Cet 2020-05-24 18:40 ` Eli Zaretskii 2020-05-24 19:40 ` Pip Cet 2020-05-25 2:30 ` Eli Zaretskii 2020-05-25 6:40 ` Pip Cet 2020-05-25 11:28 ` Pip Cet 2020-05-25 14:53 ` Eli Zaretskii 2020-05-25 15:12 ` Stefan Monnier 2020-05-26 3:39 ` Paul Eggert 2020-05-26 3:33 ` Paul Eggert 2020-05-26 6:18 ` Pip Cet 2020-05-26 7:51 ` Paul Eggert 2020-05-26 8:27 ` Pip Cet 2020-05-26 6:46 ` Paul Eggert 2020-05-26 15:17 ` Eli Zaretskii 2020-05-26 22:49 ` Paul Eggert 2020-05-27 15:26 ` Eli Zaretskii 2020-05-27 16:58 ` Paul Eggert 2020-05-27 17:33 ` Eli Zaretskii 2020-05-27 17:53 ` Paul Eggert 2020-05-27 18:24 ` Eli Zaretskii 2020-05-27 18:39 ` Paul Eggert 2020-05-28 2:43 ` Stefan Monnier 2020-05-28 7:27 ` Eli Zaretskii 2020-05-28 7:41 ` Paul Eggert 2020-05-28 13:30 ` Stefan Monnier 2020-05-28 14:28 ` Pip Cet 2020-05-28 16:24 ` Stefan Monnier 2020-05-29 9:43 ` Pip Cet 2020-05-29 18:31 ` Paul Eggert 2020-05-29 18:37 ` Pip Cet 2020-05-29 19:32 ` Paul Eggert 2020-05-29 19:37 ` Pip Cet 2020-05-29 20:26 ` Stefan Monnier 2020-05-29 20:40 ` Paul Eggert 2020-05-30 5:54 ` Eli Zaretskii 2020-05-30 17:52 ` Paul Eggert 2020-05-30 18:11 ` Eli Zaretskii 2020-05-30 18:17 ` Paul Eggert 2020-05-30 5:51 ` Eli Zaretskii 2020-05-30 14:26 ` Stefan Monnier 2020-05-27 17:57 ` Pip Cet 2020-05-27 18:39 ` Paul Eggert 2020-05-27 18:56 ` Pip Cet 2020-05-28 1:21 ` Paul Eggert 2020-05-28 6:31 ` Pip Cet 2020-05-28 7:47 ` Paul Eggert 2020-05-28 8:11 ` Pip Cet 2020-05-28 18:27 ` Eli Zaretskii 2020-05-28 19:33 ` Paul Eggert 2020-05-29 6:19 ` Eli Zaretskii 2020-05-29 20:24 ` Paul Eggert 2020-05-29 21:01 ` Pip Cet 2020-05-30 5:58 ` Eli Zaretskii 2020-05-30 7:19 ` Pip Cet 2020-05-30 9:08 ` Eli Zaretskii 2020-05-30 11:06 ` Pip Cet 2020-05-30 11:31 ` Eli Zaretskii 2020-05-30 13:29 ` Pip Cet 2020-05-30 16:32 ` Eli Zaretskii 2020-05-30 16:36 ` Pip Cet 2020-05-30 16:45 ` Eli Zaretskii 2020-05-30 18:04 ` Paul Eggert 2020-05-30 18:12 ` Pip Cet 2020-05-30 18:16 ` Eli Zaretskii 2020-05-30 18:45 ` Paul Eggert 2020-05-30 18:39 ` Pip Cet 2020-05-30 18:57 ` Paul Eggert 2020-05-30 19:06 ` Pip Cet 2020-05-30 21:27 ` Paul Eggert 2020-05-30 21:49 ` Pip Cet 2020-05-30 22:23 ` Paul Eggert 2020-05-30 22:54 ` Pip Cet 2020-05-30 16:31 ` Paul Eggert 2020-05-30 16:42 ` Eli Zaretskii 2020-05-30 17:06 ` Paul Eggert 2020-05-30 17:22 ` Eli Zaretskii 2020-05-30 18:12 ` Paul Eggert 2020-05-30 18:21 ` Eli Zaretskii 2020-05-30 19:14 ` Paul Eggert 2020-05-30 19:33 ` Eli Zaretskii 2020-05-30 22:18 ` Paul Eggert 2020-05-31 15:48 ` Eli Zaretskii 2020-06-01 14:48 ` Eli Zaretskii 2020-09-27 14:39 ` Lars Ingebrigtsen 2020-09-27 14:45 ` Pip Cet 2020-09-27 15:02 ` Lars Ingebrigtsen 2020-09-27 15:16 ` Eli Zaretskii 2020-05-30 16:53 ` Pip Cet 2020-05-30 5:50 ` Eli Zaretskii 2020-05-29 8:25 ` Pip Cet 2020-05-25 15:14 ` Eli Zaretskii 2020-05-25 17:41 ` Pip Cet 2020-05-24 19:00 ` Andy Moreton 2020-05-24 19:09 ` Pip Cet 2020-05-29 10:16 ` Eli Zaretskii 2020-05-29 10:34 ` Pip Cet 2020-05-29 10:55 ` Eli Zaretskii 2020-05-29 11:47 ` Pip Cet 2020-05-29 13:52 ` Eli Zaretskii 2020-05-29 14:19 ` Pip Cet
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).