unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
@ 2020-05-16 10:33 Eli Zaretskii
  2020-05-16 16:33 ` Paul Eggert
  2020-05-17 10:56 ` Pip Cet
  0 siblings, 2 replies; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-16 10:33 UTC (permalink / raw)
  To: 41321

I don't have a reproducible recipe, unfortunately.

What happens is that Emacs aborts a short time after reverting a
buffer (reverted because the file it is visiting was changed on disk).
So far, I've seen this in a C Mode buffer reverted because "git pull"
brought a modified version, and in an Info mode buffer reverted
because the manual was rebuilt after the Texinfo sources were
modified.  In the latter case I captured a backtrace, see below.

The problem seem to involve invalid markers, perhaps markers that were
unchained and put on the free list (witness the PVEC_FREE object that
caused the abort in the backtrace below, where Emacs seems to be
trying to display an error message about an invalid marker).

I don't think I saw such problems in Emacs 27.0.90, so I walked
through all the changes since then till 27.0.91 release, but didn't
see anything that could explain the problem.

Needless to say, this is a serious problem, so I'd like to ask
everyone to please run the latest pretest under a debugger and report
any similar problems with all the details they can provide.

Here's the backtrace and some additional information from the session
where it happened last:

Thread 1 hit Breakpoint 3, 0x77c36bb3 in msvcrt!abort ()
   from C:\WINDOWS\system32\msvcrt.dll
(gdb) bt
#0  0x77c36bb3 in msvcrt!abort () from C:\WINDOWS\system32\msvcrt.dll
#1  0x011cfdd8 in emacs_abort () at w32fns.c:10893
#2  0x01175f3a in print_vectorlike (obj=<optimized out>,
    printcharfun=XIL(0x30), escapeflag=escapeflag@entry=true,
    buf=buf@entry=0x82f07a "") at print.c:1830
#3  0x01172055 in print_object (obj=<optimized out>, printcharfun=XIL(0x30),
    escapeflag=true) at print.c:2148
#4  0x01172f04 in print (obj=<optimized out>, printcharfun=<optimized out>,
    escapeflag=<optimized out>, escapeflag@entry=true) at print.c:1147
#5  0x01173355 in Fprin1 (object=XIL(0xa00000001c9866d8),
    printcharfun=<optimized out>) at print.c:653
#6  0x0117483b in print_error_message (data=<optimized out>,
    stream=<optimized out>, context=<optimized out>, caller=<optimized out>)
    at print.c:979
#7  0x010c13c5 in Fcommand_error_default_function (
    data=XIL(0xc000000000ff92e0), context=XIL(0x80000000058e9118),
    sys_signal=XIL(0x5d72548)) at keyboard.c:1029
#8  0x0114fb99 in funcall_subr (subr=<optimized out>,
    numargs=<optimized out>, numargs@entry=3, args=<optimized out>,
    args@entry=0x82f498) at eval.c:2872
#9  0x0114d9fd in Ffuncall (nargs=4, args=0x82f490) at eval.c:2794
#10 0x0114dca3 in Fapply (nargs=<optimized out>, args=<optimized out>)
    at eval.c:2424
#11 0x0114d9fd in Ffuncall (nargs=3, args=args@entry=0x82f590) at eval.c:2794
#12 0x0118eaf7 in exec_byte_code (bytestr=<optimized out>,
    vector=<optimized out>, maxdepth=<optimized out>,
    args_template=<optimized out>, nargs=<optimized out>, nargs@entry=3,
    args=<optimized out>, args@entry=0x82f888) at bytecode.c:633
#13 0x0115125f in funcall_lambda (fun=<optimized out>, nargs=nargs@entry=3,
    arg_vector=arg_vector@entry=0x82f888) at eval.c:2989
#14 0x0114d953 in Ffuncall (nargs=nargs@entry=4, args=args@entry=0x82f880)
    at eval.c:2808
#15 0x01151d29 in call3 (fn=XIL(0xa000000005e00b20),
    arg1=XIL(0xc000000000ff92e0), arg2=XIL(0x80000000058e9118),
    arg3=XIL(0x5d72548)) at eval.c:2668
#16 0x010c5020 in cmd_error_internal (data=XIL(0xc000000000ff92e0),
    context=context@entry=0x82f92e "") at keyboard.c:984
#17 0x010c51e6 in cmd_error (data=XIL(0xc000000000ff92e0)) at keyboard.c:953
#18 0x0114c952 in internal_condition_case (
    bfun=bfun@entry=0x10d0a0e <command_loop_1>, handlers=XIL(0x90),
    hfun=hfun@entry=0x10c5049 <cmd_error>) at eval.c:1351
#19 0x010bdbda in command_loop_2 (ignore=XIL(0)) at keyboard.c:1091
#20 0x0114c8a6 in internal_catch (tag=XIL(0xdfb0),
    func=func@entry=0x10bdbb3 <command_loop_2>, arg=XIL(0)) at eval.c:1116
#21 0x010bdb5d in command_loop () at keyboard.c:1070
#22 0x010c4bf3 in recursive_edit_1 () at keyboard.c:714
#23 0x010c4f0c in Frecursive_edit () at keyboard.c:786
#24 0x0124a4a4 in main (argc=<optimized out>, argv=<optimized out>)
    at emacs.c:2054

Lisp Backtrace:
"command-error-default-function" (0x82f498)
"apply" (0x82f598)
0x5e00b20 PVEC_COMPILED
(gdb) fr 2
#2  0x01175f3a in print_vectorlike (obj=<optimized out>,
    printcharfun=XIL(0x30), escapeflag=escapeflag@entry=true,
    buf=buf@entry=0x82f07a "") at print.c:1830
1830          emacs_abort ();
(gdb) fr 7
#7  0x010c13c5 in Fcommand_error_default_function (
    data=XIL(0xc000000000ff92e0), context=XIL(0x80000000058e9118),
    sys_signal=XIL(0x5d72548)) at keyboard.c:1029
1029          print_error_message (data, Qt, SSDATA (context), signal);
(gdb) p data
$1 = XIL(0xc000000000ff92e0)
(gdb) xtype
Lisp_Cons
(gdb) xcar
$2 = 0xfd80
(gdb) xtype
Lisp_Symbol
(gdb) xsym
xsymbol   xsymname
(gdb) xsymbol
$3 = (struct Lisp_Symbol *) 0x15d9f60 <lispsym+64896>
"wrong-type-argument"
(gdb) p data
$4 = XIL(0xc000000000ff92e0)
(gdb) xcdr
$5 = 0xc000000000ff9300
(gdb) xtype
Lisp_Cons
(gdb) xcar
$6 = 0x9810
(gdb) xtype
Lisp_Symbol
(gdb) xsymbol
$7 = (struct Lisp_Symbol *) 0x15d39f0 <lispsym+38928>
"markerp"
(gdb) p data
$8 = XIL(0xc000000000ff92e0)
(gdb) xcdr
$9 = 0xc000000000ff9300
(gdb) xcdr
$10 = 0xc000000000ff9310
(gdb) xtype
Lisp_Cons
(gdb) xcar
$11 = 0xa00000001c9866d8
(gdb) xtype
Lisp_Vectorlike
PVEC_FREE
(gdb) fr 17
#17 0x010c51e6 in cmd_error (data=XIL(0xc000000000ff92e0)) at keyboard.c:953
953       cmd_error_internal (data, macroerror);

In GNU Emacs 27.0.91 (build 1, i686-pc-mingw32)
 of 2020-04-18 built on HOME-C4E4A596F7
Windowing system distributor 'Microsoft Corp.', version 5.1.2600
System Description: Microsoft Windows XP Service Pack 3 (v5.1.0.2600)

Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.

Configured using:
 'configure --prefix=/d/usr --with-wide-int --with-modules 'CFLAGS=-O2
 -gdwarf-4 -g3''

Configured features:
XPM JPEG TIFF GIF PNG RSVG SOUND NOTIFY W32NOTIFY ACL GNUTLS LIBXML2
HARFBUZZ ZLIB TOOLKIT_SCROLL_BARS MODULES THREADS JSON PDUMPER LCMS2 GMP

Important settings:
  value of $LANG: ENU
  locale-coding-system: cp1255

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message rmc puny dired dired-loaddefs
format-spec rfc822 mml easymenu mml-sec password-cache epa derived epg
epg-config gnus-util rmail rmail-loaddefs text-property-search time-date
subr-x seq byte-opt gv bytecomp byte-compile cconv mm-decode mm-bodies
mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader cl-loaddefs
cl-lib sendmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr mail-utils
tooltip eldoc electric uniquify ediff-hook vc-hooks lisp-float-type
mwheel dos-w32 ls-lisp disp-table term/w32-win w32-win w32-vars
term/common-win tool-bar dnd fontset image regexp-opt fringe
tabulated-list replace newcomment text-mode elisp-mode lisp-mode
prog-mode register page tab-bar menu-bar rfn-eshadow isearch timer
select scroll-bar mouse jit-lock font-lock syntax facemenu font-core
term/tty-colors frame minibuffer cl-generic cham georgian utf-8-lang
misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms
cp51932 hebrew greek romanian slovak czech european ethiopic indian
cyrillic chinese composite charscript charprop case-table epa-hook
jka-cmpr-hook help simple abbrev obarray cl-preloaded nadvice loaddefs
button faces cus-face macroexp files text-properties overlay sha1 md5
base64 format env code-pages mule custom widget hashtable-print-readable
backquote threads w32notify w32 lcms2 multi-tty make-network-process
emacs)

Memory information:
((conses 16 50536 10936)
 (symbols 48 7172 1)
 (strings 16 18837 2268)
 (string-bytes 1 532938)
 (vectors 16 9527)
 (vector-slots 8 127687 7318)
 (floats 8 21 170)
 (intervals 40 254 84)
 (buffers 888 11))





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-16 10:33 bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects Eli Zaretskii
@ 2020-05-16 16:33 ` Paul Eggert
  2020-05-16 16:47   ` Eli Zaretskii
  2020-05-17 10:56 ` Pip Cet
  1 sibling, 1 reply; 132+ messages in thread
From: Paul Eggert @ 2020-05-16 16:33 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 41321

I fooled around a bit with emacs-27 on Ubuntu 18.04.4 (compiled in 32-bit mode
--with-wide-int) and couldn't reproduce it. I'll keep trying.

Could you give more details about the failures you observed? That might help
attempts at reproducing. How did you revert your info buffer - was it by typing
"M-x revert-buffer"? Are you using auto-revert-mode? That sort of thing.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-16 16:33 ` Paul Eggert
@ 2020-05-16 16:47   ` Eli Zaretskii
  0 siblings, 0 replies; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-16 16:47 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 41321

> Cc: 41321@debbugs.gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Sat, 16 May 2020 09:33:35 -0700
> 
> I fooled around a bit with emacs-27 on Ubuntu 18.04.4 (compiled in 32-bit mode
> --with-wide-int) and couldn't reproduce it. I'll keep trying.

Yes, I didn't succeed reproducing it on purpose, either.  Not sure
why, maybe there's some other factor that is at work, e.g. how many
markers are there in the buffer.

> Could you give more details about the failures you observed? That might help
> attempts at reproducing. How did you revert your info buffer - was it by typing
> "M-x revert-buffer"? Are you using auto-revert-mode? That sort of thing.

Just "M-x revert-buffer RET" followed by 'y'.  I don't use
auto-revert-mode.

In the Git case, I would usually switch to a buffer visiting the file,
perhaps via "M-.", and Emacs would ask me whether to re-read the file
into its buffer, I'd say yes, and then I see an error about a bad
marker; the next command would abort Emacs.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-16 10:33 bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects Eli Zaretskii
  2020-05-16 16:33 ` Paul Eggert
@ 2020-05-17 10:56 ` Pip Cet
  2020-05-17 15:28   ` Eli Zaretskii
  1 sibling, 1 reply; 132+ messages in thread
From: Pip Cet @ 2020-05-17 10:56 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 41321

On Sat, May 16, 2020 at 10:34 AM Eli Zaretskii <eliz@gnu.org> wrote:
> So far, I've seen this in a C Mode buffer reverted because "git pull"
> brought a modified version, and in an Info mode buffer reverted
> because the manual was rebuilt after the Texinfo sources were
> modified.  In the latter case I captured a backtrace, see below.
>
> The problem seem to involve invalid markers, perhaps markers that were
> unchained and put on the free list

Even unchained markers shouldn't be put on the free list as long as
they're still reachable, so I suspect the problem is more likely to be
caused by that.

> (witness the PVEC_FREE object that
> caused the abort in the backtrace below, where Emacs seems to be
> trying to display an error message about an invalid marker).

What I would do next is run with a breakpoint on wrong_type_argument
(if that's impossible, change the code in CHECK_MARKER to abort upon
encountering a PVEC_FREE vector) to see where the reference to the
freed pseudovector came from. An undo list, maybe?





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-17 10:56 ` Pip Cet
@ 2020-05-17 15:28   ` Eli Zaretskii
  2020-05-17 15:57     ` Eli Zaretskii
  0 siblings, 1 reply; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-17 15:28 UTC (permalink / raw)
  To: Pip Cet; +Cc: 41321

> From: Pip Cet <pipcet@gmail.com>
> Date: Sun, 17 May 2020 10:56:28 +0000
> Cc: 41321@debbugs.gnu.org
> 
> What I would do next is run with a breakpoint on wrong_type_argument
> (if that's impossible, change the code in CHECK_MARKER to abort upon
> encountering a PVEC_FREE vector) to see where the reference to the
> freed pseudovector came from. An undo list, maybe?

I'm already running with such a breakpoint, let's how it will catch
something.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-17 15:28   ` Eli Zaretskii
@ 2020-05-17 15:57     ` Eli Zaretskii
  2020-05-22  7:22       ` Eli Zaretskii
  0 siblings, 1 reply; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-17 15:57 UTC (permalink / raw)
  To: pipcet; +Cc: 41321

> Date: Sun, 17 May 2020 18:28:04 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 41321@debbugs.gnu.org
> 
> I'm already running with such a breakpoint, let's how it will catch
> something.                                        ^^^

Should have been "hope".  Sorry.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-17 15:57     ` Eli Zaretskii
@ 2020-05-22  7:22       ` Eli Zaretskii
  2020-05-22  8:35         ` Andrea Corallo
                           ` (4 more replies)
  0 siblings, 5 replies; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-22  7:22 UTC (permalink / raw)
  To: pipcet, Stefan Monnier; +Cc: 41321

> Date: Sun, 17 May 2020 18:57:53 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 41321@debbugs.gnu.org
> 
> > Date: Sun, 17 May 2020 18:28:04 +0300
> > From: Eli Zaretskii <eliz@gnu.org>
> > Cc: 41321@debbugs.gnu.org
> > 
> > I'm already running with such a breakpoint, let's how it will catch
> > something.                                        ^^^
> 
> Should have been "hope".  Sorry.

It happened again, and now insert-file-contents wasn't involved, so I
guess it's off the hook.  The command which triggered the problem was
self-insert-command, as shown in the backtrace below.  The problem
seems to be with handling overlays when buffer text changes.

The backtrace below, as well as some tinkering with values of relevant
variables, indicate that the buffer has two overlays, both of which
point to invalid memory.  The crash happens here:

  /* Now run the before-change-functions if any.  */
  if (!NILP (Vbefore_change_functions))
    {
      rvoe_arg.location = &Vbefore_change_functions;
      rvoe_arg.errorp = 1;

      PRESERVE_VALUE;
      PRESERVE_START_END;

      /* Mark before-change-functions to be reset to nil in case of error.  */
      record_unwind_protect_ptr (reset_var_on_error, &rvoe_arg);

      /* Actually run the hook functions.  */
      CALLN (Frun_hook_with_args, Qbefore_change_functions,
	     FETCH_START, FETCH_END);

      /* There was no error: unarm the reset_on_error.  */
      rvoe_arg.errorp = 0;
    }

  if (buffer_has_overlays ())
    {
      PRESERVE_VALUE;
      report_overlay_modification (FETCH_START, FETCH_END, 0,  <<<<<<<<<<<<
				   FETCH_START, FETCH_END, Qnil);
    }

FETCH_END calls marker-position, and that segfaults because the marker
points to invalid memory, which was probably unmapped from the process
address space (so I guess this is w32-specific, as GNU systems don't
really return memory to the system).  The start_marker is also
invalid, it's just that FETCH_END is called first.

Since the previous call to before-change-functions already used the
same overlay markers, I suspect that the call to
before-change-functions caused the memory to be unmapped (perhaps due
to GC).  As you see below, the value of before-change-functions is

  (t syntax-ppss-flush-cache)

So the prime suspect is what happens when syntax-ppss-flush-cache
runs, and thus I CC Stefan.  The main question to answer now from my
POV is how come a marker on buffer position 3116 which was valid
before before-change-functions was called became invalid as result of
some Lisp, in particular as result of calling before-change-functions.

Here's the backtrace; ideas for further debugging are welcome.

  Thread 1 received signal SIGSEGV, Segmentation fault.
  PSEUDOVECTORP (code=<optimized out>, a=<optimized out>) at lisp.h:1720
  1720          return PSEUDOVECTOR_TYPEP (XUNTAG (a, Lisp_Vectorlike,
  (gdb) bt
  #0  PSEUDOVECTORP (code=<optimized out>, a=<optimized out>) at lisp.h:1720
  #1  MARKERP (x=<optimized out>) at lisp.h:2618
  #2  CHECK_MARKER (x=XIL(0xa000000018ac0518)) at marker.c:133
  #3  0x010f073c in Fmarker_position (marker=XIL(0xa000000018ac0518))
      at marker.c:452
  #4  0x010edd34 in signal_before_change (preserve_ptr=0x0, end_int=3116,
      start_int=3116) at insdel.c:2179
  #5  prepare_to_modify_buffer_1 (start=start@entry=3116, end=end@entry=3116,
      preserve_ptr=preserve_ptr@entry=0x0) at insdel.c:2007
  #6  0x010ee27d in prepare_to_modify_buffer (start=3116, end=3116,
      preserve_ptr=preserve_ptr@entry=0x0) at insdel.c:2018
  #7  0x010ee54d in insert_1_both (string=string@entry=0x82ef1b "r",
      nchars=nchars@entry=1, nbytes=nbytes@entry=1, inherit=inherit@entry=true,
      prepare=prepare@entry=true, before_markers=before_markers@entry=false)
      at insdel.c:896
  #8  0x010ef005 in insert_1_both (before_markers=false, prepare=true,
      inherit=true, nbytes=1, nchars=1, string=0x82ef1b "r") at insdel.c:697
  #9  insert_and_inherit (string=string@entry=0x82ef1b "r",
      nbytes=nbytes@entry=1) at insdel.c:692
  #10 0x01107160 in internal_self_insert (c=114, n=<optimized out>)
      at cmds.c:477
  #11 0x01107804 in Fself_insert_command (n=make_fixnum(1), c=<optimized out>)
      at cmds.c:302
  #12 0x0114fb6c in funcall_subr (subr=<optimized out>,
      numargs=<optimized out>, numargs@entry=2, args=<optimized out>,
      args@entry=0x82f120) at eval.c:2869
  #13 0x0114d9fd in Ffuncall (nargs=nargs@entry=3, args=args@entry=0x82f118)
      at eval.c:2794
  #14 0x01148f7d in Ffuncall_interactively (nargs=3, args=0x82f118)
      at callint.c:254
  #15 0x0114d9fd in Ffuncall (nargs=4, args=0x82f110) at eval.c:2794
  #16 0x0114dca3 in Fapply (nargs=<optimized out>, nargs@entry=3,
      args=<optimized out>, args@entry=0x82f288) at eval.c:2424
  #17 0x0114aecb in Fcall_interactively (function=XIL(0x43b3350),
      record_flag=<optimized out>, keys=XIL(0xa000000016a31530))
      at callint.c:342
  #18 0x0114fb99 in funcall_subr (subr=<optimized out>,
      numargs=<optimized out>, numargs@entry=3, args=<optimized out>,
      args@entry=0x82f430) at eval.c:2872
  #19 0x0114d9fd in Ffuncall (nargs=4, args=args@entry=0x82f428) at eval.c:2794
  #20 0x0118eaf7 in exec_byte_code (bytestr=<optimized out>,
      vector=<optimized out>, maxdepth=<optimized out>,
      args_template=<optimized out>, nargs=<optimized out>, nargs@entry=1,
      args=<optimized out>, args@entry=0x82f7b8) at bytecode.c:633
  #21 0x0115125f in funcall_lambda (fun=<optimized out>, nargs=nargs@entry=1,
      arg_vector=arg_vector@entry=0x82f7b8) at eval.c:2989
  #22 0x0114d953 in Ffuncall (nargs=nargs@entry=2, args=args@entry=0x82f7b0)
      at eval.c:2808
  #23 0x0114db2c in call1 (fn=XIL(0x3f30), arg1=XIL(0x43b3350)) at eval.c:2654
  #24 0x010d0efe in command_loop_1 () at keyboard.c:1463
  #25 0x0114c91f in internal_condition_case (
      bfun=bfun@entry=0x10d0a0e <command_loop_1>, handlers=XIL(0x90),
      hfun=hfun@entry=0x10c5049 <cmd_error>) at eval.c:1355
  #26 0x010bdbda in command_loop_2 (ignore=XIL(0)) at keyboard.c:1091
  #27 0x0114c8a6 in internal_catch (tag=XIL(0xdfb0),
      func=func@entry=0x10bdbb3 <command_loop_2>, arg=XIL(0)) at eval.c:1116
  #28 0x010bdb5d in command_loop () at keyboard.c:1070
  #29 0x010c4bf3 in recursive_edit_1 () at keyboard.c:714
  #30 0x010c4f0c in Frecursive_edit () at keyboard.c:786
  #31 0x0124a4a4 in main (argc=<optimized out>, argv=<optimized out>)
      at emacs.c:2054

  Lisp Backtrace:
  "self-insert-command" (0x82f120)
  "funcall-interactively" (0x82f118)
  "call-interactively" (0x82f430)
  "command-execute" (0x82f7b8)
  (gdb) fr 3
  #3  CHECK_MARKER (x=XIL(0xa000000018ac0518)) at marker.c:133
  133       CHECK_TYPE (MARKERP (x), Qmarkerp, x);
  (gdb) up
  #4  0x010f073c in Fmarker_position (marker=XIL(0xa000000018ac0518))
      at marker.c:452
  452       CHECK_MARKER (marker);
  (gdb) p marker
  $3 = XIL(0xa000000018ac0518)
  (gdb) xtype
  Lisp_Vectorlike
  Cannot access memory at address 0x18ac0518
  (gdb) p marker+0
  $4 = -6917529027227155176
  (gdb) p/x marker+0
  $5 = 0xa000000018ac0518
  (gdb) up
  #5  0x010edd34 in signal_before_change (preserve_ptr=0x0, end_int=3116,
      start_int=3116) at insdel.c:2179
  2179          report_overlay_modification (FETCH_START, FETCH_END, 0,
  (gdb) p Vbefore_change_functions
  $6 = XIL(0xc000000018dbef20)
  (gdb) xtype
  Lisp_Cons
  (gdb) xcar
  $7 = 0x30
  (gdb) xtype
  Lisp_Symbol
  (gdb) xsymbol
  $8 = (struct Lisp_Symbol *) 0x15ca210 <lispsym+48>
  "t"
  (gdb) p Vbefore_change_functions
  $9 = XIL(0xc000000018dbef20)
  (gdb) xcdr
  $10 = 0xc000000018dbf410
  (gdb) xcar
  $11 = 0xd5c0
  (gdb) xtype
  Lisp_Symbol
  (gdb) xsym
  xsymbol   xsymname
  (gdb) xsymbol
  $12 = (struct Lisp_Symbol *) 0x15d77a0 <lispsym+54720>
  "syntax-ppss-flush-cache"
  (gdb) p Vbefore_change_functions
  $13 = XIL(0xc000000018dbef20)
  (gdb) xcdr
  $14 = 0xc000000018dbf410
  (gdb) xcdr
  $15 = 0x0
  (gdb) p start
  $16 = <optimized out>
  (gdb) p start_int
  $17 = 3116
  (gdb) p end_int
  $18 = 3116
  (gdb) p start_marker
  $19 = XIL(0xa000000018ac04f8)
  (gdb) p end_marker
  $20 = XIL(0xa000000018ac0518)
  (gdb) p start_marker
  $21 = XIL(0xa000000018ac04f8)
  (gdb) xtype
  Lisp_Vectorlike
  Cannot access memory at address 0x18ac04f8
  (gdb) p preserve_ptr
  $22 = (ptrdiff_t *) 0x0
  (gdb) p *(current_buffer->text->beg+3000)
  $23 = 115 's'
  (gdb) p *(current_buffer->text->beg+3000)@200
  $24 = "sense would then\nsuggest us that the feature should be extended to other means of\ndkispaying messages in the echo a", '\000' <repeats 84 times>
  (gdb) p *(current_buffer->text->beg+3116)
  $25 = 0 '\000'
  (gdb) p GPT
  $26 = 3116
  (gdb) p GPT_ADDR
  $27 = (unsigned char *) 0x7d80c2b ""
  (gdb) p current_buffer->overlays_before
  $28 = (struct Lisp_Overlay *) 0x170cb080
  (gdb) p $28->start
  $29 = XIL(0xa0000000170cb040)
  (gdb) xtype
  Lisp_Vectorlike
  Cannot access memory at address 0x18ac04f8
  (gdb) p $28->next
  $30 = (struct Lisp_Overlay *) 0x13050320
  (gdb) p $28->next->start
  $31 = XIL(0xa000000016172310)
  (gdb) xtype
  Lisp_Vectorlike
  Cannot access memory at address 0x18ac04f8
  (gdb) p current_buffer->overlays_after
  $32 = (struct Lisp_Overlay *) 0x0
  (gdb) p $28->next->next
  $33 = (struct Lisp_Overlay *) 0x0
  (gdb) p rvoe_arg.location
  $35 = (Lisp_Object *) 0x15c9298 <globals+120>
  (gdb) xtype
  Lisp_Vectorlike
  Cannot access memory at address 0x18ac04f8
  (gdb) p rvoe_arg.errorp
  $36 = false






^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-22  7:22       ` Eli Zaretskii
@ 2020-05-22  8:35         ` Andrea Corallo
  2020-05-22 11:04           ` Eli Zaretskii
  2020-05-22 10:54         ` Eli Zaretskii
                           ` (3 subsequent siblings)
  4 siblings, 1 reply; 132+ messages in thread
From: Andrea Corallo @ 2020-05-22  8:35 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 41321, Stefan Monnier, pipcet

Eli Zaretskii <eliz@gnu.org> writes:

> FETCH_END calls marker-position, and that segfaults because the marker
> points to invalid memory, which was probably unmapped from the process
> address space (so I guess this is w32-specific, as GNU systems don't
> really return memory to the system).  The start_marker is also
> invalid, it's just that FETCH_END is called first.
>
> Since the previous call to before-change-functions already used the
> same overlay markers, I suspect that the call to
> before-change-functions caused the memory to be unmapped (perhaps due
> to GC).  As you see below, the value of before-change-functions is
>
>   (t syntax-ppss-flush-cache)
>
> So the prime suspect is what happens when syntax-ppss-flush-cache
> runs, and thus I CC Stefan.  The main question to answer now from my
> POV is how come a marker on buffer position 3116 which was valid
> before before-change-functions was called became invalid as result of
> some Lisp, in particular as result of calling before-change-functions.
>
> Here's the backtrace; ideas for further debugging are welcome.

Hi Eli,

I'be curious of the outcome if you had a look to your 'garbage_collect'
assembly to investigate the possible relation with 41357 as suggested
here
https://lists.gnu.org/archive/html/bug-gnu-emacs/2020-05/msg01095.html

Hope it helps

  Andrea

-- 
akrl@sdf.org





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-22  7:22       ` Eli Zaretskii
  2020-05-22  8:35         ` Andrea Corallo
@ 2020-05-22 10:54         ` Eli Zaretskii
  2020-05-22 11:47         ` Pip Cet
                           ` (2 subsequent siblings)
  4 siblings, 0 replies; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-22 10:54 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 41321, pipcet

> Date: Fri, 22 May 2020 10:22:56 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 41321@debbugs.gnu.org
> 
> Since the previous call to before-change-functions already used the
> same overlay markers, I suspect that the call to
> before-change-functions caused the memory to be unmapped (perhaps due
> to GC).

FTR: I am now running the 27.0.91 pretest with the patch for bug#40661
applied.  It's a long shot, since the problem here is not with
pointers to buffer text, but I just want to be sure I didn't
rediscover a complicated way to reproduce that bug ;-)





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-22  8:35         ` Andrea Corallo
@ 2020-05-22 11:04           ` Eli Zaretskii
  2020-05-22 12:55             ` Andrea Corallo
  0 siblings, 1 reply; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-22 11:04 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 41321, monnier, pipcet

> From: Andrea Corallo <akrl@sdf.org>
> Cc: pipcet@gmail.com, Stefan Monnier <monnier@iro.umontreal.ca>,
>         41321@debbugs.gnu.org
> Date: Fri, 22 May 2020 08:35:55 +0000
> 
> I'be curious of the outcome if you had a look to your 'garbage_collect'
> assembly to investigate the possible relation with 41357 as suggested
> here
> https://lists.gnu.org/archive/html/bug-gnu-emacs/2020-05/msg01095.html

Sorry, I'm not sure I understand what you mean by the above.  Did you
mean whether I disassembled garbage_collect and looked at the code?
If so, the answer is NO, I didn't yet have time for that.

However, given the latest findings, I now doubt even more that the
issue you identified can have any relation to this problem.  As seen
by the backtrace I've shown in my last message, the buffer's overlay
list has invalid overlay objects at the point of the crash.  The 2
pointers to the overlay lists of a buffer are unconditionally marked
in mark_buffer, so I don't see how problems in GC with Lisp objects in
registers could interfere in this case.  Am I missing something?





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-22  7:22       ` Eli Zaretskii
  2020-05-22  8:35         ` Andrea Corallo
  2020-05-22 10:54         ` Eli Zaretskii
@ 2020-05-22 11:47         ` Pip Cet
  2020-05-22 12:13           ` Eli Zaretskii
                             ` (2 more replies)
  2020-05-23 23:54         ` Pip Cet
  2020-05-29 10:16         ` Eli Zaretskii
  4 siblings, 3 replies; 132+ messages in thread
From: Pip Cet @ 2020-05-22 11:47 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 41321, Stefan Monnier

On Fri, May 22, 2020 at 7:22 AM Eli Zaretskii <eliz@gnu.org> wrote:
>   (gdb) p current_buffer->overlays_before
>   $28 = (struct Lisp_Overlay *) 0x170cb080
>   (gdb) p $28->start
>   $29 = XIL(0xa0000000170cb040)
>   (gdb) xtype
>   Lisp_Vectorlike
>   Cannot access memory at address 0x18ac04f8

Note that didn't try to print $29, but the original invalid marker. In
particular, I believe 0x170cb040 is a pointer to a valid marker.

>   (gdb) p $28->next
>   $30 = (struct Lisp_Overlay *) 0x13050320
>   (gdb) p $28->next->start
>   $31 = XIL(0xa000000016172310)
>   (gdb) xtype
>   Lisp_Vectorlike
>   Cannot access memory at address 0x18ac04f8

Same here.

If you could disassemble signal_before_change, we'd know whether
start_marker and end_marker live in callee-saved registers, and thus
whether this is likely to be Andrea's bug.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-22 11:47         ` Pip Cet
@ 2020-05-22 12:13           ` Eli Zaretskii
  2020-05-22 12:39             ` Pip Cet
  2020-05-22 12:32           ` Eli Zaretskii
  2020-05-29  9:51           ` Eli Zaretskii
  2 siblings, 1 reply; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-22 12:13 UTC (permalink / raw)
  To: Pip Cet; +Cc: 41321, monnier

> From: Pip Cet <pipcet@gmail.com>
> Date: Fri, 22 May 2020 11:47:03 +0000
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>, 41321@debbugs.gnu.org
> 
> On Fri, May 22, 2020 at 7:22 AM Eli Zaretskii <eliz@gnu.org> wrote:
> >   (gdb) p current_buffer->overlays_before
> >   $28 = (struct Lisp_Overlay *) 0x170cb080
> >   (gdb) p $28->start
> >   $29 = XIL(0xa0000000170cb040)
> >   (gdb) xtype
> >   Lisp_Vectorlike
> >   Cannot access memory at address 0x18ac04f8
> 
> Note that didn't try to print $29, but the original invalid marker.

Sorry, I don't follow.  "xtype" shows the type of the last result,
AFAIK, in this case the type of $29.  If this changed somehow, either
we have a bug in .gdbinit or I have been using GDB incorrectly for I
don't know how many years.

What am I missing?





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-22 11:47         ` Pip Cet
  2020-05-22 12:13           ` Eli Zaretskii
@ 2020-05-22 12:32           ` Eli Zaretskii
  2020-05-29  9:51           ` Eli Zaretskii
  2 siblings, 0 replies; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-22 12:32 UTC (permalink / raw)
  To: Pip Cet; +Cc: 41321, monnier

> From: Pip Cet <pipcet@gmail.com>
> Date: Fri, 22 May 2020 11:47:03 +0000
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>, 41321@debbugs.gnu.org
> 
> On Fri, May 22, 2020 at 7:22 AM Eli Zaretskii <eliz@gnu.org> wrote:
> >   (gdb) p current_buffer->overlays_before
> >   $28 = (struct Lisp_Overlay *) 0x170cb080
> >   (gdb) p $28->start
> >   $29 = XIL(0xa0000000170cb040)
> >   (gdb) xtype
> >   Lisp_Vectorlike
> >   Cannot access memory at address 0x18ac04f8
> 
> Note that didn't try to print $29, but the original invalid marker. In
> particular, I believe 0x170cb040 is a pointer to a valid marker.
> 
> >   (gdb) p $28->next
> >   $30 = (struct Lisp_Overlay *) 0x13050320
> >   (gdb) p $28->next->start
> >   $31 = XIL(0xa000000016172310)
> >   (gdb) xtype
> >   Lisp_Vectorlike
> >   Cannot access memory at address 0x18ac04f8
> 
> Same here.
> 
> If you could disassemble signal_before_change, we'd know whether
> start_marker and end_marker live in callee-saved registers, and thus
> whether this is likely to be Andrea's bug.

Since $28 is neither start_marker nor end_marker, but the first
overlay on the buffer's overlay chain, how could it be affected by
whether start_marker or end_marker are in a callee-saved register?
What am I missing here?





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-22 12:13           ` Eli Zaretskii
@ 2020-05-22 12:39             ` Pip Cet
  2020-05-22 12:48               ` Eli Zaretskii
  0 siblings, 1 reply; 132+ messages in thread
From: Pip Cet @ 2020-05-22 12:39 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 41321, Stefan Monnier

On Fri, May 22, 2020 at 12:13 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Pip Cet <pipcet@gmail.com>
> > Date: Fri, 22 May 2020 11:47:03 +0000
> > Cc: Stefan Monnier <monnier@iro.umontreal.ca>, 41321@debbugs.gnu.org
> >
> > On Fri, May 22, 2020 at 7:22 AM Eli Zaretskii <eliz@gnu.org> wrote:
> > >   (gdb) p current_buffer->overlays_before
> > >   $28 = (struct Lisp_Overlay *) 0x170cb080
> > >   (gdb) p $28->start
> > >   $29 = XIL(0xa0000000170cb040)
> > >   (gdb) xtype
> > >   Lisp_Vectorlike
> > >   Cannot access memory at address 0x18ac04f8
> >
> > Note that didn't try to print $29, but the original invalid marker.
>
> Sorry, I don't follow.  "xtype" shows the type of the last result,
> AFAIK, in this case the type of $29.  If this changed somehow, either
> we have a bug in .gdbinit or I have been using GDB incorrectly for I
> don't know how many years.

I think it's most likely to be a GDB bug, and I can't reproduce it here.

But it's definitely trying to access memory at address 0x18ac04f8,
which corresponds to start_marker.

  (gdb) p rvoe_arg.location
  $35 = (Lisp_Object *) 0x15c9298 <globals+120>
  (gdb) xtype
  Lisp_Vectorlike
  Cannot access memory at address 0x18ac04f8
  (gdb) p rvoe_arg.errorp
  $36 = false

Surely rvoe_arg.location isn't a vectorlike, so that also points to
GDB not dealing with things correctly.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-22 12:39             ` Pip Cet
@ 2020-05-22 12:48               ` Eli Zaretskii
  2020-05-22 14:04                 ` Pip Cet
  0 siblings, 1 reply; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-22 12:48 UTC (permalink / raw)
  To: Pip Cet; +Cc: 41321, monnier

> From: Pip Cet <pipcet@gmail.com>
> Date: Fri, 22 May 2020 12:39:27 +0000
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>, 41321@debbugs.gnu.org
> 
> > Sorry, I don't follow.  "xtype" shows the type of the last result,
> > AFAIK, in this case the type of $29.  If this changed somehow, either
> > we have a bug in .gdbinit or I have been using GDB incorrectly for I
> > don't know how many years.
> 
> I think it's most likely to be a GDB bug, and I can't reproduce it here.
> 
> But it's definitely trying to access memory at address 0x18ac04f8,
> which corresponds to start_marker.

My interpretation of that equality was that both start_marker and the
buffer's overlay chain git invalidated because some code relocated
objects and unmapped the previously referenced memory, perhaps due to
GC.  I don't yet have an explanation for how this could happen, so
maybe this hypothesis is wrong.

>   (gdb) p rvoe_arg.location
>   $35 = (Lisp_Object *) 0x15c9298 <globals+120>
>   (gdb) xtype
>   Lisp_Vectorlike
>   Cannot access memory at address 0x18ac04f8
>   (gdb) p rvoe_arg.errorp
>   $36 = false
> 
> Surely rvoe_arg.location isn't a vectorlike, so that also points to
> GDB not dealing with things correctly.

rvoe_arg.location should be a pointer to the value of
before-change-functions, so yes, it isn't supposed to be vectorlike.
But I very much doubt there's such a blatant bug in GDB: this is the
latest GDB 9.1, and I'm using these commands from .gdbinit all the
time.  I tend to think this is somehow part of the bug that caused the
crash.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-22 11:04           ` Eli Zaretskii
@ 2020-05-22 12:55             ` Andrea Corallo
  0 siblings, 0 replies; 132+ messages in thread
From: Andrea Corallo @ 2020-05-22 12:55 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 41321, monnier, pipcet

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Andrea Corallo <akrl@sdf.org>
>> Cc: pipcet@gmail.com, Stefan Monnier <monnier@iro.umontreal.ca>,
>>         41321@debbugs.gnu.org
>> Date: Fri, 22 May 2020 08:35:55 +0000
>> 
>> I'be curious of the outcome if you had a look to your 'garbage_collect'
>> assembly to investigate the possible relation with 41357 as suggested
>> here
>> https://lists.gnu.org/archive/html/bug-gnu-emacs/2020-05/msg01095.html
>
> Sorry, I'm not sure I understand what you mean by the above.  Did you
> mean whether I disassembled garbage_collect and looked at the code?

Yes, should be quick to see if callee-save regs are pushed.

> However, given the latest findings, I now doubt even more that the
> issue you identified can have any relation to this problem.  As seen
> by the backtrace I've shown in my last message, the buffer's overlay
> list has invalid overlay objects at the point of the crash.  The 2
> pointers to the overlay lists of a buffer are unconditionally marked
> in mark_buffer, so I don't see how problems in GC with Lisp objects in
> registers could interfere in this case.  Am I missing something?

Not that I'm aware, I'm no expert of the piece of code you are looking
at and haven't investigated into.  Was just a 'cheap' idea to exclude a
potential problem from the table.

  Andrea

-- 
akrl@sdf.org





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-22 12:48               ` Eli Zaretskii
@ 2020-05-22 14:04                 ` Pip Cet
  2020-05-22 14:26                   ` Eli Zaretskii
  0 siblings, 1 reply; 132+ messages in thread
From: Pip Cet @ 2020-05-22 14:04 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 41321, Stefan Monnier

On Fri, May 22, 2020 at 12:48 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Pip Cet <pipcet@gmail.com>
> > Date: Fri, 22 May 2020 12:39:27 +0000
> > Cc: Stefan Monnier <monnier@iro.umontreal.ca>, 41321@debbugs.gnu.org
> >
> > > Sorry, I don't follow.  "xtype" shows the type of the last result,
> > > AFAIK, in this case the type of $29.  If this changed somehow, either
> > > we have a bug in .gdbinit or I have been using GDB incorrectly for I
> > > don't know how many years.
> >
> > I think it's most likely to be a GDB bug, and I can't reproduce it here.
> >
> > But it's definitely trying to access memory at address 0x18ac04f8,
> > which corresponds to start_marker.
>
> My interpretation of that equality was that both start_marker and the
> buffer's overlay chain git invalidated because some code relocated
> objects and unmapped the previously referenced memory, perhaps due to
> GC.  I don't yet have an explanation for how this could happen, so
> maybe this hypothesis is wrong.

I think it has to be, because the error message would then read
"Cannot access memory at address 0x170cb040", which is the only
address xvectype is supposed to look at.

> But I very much doubt there's such a blatant bug in GDB: this is the
> latest GDB 9.1, and I'm using these commands from .gdbinit all the
> time.  I tend to think this is somehow part of the bug that caused the
> crash.

I'm not sure how it could be. I don't think posting the disassembled
code for `signal_before_change' can hurt, since there's no easy way
for anyone else to reproduce it.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-22 14:04                 ` Pip Cet
@ 2020-05-22 14:26                   ` Eli Zaretskii
  2020-05-22 14:40                     ` Andrea Corallo
  0 siblings, 1 reply; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-22 14:26 UTC (permalink / raw)
  To: Pip Cet; +Cc: 41321, monnier

> From: Pip Cet <pipcet@gmail.com>
> Date: Fri, 22 May 2020 14:04:03 +0000
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>, 41321@debbugs.gnu.org
> 
> I don't think posting the disassembled code for
> `signal_before_change' can hurt, since there's no easy way for
> anyone else to reproduce it.

I see this on two different systems where Emacs was compiled with two
different versions of GCC.  So if you want to see the disassembly, any
32-bit GCC will do, I think.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-22 14:26                   ` Eli Zaretskii
@ 2020-05-22 14:40                     ` Andrea Corallo
  2020-05-22 19:03                       ` Eli Zaretskii
  0 siblings, 1 reply; 132+ messages in thread
From: Andrea Corallo @ 2020-05-22 14:40 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 41321, monnier, Pip Cet

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Pip Cet <pipcet@gmail.com>
>> Date: Fri, 22 May 2020 14:04:03 +0000
>> Cc: Stefan Monnier <monnier@iro.umontreal.ca>, 41321@debbugs.gnu.org
>> 
>> I don't think posting the disassembled code for
>> `signal_before_change' can hurt, since there's no easy way for
>> anyone else to reproduce it.
>
> I see this on two different systems where Emacs was compiled with two
> different versions of GCC.  So if you want to see the disassembly, any
> 32-bit GCC will do, I think.

I believe the triplet can make a difference given the calling convention
can change no?  Also CFLAGS are clearly a factor.

-- 
akrl@sdf.org





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-22 14:40                     ` Andrea Corallo
@ 2020-05-22 19:03                       ` Eli Zaretskii
       [not found]                         ` <CAOqdjBdpU4U1NqErNH0idBmUxNeE3fL=2=KKpo9kbCM3DhW5gA@mail.gmail.com>
  0 siblings, 1 reply; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-22 19:03 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 41321, monnier, pipcet

> From: Andrea Corallo <akrl@sdf.org>
> Cc: Pip Cet <pipcet@gmail.com>, 41321@debbugs.gnu.org,
>         monnier@iro.umontreal.ca
> Date: Fri, 22 May 2020 14:40:05 +0000
> 
> > I see this on two different systems where Emacs was compiled with two
> > different versions of GCC.  So if you want to see the disassembly, any
> > 32-bit GCC will do, I think.
> 
> I believe the triplet can make a difference given the calling convention
> can change no?  Also CFLAGS are clearly a factor.

My CFLAGS are in my original report of this bug.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
       [not found]                         ` <CAOqdjBdpU4U1NqErNH0idBmUxNeE3fL=2=KKpo9kbCM3DhW5gA@mail.gmail.com>
@ 2020-05-23 17:58                           ` Andrea Corallo
  2020-05-23 22:37                             ` Stefan Monnier
  0 siblings, 1 reply; 132+ messages in thread
From: Andrea Corallo @ 2020-05-23 17:58 UTC (permalink / raw)
  To: Pip Cet; +Cc: 41321, Stefan Monnier

Pip Cet <pipcet@gmail.com> writes:

> I believe this isn't the problem we're looking for, but it might be
> related anyway.
>
> I'm seeing this in the assembler source code for insdel.c produced
> with the mingw cross compiler (i686-w64-mingw32-gcc-win32):
>
>     movl    60(%esp), %eax
>     movl    %eax, (%esp)
>     movl    72(%esp), %eax
>     movl    %eax, 4(%esp)
>     call    _Fmarker_position
> If I'm reading this correctly, it's of some concern for wide-int
> builds: the two 32-bit halves of a Lisp_Object are stored
> non-consecutively.
>
> Our stack marking doesn't catch that; at least, it doesn't for
> symbols, where the less-significant half isn't a valid pointer. For
> pseudovectors, things should still work...
>
> So I think we have a problem with such --wide-int builds in cases
> where a stack temporary holds an unpinned uninterned symbol while GC
> is called. Something like
>
> (prog1
>   (gensym)
>   (garbage-collect))
>
> might trigger it. No problem with gcc -m32 on GNU/Linux, for some reason.

Very interesting.  AFAIK there's no guarantees for the compiler to spill
a DI reg in adjacent memory.  Also reading the GC code your observation
seems correct to me.

-- 
akrl@sdf.org





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-23 17:58                           ` Andrea Corallo
@ 2020-05-23 22:37                             ` Stefan Monnier
  2020-05-23 22:41                               ` Pip Cet
  0 siblings, 1 reply; 132+ messages in thread
From: Stefan Monnier @ 2020-05-23 22:37 UTC (permalink / raw)
  To: Andrea Corallo; +Cc: 41321, Pip Cet

>> If I'm reading this correctly, it's of some concern for wide-int
>> builds: the two 32-bit halves of a Lisp_Object are stored
>> non-consecutively.

This shouldn't be a problem: wide-int builds use MSB tagging, so all
Lisp_Objects which contain a pointer have their lowest 32bits exactly
identical to that pointer (and the higher 32bits just contain the tag).
So we'll find them in the stack even if the two halves are separate
simply because the pointer-part will be found like any other pointer.


        Stefan






^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-23 22:37                             ` Stefan Monnier
@ 2020-05-23 22:41                               ` Pip Cet
  2020-05-23 23:26                                 ` Stefan Monnier
  0 siblings, 1 reply; 132+ messages in thread
From: Pip Cet @ 2020-05-23 22:41 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 41321, Andrea Corallo

On Sat, May 23, 2020 at 10:38 PM Stefan Monnier
<monnier@iro.umontreal.ca> wrote:
> >> If I'm reading this correctly, it's of some concern for wide-int
> >> builds: the two 32-bit halves of a Lisp_Object are stored
> >> non-consecutively.
>
> This shouldn't be a problem: wide-int builds use MSB tagging, so all
> Lisp_Objects which contain a pointer have their lowest 32bits exactly
> identical to that pointer (and the higher 32bits just contain the tag).

As I said, I don't believe that's true for symbols. Qnil is always
binary 0, so we offset all symbols by the offset of lispsym.

> So we'll find them in the stack even if the two halves are separate
> simply because the pointer-part will be found like any other pointer.

Yes, that's what I meant to say when I said it should still work for
pseudovectors.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-23 22:41                               ` Pip Cet
@ 2020-05-23 23:26                                 ` Stefan Monnier
  0 siblings, 0 replies; 132+ messages in thread
From: Stefan Monnier @ 2020-05-23 23:26 UTC (permalink / raw)
  To: Pip Cet; +Cc: 41321, Andrea Corallo

>> This shouldn't be a problem: wide-int builds use MSB tagging, so all
>> Lisp_Objects which contain a pointer have their lowest 32bits exactly
>> identical to that pointer (and the higher 32bits just contain the tag).
> As I said, I don't believe that's true for symbols.  Qnil is always
> binary 0, so we offset all symbols by the offset of lispsym.

Oh, right, good point: I had completely forgotten about that "detail".
We should probably adjust our conservative stack scanning accordingly.


        Stefan






^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-22  7:22       ` Eli Zaretskii
                           ` (2 preceding siblings ...)
  2020-05-22 11:47         ` Pip Cet
@ 2020-05-23 23:54         ` Pip Cet
  2020-05-24 14:24           ` Eli Zaretskii
  2020-05-29 10:16         ` Eli Zaretskii
  4 siblings, 1 reply; 132+ messages in thread
From: Pip Cet @ 2020-05-23 23:54 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 41321, Stefan Monnier

[-- Attachment #1: Type: text/plain, Size: 1003 bytes --]

On Fri, May 22, 2020 at 7:22 AM Eli Zaretskii <eliz@gnu.org> wrote:
>   #0  PSEUDOVECTORP (code=<optimized out>, a=<optimized out>) at lisp.h:1720
>   #1  MARKERP (x=<optimized out>) at lisp.h:2618
>   #2  CHECK_MARKER (x=XIL(0xa000000018ac0518)) at marker.c:133
>   #3  0x010f073c in Fmarker_position (marker=XIL(0xa000000018ac0518))
>       at marker.c:452

I think I've worked it out: it's this mingw bug:
https://sourceforge.net/p/mingw-w64/bugs/778/

On mingw, if <stdint.h> is included before/instead of stddef.h,
alignof (max_align_t) == 16. However, as can be seen by the backtrace
above, Eli's malloc only returned an 8-byte-aligned block. That's not
normally a problem, because mark_maybe_object doesn't care about
alignment; but in conjunction with the gcc behavior change, we rely or
mark_maybe_pointer to mark the pointer, and it doesn't, because the
pointer is not aligned to a LISP_ALIGNMENT = 16-byte boundary.

Brute-force patch attached until we can work out how to fix this properly.

[-- Attachment #2: 0001-Accept-unaligned-pointers-in-maybe_lisp_pointer.patch --]
[-- Type: text/x-patch, Size: 839 bytes --]

From abb79bf33622b4e8407565ab8e82771b6a35945e Mon Sep 17 00:00:00 2001
From: Pip Cet <pipcet@gmail.com>
Date: Sat, 23 May 2020 23:51:55 +0000
Subject: [PATCH] Accept unaligned pointers in maybe_lisp_pointer

* src/alloc.c (maybe_lisp_pointer): Don't require pointers be aligned
  to a LISP_ALIGNMENT boundary, as this is false on mingw builds.
---
 src/alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/alloc.c b/src/alloc.c
index 1c6b664b22..86e81cd1f6 100644
--- a/src/alloc.c
+++ b/src/alloc.c
@@ -4594,7 +4594,7 @@ mark_maybe_objects (Lisp_Object const *array, ptrdiff_t nelts)
 static bool
 maybe_lisp_pointer (void *p)
 {
-  return (uintptr_t) p % LISP_ALIGNMENT == 0;
+  return (uintptr_t) p % GC_ALIGNMENT == 0;
 }
 
 /* If P points to Lisp data, mark that as live if it isn't already
-- 
2.27.0.rc0


^ permalink raw reply related	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-23 23:54         ` Pip Cet
@ 2020-05-24 14:24           ` Eli Zaretskii
  2020-05-24 15:00             ` Pip Cet
  0 siblings, 1 reply; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-24 14:24 UTC (permalink / raw)
  To: Pip Cet; +Cc: 41321, monnier

> From: Pip Cet <pipcet@gmail.com>
> Date: Sat, 23 May 2020 23:54:17 +0000
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>, 41321@debbugs.gnu.org
> 
> I think I've worked it out: it's this mingw bug:
> https://sourceforge.net/p/mingw-w64/bugs/778/

Thank you for working on this tricky problem.

FTR, I don't use that flavor of MinGW.

> On mingw, if <stdint.h> is included before/instead of stddef.h,
> alignof (max_align_t) == 16.

The problem with the order of inclusion doesn't exist in my header
files, so alignof (max_align_t) is always 16.

> However, as can be seen by the backtrace
> above, Eli's malloc only returned an 8-byte-aligned block.

Isn't that strange?  Lisp data is allocated via lmalloc, AFAIK, and
lmalloc is supposed to guarantee LISP_ALIGNMENT alignment.  Or am I
missing something?

> That's not normally a problem, because mark_maybe_object doesn't
> care about alignment; but in conjunction with the gcc behavior
> change, we rely or mark_maybe_pointer to mark the pointer, and it
> doesn't, because the pointer is not aligned to a LISP_ALIGNMENT =
> 16-byte boundary.

I still very much doubt that this has anything to do with stack
marking during GC, since I've shown in my backtrace that
current_buffer->overlays_before points to an overlay with invalid
markers.  And GC always marks buffer's overlays (and thus their
markers), as can be seen in mark_buffer.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-24 14:24           ` Eli Zaretskii
@ 2020-05-24 15:00             ` Pip Cet
  2020-05-24 16:25               ` Eli Zaretskii
  2020-05-24 19:00               ` Andy Moreton
  0 siblings, 2 replies; 132+ messages in thread
From: Pip Cet @ 2020-05-24 15:00 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 41321, Stefan Monnier

On Sun, May 24, 2020 at 2:24 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Pip Cet <pipcet@gmail.com>
> > Date: Sat, 23 May 2020 23:54:17 +0000
> > Cc: Stefan Monnier <monnier@iro.umontreal.ca>, 41321@debbugs.gnu.org
> >
> > I think I've worked it out: it's this mingw bug:
> > https://sourceforge.net/p/mingw-w64/bugs/778/
>
> Thank you for working on this tricky problem.
>
> FTR, I don't use that flavor of MinGW.

So your flavor is even more broken than what Debian ships? That's
interesting, which flavor is it?

> > On mingw, if <stdint.h> is included before/instead of stddef.h,
> > alignof (max_align_t) == 16.
>
> The problem with the order of inclusion doesn't exist in my header
> files, so alignof (max_align_t) is always 16.

Okay, so that is our bug.

> > However, as can be seen by the backtrace
> > above, Eli's malloc only returned an 8-byte-aligned block.
>
> Isn't that strange?  Lisp data is allocated via lmalloc, AFAIK, and
> lmalloc is supposed to guarantee LISP_ALIGNMENT alignment.  Or am I
> missing something?

No, it relies on the compile-time constants and never checks.

The relevant code is:

enum { MALLOC_IS_LISP_ALIGNED = alignof (max_align_t) % LISP_ALIGNMENT == 0 };

static bool
laligned (void *p, size_t size)
{
  return (MALLOC_IS_LISP_ALIGNED || (intptr_t) p % LISP_ALIGNMENT == 0
      || size % LISP_ALIGNMENT != 0);
}

... so laligned is a constant "true" function on your machine, since
alignof (max_align_t) is 16 and LISP_ALIGNMENT is 16.

static void *
lmalloc (size_t size, bool clearit)
{
#ifdef USE_ALIGNED_ALLOC
  if (! MALLOC_IS_LISP_ALIGNED && size % LISP_ALIGNMENT == 0)
    {
      void *p = aligned_alloc (LISP_ALIGNMENT, size);
      if (clearit && p)
    memclear (p, size);
      return p;
    }
#endif

  while (true)
    {
      void *p = clearit ? calloc (1, size) : malloc (size);
      if (laligned (p, size))
    return p;
      free (p);
      size_t bigger = size + LISP_ALIGNMENT;
      if (size < bigger)
    size = bigger;
    }
}

That optimizes down to returning the malloc/calloc return value directly.

IOW, alloc.c relies on malloc() being max_align_t-aligned, and never
checks, not even in debug builds. That's something that needs to be
fixed, since broken-malloc environments such as yours exist.

> > That's not normally a problem, because mark_maybe_object doesn't
> > care about alignment; but in conjunction with the gcc behavior
> > change, we rely or mark_maybe_pointer to mark the pointer, and it
> > doesn't, because the pointer is not aligned to a LISP_ALIGNMENT =
> > 16-byte boundary.
>
> I still very much doubt that this has anything to do with stack
> marking during GC, since I've shown in my backtrace that
> current_buffer->overlays_before points to an overlay with invalid
> markers.

You haven't.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-24 15:00             ` Pip Cet
@ 2020-05-24 16:25               ` Eli Zaretskii
  2020-05-24 16:55                 ` Eli Zaretskii
  2020-05-24 19:00               ` Andy Moreton
  1 sibling, 1 reply; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-24 16:25 UTC (permalink / raw)
  To: Pip Cet; +Cc: 41321, monnier

> From: Pip Cet <pipcet@gmail.com>
> Date: Sun, 24 May 2020 15:00:36 +0000
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>, 41321@debbugs.gnu.org
> 
> > FTR, I don't use that flavor of MinGW.
> 
> So your flavor is even more broken than what Debian ships?

Why _more_ broken?

> That's interesting, which flavor is it?

mingw.org's MinGW.

> > Isn't that strange?  Lisp data is allocated via lmalloc, AFAIK, and
> > lmalloc is supposed to guarantee LISP_ALIGNMENT alignment.  Or am I
> > missing something?
> 
> No, it relies on the compile-time constants and never checks.

So that is the bug to fix, no?

> > I still very much doubt that this has anything to do with stack
> > marking during GC, since I've shown in my backtrace that
> > current_buffer->overlays_before points to an overlay with invalid
> > markers.
> 
> You haven't.

Of course, I have.






^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-24 16:25               ` Eli Zaretskii
@ 2020-05-24 16:55                 ` Eli Zaretskii
  2020-05-24 18:03                   ` Pip Cet
  0 siblings, 1 reply; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-24 16:55 UTC (permalink / raw)
  To: pipcet; +Cc: 41321, monnier

> Date: Sun, 24 May 2020 19:25:14 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 41321@debbugs.gnu.org, monnier@iro.umontreal.ca
> 
> > > I still very much doubt that this has anything to do with stack
> > > marking during GC, since I've shown in my backtrace that
> > > current_buffer->overlays_before points to an overlay with invalid
> > > markers.
> > 
> > You haven't.
> 
> Of course, I have.

Here's how healthy overlays look in a healthy buffer:

  (gdb) p current_buffer->overlays_after
  $10 = (struct Lisp_Overlay *) 0x0
  (gdb) p current_buffer->overlays_before
  $11 = (struct Lisp_Overlay *) 0x7728258
  (gdb) p $11->start
  $12 = XIL(0xa000000007728218)
  (gdb) xtype
  Lisp_Vectorlike
  PVEC_MARKER
  (gdb) xmarker
  $13 = (struct Lisp_Marker *) 0x7728218
  (gdb) p *$
  $14 = {
    header = {
      size = 1124081664
    },
    buffer = 0x728fc38,
    need_adjustment = 0,
    insertion_type = 0,
    next = 0x765eae8,
    charpos = 13968,
    bytepos = 13968
  }
  (gdb) p $11->next
  $15 = (struct Lisp_Overlay *) 0x0

And here's a reminder from how the same looked in the session that
segfaulted:

  (gdb) p current_buffer->overlays_before
  $28 = (struct Lisp_Overlay *) 0x170cb080
  (gdb) p $28->start
  $29 = XIL(0xa0000000170cb040)
  (gdb) xtype
  Lisp_Vectorlike
  Cannot access memory at address 0x18ac04f8
  (gdb) p $28->next
  $30 = (struct Lisp_Overlay *) 0x13050320
  (gdb) p $28->next->start
  $31 = XIL(0xa000000016172310)
  (gdb) xtype
  Lisp_Vectorlike
  Cannot access memory at address 0x18ac04f8
  (gdb) p current_buffer->overlays_after
  $32 = (struct Lisp_Overlay *) 0x0
  (gdb) p $28->next->next
  $33 = (struct Lisp_Overlay *) 0x0

If you still claim that I didn't demonstrate that the buffer's overlay
chain got corrupted as part of the bug that caused the segfault,
please point out what I missed here.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-24 16:55                 ` Eli Zaretskii
@ 2020-05-24 18:03                   ` Pip Cet
  2020-05-24 18:40                     ` Eli Zaretskii
  0 siblings, 1 reply; 132+ messages in thread
From: Pip Cet @ 2020-05-24 18:03 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 41321, Stefan Monnier

On Sun, May 24, 2020 at 4:55 PM Eli Zaretskii <eliz@gnu.org> wrote:
> And here's a reminder from how the same looked in the session that> segfaulted:
>
>   (gdb) p current_buffer->overlays_before
>   $28 = (struct Lisp_Overlay *) 0x170cb080
>   (gdb) p $28->start
>   $29 = XIL(0xa0000000170cb040)
>   (gdb) xtype
>   Lisp_Vectorlike
>   Cannot access memory at address 0x18ac04f8

That should read "Cannot access memory at address 0x170cb080". It
doesn't. It doesn't tell you whether the memory at page 0x170cb000 is
mapped, because gdb, for whatever reason (a bug in .gdbinit, a bug in
gdb, some weird command entered at the gdb prompt before the
transcript started, or even, as you yourself suggested, somehow as the
result of the memory corruption that caused the crash), looked in the
wrong place.

Instead, it tells you that the page at 0x18ac0000 isn't mapped. Which we knew.

>   (gdb) p $28->next
>   $30 = (struct Lisp_Overlay *) 0x13050320
>   (gdb) p $28->next->start
>   $31 = XIL(0xa000000016172310)
>   (gdb) xtype
>   Lisp_Vectorlike
>   Cannot access memory at address 0x18ac04f8

Same here. It should read "Cannot access memory at address 0x16172310".

> If you still claim that I didn't demonstrate that the buffer's overlay
> chain got corrupted

I do, of course. The message GDB prints simply does not say anything
problematic about the buffer's overlay chain.

> as part of the bug that caused the segfault,
> please point out what I missed here.

You omitted the third call to xtype, which was even more clearly
nonsensical: xtype was misbehaving. We don't know in which way it was
misbehaving. So there's no evidence either way.

FWIW, running into gdb bugs is something that happens to me almost on
a regular basis. There's no point reporting those, as there's
generally no response. In your case, you're in an unusual environment
with a rather large and complicated .gdbinit file which does very
strange things to avoid running into GDB bugs that we know about. All
that increases the likelihood of your encountering a gdb bug that no
one else has, or that has been reported but never responded to.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-24 18:03                   ` Pip Cet
@ 2020-05-24 18:40                     ` Eli Zaretskii
  2020-05-24 19:40                       ` Pip Cet
  0 siblings, 1 reply; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-24 18:40 UTC (permalink / raw)
  To: Pip Cet; +Cc: 41321, monnier

> From: Pip Cet <pipcet@gmail.com>
> Date: Sun, 24 May 2020 18:03:57 +0000
> Cc: 41321@debbugs.gnu.org, Stefan Monnier <monnier@iro.umontreal.ca>
> 
> > If you still claim that I didn't demonstrate that the buffer's overlay
> > chain got corrupted
> 
> I do, of course. The message GDB prints simply does not say anything
> problematic about the buffer's overlay chain.
> 
> > as part of the bug that caused the segfault,
> > please point out what I missed here.
> 
> You omitted the third call to xtype, which was even more clearly
> nonsensical: xtype was misbehaving. We don't know in which way it was
> misbehaving. So there's no evidence either way.
> 
> FWIW, running into gdb bugs is something that happens to me almost on
> a regular basis. There's no point reporting those, as there's
> generally no response. In your case, you're in an unusual environment
> with a rather large and complicated .gdbinit file which does very
> strange things to avoid running into GDB bugs that we know about. All
> that increases the likelihood of your encountering a gdb bug that no
> one else has, or that has been reported but never responded to.

I don't buy this, sorry.  I use GDB every day in this very "unusual
environment", both when debugging Emacs and other programs.  The
probability of these being due to some bug in GDB or in .gdbinit
commands is very low, as I and others use them all the time.  It is
much more probable that the commands I've shown are signs of a real
trouble in Emacs and not in GDB.  I'm not willing to disregard what
those commands show me because they don't match your theory.  I prefer
facts.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-24 15:00             ` Pip Cet
  2020-05-24 16:25               ` Eli Zaretskii
@ 2020-05-24 19:00               ` Andy Moreton
  2020-05-24 19:09                 ` Pip Cet
  1 sibling, 1 reply; 132+ messages in thread
From: Andy Moreton @ 2020-05-24 19:00 UTC (permalink / raw)
  To: 41321

On Sun 24 May 2020, Pip Cet wrote:

> On Sun, May 24, 2020 at 2:24 PM Eli Zaretskii <eliz@gnu.org> wrote:
>> > From: Pip Cet <pipcet@gmail.com>
>> > Date: Sat, 23 May 2020 23:54:17 +0000
>> > Cc: Stefan Monnier <monnier@iro.umontreal.ca>, 41321@debbugs.gnu.org
>> >
>> > I think I've worked it out: it's this mingw bug:
>> > https://sourceforge.net/p/mingw-w64/bugs/778/
>>
>> Thank you for working on this tricky problem.
>>
>> FTR, I don't use that flavor of MinGW.
>
> So your flavor is even more broken than what Debian ships? That's
> interesting, which flavor is it?

FYI, there are two separate projects:
  mingw.org: 32bit only.
  mingw-w64: 32bit and 64bit, using a different C runtime.

On my machine a simple test program shows:

--------------------------------------------------------------
  project     gcc     cpu   alignof(max_align_t)
--------------------------------------------------------------
mingw.org   9.2.0    i686   16
mingw-w64  10.1.0    i686   16 (stdint.h before stddef.h)
                             8 (stdint.h after  stddef.h)
mingw-w64  10.1.0  x86_64   16
--------------------------------------------------------------

This problem only appears with the 32bit mingw-w64 toolchain.

Eli uses the mingw.org toolchain. Linux distros initially used
mingw.org, but switched to mingw-w64 cross compilers several years ago.

    AndyM






^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-24 19:00               ` Andy Moreton
@ 2020-05-24 19:09                 ` Pip Cet
  0 siblings, 0 replies; 132+ messages in thread
From: Pip Cet @ 2020-05-24 19:09 UTC (permalink / raw)
  To: Andy Moreton; +Cc: 41321

On Sun, May 24, 2020 at 7:01 PM Andy Moreton <andrewjmoreton@gmail.com> wrote:
> > So your flavor is even more broken than what Debian ships? That's
> > interesting, which flavor is it?
>
> FYI, there are two separate projects:
>   mingw.org: 32bit only.
>   mingw-w64: 32bit and 64bit, using a different C runtime.
>
> On my machine a simple test program shows:
>
> --------------------------------------------------------------
>   project     gcc     cpu   alignof(max_align_t)
> --------------------------------------------------------------
> mingw.org   9.2.0    i686   16
> mingw-w64  10.1.0    i686   16 (stdint.h before stddef.h)
>                              8 (stdint.h after  stddef.h)
> mingw-w64  10.1.0  x86_64   16
> --------------------------------------------------------------

Thanks!

> This problem only appears with the 32bit mingw-w64 toolchain.

FWIW, the problem is that the incorrect value of 16 is returned in
some cases. All 32bit toolchains appear to be broken. I said that
mingw.org was "more broken" than mingw-w64 because it _always_ returns
the incorrect value, rather than doing so only for an unfortunate
combination of #includes.

> Eli uses the mingw.org toolchain. Linux distros initially used
> mingw.org, but switched to mingw-w64 cross compilers several years ago.

I couldn't get the mingw.org toolchain to work at all...





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-24 18:40                     ` Eli Zaretskii
@ 2020-05-24 19:40                       ` Pip Cet
  2020-05-25  2:30                         ` Eli Zaretskii
  0 siblings, 1 reply; 132+ messages in thread
From: Pip Cet @ 2020-05-24 19:40 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 41321, Stefan Monnier

On Sun, May 24, 2020 at 6:40 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Pip Cet <pipcet@gmail.com>
> > Date: Sun, 24 May 2020 18:03:57 +0000
> > Cc: 41321@debbugs.gnu.org, Stefan Monnier <monnier@iro.umontreal.ca>
> >
> > > If you still claim that I didn't demonstrate that the buffer's overlay
> > > chain got corrupted
> >
> > I do, of course. The message GDB prints simply does not say anything
> > problematic about the buffer's overlay chain.
> >
> > > as part of the bug that caused the segfault,
> > > please point out what I missed here.
> >
> > You omitted the third call to xtype, which was even more clearly
> > nonsensical: xtype was misbehaving. We don't know in which way it was
> > misbehaving. So there's no evidence either way.
> >
> > FWIW, running into gdb bugs is something that happens to me almost on
> > a regular basis. There's no point reporting those, as there's
> > generally no response. In your case, you're in an unusual environment
> > with a rather large and complicated .gdbinit file which does very
> > strange things to avoid running into GDB bugs that we know about. All
> > that increases the likelihood of your encountering a gdb bug that no
> > one else has, or that has been reported but never responded to.
>
> I don't buy this, sorry.

So you think there's a second bug, located in Emacs, which causes GDB,
which isn't supposed to be broken by anything the debuggee does, to be
broken and respond in nonsensical ways?

> I use GDB every day in this very "unusual
> environment", both when debugging Emacs and other programs.

And you've never run into GDB bugs?

> The
> probability of these being due to some bug in GDB or in .gdbinit
> commands is very low, as I and others use them all the time.

I'm perfectly willing to help you trace down this bug (in GDB or
.gdbinit; we've already found the bug in mingw and the one in Emacs)
if it serves any purpose, but I suspect you don't have the time.

But I can't conceive of an explanation in which a bug in Emacs could
cause a bug-free GDB to respond in the nonsensical way your last
invocation of xtype did.

> It is much more probable that the commands I've shown are signs of a real
> trouble in Emacs and not in GDB.

Are you saying the bug I've found isn't "a real trouble"? I'm curious
as to what trouble you're imagining.

> I'm not willing to disregard what
> those commands show me because they don't match your theory.

What they show you is that memory at a certain address, which they
helpfully specify, isn't mapped.

You conclude that memory at a totally different address isn't mapped,
even though GDB quite explicitly never says so.

That conclusion is invalid.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-24 19:40                       ` Pip Cet
@ 2020-05-25  2:30                         ` Eli Zaretskii
  2020-05-25  6:40                           ` Pip Cet
  0 siblings, 1 reply; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-25  2:30 UTC (permalink / raw)
  To: Pip Cet; +Cc: 41321, monnier

> From: Pip Cet <pipcet@gmail.com>
> Date: Sun, 24 May 2020 19:40:09 +0000
> Cc: 41321@debbugs.gnu.org, Stefan Monnier <monnier@iro.umontreal.ca>
> 
> > I use GDB every day in this very "unusual
> > environment", both when debugging Emacs and other programs.
> 
> And you've never run into GDB bugs?

Not such blatant ones, no, and not lately.

> Are you saying the bug I've found isn't "a real trouble"?

I'm saying I'm not convinced that problem has anything to do with this
particular segfault.

> What they show you is that memory at a certain address, which they
> helpfully specify, isn't mapped.
> 
> You conclude that memory at a totally different address isn't mapped,
> even though GDB quite explicitly never says so.
> 
> That conclusion is invalid.

Your opinion, not mine, not yet anyway.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-25  2:30                         ` Eli Zaretskii
@ 2020-05-25  6:40                           ` Pip Cet
  2020-05-25 11:28                             ` Pip Cet
  2020-05-25 15:14                             ` Eli Zaretskii
  0 siblings, 2 replies; 132+ messages in thread
From: Pip Cet @ 2020-05-25  6:40 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 41321, Stefan Monnier

On Mon, May 25, 2020 at 2:30 AM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Pip Cet <pipcet@gmail.com>
> > Date: Sun, 24 May 2020 19:40:09 +0000
> > Cc: 41321@debbugs.gnu.org, Stefan Monnier <monnier@iro.umontreal.ca>
> > What they show you is that memory at a certain address, which they
> > helpfully specify, isn't mapped.
> >
> > You conclude that memory at a totally different address isn't mapped,
> > even though GDB quite explicitly never says so.
> >
> > That conclusion is invalid.
> Your opinion, not mine, not yet anyway.

Maybe I'm approaching this the wrong way: What are you actually planning to do?

I think we should work around the mingw bug on both the master and
emacs-27 branches.

We should also fix the (symbol-related) Emacs bug before it bites us:
on both branches, unless we can get a mingw user to provide the output
of "disassemble Fprog1" (and a bunch of other functions). (OTOH, we've
already decided to keep crashable GC bugs on the emacs-27 branch).

And we should wait and see whether similar crashes keep happening.

What we should not do is encourage people to keep looking for another
Emacs bug based on the existing backtraces.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-25  6:40                           ` Pip Cet
@ 2020-05-25 11:28                             ` Pip Cet
  2020-05-25 14:53                               ` Eli Zaretskii
  2020-05-26  3:33                               ` Paul Eggert
  2020-05-25 15:14                             ` Eli Zaretskii
  1 sibling, 2 replies; 132+ messages in thread
From: Pip Cet @ 2020-05-25 11:28 UTC (permalink / raw)
  To: Eli Zaretskii, eggert; +Cc: 41321, Stefan Monnier

On Mon, May 25, 2020 at 6:40 AM Pip Cet <pipcet@gmail.com> wrote:
> We should also fix the (symbol-related) Emacs bug before it bites us:
> on both branches, unless we can get a mingw user to provide the output
> of "disassemble Fprog1" (and a bunch of other functions). (OTOH, we've
> already decided to keep crashable GC bugs on the emacs-27 branch).

And I just noticed strings aren't aligned to LISP_ALIGNMENT on
x86_64-pc-linux-gnu.

I think we're going to have to weaken the maybe_lisp_pointer check to
check only for GC_ALIGNMENT.

The commit that introduced this problem, for what it's worth, is
967d2c55ef3908fd378e05b2a0070663ae45f6de





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-25 11:28                             ` Pip Cet
@ 2020-05-25 14:53                               ` Eli Zaretskii
  2020-05-25 15:12                                 ` Stefan Monnier
  2020-05-26  3:39                                 ` Paul Eggert
  2020-05-26  3:33                               ` Paul Eggert
  1 sibling, 2 replies; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-25 14:53 UTC (permalink / raw)
  To: Pip Cet, eggert; +Cc: 41321, monnier

> From: Pip Cet <pipcet@gmail.com>
> Date: Mon, 25 May 2020 11:28:46 +0000
> Cc: 41321@debbugs.gnu.org, Stefan Monnier <monnier@iro.umontreal.ca>
> 
> And I just noticed strings aren't aligned to LISP_ALIGNMENT on
> x86_64-pc-linux-gnu.
> 
> I think we're going to have to weaken the maybe_lisp_pointer check to
> check only for GC_ALIGNMENT.

I tend to agree.

Paul, why did we move to max_align_t as the alignment requirement?
AFAIU, GCC enlarged that recently to allow for _Float128 type (at
least on 32-bit hosts), but do we really need that?

Also, what does this mean for stack-based Lisp objects?  AFAIU, we
previously required 8-byte alignment on 32-bit hosts (and on
MS-Windows we jump through some hoops to guarantee that in callbacks
of Windows APIs and in thread functions that manipulate Lisp objects).
Does the use of max_align_t means that now stack-based Lisp objects
will need to have 16-byte alignment on 32-bit Windows?





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-25 14:53                               ` Eli Zaretskii
@ 2020-05-25 15:12                                 ` Stefan Monnier
  2020-05-26  3:39                                 ` Paul Eggert
  1 sibling, 0 replies; 132+ messages in thread
From: Stefan Monnier @ 2020-05-25 15:12 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: eggert, 41321, Pip Cet

>> I think we're going to have to weaken the maybe_lisp_pointer check to
>> check only for GC_ALIGNMENT.

Sounds about right: the only alignment we really need for Lisp_Objects
is the GC_ALIGNMENT that allows us to use the 3 LSB for tags.
src/alloc.c makes efforts to ensure this alignment and for some objects
(e.g. Lisp_Floats as well as (on 32bit hosts) Lisp_Cons cells) that's
the only alignment we can meaningfully impose since those objects are
only 64bit in size.


        Stefan






^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-25  6:40                           ` Pip Cet
  2020-05-25 11:28                             ` Pip Cet
@ 2020-05-25 15:14                             ` Eli Zaretskii
  2020-05-25 17:41                               ` Pip Cet
  1 sibling, 1 reply; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-25 15:14 UTC (permalink / raw)
  To: Pip Cet; +Cc: 41321, monnier

> From: Pip Cet <pipcet@gmail.com>
> Date: Mon, 25 May 2020 06:40:11 +0000
> Cc: 41321@debbugs.gnu.org, Stefan Monnier <monnier@iro.umontreal.ca>
> 
> What are you actually planning to do?

Given the fact that I'm the only one who sees these problems?  Not
much: I intend to continue running Emacs under GDB and collect data
about the crashes until either I figure out what causes the crashes,
or the crashes disappear (which would mean the problem was fixed
indirectly by some other change).

> I think we should work around the mingw bug on both the master and
> emacs-27 branches.

That depends on what the proposed solution or workaround will be.  We
need to see where the discussion of the alignment issue goes and what
we decide to do about that.

> What we should not do is encourage people to keep looking for another
> Emacs bug based on the existing backtraces.

Indeed, I'm posting the backtraces for the record; no one should feel
compelled to study them unless they are interested.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-25 15:14                             ` Eli Zaretskii
@ 2020-05-25 17:41                               ` Pip Cet
  0 siblings, 0 replies; 132+ messages in thread
From: Pip Cet @ 2020-05-25 17:41 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 41321, Stefan Monnier

On Mon, May 25, 2020 at 3:14 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Pip Cet <pipcet@gmail.com>
> > Date: Mon, 25 May 2020 06:40:11 +0000
> > Cc: 41321@debbugs.gnu.org, Stefan Monnier <monnier@iro.umontreal.ca>
> >
> > What are you actually planning to do?
> Not
> much: I intend to continue running Emacs under GDB and collect data
> about the crashes until either I figure out what causes the crashes,
> or the crashes disappear (which would mean the problem was fixed
> indirectly by some other change).

(Or directly, of course. I still believe my "theory" about your bug is correct.)

> > I think we should work around the mingw bug on both the master and
> > emacs-27 branches.
>
> That depends on what the proposed solution or workaround will be.

For emacs-27, reducing the alignment requirement in
maybe_lisp_pointer: that will only make us check more pointers, not
fewer, so while it is a GC change it's one that makes sense.

For master, I'd consider setting LISP_ALIGNMENT to 8 on the mingw32
platform, where memory is already scarce. I don't trust the alleged
performance hit of 20%, so we might have to collect some actual
performance data. But we definitely need to make strings aligned to
LISP_ALIGNMENT, one way or the other, because that's the original
reason for maybe_mark_pointer.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-25 11:28                             ` Pip Cet
  2020-05-25 14:53                               ` Eli Zaretskii
@ 2020-05-26  3:33                               ` Paul Eggert
  2020-05-26  6:18                                 ` Pip Cet
  2020-05-26  6:46                                 ` Paul Eggert
  1 sibling, 2 replies; 132+ messages in thread
From: Paul Eggert @ 2020-05-26  3:33 UTC (permalink / raw)
  To: Pip Cet, Eli Zaretskii; +Cc: 41321, Stefan Monnier

On 5/25/20 4:28 AM, Pip Cet wrote:

> And I just noticed strings aren't aligned to LISP_ALIGNMENT on
> x86_64-pc-linux-gnu.

Could you explain? Strings are allocated via allocate_string -> lisp_malloc ->
lmalloc, and lmalloc is supposed to align to LISP_ALIGNMENT for strings just
like it does for other Lisp objects.

String data (struct sdata) is not Lisp-aligned, but it doesn't need to be.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-25 14:53                               ` Eli Zaretskii
  2020-05-25 15:12                                 ` Stefan Monnier
@ 2020-05-26  3:39                                 ` Paul Eggert
  1 sibling, 0 replies; 132+ messages in thread
From: Paul Eggert @ 2020-05-26  3:39 UTC (permalink / raw)
  To: Eli Zaretskii, Pip Cet; +Cc: 41321, monnier

On 5/25/20 7:53 AM, Eli Zaretskii wrote:

> why did we move to max_align_t as the alignment requirement?
> AFAIU, GCC enlarged that recently to allow for _Float128 type (at
> least on 32-bit hosts), but do we really need that?

Not on current glibc on any platform that I know, no. I was merely trying to
keep the code portable to platforms where (say) alignof (pthread_cond_t) == 16.
POSIX allows this, and this sort of thing is likely to happen somewhere in the
not-too-distant future, for performance reasons.

> Does the use of max_align_t means that now stack-based Lisp objects
> will need to have 16-byte alignment on 32-bit Windows?

No, because we don't need to GC stack-based objects themselves (the stack will
reclaim them) and the GC find everything they point to (as it scans the stack).





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-26  3:33                               ` Paul Eggert
@ 2020-05-26  6:18                                 ` Pip Cet
  2020-05-26  7:51                                   ` Paul Eggert
  2020-05-26  6:46                                 ` Paul Eggert
  1 sibling, 1 reply; 132+ messages in thread
From: Pip Cet @ 2020-05-26  6:18 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 41321, Stefan Monnier

On Tue, May 26, 2020 at 3:33 AM Paul Eggert <eggert@cs.ucla.edu> wrote:
> On 5/25/20 4:28 AM, Pip Cet wrote:
>
> > And I just noticed strings aren't aligned to LISP_ALIGNMENT on
> > x86_64-pc-linux-gnu.
>
> Could you explain? Strings are allocated via allocate_string -> lisp_malloc ->
> lmalloc, and lmalloc is supposed to align to LISP_ALIGNMENT for strings just
> like it does for other Lisp objects.

Sorry. You're right, the non-aligned strings aren't relevant for GC.

However, this is only because struct Lisp_String happens to have an
even number of words. If someone changes that, the old code would
break...

We're still going to have to deal with symbols on --wide-int builds
when the two halves of the wide int are saved non-consecutively.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-26  3:33                               ` Paul Eggert
  2020-05-26  6:18                                 ` Pip Cet
@ 2020-05-26  6:46                                 ` Paul Eggert
  2020-05-26 15:17                                   ` Eli Zaretskii
  1 sibling, 1 reply; 132+ messages in thread
From: Paul Eggert @ 2020-05-26  6:46 UTC (permalink / raw)
  To: Pip Cet, Eli Zaretskii; +Cc: 41321, Stefan Monnier

[-- Attachment #1: Type: text/plain, Size: 629 bytes --]

On 5/25/20 8:33 PM, Paul Eggert wrote:
> On 5/25/20 4:28 AM, Pip Cet wrote:
> 
>> And I just noticed strings aren't aligned to LISP_ALIGNMENT on
>> x86_64-pc-linux-gnu.
> 
> Could you explain?

Oh, never mind, I figured it out. Sorry about the noise.

I installed the first attached patch to fix the bug on master (as a series of
commits, the leading ones not quite right unfortunately). This patch does what
you proposed, and also tightens up some of the related alignment checks.

I propose the second patch for emacs-27; it's limited to what you proposed,
namely, it weakens maybe_lisp_pointer to check only for GC_ALIGNMENT.

[-- Attachment #2: emacs.diff --]
[-- Type: text/x-patch, Size: 6907 bytes --]

diff --git a/src/alloc.c b/src/alloc.c
index d5a6d9167e..f8609398a3 100644
--- a/src/alloc.c
+++ b/src/alloc.c
@@ -104,6 +104,46 @@ Copyright (C) 1985-1986, 1988, 1993-1995, 1997-2020 Free Software
 #include "w32heap.h"	/* for sbrk */
 #endif
 
+/* A type with alignment at least as large as any object that Emacs
+   allocates.  This is not max_align_t because some platforms (e.g.,
+   mingw) have buggy malloc implementations that do not align for
+   max_align_t.  This union contains types of all GCALIGNED_STRUCT
+   components visible here.  */
+union emacs_align_type
+{
+  struct frame frame;
+  struct Lisp_Bignum Lisp_Bignum;
+  struct Lisp_Bool_Vector Lisp_Bool_Vector;
+  struct Lisp_Char_Table Lisp_Char_Table;
+  struct Lisp_CondVar Lisp_CondVar;
+  struct Lisp_Finalizer Lisp_Finalizer;
+  struct Lisp_Float Lisp_Float;
+  struct Lisp_Hash_Table Lisp_Hash_Table;
+  struct Lisp_Marker Lisp_Marker;
+  struct Lisp_Misc_Ptr Lisp_Misc_Ptr;
+  struct Lisp_Mutex Lisp_Mutex;
+  struct Lisp_Overlay Lisp_Overlay;
+  struct Lisp_Sub_Char_Table Lisp_Sub_Char_Table;
+  struct Lisp_Subr Lisp_Subr;
+  struct Lisp_User_Ptr Lisp_User_Ptr;
+  struct Lisp_Vector Lisp_Vector;
+  struct terminal terminal;
+  struct thread_state thread_state;
+  struct window window;
+
+  /* Omit the following since they would require including process.h
+     etc.  In practice their alignments never exceed that of the
+     structs already listed.  */
+#if 0
+  struct Lisp_Module_Function Lisp_Module_Function;
+  struct Lisp_Process Lisp_Process;
+  struct save_window_data save_window_data;
+  struct scroll_bar scroll_bar;
+  struct xwidget_view xwidget_view;
+  struct xwidget xwidget;
+#endif
+};
+
 /* MALLOC_SIZE_NEAR (N) is a good number to pass to malloc when
    allocating a block of memory with size close to N bytes.
    For best results N should be a power of 2.
@@ -112,9 +152,9 @@ Copyright (C) 1985-1986, 1988, 1993-1995, 1997-2020 Free Software
    adds sizeof (size_t) to SIZE for internal overhead, and then rounds
    up to a multiple of MALLOC_ALIGNMENT.  Emacs can improve
    performance a bit on GNU platforms by arranging for the resulting
-   size to be a power of two.  This heuristic is good for glibc 2.0
-   (1997) through at least glibc 2.31 (2020), and does not affect
-   correctness on other platforms.  */
+   size to be a power of two.  This heuristic is good for glibc 2.26
+   (2017) and later, and does not affect correctness on other
+   platforms.  */
 
 #define MALLOC_SIZE_NEAR(n) \
   (ROUNDUP (max (n, sizeof (size_t)), MALLOC_ALIGNMENT) - sizeof (size_t))
@@ -655,25 +695,19 @@ buffer_memory_full (ptrdiff_t nbytes)
 #define COMMON_MULTIPLE(a, b) \
   ((a) % (b) == 0 ? (a) : (b) % (a) == 0 ? (b) : (a) * (b))
 
-/* LISP_ALIGNMENT is the alignment of Lisp objects.  It must be at
-   least GCALIGNMENT so that pointers can be tagged.  It also must be
-   at least as strict as the alignment of all the C types used to
-   implement Lisp objects; since pseudovectors can contain any C type,
-   this is max_align_t.  On recent GNU/Linux x86 and x86-64 this can
-   often waste up to 8 bytes, since alignof (max_align_t) is 16 but
-   typical vectors need only an alignment of 8.  Although shrinking
-   the alignment to 8 would save memory, it cost a 20% hit to Emacs
-   CPU performance on Fedora 28 x86-64 when compiled with gcc -m32.  */
-enum { LISP_ALIGNMENT = alignof (union { max_align_t x;
+/* Alignment needed for memory blocks that are allocated via malloc
+   and that contain Lisp objects.  On typical hosts malloc already
+   aligns sufficiently, but extra work is needed on oddball hosts
+   where Emacs would crash if malloc returned a non-GCALIGNED pointer.  */
+enum { LISP_ALIGNMENT = alignof (union { union emacs_align_type x;
 					 GCALIGNED_UNION_MEMBER }) };
 verify (LISP_ALIGNMENT % GCALIGNMENT == 0);
 
 /* True if malloc (N) is known to return storage suitably aligned for
    Lisp objects whenever N is a multiple of LISP_ALIGNMENT.  In
    practice this is true whenever alignof (max_align_t) is also a
-   multiple of LISP_ALIGNMENT.  This works even for x86, where some
-   platform combinations (e.g., GCC 7 and later, glibc 2.25 and
-   earlier) have bugs where alignof (max_align_t) is 16 even though
+   multiple of LISP_ALIGNMENT.  This works even for buggy platforms
+   like MinGW circa 2020, where alignof (max_align_t) is 16 even though
    the malloc alignment is only 8, and where Emacs still works because
    it never does anything that requires an alignment of 16.  */
 enum { MALLOC_IS_LISP_ALIGNED = alignof (max_align_t) % LISP_ALIGNMENT == 0 };
@@ -4657,12 +4691,12 @@ mark_maybe_objects (Lisp_Object const *array, ptrdiff_t nelts)
    collected, and false otherwise (i.e., false if it is easy to see
    that P cannot point to Lisp data that can be garbage collected).
    Symbols are implemented via offsets not pointers, but the offsets
-   are also multiples of LISP_ALIGNMENT.  */
+   are also multiples of GCALIGNMENT.  */
 
 static bool
 maybe_lisp_pointer (void *p)
 {
-  return (uintptr_t) p % LISP_ALIGNMENT == 0;
+  return (uintptr_t) p % GCALIGNMENT == 0;
 }
 
 /* If P points to Lisp data, mark that as live if it isn't already
@@ -4885,9 +4919,10 @@ test_setjmp (void)
    as a stack scan limit.  */
 typedef union
 {
-  /* Align the stack top properly.  Even if !HAVE___BUILTIN_UNWIND_INIT,
-     jmp_buf may not be aligned enough on darwin-ppc64.  */
-  max_align_t o;
+  /* Make sure stack_top and m_stack_bottom are properly aligned as GC
+     expects.  */
+  Lisp_Object o;
+  void *p;
 #ifndef HAVE___BUILTIN_UNWIND_INIT
   sys_jmp_buf j;
   char c;
diff --git a/src/lisp.h b/src/lisp.h
index 85bdc172b2..8bd83a888c 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -277,7 +277,8 @@ DEFINE_GDB_SYMBOL_END (VALMASK)
    allocation in a containing union that has GCALIGNED_UNION_MEMBER)
    and does not contain a GC-aligned struct or union, putting
    GCALIGNED_STRUCT after its closing '}' can help the compiler
-   generate better code.
+   generate better code.  Also, such structs should be added to the
+   emacs_align_type union in alloc.c.
 
    Although these macros are reasonably portable, they are not
    guaranteed on non-GCC platforms, as C11 does not require support
diff --git a/src/thread.c b/src/thread.c
index df1a705382..b638dd77f8 100644
--- a/src/thread.c
+++ b/src/thread.c
@@ -717,12 +717,17 @@ run_thread (void *state)
 {
   /* Make sure stack_top and m_stack_bottom are properly aligned as GC
      expects.  */
-  max_align_t stack_pos;
+  union
+  {
+    Lisp_Object o;
+    void *p;
+    char c;
+  } stack_pos;
 
   struct thread_state *self = state;
   struct thread_state **iter;
 
-  self->m_stack_bottom = self->stack_top = (char *) &stack_pos;
+  self->m_stack_bottom = self->stack_top = &stack_pos.c;
   self->thread_id = sys_thread_self ();
 
   if (self->thread_name)

[-- Attachment #3: 0001-Fix-aborts-due-to-GC-losing-pseudovectors.patch --]
[-- Type: text/x-patch, Size: 1191 bytes --]

From 7466aeeb4ac3dace283ecef00c2b38148b56b3b3 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Mon, 25 May 2020 23:39:37 -0700
Subject: [PATCH] Fix aborts due to GC losing pseudovectors

Problem reported by Eli Zaretskii (Bug#41321).
* src/alloc.c (maybe_lisp_pointer): Modulo GCALIGNMENT,
not modulo LISP_ALIGNMENT.  Master has a more-elaborate fix.
Do not merge to master.
---
 src/alloc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/alloc.c b/src/alloc.c
index 1c6b664b22..c7a4a3ee86 100644
--- a/src/alloc.c
+++ b/src/alloc.c
@@ -4589,12 +4589,12 @@ mark_maybe_objects (Lisp_Object const *array, ptrdiff_t nelts)
    collected, and false otherwise (i.e., false if it is easy to see
    that P cannot point to Lisp data that can be garbage collected).
    Symbols are implemented via offsets not pointers, but the offsets
-   are also multiples of LISP_ALIGNMENT.  */
+   are also multiples of GCALIGNMENT.  */
 
 static bool
 maybe_lisp_pointer (void *p)
 {
-  return (uintptr_t) p % LISP_ALIGNMENT == 0;
+  return (uintptr_t) p % GCALIGNMENT == 0;
 }
 
 /* If P points to Lisp data, mark that as live if it isn't already
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-26  6:18                                 ` Pip Cet
@ 2020-05-26  7:51                                   ` Paul Eggert
  2020-05-26  8:27                                     ` Pip Cet
  0 siblings, 1 reply; 132+ messages in thread
From: Paul Eggert @ 2020-05-26  7:51 UTC (permalink / raw)
  To: Pip Cet; +Cc: 41321, Stefan Monnier

[-- Attachment #1: Type: text/plain, Size: 872 bytes --]

On 5/25/20 11:18 PM, Pip Cet wrote:

> However, this is only because struct Lisp_String happens to have an
> even number of words. If someone changes that, the old code would
> break...

No, because struct Lisp_String contains a GCALIGNED_UNION_MEMBER, so it is
always GC-aligned, and (for older compilers that don't support alignas (8)) this
is checked statically via 'verify (GCALIGNED (struct Lisp_String))'.

Now that I've looked at it, though, I see that I forgot to do something similar
with struct Lisp_Float, which has the same issue. Fixed by installing the
attached patch on master.

> We're still going to have to deal with symbols on --wide-int builds
> when the two halves of the wide int are saved non-consecutively.

Yes, I think that's the most pressing issue in this area. I will have to take a
break now, though, since I have sleep and other work to do.

[-- Attachment #2: 0001-Port-struct-Lisp_FLoat-to-oddball-platforms.patch --]
[-- Type: text/x-patch, Size: 1203 bytes --]

From cfd5e106c3c9334de93ccda0d65523476553fb1f Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Tue, 26 May 2020 00:47:24 -0700
Subject: [PATCH] Port struct Lisp_FLoat to oddball platforms
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* src/lisp.h (struct Lisp_Float): Declare via
GCALIGNED_UNION_MEMBER, not via GCALIGNED_STRUCT, since alloc.c
creates these in arrays and GCALIGNED_STRUCT does not necessarily
suffice to align struct Lisp_Float when it’s used in an array.
This avoids undefined behavior on oddball machines where
sizeof (struct Lisp_Float) is not a multiple of 8 and the compiler
does not support __attribute__ ((aligned 8)).
---
 src/lisp.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/lisp.h b/src/lisp.h
index 8bd83a888c..f5d581a2f1 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -2801,8 +2801,10 @@ XBUFFER_OBJFWD (lispfwd a)
     {
       double data;
       struct Lisp_Float *chain;
+      GCALIGNED_UNION_MEMBER
     } u;
-  } GCALIGNED_STRUCT;
+  };
+verify (GCALIGNED (struct Lisp_Float));
 
 INLINE bool
 (FLOATP) (Lisp_Object x)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-26  7:51                                   ` Paul Eggert
@ 2020-05-26  8:27                                     ` Pip Cet
  0 siblings, 0 replies; 132+ messages in thread
From: Pip Cet @ 2020-05-26  8:27 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 41321, Stefan Monnier

On Tue, May 26, 2020 at 7:51 AM Paul Eggert <eggert@cs.ucla.edu> wrote:
> On 5/25/20 11:18 PM, Pip Cet wrote:
> > However, this is only because struct Lisp_String happens to have an
> > even number of words. If someone changes that, the old code would
> > break...
>
> No, because struct Lisp_String contains a GCALIGNED_UNION_MEMBER, so it is
> always GC-aligned, and (for older compilers that don't support alignas (8)) this
> is checked statically via 'verify (GCALIGNED (struct Lisp_String))'.

As I said, this was specific to the old code, where LISP_ALIGNMENT,
not GCALIGNMENT, was used by maybe_lisp_pointer. Things should be fine
now (apart from the issue below)!

> Now that I've looked at it, though, I see that I forgot to do something similar
> with struct Lisp_Float, which has the same issue. Fixed by installing the
> attached patch on master.

LGTM.

> > We're still going to have to deal with symbols on --wide-int builds
> > when the two halves of the wide int are saved non-consecutively.
>
> Yes, I think that's the most pressing issue in this area.
> I will have to take a
> break now, though, since I have sleep and other work to do.

Thanks for all the patches and comments!





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-26  6:46                                 ` Paul Eggert
@ 2020-05-26 15:17                                   ` Eli Zaretskii
  2020-05-26 22:49                                     ` Paul Eggert
  0 siblings, 1 reply; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-26 15:17 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 41321, monnier, pipcet

> From: Paul Eggert <eggert@cs.ucla.edu>
> Cc: 41321@debbugs.gnu.org, Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Mon, 25 May 2020 23:46:02 -0700
> 
> I propose the second patch for emacs-27; it's limited to what you proposed,
> namely, it weakens maybe_lisp_pointer to check only for GC_ALIGNMENT.
> 
>  static bool
>  maybe_lisp_pointer (void *p)
>  {
> -  return (uintptr_t) p % LISP_ALIGNMENT == 0;
> +  return (uintptr_t) p % GCALIGNMENT == 0;
>  }

On non-USE_LSB_TAG systems, GCALIGNMENT is 1, so this doesn't look
right (or maybe I'm missing something).





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-26 15:17                                   ` Eli Zaretskii
@ 2020-05-26 22:49                                     ` Paul Eggert
  2020-05-27 15:26                                       ` Eli Zaretskii
  0 siblings, 1 reply; 132+ messages in thread
From: Paul Eggert @ 2020-05-26 22:49 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 41321, monnier, pipcet

[-- Attachment #1: Type: text/plain, Size: 982 bytes --]

On 5/26/20 8:17 AM, Eli Zaretskii wrote:
>>  static bool
>>  maybe_lisp_pointer (void *p)
>>  {
>> -  return (uintptr_t) p % LISP_ALIGNMENT == 0;
>> +  return (uintptr_t) p % GCALIGNMENT == 0;
>>  }
> On non-USE_LSB_TAG systems, GCALIGNMENT is 1, so this doesn't look
> right (or maybe I'm missing something).

Good point; I'd neglected that. I.e., on !USE_LSB_TAG systems the proposed
emacs-27 patch is overly-conservative, as it causes maybe_lisp_pointer to always
return true. Although this hurts GC performance it doesn't affect correctness
and the patch does fix a crash on USE_LSB_TAG systems, so it (or something like
it) is needed for emacs-27.

I installed the attached patch into master to fix the !USE_LSB_TAG performance
issue you raised.  This patch does not fix crashes; it's merely a performance tweak.

I am planning on looking into related crashes for Lisp_Symbol next. Perhaps we
should wait on that before worrying about what exact patch should go into emacs-27.

[-- Attachment #2: 0001-Tweak-GC-performance-if-USE_LSB_TAG.patch --]
[-- Type: text/x-patch, Size: 2104 bytes --]

From 7a83c4f66cb945d43dcaf8c37f4af1334d34f501 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Tue, 26 May 2020 15:47:59 -0700
Subject: [PATCH] Tweak GC performance if !USE_LSB_TAG

Performance issue reported by Eli Zaretskii (Bug#41321#149).
* src/alloc.c (GC_OBJECT_ALIGNMENT_MINIMUM): New constant.
(maybe_lisp_pointer): Use it instead of GCALIGNMENT.
---
 src/alloc.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/src/alloc.c b/src/alloc.c
index f8609398a3..e241b9933a 100644
--- a/src/alloc.c
+++ b/src/alloc.c
@@ -4687,16 +4687,33 @@ mark_maybe_objects (Lisp_Object const *array, ptrdiff_t nelts)
     mark_maybe_object (*array);
 }
 
+/* A lower bound on the alignment of Lisp objects that need marking.
+   Although 1 is safe, higher values speed up mark_maybe_pointer.
+   If USE_LSB_TAG, this value is typically GCALIGNMENT; otherwise,
+   it's determined by the natural alignment of Lisp structs.
+   All vectorlike objects have alignment at least that of union
+   vectorlike_header and it's unlikely they all have alignment greater,
+   so use the union as a safe and likely-accurate standin for
+   vectorlike objects.  */
+
+enum { GC_OBJECT_ALIGNMENT_MINIMUM
+         = max (GCALIGNMENT,
+		min (alignof (union vectorlike_header),
+		     min (min (alignof (struct Lisp_Cons),
+			       alignof (struct Lisp_Float)),
+			  min (alignof (struct Lisp_String),
+			       alignof (struct Lisp_Symbol))))) };
+
 /* Return true if P might point to Lisp data that can be garbage
    collected, and false otherwise (i.e., false if it is easy to see
    that P cannot point to Lisp data that can be garbage collected).
    Symbols are implemented via offsets not pointers, but the offsets
-   are also multiples of GCALIGNMENT.  */
+   are also multiples of GC_OBJECT_ALIGNMENT_MINIMUM.  */
 
 static bool
 maybe_lisp_pointer (void *p)
 {
-  return (uintptr_t) p % GCALIGNMENT == 0;
+  return (uintptr_t) p % GC_OBJECT_ALIGNMENT_MINIMUM == 0;
 }
 
 /* If P points to Lisp data, mark that as live if it isn't already
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-26 22:49                                     ` Paul Eggert
@ 2020-05-27 15:26                                       ` Eli Zaretskii
  2020-05-27 16:58                                         ` Paul Eggert
  0 siblings, 1 reply; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-27 15:26 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 41321, monnier, pipcet

> Cc: pipcet@gmail.com, 41321@debbugs.gnu.org, monnier@iro.umontreal.ca
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Tue, 26 May 2020 15:49:24 -0700
> 
> > On non-USE_LSB_TAG systems, GCALIGNMENT is 1, so this doesn't look
> > right (or maybe I'm missing something).
> 
> Good point; I'd neglected that. I.e., on !USE_LSB_TAG systems the proposed
> emacs-27 patch is overly-conservative, as it causes maybe_lisp_pointer to always
> return true. Although this hurts GC performance it doesn't affect correctness
> and the patch does fix a crash on USE_LSB_TAG systems, so it (or something like
> it) is needed for emacs-27.

We used to rely on 8-byte alignment on those systems, and I don't see
any reason not to continue relying on that and punishing those
systems' performance.  What would we gain?





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-27 15:26                                       ` Eli Zaretskii
@ 2020-05-27 16:58                                         ` Paul Eggert
  2020-05-27 17:33                                           ` Eli Zaretskii
                                                             ` (2 more replies)
  0 siblings, 3 replies; 132+ messages in thread
From: Paul Eggert @ 2020-05-27 16:58 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 41321, monnier, pipcet

On 5/27/20 8:26 AM, Eli Zaretskii wrote:
> We used to rely on 8-byte alignment on those systems, and I don't see
> any reason not to continue relying on that and punishing those
> systems' performance.  What would we gain?

In looking into this more, it appears that the maybe_lisp_pointer idea is wrong,
in that compilers can make pointers into a Lisp object while losing the address
of the original object (and we've seen them do this) and there's no guarantee
that these sub-pointers are GCALIGNED. This sort of failure should be quite rare
but can cause crashes such as the one you observed. I am looking into a fix and
plan to apply it to master (I've already installed some minor glitches I
observed on the way); we can then talk about what to do with emacs-27.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-27 16:58                                         ` Paul Eggert
@ 2020-05-27 17:33                                           ` Eli Zaretskii
  2020-05-27 17:53                                             ` Paul Eggert
  2020-05-27 17:57                                           ` Pip Cet
  2020-05-28 18:27                                           ` Eli Zaretskii
  2 siblings, 1 reply; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-27 17:33 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 41321, monnier, pipcet

> Cc: pipcet@gmail.com, 41321@debbugs.gnu.org, monnier@iro.umontreal.ca
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Wed, 27 May 2020 09:58:11 -0700
> 
> In looking into this more, it appears that the maybe_lisp_pointer idea is wrong,
> in that compilers can make pointers into a Lisp object while losing the address
> of the original object (and we've seen them do this) and there's no guarantee
> that these sub-pointers are GCALIGNED.

Sorry, I don't follow: what do you mean by "losing the address of the
original object" in this case?  Can you show an example?





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-27 17:33                                           ` Eli Zaretskii
@ 2020-05-27 17:53                                             ` Paul Eggert
  2020-05-27 18:24                                               ` Eli Zaretskii
  2020-05-28  2:43                                               ` Stefan Monnier
  0 siblings, 2 replies; 132+ messages in thread
From: Paul Eggert @ 2020-05-27 17:53 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 41321, monnier, pipcet

On 5/27/20 10:33 AM, Eli Zaretskii wrote:
> Sorry, I don't follow: what do you mean by "losing the address of the
> original object" in this case?  Can you show an example?

The source code says

   for (i = 0; i < size; i++)
      foo (AREF (obj, i));

This is the last reference to obj, so the compiler reuses the register R holding
obj, and has that register R contain &XVECTOR (obj)->contents[0], &XVECTOR
(obj)->contents[1], etc. each time through the loop, and transforms the call
into foo (*R) as an optimization. When foo calls the garbage collector,
maybe_lisp_pointer (R) can be false because R doesn't point directly at a Lisp
object: it points somewhere into the middle of a Lisp object and R's value is
not GC-aligned.

We've seen compilers do that.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-27 16:58                                         ` Paul Eggert
  2020-05-27 17:33                                           ` Eli Zaretskii
@ 2020-05-27 17:57                                           ` Pip Cet
  2020-05-27 18:39                                             ` Paul Eggert
  2020-05-28 18:27                                           ` Eli Zaretskii
  2 siblings, 1 reply; 132+ messages in thread
From: Pip Cet @ 2020-05-27 17:57 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 41321, Stefan Monnier

On Wed, May 27, 2020 at 4:58 PM Paul Eggert <eggert@cs.ucla.edu> wrote:
> On 5/27/20 8:26 AM, Eli Zaretskii wrote:
> > We used to rely on 8-byte alignment on those systems, and I don't see
> > any reason not to continue relying on that and punishing those
> > systems' performance.  What would we gain?
>
> In looking into this more, it appears that the maybe_lisp_pointer idea is wrong,
> in that compilers can make pointers into a Lisp object while losing the address
> of the original object (and we've seen them do this) and there's no guarantee
> that these sub-pointers are GCALIGNED.

Do you know of anything like this happening on 64-bit systems? Because
I think it doesn't; Emacs GC does rely, and has always relied since
GCPRO was removed, on compilers being sensible about what they put on
the stack. There's no guarantee in the C standard that that's true,
but there never will be.

> This sort of failure should be quite rare
> but can cause crashes such as the one you observed.

I'm pretty sure we figured out the crash that Eli observed. It's not
anything that involved, just a Lisp_Object being stored
non-consecutively and simultaneously being misaligned for the purposes
of maybe_lisp_pointer.

> I am looking into a fix and
> plan to apply it to master (I've already installed some minor glitches I
> observed on the way); we can then talk about what to do with emacs-27.

I may be out of line, but I think it's rash to change things like
that, even on master, with no opportunity for prior discussion. This
isn't a minor bug, or a spelling fix: it's a fundamental change in
what we expect from our C compiler and how GC works. In particular, I
don't see how you plan to solve it without treating any pointer that
points even in the vicinity of a valid lisp object as keeping that
object alive.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-27 17:53                                             ` Paul Eggert
@ 2020-05-27 18:24                                               ` Eli Zaretskii
  2020-05-27 18:39                                                 ` Paul Eggert
  2020-05-28  2:43                                               ` Stefan Monnier
  1 sibling, 1 reply; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-27 18:24 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 41321, monnier, pipcet

> Cc: pipcet@gmail.com, 41321@debbugs.gnu.org, monnier@iro.umontreal.ca
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Wed, 27 May 2020 10:53:22 -0700
> 
> The source code says
> 
>    for (i = 0; i < size; i++)
>       foo (AREF (obj, i));
> 
> This is the last reference to obj, so the compiler reuses the register R holding
> obj, and has that register R contain &XVECTOR (obj)->contents[0], &XVECTOR
> (obj)->contents[1], etc. each time through the loop, and transforms the call
> into foo (*R) as an optimization. When foo calls the garbage collector,
> maybe_lisp_pointer (R) can be false because R doesn't point directly at a Lisp
> object: it points somewhere into the middle of a Lisp object and R's value is
> not GC-aligned.

For this to cause trouble, you'd need to arrange for no other
reference to obj, neither anywhere else up the callstack, nor from
another object we will mark.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-27 17:57                                           ` Pip Cet
@ 2020-05-27 18:39                                             ` Paul Eggert
  2020-05-27 18:56                                               ` Pip Cet
  0 siblings, 1 reply; 132+ messages in thread
From: Paul Eggert @ 2020-05-27 18:39 UTC (permalink / raw)
  To: Pip Cet; +Cc: 41321, Stefan Monnier

On 5/27/20 10:57 AM, Pip Cet wrote:

> Do you know of anything like this happening on 64-bit systems?

I think it's unlikely on 64-bit systems; it'd happen only on platforms where
alignof (void *) < 8, such as x86.

> Emacs GC does rely, and has always relied since
> GCPRO was removed, on compilers being sensible about what they put on
> the stack.

This isn't merely an issue about what compilers put into the stack; it's an also
an issue of what's in registers. There may not be any pointer in the stack that
points into the Lisp object. And compilers are not always "sensible" about
temps; they may cache &P->x into a temp with no copy of P anywhere.

> I'm pretty sure we figured out the crash that Eli observed. It's not
> anything that involved, just a Lisp_Object being stored
> non-consecutively and simultaneously being misaligned for the purposes
> of maybe_lisp_pointer.

Not sure what the point is here. None of this is "that involved". We can have
pointers into Lisp objects, pointers that are not aligned for the purposes of
maybe_lisp_pointer. Emacs should follow all of them, not just the one that Eli
happened to observe.

> I don't see how you plan to solve it without treating any pointer that
> points even in the vicinity of a valid lisp object as keeping that
> object alive.
Yes, of course. Any pointer that points somewhere within a Lisp object (in the C
sense) should count as pointing to the object. If memory serves, we already
treat pointers that way in some places; unfortunately we're not doing it
consistently.

But I take your point; I'll post the change here before committing to master.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-27 18:24                                               ` Eli Zaretskii
@ 2020-05-27 18:39                                                 ` Paul Eggert
  0 siblings, 0 replies; 132+ messages in thread
From: Paul Eggert @ 2020-05-27 18:39 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 41321, monnier, pipcet

On 5/27/20 11:24 AM, Eli Zaretskii wrote:
> For this to cause trouble, you'd need to arrange for no other
> reference to obj, neither anywhere else up the callstack, nor from
> another object we will mark.

Yes, that's right. It's unlikely, but it does happen and we've seen it happen in
the past.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-27 18:39                                             ` Paul Eggert
@ 2020-05-27 18:56                                               ` Pip Cet
  2020-05-28  1:21                                                 ` Paul Eggert
  0 siblings, 1 reply; 132+ messages in thread
From: Pip Cet @ 2020-05-27 18:56 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 41321, Stefan Monnier

On Wed, May 27, 2020 at 6:39 PM Paul Eggert <eggert@cs.ucla.edu> wrote:
> On 5/27/20 10:57 AM, Pip Cet wrote:
>
> > Do you know of anything like this happening on 64-bit systems?
>
> I think it's unlikely on 64-bit systems; it'd happen only on platforms where
> alignof (void *) < 8, such as x86.
>
> > Emacs GC does rely, and has always relied since
> > GCPRO was removed, on compilers being sensible about what they put on
> > the stack.
>
> This isn't merely an issue about what compilers put into the stack; it's an also
> an issue of what's in registers. There may not be any pointer in the stack that
> points into the Lisp object. And compilers are not always "sensible" about
> temps; they may cache &P->x into a temp with no copy of P anywhere.

Or they may cache &P->x + 1, and use negative offsets to access it.
That used to be the most efficient way of accessing arrays on some
machines. We simply can't cater to that.

Think about code like:

Lisp_Object reverse(Lisp_Object vector)
{
  ptrdiff_t count = ASIZE (vector);
  Lisp_Object new_vector = make_nil_vector (count);
  Lisp_Object *p = aref_addr (vector, count);
  Lisp_Object *q = new_vector->contents;
  while (count--)
    {
      garbage_collect ();
      *q++ = *--p;
    }
}

(which is what many compilers would generate from more sensible code).
On the first iteration, p points to a totally different vector, or
some random other object, but it still needs to keep its vector alive.

So, at the very least, we need to always keep the immediately
preceding object alive if we go that way.

> > I'm pretty sure we figured out the crash that Eli observed. It's not
> > anything that involved, just a Lisp_Object being stored
> > non-consecutively and simultaneously being misaligned for the purposes
> > of maybe_lisp_pointer.
>
> Not sure what the point is here. None of this is "that involved". We can have
> pointers into Lisp objects, pointers that are not aligned for the purposes of
> maybe_lisp_pointer. Emacs should follow all of them, not just the one that Eli
> happened to observe.

Or pointers past them, and that's a significant overhead because it
usually means two objects are being kept alive by one reference.

> > I don't see how you plan to solve it without treating any pointer that
> > points even in the vicinity of a valid lisp object as keeping that
> > object alive.

> Yes, of course.

I didn't mean just "within the object", I did mean "in the vicinity".
With prefetch instructions, it's quite likely the compiler concludes
it's easiest to prefetch something 256 bytes ahead of where it
actually makes the access, then make the actual access relative to
that address...

> Any pointer that points somewhere within a Lisp object (in the C
> sense) should count as pointing to the object.

The C standard explicitly allows pointers (and that's C pointers) to
point one past the end of an allocated array, I believe.

> If memory serves, we already
> treat pointers that way in some places; unfortunately we're not doing it
> consistently.

Yes, we do.

> But I take your point; I'll post the change here before committing to master.

I'm sorry, I misunderstood. If you want to fix only pointers within
objects, that is quite a small change, but I believe it is incomplete.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-27 18:56                                               ` Pip Cet
@ 2020-05-28  1:21                                                 ` Paul Eggert
  2020-05-28  6:31                                                   ` Pip Cet
  0 siblings, 1 reply; 132+ messages in thread
From: Paul Eggert @ 2020-05-28  1:21 UTC (permalink / raw)
  To: Pip Cet; +Cc: 41321, Stefan Monnier

On 5/27/20 11:56 AM, Pip Cet wrote:

> So, at the very least, we need to always keep the immediately
> preceding object alive if we go that way.

Yes, I'm assuming that. I'll check that the code is doing that (if it isn't
doing it already).

> that's a significant overhead because it
> usually means two objects are being kept alive by one reference.

For Lisp_Objects with nonzero tags this shouldn't be an issue, since the tags
mean the pointers won't tie down two objects. For Lisp_Symbols (whose tags are
zero) it is an issue; also for untagged pointers to the start of objects.

I'll measure how much overhead is involved in my usual 'make compile-always'
benchmark. If it's not that much then we'll be OK. I'm hoping that's the case.
If not, there are some more measures we can take.

> With prefetch instructions, it's quite likely the compiler concludes
> it's easiest to prefetch something 256 bytes ahead of where it
> actually makes the access, then make the actual access relative to
> that address...

I wouldn't worry about that; it's so unlikely that it's not a practical concern.
"Some C optimizers may lose the last undisguised pointer to a memory object as a
consequence of clever optimizations. This has almost never been observed in
practice." <https://github.com/ivmai/bdwgc> As I understand it, the times "in
practice" that Hans-J. Boehm was talking about were for C code deliberately
designed to fool the compiler / GC combination.

I think it unlikely that a modern compiler would break all the code out there
that uses conservative GC.

(Besides, if that stuff really were of practical concern we'd have to give up on
conservative GC entirely. :-)





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-27 17:53                                             ` Paul Eggert
  2020-05-27 18:24                                               ` Eli Zaretskii
@ 2020-05-28  2:43                                               ` Stefan Monnier
  2020-05-28  7:27                                                 ` Eli Zaretskii
  1 sibling, 1 reply; 132+ messages in thread
From: Stefan Monnier @ 2020-05-28  2:43 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 41321, pipcet

> (obj)->contents[1], etc. each time through the loop, and transforms the call
> into foo (*R) as an optimization. When foo calls the garbage collector,
> maybe_lisp_pointer (R) can be false because R doesn't point directly at a Lisp
> object: it points somewhere into the middle of a Lisp object and R's value is
> not GC-aligned.

Indeed, basically `maybe_lisp_pointer` goes against the effort we've put
into replacing `live_string_p` with `live_string_holding` (i.e. to
recognize anything that points into any part of a Lisp_String so as to
prevent collecting it).


        Stefan






^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-28  1:21                                                 ` Paul Eggert
@ 2020-05-28  6:31                                                   ` Pip Cet
  2020-05-28  7:47                                                     ` Paul Eggert
  0 siblings, 1 reply; 132+ messages in thread
From: Pip Cet @ 2020-05-28  6:31 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 41321, Stefan Monnier

On Thu, May 28, 2020 at 1:21 AM Paul Eggert <eggert@cs.ucla.edu> wrote:
> On 5/27/20 11:56 AM, Pip Cet wrote:
>
> > So, at the very least, we need to always keep the immediately
> > preceding object alive if we go that way.
>
> Yes, I'm assuming that. I'll check that the code is doing that (if it isn't
> doing it already).

Okay, that makes sense.

> > that's a significant overhead because it
> > usually means two objects are being kept alive by one reference.
>
> For Lisp_Objects with nonzero tags this shouldn't be an issue, since the tags
> mean the pointers won't tie down two objects.

On USE_LSB_TAG systems, you're correct.

> I'll measure how much overhead is involved in my usual 'make compile-always'
> benchmark. If it's not that much then we'll be OK. I'm hoping that's the case.
> If not, there are some more measures we can take.

I suspect that garbage collection is only slowed down significantly
when there are large objects on the stack; that happens when GC
happens during redisplay, for example. (All the more reason to make
the struct it stack heap-allocated as I'd proposed).

> > With prefetch instructions, it's quite likely the compiler concludes
> > it's easiest to prefetch something 256 bytes ahead of where it
> > actually makes the access, then make the actual access relative to
> > that address...
>
> I wouldn't worry about that; it's so unlikely that it's not a practical concern.

Fingers crossed.

> "Some C optimizers may lose the last undisguised pointer to a memory object as a
> consequence of clever optimizations. This has almost never been observed in
> practice." <https://github.com/ivmai/bdwgc> As I understand it, the times "in
> practice" that Hans-J. Boehm was talking about were for C code deliberately
> designed to fool the compiler / GC combination.
>
> I think it unlikely that a modern compiler would break all the code out there
> that uses conservative GC.
>
> (Besides, if that stuff really were of practical concern we'd have to give up on
> conservative GC entirely. :-)

I hope you're right, in that compilers will support GC better before
they move on to clever optimizations that break it :-)

(I'm not sure what the current state is of "real" GC support in LLVM;
I'm pretty sure not much has happened in GCC.)





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-28  2:43                                               ` Stefan Monnier
@ 2020-05-28  7:27                                                 ` Eli Zaretskii
  2020-05-28  7:41                                                   ` Paul Eggert
  0 siblings, 1 reply; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-28  7:27 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: eggert, 41321, pipcet

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Eli Zaretskii <eliz@gnu.org>,  pipcet@gmail.com,  41321@debbugs.gnu.org
> Date: Wed, 27 May 2020 22:43:52 -0400
> 
> Indeed, basically `maybe_lisp_pointer` goes against the effort we've put
> into replacing `live_string_p` with `live_string_holding` (i.e. to
> recognize anything that points into any part of a Lisp_String so as to
> prevent collecting it).

You are suggesting that we go back to using live_string_p?





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-28  7:27                                                 ` Eli Zaretskii
@ 2020-05-28  7:41                                                   ` Paul Eggert
  2020-05-28 13:30                                                     ` Stefan Monnier
  0 siblings, 1 reply; 132+ messages in thread
From: Paul Eggert @ 2020-05-28  7:41 UTC (permalink / raw)
  To: Eli Zaretskii, Stefan Monnier; +Cc: 41321, pipcet

On 5/28/20 12:27 AM, Eli Zaretskii wrote:
>> From: Stefan Monnier <monnier@iro.umontreal.ca>
>> Date: Wed, 27 May 2020 22:43:52 -0400
>>
>> Indeed, basically `maybe_lisp_pointer` goes against the effort we've put
>> into replacing `live_string_p` with `live_string_holding` (i.e. to
>> recognize anything that points into any part of a Lisp_String so as to
>> prevent collecting it).
> 
> You are suggesting that we go back to using live_string_p?

I think he's saying just the opposite: namely, that maybe_lisp_pointer is a
mistake, in that it goes against the (solid) reasons we've replaced some calls
to live_string_p with calls to live_string_holding.

After looking into it I agree. I'll propose a patch shortly that does away with
maybe_lisp_pointer.






^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-28  6:31                                                   ` Pip Cet
@ 2020-05-28  7:47                                                     ` Paul Eggert
  2020-05-28  8:11                                                       ` Pip Cet
  0 siblings, 1 reply; 132+ messages in thread
From: Paul Eggert @ 2020-05-28  7:47 UTC (permalink / raw)
  To: Pip Cet; +Cc: 41321, Stefan Monnier

[-- Attachment #1: Type: text/plain, Size: 1317 bytes --]

On 5/27/20 11:31 PM, Pip Cet wrote:
> I hope you're right, in that compilers will support GC better before
> they move on to clever optimizations that break it :-)

After looking into it, I decided it wasn't worth the hassle of treating pointers
just past the end of a Lisp object as pointing into the object. Although such
pointers can exist, I can't think of a realistic-with-today's-compilers scenario
at the machine level where (1) a pointer like that will exist, (2) no pointers
into the middle or start of the object will exist, and (3) the object might be
accessed later. In contrast we have seen scenarios with pointers into the middle
of Lisp objects.

With that in mind, attached is a proposed patch to master that I hope deals with
some of the more-serious problems mentioned so far in this thread, in particular
the problem with Lisp_Object representations of symbols being split into two
registers in a --with-wide-int build. I haven't tested this as much as I'd like,
but I need to turn my attention to sleep and work and so this is a good place to
broadcast a checkpoint.

This patch doesn't address the LISP_ALIGNMENT issues you mentioned, both in
lisp.h and in the pdumper; I can work on that soon, I think.

PS. Thanks for helping bring this problem to our attention; it's been fun to
look into it.

[-- Attachment #2: 0001-Fix-crashes-due-to-misidentified-pointers.patch --]
[-- Type: text/x-patch, Size: 5798 bytes --]

From 023344217e05d2a23b5e8157da2f9aea16a5df78 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Thu, 28 May 2020 00:11:08 -0700
Subject: [PATCH] Fix crashes due to misidentified pointers
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Problem reported by Pip Cet (Bug#41321#74, Bug#41321#80)
A compiler can create temporaries that point somewhere into a Lisp
object but are not GCALIGNED, and these temporaries may be the
only thing that addresses the object.  So, if any value points
within an object, treat the object as being addressed.  However,
do not worry about pointers that point just past the end of an
object, as these do not seem to be a problem in practice and
attempting to worry about them would complicate and slow the code.
* src/alloc.c (live_float_p): Don’t insist that the offset
be aligned properly for a float, since it might be tagged
or offset.
(GC_OBJECT_ALIGNMENT_MINIMUM, maybe_lisp_pointer):
Remove.  All uses removed.
(mark_maybe_pointer): New arg SYMBOL_ONLY.  All callers changed.
Don’t insist on pointers being aligned.
Align pointers before doing pdumper checks on them, and
before giving them to make_lisp_ptr.
(mark_memory): Do not use mark_maybe_object here.
Instead, use mark_maybe_pointer alone; that suffices.
Also look for offsets from lispsym, to mark symbols more
reliably.
---
 src/alloc.c | 65 +++++++++++++++--------------------------------------
 1 file changed, 18 insertions(+), 47 deletions(-)

diff --git a/src/alloc.c b/src/alloc.c
index e241b9933a..b1d45dbb33 100644
--- a/src/alloc.c
+++ b/src/alloc.c
@@ -4560,7 +4560,6 @@ live_float_p (struct mem_node *m, void *p)
       /* P must point to the start of a Lisp_Float and not be
 	 one of the unused cells in the current float block.  */
       return (offset >= 0
-	      && offset % sizeof b->floats[0] == 0
 	      && offset < (FLOAT_BLOCK_SIZE * sizeof b->floats[0])
 	      && (b != float_block
 		  || offset / sizeof b->floats[0] < float_block_index));
@@ -4687,54 +4686,25 @@ mark_maybe_objects (Lisp_Object const *array, ptrdiff_t nelts)
     mark_maybe_object (*array);
 }
 
-/* A lower bound on the alignment of Lisp objects that need marking.
-   Although 1 is safe, higher values speed up mark_maybe_pointer.
-   If USE_LSB_TAG, this value is typically GCALIGNMENT; otherwise,
-   it's determined by the natural alignment of Lisp structs.
-   All vectorlike objects have alignment at least that of union
-   vectorlike_header and it's unlikely they all have alignment greater,
-   so use the union as a safe and likely-accurate standin for
-   vectorlike objects.  */
-
-enum { GC_OBJECT_ALIGNMENT_MINIMUM
-         = max (GCALIGNMENT,
-		min (alignof (union vectorlike_header),
-		     min (min (alignof (struct Lisp_Cons),
-			       alignof (struct Lisp_Float)),
-			  min (alignof (struct Lisp_String),
-			       alignof (struct Lisp_Symbol))))) };
-
-/* Return true if P might point to Lisp data that can be garbage
-   collected, and false otherwise (i.e., false if it is easy to see
-   that P cannot point to Lisp data that can be garbage collected).
-   Symbols are implemented via offsets not pointers, but the offsets
-   are also multiples of GC_OBJECT_ALIGNMENT_MINIMUM.  */
-
-static bool
-maybe_lisp_pointer (void *p)
-{
-  return (uintptr_t) p % GC_OBJECT_ALIGNMENT_MINIMUM == 0;
-}
-
 /* If P points to Lisp data, mark that as live if it isn't already
-   marked.  */
+   marked.  If SYMBOL_ONLY, mark it only if it is a symbol.  */
 
 static void
-mark_maybe_pointer (void *p)
+mark_maybe_pointer (void *p, bool symbol_only)
 {
+  char *cp = p;
   struct mem_node *m;
 
 #ifdef USE_VALGRIND
   VALGRIND_MAKE_MEM_DEFINED (&p, sizeof (p));
 #endif
 
-  if (!maybe_lisp_pointer (p))
-    return;
-
   if (pdumper_object_p (p))
     {
+      p = cp - (uintptr_t) cp % GCALIGNMENT;
       int type = pdumper_find_object_type (p);
-      if (pdumper_valid_object_type_p (type))
+      if (pdumper_valid_object_type_p (type)
+	  && (type == Lisp_Symbol || !symbol_only))
         mark_object (type == Lisp_Symbol
                      ? make_lisp_symbol (p)
                      : make_lisp_ptr (p, type));
@@ -4755,11 +4725,13 @@ mark_maybe_pointer (void *p)
 	  break;
 
 	case MEM_TYPE_CONS:
-	  obj = live_cons_holding (m, p);
+	  if (!symbol_only)
+	    obj = live_cons_holding (m, p);
 	  break;
 
 	case MEM_TYPE_STRING:
-	  obj = live_string_holding (m, p);
+	  if (!symbol_only)
+	    obj = live_string_holding (m, p);
 	  break;
 
 	case MEM_TYPE_SYMBOL:
@@ -4767,13 +4739,14 @@ mark_maybe_pointer (void *p)
 	  break;
 
 	case MEM_TYPE_FLOAT:
-	  if (live_float_p (m, p))
-	    obj = make_lisp_ptr (p, Lisp_Float);
+	  if (!symbol_only && live_float_p (m, p))
+	    obj = make_lisp_ptr (cp - (uintptr_t) cp % GCALIGNMENT, Lisp_Float);
 	  break;
 
 	case MEM_TYPE_VECTORLIKE:
 	case MEM_TYPE_VECTOR_BLOCK:
-	  obj = live_vector_holding (m, p);
+	  if (!symbol_only)
+	    obj = live_vector_holding (m, p);
 	  break;
 
 	default:
@@ -4830,12 +4803,10 @@ mark_memory (void const *start, void const *end)
 
   for (pp = start; (void const *) pp < end; pp += GC_POINTER_ALIGNMENT)
     {
-      mark_maybe_pointer (*(void *const *) pp);
-
-      verify (alignof (Lisp_Object) % GC_POINTER_ALIGNMENT == 0);
-      if (alignof (Lisp_Object) == GC_POINTER_ALIGNMENT
-	  || (uintptr_t) pp % alignof (Lisp_Object) == 0)
-	mark_maybe_object (*(Lisp_Object const *) pp);
+      char *p = *(char *const *) pp;
+      mark_maybe_pointer (p, false);
+      p += (intptr_t) lispsym;
+      mark_maybe_pointer (p, true);
     }
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-28  7:47                                                     ` Paul Eggert
@ 2020-05-28  8:11                                                       ` Pip Cet
  0 siblings, 0 replies; 132+ messages in thread
From: Pip Cet @ 2020-05-28  8:11 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 41321, Stefan Monnier

On Thu, May 28, 2020 at 7:47 AM Paul Eggert <eggert@cs.ucla.edu> wrote:
> On 5/27/20 11:31 PM, Pip Cet wrote:
> > I hope you're right, in that compilers will support GC better before
> > they move on to clever optimizations that break it :-)
>
> After looking into it, I decided it wasn't worth the hassle of treating pointers
> just past the end of a Lisp object as pointing into the object. Although such
> pointers can exist, I can't think of a realistic-with-today's-compilers scenario
> at the machine level where (1) a pointer like that will exist, (2) no pointers
> into the middle or start of the object will exist, and (3) the object might be
> accessed later. In contrast we have seen scenarios with pointers into the middle
> of Lisp objects.

Okay. I was about to write that I'd concluded the same thing, after
failing to come up with an example other than that hypothetical
Freverse implementation.

> With that in mind, attached is a proposed patch to master that I hope deals with
> some of the more-serious problems mentioned so far in this thread, in particular
> the problem with Lisp_Object representations of symbols being split into two
> registers in a --with-wide-int build. I haven't tested this as much as I'd like,
> but I need to turn my attention to sleep and work and so this is a good place to
> broadcast a checkpoint.

Thanks! Looks great generally, though I confess I haven't checked what
would happen in a (hypothetical?) !USE_LSB_TAG 64-bit case.

+      if (!symbol_only && live_float_p (m, p))
+        obj = make_lisp_ptr (cp - (uintptr_t) cp % GCALIGNMENT, Lisp_Float);
       break;

I'm not sure about this code, though, it assumes GCALIGNMENT == sizeof
Lisp_Float.

> PS. Thanks for helping bring this problem to our attention; it's been fun to
> look into it.

I agree. I'll certainly continue looking for bugs and working on
Emacs, but at this point I'm unsure it's worth it to actually share
such work with anyone. But that doesn't really belong here.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-28  7:41                                                   ` Paul Eggert
@ 2020-05-28 13:30                                                     ` Stefan Monnier
  2020-05-28 14:28                                                       ` Pip Cet
  0 siblings, 1 reply; 132+ messages in thread
From: Stefan Monnier @ 2020-05-28 13:30 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 41321, pipcet

>> You are suggesting that we go back to using live_string_p?
> I think he's saying just the opposite: namely, that maybe_lisp_pointer is a
> mistake, in that it goes against the (solid) reasons we've replaced some calls
> to live_string_p with calls to live_string_holding.
> After looking into it I agree. I'll propose a patch shortly that does away with
> maybe_lisp_pointer.

Exactly.  More specifically, `maybe_lisp_pointer` tries to filter out
false positives but does it based on the assumption that we should only
accept numbers that look like pointers to the beginning of
a Lisp_Object.

If we still want to try and filter out false positives we need to do it
more carefully by considering what is the smallest alignment possible
for a pointer to an internal field of a Lisp_Object.

And if this least alignment is not the same for all Lisp_Objects, then
this test should likely be moved to the respective `live_<foo>_holding`.

I suspect that for vectorlike objects, the least alignement is 1 because
of some `char` or `bool` fields in some of the pseudovectors.
Of course, we could do better by checking for "false positives" after
checking the specific kind of vectorlike object (so as to use
a different least-alignment-check for those objects that contains
`char`s than for those who only contain `int`s, for example).


        Stefan






^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-28 13:30                                                     ` Stefan Monnier
@ 2020-05-28 14:28                                                       ` Pip Cet
  2020-05-28 16:24                                                         ` Stefan Monnier
  2020-05-29  9:43                                                         ` Pip Cet
  0 siblings, 2 replies; 132+ messages in thread
From: Pip Cet @ 2020-05-28 14:28 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Paul Eggert, 41321

On Thu, May 28, 2020 at 1:30 PM Stefan Monnier <monnier@iro.umontreal.ca> wrote:
> >> You are suggesting that we go back to using live_string_p?
> > I think he's saying just the opposite: namely, that maybe_lisp_pointer is a
> > mistake, in that it goes against the (solid) reasons we've replaced some calls
> > to live_string_p with calls to live_string_holding.
> > After looking into it I agree. I'll propose a patch shortly that does away with
> > maybe_lisp_pointer.
>
> Exactly.  More specifically, `maybe_lisp_pointer` tries to filter out
> false positives but does it based on the assumption that we should only
> accept numbers that look like pointers to the beginning of
> a Lisp_Object.
>
> If we still want to try and filter out false positives we need to do it
> more carefully by considering what is the smallest alignment possible
> for a pointer to an internal field of a Lisp_Object.
>
> And if this least alignment is not the same for all Lisp_Objects, then
> this test should likely be moved to the respective `live_<foo>_holding`.

But at that point, we already have walked the rbtree, which is
probably the main performance problem.

My suggestion is instead to put MEM_TYPE_SYMBOL blocks into the rbtree
twice, once at their proper address and once at the lispsym-based
offset.

We could then look up each pointer precisely once, though sometimes
the blocks might overlap and we'd end up marking two objects for one
pointer.

But that would lead to overlapping rbtree entries, and that requires
some extra code which wouldn't be exercised very often... still, I
think it might be worth doing, particularly since there are relatively
few symbol blocks on most systems.

> I suspect that for vectorlike objects, the least alignement is 1 because
> of some `char` or `bool` fields in some of the pseudovectors.
> Of course, we could do better by checking for "false positives" after
> checking the specific kind of vectorlike object (so as to use
> a different least-alignment-check for those objects that contains
> `char`s than for those who only contain `int`s, for example).

I think the point of maybe_lisp_pointer wasn't to mark fewer objects,
it was to look up fewer pointers in the rbtree. I might be wrong.

On 64-bit systems with ASLR, at least, it's quite unlikely that we
have what looks like a valid pointer into a Lisp object that we can
conclude is not based on its offset or alignment...





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-28 14:28                                                       ` Pip Cet
@ 2020-05-28 16:24                                                         ` Stefan Monnier
  2020-05-29  9:43                                                         ` Pip Cet
  1 sibling, 0 replies; 132+ messages in thread
From: Stefan Monnier @ 2020-05-28 16:24 UTC (permalink / raw)
  To: Pip Cet; +Cc: Paul Eggert, 41321

> But at that point, we already have walked the rbtree, which is
> probably the main performance problem.

Indeed, lisp_maybe_pointer can avoid this cost, but I was more concerned
with the risk of increasing the number of objects kept live because of
false-positives (i.e. a random integer/float/younameit that happens to
look like it's pointing into the object).

> I think the point of maybe_lisp_pointer wasn't to mark fewer objects,
> it was to look up fewer pointers in the rbtree.  I might be wrong.

You might right.


        Stefan






^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-27 16:58                                         ` Paul Eggert
  2020-05-27 17:33                                           ` Eli Zaretskii
  2020-05-27 17:57                                           ` Pip Cet
@ 2020-05-28 18:27                                           ` Eli Zaretskii
  2020-05-28 19:33                                             ` Paul Eggert
  2 siblings, 1 reply; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-28 18:27 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 41321, monnier, pipcet

> Cc: pipcet@gmail.com, 41321@debbugs.gnu.org, monnier@iro.umontreal.ca
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Wed, 27 May 2020 09:58:11 -0700
> 
> we can then talk about what to do with emacs-27.

After thinking about this some, I think the only sensible thing to do
on emacs-27 is to return to 8-byte alignment test in GC for 32-bit
MinGW builds.  That is, replace max_align_t with just 8 in the
definition of LISP_ALIGNMENT in that case.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-28 18:27                                           ` Eli Zaretskii
@ 2020-05-28 19:33                                             ` Paul Eggert
  2020-05-29  6:19                                               ` Eli Zaretskii
  2020-05-29  8:25                                               ` Pip Cet
  0 siblings, 2 replies; 132+ messages in thread
From: Paul Eggert @ 2020-05-28 19:33 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 41321, monnier, pipcet

[-- Attachment #1: Type: text/plain, Size: 856 bytes --]

On 5/28/20 11:27 AM, Eli Zaretskii wrote:
>> Cc: pipcet@gmail.com, 41321@debbugs.gnu.org, monnier@iro.umontreal.ca
>> From: Paul Eggert <eggert@cs.ucla.edu>
>> Date: Wed, 27 May 2020 09:58:11 -0700
>>
>> we can then talk about what to do with emacs-27.
> 
> After thinking about this some, I think the only sensible thing to do
> on emacs-27 is to return to 8-byte alignment test in GC for 32-bit
> MinGW builds.  That is, replace max_align_t with just 8 in the
> definition of LISP_ALIGNMENT in that case.

Exactly the same problem can occur for other x86 platforms (e.g., GNU/Linux, GCC
7-and-later, glibc 2.25-and-earlier), because these other platforms also have
the bug that malloc can return a pointer that is 8 modulo 16 even though alignof
(max_align_t) is 16.  so I suggest doing the replacement for those platforms
too, as in the attached patch.

[-- Attachment #2: 0001-Fix-aborts-due-to-GC-losing-pseudovectors.patch --]
[-- Type: text/x-patch, Size: 1191 bytes --]

From b3501c978f315d980f7a26481989725d63953558 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Thu, 28 May 2020 12:27:27 -0700
Subject: [PATCH] Fix aborts due to GC losing pseudovectors

Problem reported by Eli Zaretskii (Bug#41321).
* src/alloc.c (maybe_lisp_pointer): Modulo GCALIGNMENT,
not modulo LISP_ALIGNMENT.  Master has a more-elaborate fix.
Do not merge to master.
---
 src/alloc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/alloc.c b/src/alloc.c
index 1c6b664b22..c7a4a3ee86 100644
--- a/src/alloc.c
+++ b/src/alloc.c
@@ -4589,12 +4589,12 @@ mark_maybe_objects (Lisp_Object const *array, ptrdiff_t nelts)
    collected, and false otherwise (i.e., false if it is easy to see
    that P cannot point to Lisp data that can be garbage collected).
    Symbols are implemented via offsets not pointers, but the offsets
-   are also multiples of LISP_ALIGNMENT.  */
+   are also multiples of GCALIGNMENT.  */
 
 static bool
 maybe_lisp_pointer (void *p)
 {
-  return (uintptr_t) p % LISP_ALIGNMENT == 0;
+  return (uintptr_t) p % GCALIGNMENT == 0;
 }
 
 /* If P points to Lisp data, mark that as live if it isn't already
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-28 19:33                                             ` Paul Eggert
@ 2020-05-29  6:19                                               ` Eli Zaretskii
  2020-05-29 20:24                                                 ` Paul Eggert
  2020-05-29  8:25                                               ` Pip Cet
  1 sibling, 1 reply; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-29  6:19 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 41321, monnier, pipcet

> Cc: pipcet@gmail.com, 41321@debbugs.gnu.org, monnier@iro.umontreal.ca
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Thu, 28 May 2020 12:33:10 -0700
> 
> > After thinking about this some, I think the only sensible thing to do
> > on emacs-27 is to return to 8-byte alignment test in GC for 32-bit
> > MinGW builds.  That is, replace max_align_t with just 8 in the
> > definition of LISP_ALIGNMENT in that case.
> 
> Exactly the same problem can occur for other x86 platforms (e.g., GNU/Linux, GCC
> 7-and-later, glibc 2.25-and-earlier), because these other platforms also have
> the bug that malloc can return a pointer that is 8 modulo 16 even though alignof
> (max_align_t) is 16.  so I suggest doing the replacement for those platforms
> too, as in the attached patch.

I'm okay with doing this on other platforms, but...

>  static bool
>  maybe_lisp_pointer (void *p)
>  {
> -  return (uintptr_t) p % LISP_ALIGNMENT == 0;
> +  return (uintptr_t) p % GCALIGNMENT == 0;
>  }

...replacing LISP_ALIGNMENT with GCALIGNMENT just here doesn't sound
right to me: by keeping the current value of LISP_ALIGNMENT, we
basically declare that Lisp objects shall be aligned on that boundary,
whereas that isn't really the case.  Why not change the value of
LISP_ALIGNMENT instead?





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-28 19:33                                             ` Paul Eggert
  2020-05-29  6:19                                               ` Eli Zaretskii
@ 2020-05-29  8:25                                               ` Pip Cet
  1 sibling, 0 replies; 132+ messages in thread
From: Pip Cet @ 2020-05-29  8:25 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 41321, Stefan Monnier

On Thu, May 28, 2020 at 7:33 PM Paul Eggert <eggert@cs.ucla.edu> wrote:
> too, as in the attached patch.

Are you sure you attached the correct file? This patch is identical to
one you'd sent earlier, and which Eli criticized for being overly
conservative on GCALIGNMENT==1 systems.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-28 14:28                                                       ` Pip Cet
  2020-05-28 16:24                                                         ` Stefan Monnier
@ 2020-05-29  9:43                                                         ` Pip Cet
  2020-05-29 18:31                                                           ` Paul Eggert
  1 sibling, 1 reply; 132+ messages in thread
From: Pip Cet @ 2020-05-29  9:43 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Paul Eggert, 41321

[-- Attachment #1: Type: text/plain, Size: 2013 bytes --]

On Thu, May 28, 2020 at 2:28 PM Pip Cet <pipcet@gmail.com> wrote:
> My suggestion is instead to put MEM_TYPE_SYMBOL blocks into the rbtree
> twice, once at their proper address and once at the lispsym-based
> offset.
>
> We could then look up each pointer precisely once, though sometimes
> the blocks might overlap and we'd end up marking two objects for one
> pointer.
>
> But that would lead to overlapping rbtree entries, and that requires
> some extra code which wouldn't be exercised very often... still, I
> think it might be worth doing, particularly since there are relatively
> few symbol blocks on most systems.

Okay, here's some initial code that does that. It's a little tricky,
because real addresses and symbol offsets can overlap arbitrarily and
become mapped and unmapped in any order. The basic idea is that symbol
offsets are marked two ways:
1. an overlaps_with_symbols flag on a "normal" memory node
2. a mem node type of MEM_TYPE_SYMBOL_ADJUSTED

(2) implies (1), but not the other way around. There's only one flag
per normal memory node, which is true if any of the addresses in the
node are also valid symbol offsets. MEM_TYPE_SYMBOL_ADJUSTED nodes
have start and end addresses that do not necessarily correspond to
symbol blocks or even symbols; their length is arbitrary.

When we insert or delete memory nodes, we perform the obvious
operations to keep MEM_TYPE_SYMBOL_ADJUSTED blocks accurate: i.e.,
when a MEM_TYPE_SYMBOL_ADJUSTED node is split by an
intervening/overlapping normal node, we insert one or two new
MEM_TYPE_SYMBOL_ADJUSTED nodes to cover the remaining offsets, and set
the overlaps_with_symbols flag on the normal node, to cover those,
etc.

As I said, the code is tricky (i.e. might contain bugs that can only
be discovered through extensive testing on 32-bit systems), and it
complicates what should be generic functions for the rbtree
implementation, so this is probably a 32-bit optimization that is too
late because 32-bit systems are no longer that relevant...

[-- Attachment #2: 0001-snapshot.patch --]
[-- Type: text/x-patch, Size: 8603 bytes --]

From 246493425f01fc6876ed2222fd4c1806dc0e12f1 Mon Sep 17 00:00:00 2001
From: Pip Cet <pipcet@gmail.com>
Date: Fri, 29 May 2020 09:40:36 +0000
Subject: [PATCH] snapshot

---
 src/alloc.c | 200 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 199 insertions(+), 1 deletion(-)

diff --git a/src/alloc.c b/src/alloc.c
index e241b9933a..65cbacbe87 100644
--- a/src/alloc.c
+++ b/src/alloc.c
@@ -475,6 +475,7 @@ no_sanitize_memcpy (void *dest, void const *src, size_t size)
   MEM_TYPE_CONS,
   MEM_TYPE_STRING,
   MEM_TYPE_SYMBOL,
+  MEM_TYPE_SYMBOL_ADJUSTED,
   MEM_TYPE_FLOAT,
   /* Since all non-bool pseudovectors are small enough to be
      allocated from vector blocks, this memory type denotes
@@ -534,6 +535,12 @@ deadp (Lisp_Object x)
   /* Start and end of allocated region.  */
   void *start, *end;
 
+  /* Whether any symbol blocks are known to exist whose adjusted
+     offsets fall in this region.  If only symbol offsets in this
+     region are valid, type == MEM_TYPE_SYMBOL_ADJUSTED, but this
+     flag will also be true.  */
+  bool overlaps_with_symbols;
+
   /* Node color.  */
   enum {MEM_BLACK, MEM_RED} color;
 
@@ -981,6 +988,17 @@ record_xmalloc (size_t size)
   return p;
 }
 
+static void *
+adjust_symbol (void *ptr)
+{
+  return (void *)((uintptr_t) ptr - (uintptr_t) &lispsym);
+}
+
+static void *
+unadjust_symbol (void *ptr)
+{
+  return (void *)((uintptr_t) ptr + (uintptr_t) &lispsym);
+}
 
 /* Like malloc but used for allocating Lisp data.  NBYTES is the
    number of bytes to allocate, TYPE describes the intended use of the
@@ -1023,6 +1041,9 @@ lisp_malloc (size_t nbytes, bool clearit, enum mem_type type)
 #ifndef GC_MALLOC_CHECK
   if (val && type != MEM_TYPE_NON_LISP)
     mem_insert (val, (char *) val + nbytes, type);
+  if (val && type == MEM_TYPE_SYMBOL)
+    mem_insert (adjust_symbol (val), (char *) adjust_symbol (val) + nbytes,
+		MEM_TYPE_SYMBOL_ADJUSTED)->overlaps_with_symbols = true;
 #endif
 
   MALLOC_UNBLOCK_INPUT;
@@ -1259,6 +1280,9 @@ lisp_align_malloc (size_t nbytes, enum mem_type type)
 #ifndef GC_MALLOC_CHECK
   if (type != MEM_TYPE_NON_LISP)
     mem_insert (val, (char *) val + nbytes, type);
+  if (val && type == MEM_TYPE_SYMBOL)
+    mem_insert (adjust_symbol (val), (char *) adjust_symbol (val) + nbytes,
+		MEM_TYPE_SYMBOL_ADJUSTED)->overlaps_with_symbols = true;
 #endif
 
   MALLOC_UNBLOCK_INPUT;
@@ -4073,6 +4097,36 @@ mem_init (void)
   mem_root = MEM_NIL;
 }
 
+/* Value is a pointer to the first mem node not to start before START.
+   Value is MEM_NIL if there is no such node.  */
+
+static struct mem_node *
+mem_find_next (void *start)
+{
+  struct mem_node *p, *parent;
+
+  p = mem_root;
+  parent = p;
+  while (p != MEM_NIL)
+    {
+      if (start >= p->end)
+	{
+	  p = p->right;
+	}
+      else if (start <= p->start)
+	{
+	  parent = p;
+	  p = p->left;
+	}
+      else
+	return p;
+    }
+
+  if (start <= parent->start)
+    return parent;
+
+  return MEM_NIL;
+}
 
 /* Value is a pointer to the mem_node containing START.  Value is
    MEM_NIL if there is no node in the tree containing START.  */
@@ -4119,9 +4173,42 @@ mem_insert (void *start, void *end, enum mem_type type)
   while (c != MEM_NIL)
     {
       parent = c;
-      c = start < c->start ? c->left : c->right;
+      if (start < c->end && c->start < end)
+	break;
+      if (start < c->start)
+	c = c->left;
+      else if (end >= c->end)
+	c = c->right;
+      else
+	break;
     }
 
+  if (parent && parent->end > start && parent->start < end)
+    {
+      void *old_start = parent->start;
+      void *old_end = parent->end;
+      enum mem_type old_type = parent->type;
+      if (type == MEM_TYPE_SYMBOL_ADJUSTED
+	  && old_type != MEM_TYPE_SYMBOL_ADJUSTED)
+	{
+	  if (start < old_start)
+	    mem_insert (start, old_start, type)->overlaps_with_symbols = true;
+	  if (old_end < end)
+	    mem_insert (old_end, end, type)->overlaps_with_symbols = true;
+	}
+      else
+	{
+	  eassert (parent->type == MEM_TYPE_SYMBOL_ADJUSTED);
+	  mem_delete (parent);
+	  parent = mem_insert (start, end, type);
+	  if (old_start < start)
+	    mem_insert (old_start, start, old_type)->overlaps_with_symbols = true;
+	  if (old_end > end)
+	    mem_insert (end, old_end, old_type)->overlaps_with_symbols = true;
+	}
+      parent->overlaps_with_symbols = true;
+      return parent;
+    }
   /* Create a new node.  */
 #ifdef GC_MALLOC_CHECK
   x = malloc (sizeof *x);
@@ -4136,6 +4223,7 @@ mem_insert (void *start, void *end, enum mem_type type)
   x->parent = parent;
   x->left = x->right = MEM_NIL;
   x->color = MEM_RED;
+  x->overlaps_with_symbols = false;
 
   /* Insert it as child of PARENT or install it as root.  */
   if (parent)
@@ -4301,12 +4389,92 @@ mem_rotate_right (struct mem_node *x)
     x->parent = y;
 }
 
+/* Set the overlaps_with_symbols flag based on MEM_TYPE_SYMBOL
+   blocks.  */
+
+static void
+mem_set_overlaps_with_symbols (struct mem_node *x)
+{
+  x->overlaps_with_symbols = false;
+
+  for (void *p = unadjust_symbol (x->start);
+       p < unadjust_symbol (x->end);)
+    {
+      struct mem_node *y = mem_find_next (p);
+      p = y->end;
+      if (y->start >= x->end)
+	break;
+      if (y->type == MEM_TYPE_SYMBOL)
+	{
+	  x->overlaps_with_symbols = true;
+	  return;
+	}
+    }
+}
 
 /* Delete node Z from the tree.  If Z is null or MEM_NIL, do nothing.  */
 
 static void
 mem_delete (struct mem_node *z)
 {
+  if (z->overlaps_with_symbols)
+    {
+      void *z_start = z->start;
+      void *z_end = z->end;
+      z->overlaps_with_symbols = false;
+      mem_delete (z);
+      /* Find all the symbol blocks that intersected with z, and add
+	 them to the rbtree.  */
+      for (void *unadjusted_start = unadjust_symbol (z_start);
+	   unadjusted_start < unadjust_symbol (z_end);)
+	{
+	  struct mem_node *x = mem_find_next (unadjusted_start);
+	  unadjusted_start = x->end;
+
+	  if (x == MEM_NIL)
+	    break;
+
+	  if (x->start > unadjust_symbol (z_end))
+	    break;
+
+	  if (x->type == MEM_TYPE_SYMBOL)
+	    {
+	      mem_insert (max (z_start, adjust_symbol (x->start)),
+			  min (z_end, adjust_symbol (x->end)),
+			  MEM_TYPE_SYMBOL_ADJUSTED)
+		->overlaps_with_symbols = true;
+	    }
+	}
+      return;
+    }
+  if (z->type == MEM_TYPE_SYMBOL)
+    {
+      for (void *adjusted_start = adjust_symbol (z->start);
+	   adjusted_start < adjust_symbol (z->end);)
+	{
+	  struct mem_node *x = mem_find_next (adjusted_start);
+	  adjusted_start = x->end;
+	  if (x->type == MEM_TYPE_SYMBOL_ADJUSTED)
+	    {
+	      void *x_start = x->start;
+	      void *x_end = x->end;
+	      mem_delete (x);
+	      if (x_start < adjust_symbol (z->start))
+		mem_insert (x_start, adjust_symbol (z->start),
+			    MEM_TYPE_SYMBOL_ADJUSTED)->overlaps_with_symbols
+		  = true;
+	      if (x_end > adjust_symbol (z->end))
+		mem_insert (adjust_symbol (z->end), x_end,
+			    MEM_TYPE_SYMBOL_ADJUSTED)->overlaps_with_symbols
+		  = true;
+	    }
+	  else
+	    {
+	      eassert (x->overlaps_with_symbols);
+	      mem_set_overlaps_with_symbols (x);
+	    }
+	}
+    }
   struct mem_node *x, *y;
 
   if (!z || z == MEM_NIL)
@@ -4342,6 +4510,7 @@ mem_delete (struct mem_node *z)
       z->start = y->start;
       z->end = y->end;
       z->type = y->type;
+      z->overlaps_with_symbols = y->overlaps_with_symbols;
     }
 
   if (y->color == MEM_BLACK)
@@ -4766,6 +4935,10 @@ mark_maybe_pointer (void *p)
 	  obj = live_symbol_holding (m, p);
 	  break;
 
+	case MEM_TYPE_SYMBOL_ADJUSTED:
+	  /* handled below */
+	  break;
+
 	case MEM_TYPE_FLOAT:
 	  if (live_float_p (m, p))
 	    obj = make_lisp_ptr (p, Lisp_Float);
@@ -4782,6 +4955,18 @@ mark_maybe_pointer (void *p)
 
       if (!NILP (obj))
 	mark_object (obj);
+
+      if (m->overlaps_with_symbols)
+	{
+	  obj = Qnil;
+	  p = unadjust_symbol (p);
+	  m = mem_find (p);
+	  if (m != MEM_NIL
+	      && m->type == MEM_TYPE_SYMBOL)
+	    obj = live_symbol_holding (m, unadjust_symbol (p));
+	  if (!NILP (obj))
+	    mark_object (obj);
+	}
     }
 }
 
@@ -7077,6 +7262,19 @@ sweep_symbols (void)
           /* Unhook from the free list.  */
           symbol_free_list = sblk->symbols[0].u.s.next;
           lisp_free (sblk);
+	  void *p = adjust_symbol (sblk);
+	  while (true)
+	    {
+	      struct mem_node *m = mem_find_next (p);
+	      if (m->start >=
+		  adjust_symbol (sblk + 1))
+		break;
+	      p = m->end;
+	      mem_set_overlaps_with_symbols (m);
+	      if (m->type == MEM_TYPE_SYMBOL_ADJUSTED
+		  && !m->overlaps_with_symbols)
+		mem_delete (m);
+	    }
         }
       else
         {
-- 
2.27.0.rc0


^ permalink raw reply related	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-22 11:47         ` Pip Cet
  2020-05-22 12:13           ` Eli Zaretskii
  2020-05-22 12:32           ` Eli Zaretskii
@ 2020-05-29  9:51           ` Eli Zaretskii
  2020-05-29 10:00             ` Pip Cet
  2 siblings, 1 reply; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-29  9:51 UTC (permalink / raw)
  To: Pip Cet; +Cc: 41321, monnier

> From: Pip Cet <pipcet@gmail.com>
> Date: Fri, 22 May 2020 11:47:03 +0000
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>, 41321@debbugs.gnu.org
> 
> If you could disassemble signal_before_change, we'd know whether
> start_marker and end_marker live in callee-saved registers, and thus
> whether this is likely to be Andrea's bug.

signal_before_change cannot be disassembled because it's inlined.
Diassemblying its caller, prepare_to_modify_buffer_1, seems to
indicate that start_marker and end_marker are pushed onto the stack
when they are returned by copy-marker, and taken from there when we
later call marker-position (which segfaults):

2163          PRESERVE_START_END;
   0x010ed99e <+834>:   mov    0x58(%esp),%eax
   0x010ed9a2 <+838>:   or     0x4c(%esp),%eax
   0x010ed9a6 <+842>:   je     0x10edd77 <prepare_to_modify_buffer_1+1819>
   0x010ed9ac <+848>:   mov    0x44(%esp),%ecx
   0x010ed9b0 <+852>:   or     0x38(%esp),%ecx
   0x010ed9b4 <+856>:   je     0x10edf90 <prepare_to_modify_buffer_1+2356>
   0x010edd77 <+1819>:  movl   $0x0,0x8(%esp)
   0x010edd7f <+1827>:  movl   $0x0,0xc(%esp)
   0x010edd87 <+1835>:  mov    0x50(%esp),%eax
   0x010edd8b <+1839>:  mov    0x54(%esp),%edx
   0x010edd8f <+1843>:  mov    %eax,(%esp)
   0x010edd92 <+1846>:  mov    %edx,0x4(%esp)
   0x010edd96 <+1850>:  call   0x10f15a5 <Fcopy_marker>
   0x010edd9b <+1855>:  mov    %eax,0x4c(%esp)   <<<<<<<<<<<<<<<<<<<<<
   0x010edd9f <+1859>:  mov    %edx,0x58(%esp)   <<<<<<<<<<<<<<<<<<<<<
   0x010edda3 <+1863>:  mov    0x44(%esp),%ecx
   0x010edda7 <+1867>:  or     0x38(%esp),%ecx
   0x010eddab <+1871>:  jne    0x10ede59 <prepare_to_modify_buffer_1+2045>
   0x010eddb1 <+1877>:  movl   $0x0,0x8(%esp)
   0x010eddb9 <+1885>:  movl   $0x0,0xc(%esp)
   0x010eddc1 <+1893>:  mov    %esi,(%esp)
   0x010eddc4 <+1896>:  mov    %edi,0x4(%esp)
   0x010eddc8 <+1900>:  call   0x10f15a5 <Fcopy_marker>
   0x010eddcd <+1905>:  mov    %eax,0x38(%esp)   <<<<<<<<<<<<<<<<<<<<
   0x010eddd1 <+1909>:  mov    %edx,0x44(%esp)   <<<<<<<<<<<<<<<<<<<<
   0x010edf90 <+2356>:  movl   $0x0,0x8(%esp)
   0x010edf98 <+2364>:  movl   $0x0,0xc(%esp)
   0x010edfa0 <+2372>:  mov    %esi,(%esp)
   0x010edfa3 <+2375>:  mov    %edi,0x4(%esp)
   0x010edfa7 <+2379>:  call   0x10f15a5 <Fcopy_marker>
   0x010edfac <+2384>:  mov    %eax,0x38(%esp)
   0x010edfb0 <+2388>:  mov    %edx,0x44(%esp)
   [...]
2179          report_overlay_modification (FETCH_START, FETCH_END, 0,
   0x010eda5f <+1027>:  mov    0x44(%esp),%eax
   0x010eda63 <+1031>:  or     0x38(%esp),%eax
   0x010eda67 <+1035>:  jne    0x10edd20 <prepare_to_modify_buffer_1+1732>
   0x010eda6d <+1041>:  mov    0x58(%esp),%ecx
   0x010eda71 <+1045>:  or     0x4c(%esp),%ecx
   0x010eda75 <+1049>:  jne    0x10edf1e <prepare_to_modify_buffer_1+2242>
   0x010eda7b <+1055>:  mov    %esi,0x68(%esp)
   0x010eda7f <+1059>:  mov    %edi,0x6c(%esp)
   0x010eda83 <+1063>:  mov    0x50(%esp),%eax
   0x010eda87 <+1067>:  mov    0x54(%esp),%edx
   0x010eda8b <+1071>:  mov    %eax,0x60(%esp)
   0x010eda8f <+1075>:  mov    %edx,0x64(%esp)
   0x010eda93 <+1079>:  movl   $0x0,0x24(%esp)
   0x010eda9b <+1087>:  movl   $0x0,0x28(%esp)
   0x010edaa3 <+1095>:  mov    0x68(%esp),%eax
   0x010edaa7 <+1099>:  mov    0x6c(%esp),%edx
   0x010edaab <+1103>:  mov    %eax,0x1c(%esp)
   0x010edaaf <+1107>:  mov    %edx,0x20(%esp)
   0x010edab3 <+1111>:  mov    0x60(%esp),%eax
   0x010edab7 <+1115>:  mov    0x64(%esp),%edx
   0x010edabb <+1119>:  mov    %eax,0x14(%esp)
   0x010edabf <+1123>:  mov    %edx,0x18(%esp)
   0x010edac3 <+1127>:  movl   $0x0,0x10(%esp)
   0x010edacb <+1135>:  mov    %esi,0x8(%esp)
   0x010edacf <+1139>:  mov    %edi,0xc(%esp)
   0x010edad3 <+1143>:  mov    0x50(%esp),%eax
   0x010edad7 <+1147>:  mov    0x54(%esp),%edx
   0x010edadb <+1151>:  mov    %eax,(%esp)
   0x010edade <+1154>:  mov    %edx,0x4(%esp)
   0x010edae2 <+1158>:  call   0x10e76ea <report_overlay_modification>
   0x010edd20 <+1732>:  mov    0x38(%esp),%eax   <<<<<<<<<<<<<<<<<<<<<<
   0x010edd24 <+1736>:  mov    %eax,(%esp)
   0x010edd27 <+1739>:  mov    0x44(%esp),%eax   <<<<<<<<<<<<<<<<<<<<<<
   0x010edd2b <+1743>:  mov    %eax,0x4(%esp)
   0x010edd2f <+1747>:  call   0x10f072a <Fmarker_position>
   0x010edd34 <+1752>:  mov    %eax,0x68(%esp)
   0x010edd38 <+1756>:  mov    %edx,0x6c(%esp)
   0x010edd3c <+1760>:  mov    0x58(%esp),%eax 
   0x010edd40 <+1764>:  or     0x4c(%esp),%eax
   0x010edd44 <+1768>:  jne    0x10edeba <prepare_to_modify_buffer_1+2142>
   0x010edd4a <+1774>:  mov    0x38(%esp),%eax   <<<<<<<<<<<<<<<<<<<<<
   0x010edd4e <+1778>:  mov    %eax,(%esp)
   0x010edd51 <+1781>:  mov    0x44(%esp),%eax   <<<<<<<<<<<<<<<<<<<<<
   0x010edd55 <+1785>:  mov    %eax,0x4(%esp)
   0x010edd59 <+1789>:  call   0x10f072a <Fmarker_position>
   0x010edd5e <+1794>:  mov    %eax,%esi
   0x010edd60 <+1796>:  mov    %edx,%edi
   0x010edd62 <+1798>:  mov    0x50(%esp),%eax
   0x010edd66 <+1802>:  mov    0x54(%esp),%edx
   0x010edd6a <+1806>:  mov    %eax,0x60(%esp)
   0x010edd6e <+1810>:  mov    %edx,0x64(%esp)
   0x010edd72 <+1814>:  jmp    0x10eda93 <prepare_to_modify_buffer_1+1079>
   0x010edeba <+2142>:  mov    0x4c(%esp),%eax   <<<<<<<<<<<<<<<<<<<<<<
   0x010edebe <+2146>:  mov    %eax,(%esp)
   0x010edec1 <+2149>:  mov    0x58(%esp),%eax   <<<<<<<<<<<<<<<<<<<<<<
   0x010edec5 <+2153>:  mov    %eax,0x4(%esp)
   0x010edec9 <+2157>:  call   0x10f072a <Fmarker_position>
   0x010edece <+2162>:  mov    %eax,0x60(%esp)
   0x010eded2 <+2166>:  mov    %edx,0x64(%esp)
   0x010eded6 <+2170>:  mov    0x38(%esp),%eax   <<<<<<<<<<<<<<<<<<<<<<
   0x010ededa <+2174>:  mov    %eax,(%esp)
   0x010ededd <+2177>:  mov    0x44(%esp),%eax   <<<<<<<<<<<<<<<<<<<<<<
   0x010edee1 <+2181>:  mov    %eax,0x4(%esp)
   0x010edee5 <+2185>:  call   0x10f072a <Fmarker_position>
   0x010edeea <+2190>:  mov    %eax,%esi
   0x010edeec <+2192>:  mov    %edx,%edi
   0x010edeee <+2194>:  mov    0x4c(%esp),%eax   <<<<<<<<<<<<<<<<<<<<<<
   0x010edef2 <+2198>:  mov    %eax,(%esp)
   0x010edef5 <+2201>:  mov    0x58(%esp),%eax   <<<<<<<<<<<<<<<<<<<<<<
   0x010edef9 <+2205>:  mov    %eax,0x4(%esp)
   0x010edefd <+2209>:  call   0x10f072a <Fmarker_position>
   0x010edf02 <+2214>:  mov    %eax,0x50(%esp)
   0x010edf06 <+2218>:  mov    %edx,0x54(%esp)
   0x010edf0a <+2222>:  jmp    0x10eda93 <prepare_to_modify_buffer_1+1079>
   0x010edf1e <+2242>:  mov    0x4c(%esp),%eax   <<<<<<<<<<<<<<<<<<<<<<
   0x010edf22 <+2246>:  mov    %eax,(%esp)
   0x010edf25 <+2249>:  mov    0x58(%esp),%eax   <<<<<<<<<<<<<<<<<<<<<<
   0x010edf29 <+2253>:  mov    %eax,0x4(%esp)
   0x010edf2d <+2257>:  call   0x10f072a <Fmarker_position>





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-29  9:51           ` Eli Zaretskii
@ 2020-05-29 10:00             ` Pip Cet
  0 siblings, 0 replies; 132+ messages in thread
From: Pip Cet @ 2020-05-29 10:00 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 41321, Stefan Monnier

On Fri, May 29, 2020 at 9:51 AM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Pip Cet <pipcet@gmail.com>
> > Date: Fri, 22 May 2020 11:47:03 +0000
> > Cc: Stefan Monnier <monnier@iro.umontreal.ca>, 41321@debbugs.gnu.org
> >
> > If you could disassemble signal_before_change, we'd know whether
> > start_marker and end_marker live in callee-saved registers, and thus
> > whether this is likely to be Andrea's bug.
>
> signal_before_change cannot be disassembled because it's inlined.

Sorry. On my system, gdb does the right thing if I enter "disassemble
signal_before_change".

> Diassemblying its caller, prepare_to_modify_buffer_1, seems to
> indicate that start_marker and end_marker are pushed onto the stack
> when they are returned by copy-marker, and taken from there when we
> later call marker-position (which segfaults):

That's my reading as well.

>    0x010edd96 <+1850>:  call   0x10f15a5 <Fcopy_marker>
>    0x010edd9b <+1855>:  mov    %eax,0x4c(%esp)   <<<<<<<<<<<<<<<<<<<<<
>    0x010edd9f <+1859>:  mov    %edx,0x58(%esp)   <<<<<<<<<<<<<<<<<<<<<

As you can see, the stack positions aren't consecutive: the
Lisp_Object is split between bytes 0x58..5b(%esp) and bytes
0x4c..0x4f(%esp).

>    0x010eddc8 <+1900>:  call   0x10f15a5 <Fcopy_marker>
>    0x010eddcd <+1905>:  mov    %eax,0x38(%esp)   <<<<<<<<<<<<<<<<<<<<
>    0x010eddd1 <+1909>:  mov    %edx,0x44(%esp)   <<<<<<<<<<<<<<<<<<<<

Same here.

So we know (from your backtrace) these objects aren't 16-byte-aligned,
and we know your GC won't mark them because they're
discontinuously-stored and max_align_t has an alignment of 16 on your
system. We also know the only reference to them is on the stack.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-22  7:22       ` Eli Zaretskii
                           ` (3 preceding siblings ...)
  2020-05-23 23:54         ` Pip Cet
@ 2020-05-29 10:16         ` Eli Zaretskii
  2020-05-29 10:34           ` Pip Cet
  4 siblings, 1 reply; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-29 10:16 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 41321, monnier, pipcet

> Date: Fri, 22 May 2020 10:22:56 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 41321@debbugs.gnu.org
> 
> > > I'm already running with such a breakpoint, let's how it will catch
> > > something.                                        ^^^
> > 
> > Should have been "hope".  Sorry.
> 
> It happened again, and now insert-file-contents wasn't involved, so I
> guess it's off the hook.  The command which triggered the problem was
> self-insert-command, as shown in the backtrace below.  The problem
> seems to be with handling overlays when buffer text changes.

One more segfault very similar to the last one I reported: it happened
when calling report_overlay_modification due to text being inserted
into a buffer.

The backtrace and the debugging session are below.  Noteworthy
observations:

. The buffer's overlay chain and the buffer's marker chain are both
  intact and valid.

. The two markers, start_marker and end_marker, which are created by
  PRESERVE_START_END before calling before-change-functions, are NOT
  in the buffer's marker chain after run-hook-with-args returns.  This
  most probably means GC was invoked while run-hook-with-args ran and
  decided to GC those 2 markers, which then unchains them via
  unchain_dead_markers.

. last_marked[] doesn't seem to mention start_marker or end_marker, at
  least not in its last 470 slots:

    (gdb) find /g1 &last_marked[0], last_marked[last_marked_index-1], 0xa00000001ffac2c8
    Pattern not found.

  This seems to be a supporting evidence that those two markers were
  GC'ed.

. start_marker and end_marker encode pointers which are 8-byte
  aligned, not 16-byte aligned.  The values of the pointers are
  0x1ffac2a8 and 0x1ffac2c8, as can be seen from the debug session.

. There's nothing wrong with rvoe_arg.location; in the previous
  sessions we forgot to dereference it (it's a pointer to a Lisp
  object).  Here's how it looks when shown correctly:

    (gdb) p rvoe_arg.location
    $14 = (Lisp_Object *) 0x15c9298 <globals+120>
    (gdb) p *rvoe_arg.location
    $15 = XIL(0xc00000001646b9b0)
    (gdb) xtype
    Lisp_Cons
    (gdb) xcar
    $16 = 0x30
    (gdb) xsymbol
    $17 = (struct Lisp_Symbol *) 0x15ca210 <lispsym+48>
    "t"
    (gdb) p *rvoe_arg.location
    $18 = XIL(0xc00000001646b9b0)
    (gdb) xcdr
    $19 = 0xc00000001646b9d0
    (gdb) xtype
    Lisp_Cons
    (gdb) xcar
    $20 = 0xd5c0
    (gdb) xtype
    Lisp_Symbol
    (gdb) xsymbol
    $21 = (struct Lisp_Symbol *) 0x15d77a0 <lispsym+54720>
    "syntax-ppss-flush-cache"
    (gdb) p *rvoe_arg.location
    $22 = XIL(0xc00000001646b9b0)
    (gdb) xcdr
    $23 = 0xc00000001646b9d0
    (gdb) xcdr
    $24 = 0x0
    [...]
    (gdb) pp *rvoe_arg.location
    (t syntax-ppss-flush-cache)

. There's nothing wrong with GDB's xtype command: it fails when a Lisp
  object encodes a pointer to invalid memory:

    (gdb) p start_marker
    $25 = XIL(0xa00000001ffac2a8)
    (gdb) xtype
    Lisp_Vectorlike
    Cannot access memory at address 0x1ffac2a8
    (gdb) p/x start_marker
    $26 = 0xa00000001ffac2a8
    (gdb) xgettype $26
    (gdb) p $type
    $27 = Lisp_Vectorlike
    (gdb) xvectype $26
    Cannot access memory at address 0x1ffac2a8
    (gdb) p/x ((struct Lisp_Vector *) $26)->header.size
    warning: value truncated
    Cannot access memory at address 0x1ffac2a8
    (gdb) p/x ((struct Lisp_Vector *) $26)->header
    warning: value truncated
    Cannot access memory at address 0x1ffac2a8
    (gdb) p/x ((struct Lisp_Vector *) $26)
    warning: value truncated
    $35 = 0x1ffac2a8
    (gdb) p/x end_marker
    $38 = 0xa00000001ffac2c8
    (gdb) xtype
    Lisp_Vectorlike
    Cannot access memory at address 0x1ffac2a8
    (gdb) p/x ((struct Lisp_Vector *)0x1ffac2c8)->header
    Cannot access memory at address 0x1ffac2c8

. Provisional conclusion: the two temporary markers created by
  signal_before_change were on the stack (see my other message with
  code disassembly), and were GC'ed as side effect or running
  syntax-ppss-flush-cache via before-change-functions.  So we should
  see whether fixing the LISP_ALIGNMENT vs GCALIGNMENT discrepancy
  fixes this problem.

Here's the backtrace and the full debug session after the crash, with
some omissions:

Thread 1 received signal SIGSEGV, Segmentation fault.
PSEUDOVECTORP (code=<optimized out>, a=<optimized out>) at lisp.h:1720
1720          return PSEUDOVECTOR_TYPEP (XUNTAG (a, Lisp_Vectorlike,
(gdb) bt
#0  PSEUDOVECTORP (code=<optimized out>, a=<optimized out>) at lisp.h:1720
#1  MARKERP (x=<optimized out>) at lisp.h:2618
#2  CHECK_MARKER (x=XIL(0xa00000001ffac2c8)) at marker.c:133
#3  0x010f073c in Fmarker_position (marker=XIL(0xa00000001ffac2c8))
    at marker.c:452
#4  0x010edd34 in signal_before_change (preserve_ptr=0x0, end_int=276884,
    start_int=276884) at insdel.c:2179
#5  prepare_to_modify_buffer_1 (start=start@entry=276884,
    end=end@entry=276884, preserve_ptr=preserve_ptr@entry=0x0)
    at insdel.c:2007
#6  0x010ee27d in prepare_to_modify_buffer (start=276884, end=276884,
    preserve_ptr=preserve_ptr@entry=0x0) at insdel.c:2018
#7  0x010ee54d in insert_1_both (
    string=0x1e3c9c08 " 2823D 26-May  gdb-patches@sourceware.or [244] Re: [PATCH, testsuite] Fix some duplicate test names\n\r...",
    nchars=100, nbytes=100, inherit=false, prepare=true, before_markers=false)
    at insdel.c:896
#8  0x010ee5c5 in insert_1_both (string=<optimized out>,
    nchars=<optimized out>, nchars@entry=100, nbytes=<optimized out>,
    nbytes@entry=100, inherit=inherit@entry=false,
    prepare=prepare@entry=true, before_markers=before_markers@entry=false)
    at insdel.c:947
#9  0x01174188 in Fprinc (object=XIL(0x800000001e05f278),
    printcharfun=<optimized out>) at print.c:734
#10 0x0114fc5c in funcall_subr (subr=<optimized out>,
    numargs=<optimized out>, numargs@entry=2, args=<optimized out>,
    args@entry=0x82d9b8) at eval.c:2869
#11 0x0114daed in Ffuncall (nargs=3, args=args@entry=0x82d9b0) at eval.c:2794
#12 0x0118ebe7 in exec_byte_code (bytestr=<optimized out>,
    vector=<optimized out>, maxdepth=<optimized out>,
    args_template=<optimized out>, nargs=<optimized out>, nargs@entry=4,
    args=<optimized out>, args@entry=0x82dde8) at bytecode.c:633
#13 0x0115134f in funcall_lambda (fun=<optimized out>, nargs=nargs@entry=4,
    arg_vector=arg_vector@entry=0x82dde8) at eval.c:2989
#14 0x0114da43 in Ffuncall (nargs=5, args=args@entry=0x82dde0) at eval.c:2808
#15 0x0118ebe7 in exec_byte_code (bytestr=<optimized out>,
    vector=<optimized out>, maxdepth=<optimized out>,
    args_template=<optimized out>, nargs=<optimized out>, nargs@entry=3,
    args=<optimized out>, args@entry=0x82e1b0) at bytecode.c:633
#16 0x0115134f in funcall_lambda (fun=<optimized out>, nargs=nargs@entry=3,
    arg_vector=arg_vector@entry=0x82e1b0) at eval.c:2989
#17 0x0114da43 in Ffuncall (nargs=4, args=args@entry=0x82e1a8) at eval.c:2808
#18 0x0118ebe7 in exec_byte_code (bytestr=<optimized out>,
    vector=<optimized out>, maxdepth=<optimized out>,
    args_template=<optimized out>, nargs=<optimized out>, nargs@entry=0,
    args=<optimized out>, args@entry=0x82e570) at bytecode.c:633
#19 0x0115134f in funcall_lambda (fun=<optimized out>, nargs=nargs@entry=0,
    arg_vector=arg_vector@entry=0x82e570) at eval.c:2989
#20 0x0114da43 in Ffuncall (nargs=nargs@entry=1, args=args@entry=0x82e568)
    at eval.c:2808
#21 0x0114de2d in Fapply (nargs=2, args=0x82e568) at eval.c:2377
#22 0x0114daed in Ffuncall (nargs=3, args=args@entry=0x82e560) at eval.c:2794
#23 0x0118ebe7 in exec_byte_code (bytestr=<optimized out>,
    vector=<optimized out>, maxdepth=<optimized out>,
    args_template=<optimized out>, nargs=<optimized out>, nargs@entry=0,
    args=<optimized out>, args@entry=0x82e8c0) at bytecode.c:633
#24 0x0115134f in funcall_lambda (fun=<optimized out>, nargs=nargs@entry=0,
    arg_vector=arg_vector@entry=0x82e8c0) at eval.c:2989
#25 0x0114da43 in Ffuncall (nargs=1, args=args@entry=0x82e8b8) at eval.c:2808
#26 0x0118ebe7 in exec_byte_code (bytestr=<optimized out>,
    vector=<optimized out>, maxdepth=<optimized out>,
    args_template=<optimized out>, nargs=<optimized out>, nargs@entry=3,
    args=<optimized out>, args@entry=0x82ed30) at bytecode.c:633
#27 0x0115134f in funcall_lambda (fun=<optimized out>, nargs=nargs@entry=3,
    arg_vector=arg_vector@entry=0x82ed30) at eval.c:2989
#28 0x0114da43 in Ffuncall (nargs=4, args=args@entry=0x82ed28) at eval.c:2808
#29 0x0118ebe7 in exec_byte_code (bytestr=<optimized out>,
    vector=<optimized out>, maxdepth=<optimized out>,
    args_template=<optimized out>, nargs=<optimized out>, nargs@entry=1,
    args=<optimized out>, args@entry=0x82f298) at bytecode.c:633
#30 0x0115134f in funcall_lambda (fun=<optimized out>, nargs=nargs@entry=1,
    arg_vector=arg_vector@entry=0x82f298) at eval.c:2989
#31 0x0114da43 in Ffuncall (nargs=nargs@entry=2, args=args@entry=0x82f290)
    at eval.c:2808
#32 0x0114906d in Ffuncall_interactively (nargs=2, args=0x82f290)
    at callint.c:254
#33 0x0114daed in Ffuncall (nargs=nargs@entry=3, args=args@entry=0x82f288)
    at eval.c:2794
#34 0x0114df22 in Fapply (nargs=nargs@entry=3, args=args@entry=0x82f288)
    at eval.c:2381
#35 0x0114afbb in Fcall_interactively (function=XIL(0x5f2c790),
    record_flag=<optimized out>, keys=XIL(0xa00000000759f578))
    at callint.c:342
#36 0x0114fc89 in funcall_subr (subr=<optimized out>,
    numargs=<optimized out>, numargs@entry=3, args=<optimized out>,
    args@entry=0x82f430) at eval.c:2872
#37 0x0114daed in Ffuncall (nargs=4, args=args@entry=0x82f428) at eval.c:2794
#38 0x0118ebe7 in exec_byte_code (bytestr=<optimized out>,
    vector=<optimized out>, maxdepth=<optimized out>,
    args_template=<optimized out>, nargs=<optimized out>, nargs@entry=1,
    args=<optimized out>, args@entry=0x82f7b8) at bytecode.c:633
#39 0x0115134f in funcall_lambda (fun=<optimized out>, nargs=nargs@entry=1,
    arg_vector=arg_vector@entry=0x82f7b8) at eval.c:2989
#40 0x0114da43 in Ffuncall (nargs=nargs@entry=2, args=args@entry=0x82f7b0)
    at eval.c:2808
#41 0x0114dc1c in call1 (fn=XIL(0x3f30), arg1=XIL(0x5f2c790)) at eval.c:2654
#42 0x010d0efe in command_loop_1 () at keyboard.c:1463
#43 0x0114ca0f in internal_condition_case (
    bfun=bfun@entry=0x10d0a0e <command_loop_1>, handlers=XIL(0x90),
    hfun=hfun@entry=0x10c5049 <cmd_error>) at eval.c:1355
#44 0x010bdbda in command_loop_2 (ignore=XIL(0)) at keyboard.c:1091
#45 0x0114c996 in internal_catch (tag=XIL(0xdfb0),
    func=func@entry=0x10bdbb3 <command_loop_2>, arg=XIL(0)) at eval.c:1116
#46 0x010bdb5d in command_loop () at keyboard.c:1070
#47 0x010c4bf3 in recursive_edit_1 () at keyboard.c:714
#48 0x010c4f0c in Frecursive_edit () at keyboard.c:786
#49 0x0124a594 in main (argc=<optimized out>, argv=<optimized out>)
    at emacs.c:2054

Lisp Backtrace:
"princ" (0x82d9b8)
"rmail-new-summary-1" (0x82dde8)
"rmail-new-summary" (0x82e1b0)
"rmail-summary" (0x82e570)
"apply" (0x82e568)
"rmail-update-summary" (0x82e8c0)
"rmail-get-new-mail-1" (0x82ed30)
"rmail-get-new-mail" (0x82f298)
"funcall-interactively" (0x82f290)
"call-interactively" (0x82f430)
"command-execute" (0x82f7b8)
(gdb) fr 4
#4  0x010f073c in Fmarker_position (marker=XIL(0xa00000001ffac2c8))
    at marker.c:452
452       CHECK_MARKER (marker);
(gdb) up
#5  0x010edd34 in signal_before_change (preserve_ptr=0x0, end_int=276884,
    start_int=276884) at insdel.c:2179
2179          report_overlay_modification (FETCH_START, FETCH_END, 0,
(gdb) p current_buffer->overlays_before
$1 = (struct Lisp_Overlay *) 0x75ac520
(gdb) p *$
$2 = {
  header = {
    size = 1140854787
  },
  start = XIL(0xa0000000075ac4e0),
  end = XIL(0xa0000000075ac500),
  plist = XIL(0xc0000000077f2340),
  next = 0x0
}
(gdb) p/x $1->header.size
$3 = 0x44001003
(gdb) p current_buffer->name_
$4 = XIL(0x8000000007364540)
(gdb) xtype
Lisp_String
(gdb) xstring
$5 = (struct Lisp_String *) 0x7364540
"INBOX-summary"
(gdb) p current_buffer->overlays_before->start
$6 = XIL(0xa0000000075ac4e0)
(gdb) p *$
$7 = 1124081664
(gdb) p current_buffer->overlays_before->start
$8 = XIL(0xa0000000075ac4e0)
(gdb) xtype
Lisp_Vectorlike
PVEC_MARKER
(gdb) xmarker
$9 = (struct Lisp_Marker *) 0x75ac4e0
(gdb) p *$
$10 = {
  header = {
    size = 1124081664
  },
  buffer = 0x7519948,
  need_adjustment = 0,
  insertion_type = 0,
  next = 0x0,
  charpos = 1,
  bytepos = 1
}
(gdb) p current_buffer->overlays_before->next
$11 = (struct Lisp_Overlay *) 0x0
(gdb) p current_buffer->overlays_after
$12 = (struct Lisp_Overlay *) 0x0
(gdb) p rvoe_arg
$13 = {
  location = 0x15c9298 <globals+120>,
  errorp = false
}
(gdb) p rvoe_arg.location
$14 = (Lisp_Object *) 0x15c9298 <globals+120>
(gdb) p *rvoe_arg.location
$15 = XIL(0xc00000001646b9b0)
(gdb) xtype
Lisp_Cons
(gdb) xcar
$16 = 0x30
(gdb) xsymbol
$17 = (struct Lisp_Symbol *) 0x15ca210 <lispsym+48>
"t"
(gdb) p *rvoe_arg.location
$18 = XIL(0xc00000001646b9b0)
(gdb) xcdr
$19 = 0xc00000001646b9d0
(gdb) xtype
Lisp_Cons
(gdb) xcar
$20 = 0xd5c0
(gdb) xtype
Lisp_Symbol
(gdb) xsymbol
$21 = (struct Lisp_Symbol *) 0x15d77a0 <lispsym+54720>
"syntax-ppss-flush-cache"
(gdb) p *rvoe_arg.location
$22 = XIL(0xc00000001646b9b0)
(gdb) xcdr
$23 = 0xc00000001646b9d0
(gdb) xcdr
$24 = 0x0
(gdb) p start_marker
$25 = XIL(0xa00000001ffac2a8)
(gdb) xtype
Lisp_Vectorlike
Cannot access memory at address 0x1ffac2a8
(gdb) p/x start_marker
$26 = 0xa00000001ffac2a8
(gdb) xgettype $26
(gdb) p $type
$27 = Lisp_Vectorlike
(gdb) xvectype $26
Cannot access memory at address 0x1ffac2a8
(gdb) p/x ((struct Lisp_Vector *) $26)->header.size
warning: value truncated
Cannot access memory at address 0x1ffac2a8
(gdb) p/x ((struct Lisp_Vector *) $26)->header
warning: value truncated
Cannot access memory at address 0x1ffac2a8
(gdb) p/x ((struct Lisp_Vector *) $26)
warning: value truncated
$35 = 0x1ffac2a8
(gdb) p/x $26
$36 = 0xa00000001ffac2a8
(gdb) p/x ((struct Lisp_Vector *)0x1ffac2a8
A syntax error in expression, near `'.
(gdb) p/x ((struct Lisp_Vector *)0x1ffac2a8)
$37 = 0x1ffac2a8
(gdb) p/x *((struct Lisp_Vector *)0x1ffac2a8)
Cannot access memory at address 0x1ffac2a8
(gdb) p/x end_marker
$38 = 0xa00000001ffac2c8
(gdb) xtype
Lisp_Vectorlike
Cannot access memory at address 0x1ffac2a8
(gdb) p/x ((struct Lisp_Vector *)0x1ffac2c8)->header
Cannot access memory at address 0x1ffac2c8
(gdb) p Vfirst_change_hook
$39 = XIL(0)
(gdb) p current_buffer->text->markers
$40 = (struct Lisp_Marker *) 0x76353a0
(gdb) p *$
$41 = {
  header = {
    size = 1124081664
  },
  buffer = 0x7519948,
  need_adjustment = 0,
  insertion_type = 0,
  next = 0x76353e0,
  charpos = 1,
  bytepos = 1
}
(gdb) p current_buffer->text->markers->next
$42 = (struct Lisp_Marker *) 0x76353e0
(gdb) p *$
$43 = {
  header = {
    size = 1124081664
  },
  buffer = 0x7519948,
  need_adjustment = 0,
  insertion_type = 0,
  next = 0x7635420,
  charpos = 1,
  bytepos = 1
}
(gdb) p current_buffer->text->markers->next->next
$44 = (struct Lisp_Marker *) 0x7635420
(gdb) p *$
$45 = {
  header = {
    size = 1124081664
  },
  buffer = 0x7519948,
  need_adjustment = 0,
  insertion_type = 0,
  next = 0x16b6a5d0,
  charpos = 1,
  bytepos = 1
}
(gdb) p current_buffer->text->markers->next->next->next
$46 = (struct Lisp_Marker *) 0x16b6a5d0
(gdb) p *$
$47 = {
  header = {
    size = 1124081664
  },
  buffer = 0x7519948,
  need_adjustment = 0,
  insertion_type = 0,
  next = 0x16b6a5b0,
  charpos = 1,
  bytepos = 1
}
(gdb) p/x start_marker
$98 = 0xa00000001ffac2c8
(gdb) pp *rvoe_arg.location
(t syntax-ppss-flush-cache)
(gdb) p last_mar
last_marked        last_marked_index
(gdb) p last_marked_index
$99 = 498
(gdb) p last_marked[497]
$100 = XIL(0x439c370)
(gdb) xtype
Lisp_Vectorlike
Cannot access memory at address 0x1ffac2a8
(gdb) find /g1 &last_marked[0], last_marked[last_marked_index-1], 0xa00000001ffac2a8
Pattern not found.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-29 10:16         ` Eli Zaretskii
@ 2020-05-29 10:34           ` Pip Cet
  2020-05-29 10:55             ` Eli Zaretskii
  0 siblings, 1 reply; 132+ messages in thread
From: Pip Cet @ 2020-05-29 10:34 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Paul Eggert, 41321, Stefan Monnier

On Fri, May 29, 2020 at 10:16 AM Eli Zaretskii <eliz@gnu.org> wrote:
> > Date: Fri, 22 May 2020 10:22:56 +0300
> > From: Eli Zaretskii <eliz@gnu.org>
> > Cc: 41321@debbugs.gnu.org
> >
> > > > I'm already running with such a breakpoint, let's how it will catch
> > > > something.                                        ^^^
> > >
> > > Should have been "hope".  Sorry.
> >
> > It happened again, and now insert-file-contents wasn't involved, so I
> > guess it's off the hook.  The command which triggered the problem was
> > self-insert-command, as shown in the backtrace below.  The problem
> > seems to be with handling overlays when buffer text changes.
>
> One more segfault very similar to the last one I reported: it happened
> when calling report_overlay_modification due to text being inserted
> into a buffer.

Everything looks consistent with the bug I described.

> . There's nothing wrong with GDB's xtype command: it fails when a Lisp
>   object encodes a pointer to invalid memory:

(gdb) p last_marked[497]
$100 = XIL(0x439c370)
(gdb) xtype
Lisp_Vectorlike
Cannot access memory at address 0x1ffac2a8

Again, that can't be right. $100 is a Lisp_Symbol, not a vectorlike,
and it's not at address 0x1ffac2a8.

So my suspicion remains that this is a gdb bug, and it appears to be a
reproducible one!

> . So we should
>   see whether fixing the LISP_ALIGNMENT vs GCALIGNMENT discrepancy
>   fixes this problem.

I concur.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-29 10:34           ` Pip Cet
@ 2020-05-29 10:55             ` Eli Zaretskii
  2020-05-29 11:47               ` Pip Cet
  0 siblings, 1 reply; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-29 10:55 UTC (permalink / raw)
  To: Pip Cet; +Cc: eggert, 41321, monnier

> From: Pip Cet <pipcet@gmail.com>
> Date: Fri, 29 May 2020 10:34:20 +0000
> Cc: Paul Eggert <eggert@cs.ucla.edu>, Stefan Monnier <monnier@iro.umontreal.ca>, 41321@debbugs.gnu.org
> 
> > . There's nothing wrong with GDB's xtype command: it fails when a Lisp
> >   object encodes a pointer to invalid memory:
> 
> (gdb) p last_marked[497]
> $100 = XIL(0x439c370)
> (gdb) xtype
> Lisp_Vectorlike
> Cannot access memory at address 0x1ffac2a8
> 
> Again, that can't be right. $100 is a Lisp_Symbol, not a vectorlike,
> and it's not at address 0x1ffac2a8.
> 
> So my suspicion remains that this is a gdb bug, and it appears to be a
> reproducible one!

There's no bug: the $size variable was not updated inside pvectype
because the 'set' command tried to access invalid memory.  So the rest
is using the stale value of $size.  Puff! no miracle and no bug.

You just don't need to assign too much importance to the address the
error message displays, it might not be the problematic address.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-29 10:55             ` Eli Zaretskii
@ 2020-05-29 11:47               ` Pip Cet
  2020-05-29 13:52                 ` Eli Zaretskii
  0 siblings, 1 reply; 132+ messages in thread
From: Pip Cet @ 2020-05-29 11:47 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: eggert, 41321, Stefan Monnier

On Fri, May 29, 2020 at 10:55 AM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Pip Cet <pipcet@gmail.com>
> > Date: Fri, 29 May 2020 10:34:20 +0000
> > Cc: Paul Eggert <eggert@cs.ucla.edu>, Stefan Monnier <monnier@iro.umontreal.ca>, 41321@debbugs.gnu.org
> >
> > > . There's nothing wrong with GDB's xtype command: it fails when a Lisp
> > >   object encodes a pointer to invalid memory:
> >
> > (gdb) p last_marked[497]
> > $100 = XIL(0x439c370)
> > (gdb) xtype
> > Lisp_Vectorlike
> > Cannot access memory at address 0x1ffac2a8
> >
> > Again, that can't be right. $100 is a Lisp_Symbol, not a vectorlike,
> > and it's not at address 0x1ffac2a8.
> >
> > So my suspicion remains that this is a gdb bug, and it appears to be a
> > reproducible one!
>
> There's no bug:

I believe there is.

> the $size variable was not updated inside pvectype
> because the 'set' command tried to access invalid memory.

Why would pvectype be called at all? xtype should have said
"Lisp_Symbol", not "Lisp_Vectorlike", and never gotten to pvectype at
all.

Feel free to try that, in a fresh GDB session:

p 0x439c370
xtype

> So the rest
> is using the stale value of $size.  Puff! no miracle and no bug.

Which rest? There's no message after "Cannot access memory at address
0x1ffac2a8"

> You just don't need to assign too much importance to the address the
> error message displays, it might not be the problematic address.

Or there might not be a problematic address, because xtype is somehow
using the value of $ which it used when it encountered the initial bug
even for subsequent calls. It doesn't do that here.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-29 11:47               ` Pip Cet
@ 2020-05-29 13:52                 ` Eli Zaretskii
  2020-05-29 14:19                   ` Pip Cet
  0 siblings, 1 reply; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-29 13:52 UTC (permalink / raw)
  To: Pip Cet; +Cc: eggert, 41321, monnier

> From: Pip Cet <pipcet@gmail.com>
> Date: Fri, 29 May 2020 11:47:46 +0000
> Cc: eggert@cs.ucla.edu, Stefan Monnier <monnier@iro.umontreal.ca>, 41321@debbugs.gnu.org
> 
> > There's no bug:
> 
> I believe there is.
> 
> > the $size variable was not updated inside pvectype
> > because the 'set' command tried to access invalid memory.
> 
> Why would pvectype be called at all? xtype should have said
> "Lisp_Symbol", not "Lisp_Vectorlike", and never gotten to pvectype at
> all.

Look at what xtype does, and you will see.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-29 13:52                 ` Eli Zaretskii
@ 2020-05-29 14:19                   ` Pip Cet
  0 siblings, 0 replies; 132+ messages in thread
From: Pip Cet @ 2020-05-29 14:19 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: eggert, 41321, Stefan Monnier

On Fri, May 29, 2020 at 1:53 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Pip Cet <pipcet@gmail.com>
> > Date: Fri, 29 May 2020 11:47:46 +0000
> > Cc: eggert@cs.ucla.edu, Stefan Monnier <monnier@iro.umontreal.ca>, 41321@debbugs.gnu.org
> >
> > > There's no bug:
> >
> > I believe there is.
> >
> > > the $size variable was not updated inside pvectype
> > > because the 'set' command tried to access invalid memory.
> >
> > Why would pvectype be called at all? xtype should have said
> > "Lisp_Symbol", not "Lisp_Vectorlike", and never gotten to pvectype at
> > all.
>
> Look at what xtype does, and you will see.

So you think it's a bug in xtype?

The relevant definitions are:

define xtype
  xgettype $
  output $type
  echo \n
  if $type == Lisp_Vectorlike
    xvectype
  end
end

define xgettype
  if (CHECK_LISP_OBJECT_TYPE)
    set $bugfix = $arg0.i
  else
    set $bugfix = $arg0
  end
  set $type = (enum Lisp_Type) (USE_LSB_TAG ? (EMACS_INT) $bugfix & (1
<< GCTYPEBITS) - 1 : (EMACS_UINT) $bugfix >> VALBITS)
end

Both look fine to me: xtype calls xgettype (not xvectype), which sets
$type to the type bits, then outputs them. But the bug must have
happened by then, because what's output is "Lisp_Vectorlike" even
though $ is a Lisp_Symbol. I fail to see how xvectype and pvectype are
relevant at all...





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-29  9:43                                                         ` Pip Cet
@ 2020-05-29 18:31                                                           ` Paul Eggert
  2020-05-29 18:37                                                             ` Pip Cet
  0 siblings, 1 reply; 132+ messages in thread
From: Paul Eggert @ 2020-05-29 18:31 UTC (permalink / raw)
  To: Pip Cet, Stefan Monnier; +Cc: 41321

On 5/29/20 2:43 AM, Pip Cet wrote:
> As I said, the code is tricky (i.e. might contain bugs that can only
> be discovered through extensive testing on 32-bit systems), and it
> complicates what should be generic functions for the rbtree
> implementation, so this is probably a 32-bit optimization that is too
> late because 32-bit systems are no longer that relevant...

At least at first, it may make more sense to keep the red-black trees as-is, and
to look up what appear to be symbol-tagged pointers twice, once as-is (to find
any kind of object) and once offset by '(char *) lispsym - Lisp_Symbol' (to find
only symbols). Although a bit slower, this won't require any changes to the
rbtree code so it's cleaner. We can then time the optimization you have in mind,
to see whether it's worth doing.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-29 18:31                                                           ` Paul Eggert
@ 2020-05-29 18:37                                                             ` Pip Cet
  2020-05-29 19:32                                                               ` Paul Eggert
  0 siblings, 1 reply; 132+ messages in thread
From: Pip Cet @ 2020-05-29 18:37 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 41321, Stefan Monnier

On Fri, May 29, 2020 at 6:31 PM Paul Eggert <eggert@cs.ucla.edu> wrote:
> On 5/29/20 2:43 AM, Pip Cet wrote:
> > As I said, the code is tricky (i.e. might contain bugs that can only
> > be discovered through extensive testing on 32-bit systems), and it
> > complicates what should be generic functions for the rbtree
> > implementation, so this is probably a 32-bit optimization that is too
> > late because 32-bit systems are no longer that relevant...
>
> At least at first, it may make more sense to keep the red-black trees as-is, and
> to look up what appear to be symbol-tagged pointers twice, once as-is (to find
> any kind of object) and once offset by '(char *) lispsym - Lisp_Symbol' (to find
> only symbols).

Having had some time to think about this, I agree. I'm certainly not
very confident in that code.

But the main reason is that it's not an optimization in all
circumstances: if you have a very large vector, and a symbol block
aliasing it as symbol offsets goes away, you have to search for other
symbol blocks with that property, which might take a long time.

However, I wonder what you mean by "what appear to be symbol-tagged
pointers"? Surely we need to look up all pointers twice, no matter
what their tag is, since they might be a reference to something inside
the struct Lisp_Symbol.

Of course, on 64-bit machines, this line of code would usually save us
the trouble:

  if (start < min_heap_address || start > max_heap_address)
    return MEM_NIL;

So that's another reason to leave the code as it is for now.

> Although a bit slower, this won't require any changes to the
> rbtree code so it's cleaner.

> We can then time the optimization you have in mind, to see whether it's worth doing.

... or something simpler that might actually work better :-)





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-29 18:37                                                             ` Pip Cet
@ 2020-05-29 19:32                                                               ` Paul Eggert
  2020-05-29 19:37                                                                 ` Pip Cet
  2020-05-29 20:26                                                                 ` Stefan Monnier
  0 siblings, 2 replies; 132+ messages in thread
From: Paul Eggert @ 2020-05-29 19:32 UTC (permalink / raw)
  To: Pip Cet; +Cc: 41321, Stefan Monnier

On 5/29/20 11:37 AM, Pip Cet wrote:
> if you have a very large vector, and a symbol block
> aliasing it as symbol offsets goes away, you have to search for other
> symbol blocks with that property, which might take a long time.

It shouldn't be that bad, because when you are worrying about symbols offset by
'lispsym', you need to look only for symbol blocks; it won't matter if these
values appear to point into a vector because you won't follow them in that case.

> However, I wonder what you mean by "what appear to be symbol-tagged
> pointers"? Surely we need to look up all pointers twice, no matter
> what their tag is, since they might be a reference to something inside
> the struct Lisp_Symbol.

What I was trying to say is that if a pointer lacks the symbol tag, then we
needn't worry about it being offset by 'lispsym'. These pointers need to be
looked up only once, even if they happen to be pointers into a struct
Lisp_Symbol. We can safely assume that a compiler won't take a Lisp_Object that
is a symbol, and add a small offset to it without also adding 'lispsym'.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-29 19:32                                                               ` Paul Eggert
@ 2020-05-29 19:37                                                                 ` Pip Cet
  2020-05-29 20:26                                                                 ` Stefan Monnier
  1 sibling, 0 replies; 132+ messages in thread
From: Pip Cet @ 2020-05-29 19:37 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 41321, Stefan Monnier

On Fri, May 29, 2020 at 7:32 PM Paul Eggert <eggert@cs.ucla.edu> wrote:
> On 5/29/20 11:37 AM, Pip Cet wrote:
> > if you have a very large vector, and a symbol block
> > aliasing it as symbol offsets goes away, you have to search for other
> > symbol blocks with that property, which might take a long time.
>
> It shouldn't be that bad, because when you are worrying about symbols offset by
> 'lispsym', you need to look only for symbol blocks; it won't matter if these
> values appear to point into a vector because you won't follow them in that case.

You mean it shouldn't be that bad with the existing code? You're probably right.

It would have been very bad with the code I posted though, so best ignore that.

> > However, I wonder what you mean by "what appear to be symbol-tagged
> > pointers"? Surely we need to look up all pointers twice, no matter
> > what their tag is, since they might be a reference to something inside
> > the struct Lisp_Symbol.
>
> What I was trying to say is that if a pointer lacks the symbol tag, then we
> needn't worry about it being offset by 'lispsym'. These pointers need to be
> looked up only once, even if they happen to be pointers into a struct
> Lisp_Symbol. We can safely assume that a compiler won't take a Lisp_Object that
> is a symbol, and add a small offset to it without also adding 'lispsym'.

Oh! You're right, of course. How silly of me not to realize.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-29  6:19                                               ` Eli Zaretskii
@ 2020-05-29 20:24                                                 ` Paul Eggert
  2020-05-29 21:01                                                   ` Pip Cet
  2020-05-30  5:50                                                   ` Eli Zaretskii
  0 siblings, 2 replies; 132+ messages in thread
From: Paul Eggert @ 2020-05-29 20:24 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 41321, monnier, pipcet

[-- Attachment #1: Type: text/plain, Size: 1570 bytes --]

On 5/28/20 11:19 PM, Eli Zaretskii wrote:
>> -  return (uintptr_t) p % LISP_ALIGNMENT == 0;
>> +  return (uintptr_t) p % GCALIGNMENT == 0;
>>  }
> ...replacing LISP_ALIGNMENT with GCALIGNMENT just here doesn't sound
> right to me: by keeping the current value of LISP_ALIGNMENT, we
> basically declare that Lisp objects shall be aligned on that boundary,
> whereas that isn't really the case.  Why not change the value of
> LISP_ALIGNMENT instead?

There are really two bugs here.

1. The idea of taking the address modulo LISP_ALIGNMENT is wrong, as a pointer
can point into the middle of (say) a pseudovector and not be
LISP_ALIGNMENT-aligned. Replacing LISP_ALIGNMENT with GCALIGNMENT does not fix
this bug in general, because such a pointer might not be GCALIGNMENT-aligned
either. This bug can cause crashes because it causes GC to think an object is
garbage when it's not garbage.

2. LISP_ALIGNMENT is too large on MinGW and some other platforms.

The patch I sent earlier attempted to be the simplest patch that would fix the
bug you observed on MinGW, which is a special case of (1). It does not attempt
to fix all plausible cases of (1), nor does it address (2).

We can fix these two bugs separately, by installing the attached patches into
emacs-27. The first patch fixes (1) and thus fixes the crash along with other
plausible crashes. The second one fixes (2), and this fixes the MinGW crash in a
different way but does not fix the crash on other plausible platforms. (1)
probably has better performance than (2), though I doubt whether users will notice.

[-- Attachment #2: 0001-Remove-maybe_lisp_pointer.patch --]
[-- Type: text/x-patch, Size: 1669 bytes --]

From 2c0bac868a7aefe7dafd2362cce42a7d3738319f Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Fri, 29 May 2020 12:56:16 -0700
Subject: [PATCH 1/2] =?UTF-8?q?Remove=20=E2=80=98maybe=5Flisp=5Fpointer?=
 =?UTF-8?q?=E2=80=99?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

It’s an invalid optimization, since pointers can address the
middle of Lisp_Object data.
* src/alloc.c (maybe_lisp_pointer): Remove.  Only use removed.
Do not merge to master, as we’ll put in a better fix there.
---
 src/alloc.c | 15 ---------------
 1 file changed, 15 deletions(-)

diff --git a/src/alloc.c b/src/alloc.c
index 1c6b664b22..b8382aca5b 100644
--- a/src/alloc.c
+++ b/src/alloc.c
@@ -4585,18 +4585,6 @@ mark_maybe_objects (Lisp_Object const *array, ptrdiff_t nelts)
     mark_maybe_object (*array);
 }
 
-/* Return true if P might point to Lisp data that can be garbage
-   collected, and false otherwise (i.e., false if it is easy to see
-   that P cannot point to Lisp data that can be garbage collected).
-   Symbols are implemented via offsets not pointers, but the offsets
-   are also multiples of LISP_ALIGNMENT.  */
-
-static bool
-maybe_lisp_pointer (void *p)
-{
-  return (uintptr_t) p % LISP_ALIGNMENT == 0;
-}
-
 /* If P points to Lisp data, mark that as live if it isn't already
    marked.  */
 
@@ -4609,9 +4597,6 @@ mark_maybe_pointer (void *p)
   VALGRIND_MAKE_MEM_DEFINED (&p, sizeof (p));
 #endif
 
-  if (!maybe_lisp_pointer (p))
-    return;
-
   if (pdumper_object_p (p))
     {
       int type = pdumper_find_object_type (p);
-- 
2.17.1


[-- Attachment #3: 0002-Don-t-overalign-Lisp-objects.patch --]
[-- Type: text/x-patch, Size: 3666 bytes --]

From f620b5b802bf2afad033c7cc7856a71fd28b2c13 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Fri, 29 May 2020 13:02:32 -0700
Subject: [PATCH 2/2] =?UTF-8?q?Don=E2=80=99t=20overalign=20Lisp=20objects?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Backport from master.
* src/alloc.c (union emacs_align_type):
New type, used for LISP_ALIGNMENT.
(LISP_ALIGNMENT): Use it instead of max_align_t.
---
 src/alloc.c | 55 +++++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 45 insertions(+), 10 deletions(-)

diff --git a/src/alloc.c b/src/alloc.c
index b8382aca5b..48e96863db 100644
--- a/src/alloc.c
+++ b/src/alloc.c
@@ -104,6 +104,46 @@ Copyright (C) 1985-1986, 1988, 1993-1995, 1997-2020 Free Software
 #include "w32heap.h"	/* for sbrk */
 #endif
 
+/* A type with alignment at least as large as any object that Emacs
+   allocates.  This is not max_align_t because some platforms (e.g.,
+   mingw) have buggy malloc implementations that do not align for
+   max_align_t.  This union contains types of all GCALIGNED_STRUCT
+   components visible here.  */
+union emacs_align_type
+{
+  struct frame frame;
+  struct Lisp_Bignum Lisp_Bignum;
+  struct Lisp_Bool_Vector Lisp_Bool_Vector;
+  struct Lisp_Char_Table Lisp_Char_Table;
+  struct Lisp_CondVar Lisp_CondVar;
+  struct Lisp_Finalizer Lisp_Finalizer;
+  struct Lisp_Float Lisp_Float;
+  struct Lisp_Hash_Table Lisp_Hash_Table;
+  struct Lisp_Marker Lisp_Marker;
+  struct Lisp_Misc_Ptr Lisp_Misc_Ptr;
+  struct Lisp_Mutex Lisp_Mutex;
+  struct Lisp_Overlay Lisp_Overlay;
+  struct Lisp_Sub_Char_Table Lisp_Sub_Char_Table;
+  struct Lisp_Subr Lisp_Subr;
+  struct Lisp_User_Ptr Lisp_User_Ptr;
+  struct Lisp_Vector Lisp_Vector;
+  struct terminal terminal;
+  struct thread_state thread_state;
+  struct window window;
+
+  /* Omit the following since they would require including process.h
+     etc.  In practice their alignments never exceed that of the
+     structs already listed.  */
+#if 0
+  struct Lisp_Module_Function Lisp_Module_Function;
+  struct Lisp_Process Lisp_Process;
+  struct save_window_data save_window_data;
+  struct scroll_bar scroll_bar;
+  struct xwidget_view xwidget_view;
+  struct xwidget xwidget;
+#endif
+};
+
 #ifdef DOUG_LEA_MALLOC
 
 /* Specify maximum number of areas to mmap.  It would be nice to use a
@@ -636,16 +676,11 @@ buffer_memory_full (ptrdiff_t nbytes)
 #define COMMON_MULTIPLE(a, b) \
   ((a) % (b) == 0 ? (a) : (b) % (a) == 0 ? (b) : (a) * (b))
 
-/* LISP_ALIGNMENT is the alignment of Lisp objects.  It must be at
-   least GCALIGNMENT so that pointers can be tagged.  It also must be
-   at least as strict as the alignment of all the C types used to
-   implement Lisp objects; since pseudovectors can contain any C type,
-   this is max_align_t.  On recent GNU/Linux x86 and x86-64 this can
-   often waste up to 8 bytes, since alignof (max_align_t) is 16 but
-   typical vectors need only an alignment of 8.  Although shrinking
-   the alignment to 8 would save memory, it cost a 20% hit to Emacs
-   CPU performance on Fedora 28 x86-64 when compiled with gcc -m32.  */
-enum { LISP_ALIGNMENT = alignof (union { max_align_t x;
+/* Alignment needed for memory blocks that are allocated via malloc
+   and that contain Lisp objects.  On typical hosts malloc already
+   aligns sufficiently, but extra work is needed on oddball hosts
+   where Emacs would crash if malloc returned a non-GCALIGNED pointer.  */
+enum { LISP_ALIGNMENT = alignof (union { union emacs_align_type x;
 					 GCALIGNED_UNION_MEMBER }) };
 verify (LISP_ALIGNMENT % GCALIGNMENT == 0);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-29 19:32                                                               ` Paul Eggert
  2020-05-29 19:37                                                                 ` Pip Cet
@ 2020-05-29 20:26                                                                 ` Stefan Monnier
  2020-05-29 20:40                                                                   ` Paul Eggert
  2020-05-30  5:51                                                                   ` Eli Zaretskii
  1 sibling, 2 replies; 132+ messages in thread
From: Stefan Monnier @ 2020-05-29 20:26 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 41321, Pip Cet

> What I was trying to say is that if a pointer lacks the symbol tag, then we
> needn't worry about it being offset by 'lispsym'. These pointers need to be
> looked up only once, even if they happen to be pointers into a struct
> Lisp_Symbol. We can safely assume that a compiler won't take a Lisp_Object that
> is a symbol, and add a small offset to it without also adding 'lispsym'.

I don't think that true.

The original problematic case is for wide-int where a 64bit Lisp_Object
containing a symbol is split into a 32bit tag saying "this is a symbol"
and a 32bit pointer to which an offset has been added.

So when we encounter a 32bit word on the stack, it may be a "plain
pointer" or it may be the 32bit of a pointer to a symbol with an
offset applied but we can't tell which it is because we don't have the
tag at that point.


        Stefan "looking forward to bignums replacing wide-ints"






^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-29 20:26                                                                 ` Stefan Monnier
@ 2020-05-29 20:40                                                                   ` Paul Eggert
  2020-05-30  5:54                                                                     ` Eli Zaretskii
  2020-05-30  5:51                                                                   ` Eli Zaretskii
  1 sibling, 1 reply; 132+ messages in thread
From: Paul Eggert @ 2020-05-29 20:40 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 41321, Pip Cet

On 5/29/20 1:26 PM, Stefan Monnier wrote:

> The original problematic case is for wide-int where a 64bit Lisp_Object
> containing a symbol is split into a 32bit tag saying "this is a symbol"
> and a 32bit pointer to which an offset has been added.
> 
> So when we encounter a 32bit word on the stack, it may be a "plain
> pointer" or it may be the 32bit of a pointer to a symbol with an
> offset applied but we can't tell which it is because we don't have the
> tag at that point.

Oh, you're right. Thanks, I was thinking only of the USE_LSB_TAG case.

For the !USE_LSB_TAG case, we should check whether the word is aligned for
'struct Lisp_Symbol', not whether it has the Lisp_Symbol tag, when deciding
quickly whether to add 'lispsym' and then do the second rbtree lookup. Something
like this:

  (USE_LSB_TAG
   ? (uintptr_t) word % GCALIGNMENT == Lisp_Symbol
   : (uintptr_t) word % alignof (struct Lisp_Symbol) == 0)

I'll fold this idea into the next iteration of the patch I'm working on.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-29 20:24                                                 ` Paul Eggert
@ 2020-05-29 21:01                                                   ` Pip Cet
  2020-05-30  5:58                                                     ` Eli Zaretskii
  2020-05-30 16:31                                                     ` Paul Eggert
  2020-05-30  5:50                                                   ` Eli Zaretskii
  1 sibling, 2 replies; 132+ messages in thread
From: Pip Cet @ 2020-05-29 21:01 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 41321, Stefan Monnier

On Fri, May 29, 2020 at 8:24 PM Paul Eggert <eggert@cs.ucla.edu> wrote:
> On 5/28/20 11:19 PM, Eli Zaretskii wrote:
> >> -  return (uintptr_t) p % LISP_ALIGNMENT == 0;
> >> +  return (uintptr_t) p % GCALIGNMENT == 0;
> >>  }
> > ...replacing LISP_ALIGNMENT with GCALIGNMENT just here doesn't sound
> > right to me: by keeping the current value of LISP_ALIGNMENT, we
> > basically declare that Lisp objects shall be aligned on that boundary,
> > whereas that isn't really the case.  Why not change the value of
> > LISP_ALIGNMENT instead?
>
> There are really two bugs here.
>
> 1. The idea of taking the address modulo LISP_ALIGNMENT is wrong, as a pointer
> can point into the middle of (say) a pseudovector and not be
> LISP_ALIGNMENT-aligned. Replacing LISP_ALIGNMENT with GCALIGNMENT does not fix
> this bug in general, because such a pointer might not be GCALIGNMENT-aligned
> either. This bug can cause crashes because it causes GC to think an object is
> garbage when it's not garbage.
>
> 2. LISP_ALIGNMENT is too large on MinGW and some other platforms.
>
> The patch I sent earlier attempted to be the simplest patch that would fix the
> bug you observed on MinGW, which is a special case of (1).

I'm not convinced. I think Eli only observed (2). There were no
pointers into the middle of pseudovectors in his backtrace or
disassembly...

> It does not attempt
> to fix all plausible cases of (1), nor does it address (2).

It does address (2). It doesn't address all cases of (1).

> We can fix these two bugs separately, by installing the attached patches into
> We can fix these two bugs separately, by installing the attached patches into
> emacs-27. The first patch fixes (1) and thus fixes the crash along with other
> plausible crashes. The second one fixes (2), and this fixes the MinGW crash in a
> different way but does not fix the crash on other plausible platforms. (1)
> probably has better performance than (2), though I doubt whether users will notice.

(1) says:
It’s an invalid optimization, since pointers can address the
middle of Lisp_Object data.

That may be true (we still haven't observed it), but it's not what
happened in Eli's case: in that case, the "pointer" was actually the
lower half of a Lisp_Object, so it pointed at the beginning of a
struct Lisp_Vector. That just happened to be misaligned.

(2) has this comment:
+/* Alignment needed for memory blocks that are allocated via malloc
+   and that contain Lisp objects.  On typical hosts malloc already
+   aligns sufficiently, but extra work is needed on oddball hosts
+   where Emacs would crash if malloc returned a non-GCALIGNED pointer.  */

I can't make sense of that comment. It describes two problems that
don't happen, and omits the problem that does happen.
1. malloc() % GCALIGNMENT != 0. Never happens, as far as I can tell.
2. A Lisp object requires greater alignment than malloc() gives it.
IIRC, there was at least one RISC architecture whose specification
supported atomic operations only on the first word in each
32-byte-aligned block, but that's such a rare case (and wasn't true
for the silicon implementations, I seem to recall) that it seems silly
to worry about it today.

I'm not saying it's the best solution, but I would prefer simply
defining LISP_ALIGNMENT to be 8 to either patch.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-29 20:24                                                 ` Paul Eggert
  2020-05-29 21:01                                                   ` Pip Cet
@ 2020-05-30  5:50                                                   ` Eli Zaretskii
  1 sibling, 0 replies; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-30  5:50 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 41321, monnier, pipcet

> Cc: pipcet@gmail.com, 41321@debbugs.gnu.org, monnier@iro.umontreal.ca
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Fri, 29 May 2020 13:24:55 -0700
> 
> There are really two bugs here.
> 
> 1. The idea of taking the address modulo LISP_ALIGNMENT is wrong, as a pointer
> can point into the middle of (say) a pseudovector and not be
> LISP_ALIGNMENT-aligned. Replacing LISP_ALIGNMENT with GCALIGNMENT does not fix
> this bug in general, because such a pointer might not be GCALIGNMENT-aligned
> either. This bug can cause crashes because it causes GC to think an object is
> garbage when it's not garbage.
> 
> 2. LISP_ALIGNMENT is too large on MinGW and some other platforms.
> 
> The patch I sent earlier attempted to be the simplest patch that would fix the
> bug you observed on MinGW, which is a special case of (1). It does not attempt
> to fix all plausible cases of (1), nor does it address (2).
> 
> We can fix these two bugs separately, by installing the attached patches into
> emacs-27. The first patch fixes (1) and thus fixes the crash along with other
> plausible crashes. The second one fixes (2), and this fixes the MinGW crash in a
> different way but does not fix the crash on other plausible platforms. (1)
> probably has better performance than (2), though I doubt whether users will notice.

Since (1) is for now purely theoretical (and rare even in that
theoretical case), I'd like to see (2) applied to emacs-27.  Let's do
that soon, as I'd like to have another pretest in the near future.

Thanks.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-29 20:26                                                                 ` Stefan Monnier
  2020-05-29 20:40                                                                   ` Paul Eggert
@ 2020-05-30  5:51                                                                   ` Eli Zaretskii
  2020-05-30 14:26                                                                     ` Stefan Monnier
  1 sibling, 1 reply; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-30  5:51 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: eggert, 41321, pipcet

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Pip Cet <pipcet@gmail.com>,  Eli Zaretskii <eliz@gnu.org>,
>   41321@debbugs.gnu.org
> Date: Fri, 29 May 2020 16:26:59 -0400
> 
>         Stefan "looking forward to bignums replacing wide-ints"

Why? so that Emacs could be slower still?





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-29 20:40                                                                   ` Paul Eggert
@ 2020-05-30  5:54                                                                     ` Eli Zaretskii
  2020-05-30 17:52                                                                       ` Paul Eggert
  0 siblings, 1 reply; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-30  5:54 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 41321, monnier, pipcet

> Cc: Pip Cet <pipcet@gmail.com>, Eli Zaretskii <eliz@gnu.org>,
>  41321@debbugs.gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Fri, 29 May 2020 13:40:33 -0700
> 
>   (USE_LSB_TAG
>    ? (uintptr_t) word % GCALIGNMENT == Lisp_Symbol
>    : (uintptr_t) word % alignof (struct Lisp_Symbol) == 0)

I don't understand how this will work, given that Lisp object on the
stack can be pushed as 2 non-contiguous 32-bit words.  Can you
explain?





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-29 21:01                                                   ` Pip Cet
@ 2020-05-30  5:58                                                     ` Eli Zaretskii
  2020-05-30  7:19                                                       ` Pip Cet
  2020-05-30 16:31                                                     ` Paul Eggert
  1 sibling, 1 reply; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-30  5:58 UTC (permalink / raw)
  To: Pip Cet; +Cc: eggert, 41321, monnier

> From: Pip Cet <pipcet@gmail.com>
> Date: Fri, 29 May 2020 21:01:39 +0000
> Cc: Eli Zaretskii <eliz@gnu.org>, 41321@debbugs.gnu.org, 
> 	Stefan Monnier <monnier@iro.umontreal.ca>
> 
> (2) has this comment:
> +/* Alignment needed for memory blocks that are allocated via malloc
> +   and that contain Lisp objects.  On typical hosts malloc already
> +   aligns sufficiently, but extra work is needed on oddball hosts
> +   where Emacs would crash if malloc returned a non-GCALIGNED pointer.  */
> 
> I can't make sense of that comment. It describes two problems that
> don't happen, and omits the problem that does happen.
> 1. malloc() % GCALIGNMENT != 0. Never happens, as far as I can tell.
> 2. A Lisp object requires greater alignment than malloc() gives it.
> IIRC, there was at least one RISC architecture whose specification
> supported atomic operations only on the first word in each
> 32-byte-aligned block, but that's such a rare case (and wasn't true
> for the silicon implementations, I seem to recall) that it seems silly
> to worry about it today.
> 
> I'm not saying it's the best solution, but I would prefer simply
> defining LISP_ALIGNMENT to be 8 to either patch.

I agree, but patch 2 basically does that, so I'm okay with saying "8"
in so many words.

Btw, can someone remind me why we started requiring non-default
alignment from Lisp objects?

Also, given the fact that in the crashing case the 2 32-bit parts of a
Lisp object were pushed onto the stack non-contiguously, will fixing
the alignment alone cause those Lisp objects to be marked?





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-30  5:58                                                     ` Eli Zaretskii
@ 2020-05-30  7:19                                                       ` Pip Cet
  2020-05-30  9:08                                                         ` Eli Zaretskii
  0 siblings, 1 reply; 132+ messages in thread
From: Pip Cet @ 2020-05-30  7:19 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: eggert, 41321, Stefan Monnier

On Sat, May 30, 2020 at 5:58 AM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Pip Cet <pipcet@gmail.com>
> > Date: Fri, 29 May 2020 21:01:39 +0000
> > Cc: Eli Zaretskii <eliz@gnu.org>, 41321@debbugs.gnu.org,
> >       Stefan Monnier <monnier@iro.umontreal.ca>
> >
> > (2) has this comment:
> > +/* Alignment needed for memory blocks that are allocated via malloc
> > +   and that contain Lisp objects.  On typical hosts malloc already
> > +   aligns sufficiently, but extra work is needed on oddball hosts
> > +   where Emacs would crash if malloc returned a non-GCALIGNED pointer.  */
> >
> > I can't make sense of that comment. It describes two problems that
> > don't happen, and omits the problem that does happen.
> > 1. malloc() % GCALIGNMENT != 0. Never happens, as far as I can tell.
> > 2. A Lisp object requires greater alignment than malloc() gives it.
> > IIRC, there was at least one RISC architecture whose specification
> > supported atomic operations only on the first word in each
> > 32-byte-aligned block, but that's such a rare case (and wasn't true
> > for the silicon implementations, I seem to recall) that it seems silly
> > to worry about it today.
> >
> > I'm not saying it's the best solution, but I would prefer simply
> > defining LISP_ALIGNMENT to be 8 to either patch.
>
> I agree, but patch 2 basically does that, so I'm okay with saying "8"
> in so many words.

Okay.

> Btw, can someone remind me why we started requiring non-default
> alignment from Lisp objects?

max_align_t was changed to include a float128 type, and
alignof(float128) == 16 on x86, even though virtually all x86 systems
are configured to allow unaligned accesses.

If I understand Paul's concerns correctly, he believes it's possible a
system will once again come into use in which atomic accesses only
work for offsets aligned to, say, 32 bytes. Since pthread variables
require atomic accesses, such a platform would see weird crashes if a
pthread inside a Lisp_Vector wasn't aligned to 32 bytes.

Of course, it remains to be seen/checked whether any such system would
actually define max_align_t to have an alignment of 32, since it
covers only primitive types.

> Also, given the fact that in the crashing case the 2 32-bit parts of a
> Lisp object were pushed onto the stack non-contiguously, will fixing
> the alignment alone cause those Lisp objects to be marked?

Yes. The lower 32-bit part was ignored because its value wasn't
16-byte aligned, not because its stack location wasn't 8-byte aligned.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-30  7:19                                                       ` Pip Cet
@ 2020-05-30  9:08                                                         ` Eli Zaretskii
  2020-05-30 11:06                                                           ` Pip Cet
  0 siblings, 1 reply; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-30  9:08 UTC (permalink / raw)
  To: Pip Cet; +Cc: eggert, 41321, monnier

> From: Pip Cet <pipcet@gmail.com>
> Date: Sat, 30 May 2020 07:19:18 +0000
> Cc: eggert@cs.ucla.edu, 41321@debbugs.gnu.org, 
> 	Stefan Monnier <monnier@iro.umontreal.ca>
> 
> > Btw, can someone remind me why we started requiring non-default
> > alignment from Lisp objects?
> 
> max_align_t was changed to include a float128 type, and
> alignof(float128) == 16 on x86, even though virtually all x86 systems
> are configured to allow unaligned accesses.

I understand that part, but my question was why, even before the
change in max_align_t, did we start requiring 8-byte alignment on
systems where that is not automatically guaranteed?

> If I understand Paul's concerns correctly, he believes it's possible a
> system will once again come into use in which atomic accesses only
> work for offsets aligned to, say, 32 bytes. Since pthread variables
> require atomic accesses, such a platform would see weird crashes if a
> pthread inside a Lisp_Vector wasn't aligned to 32 bytes.

So this alignment requirement is only due to pthreads being used?  But
MinGW doesn't use pthreads.

> > Also, given the fact that in the crashing case the 2 32-bit parts of a
> > Lisp object were pushed onto the stack non-contiguously, will fixing
> > the alignment alone cause those Lisp objects to be marked?
> 
> Yes. The lower 32-bit part was ignored because its value wasn't
> 16-byte aligned, not because its stack location wasn't 8-byte aligned.

Right, but I'm talking about marking.  AFAIU, when scanning the stack
finds a value that looks like a Lisp object, we mark that object.  If
the two 32-bit parts of the object are non-contiguous, will we be able
to recognize such an object, and will we be able to mark it correctly,
and if so, how?  IOW, don't we need the upper 32-bit (which encodes
the object type) for the purposes of marking it?





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-30  9:08                                                         ` Eli Zaretskii
@ 2020-05-30 11:06                                                           ` Pip Cet
  2020-05-30 11:31                                                             ` Eli Zaretskii
  0 siblings, 1 reply; 132+ messages in thread
From: Pip Cet @ 2020-05-30 11:06 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: eggert, 41321, Stefan Monnier

On Sat, May 30, 2020 at 9:08 AM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Pip Cet <pipcet@gmail.com>
> > Date: Sat, 30 May 2020 07:19:18 +0000
> > Cc: eggert@cs.ucla.edu, 41321@debbugs.gnu.org,
> >       Stefan Monnier <monnier@iro.umontreal.ca>
> >
> > > Btw, can someone remind me why we started requiring non-default
> > > alignment from Lisp objects?
> >
> > max_align_t was changed to include a float128 type, and
> > alignof(float128) == 16 on x86, even though virtually all x86 systems
> > are configured to allow unaligned accesses.
>
> I understand that part, but my question was why, even before the
> change in max_align_t, did we start requiring 8-byte alignment on
> systems where that is not automatically guaranteed?

I don't know. As I said, I think that was always buggy on pdumper
systems, though the bug was very subtle. My guess is it predates
pdumper, at which time it was a valid optimization.

> > If I understand Paul's concerns correctly, he believes it's possible a
> > system will once again come into use in which atomic accesses only
> > work for offsets aligned to, say, 32 bytes. Since pthread variables
> > require atomic accesses, such a platform would see weird crashes if a
> > pthread inside a Lisp_Vector wasn't aligned to 32 bytes.
>
> So this alignment requirement is only due to pthreads being used?

I'm not sure what you're asking. Obviously there are systems on which
unaligned accesses will fault or be very slow indeed, so we need to
make sure, say, pure space allocations are aligned somehow. That
requires a LISP_ALIGNMENT of 8. Everything beyond that is only for
performance, pthreads, and SIMD types.

> > > Also, given the fact that in the crashing case the 2 32-bit parts of a
> > > Lisp object were pushed onto the stack non-contiguously, will fixing
> > > the alignment alone cause those Lisp objects to be marked?
> >
> > Yes. The lower 32-bit part was ignored because its value wasn't
> > 16-byte aligned, not because its stack location wasn't 8-byte aligned.
>
> Right, but I'm talking about marking.  AFAIU, when scanning the stack
> finds a value that looks like a Lisp object, we mark that object.

And if we find a value that looks like a pointer to a Lisp structure,
as the lower half of a non-symbol Lisp_Object does, we mark the
corresponding Lisp object.

> If
> the two 32-bit parts of the object are non-contiguous, will we be able
> to recognize such an object, and will we be able to mark it correctly,
> and if so, how?  IOW, don't we need the upper 32-bit (which encodes
> the object type) for the purposes of marking it?

For everything but symbols, we don't, mark_maybe_pointer called on the
low 32 bits suffices. For symbols, mark_maybe_pointer needs to be
changed to also check the pointer at <low 32-bit word> + &lispsym.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-30 11:06                                                           ` Pip Cet
@ 2020-05-30 11:31                                                             ` Eli Zaretskii
  2020-05-30 13:29                                                               ` Pip Cet
  0 siblings, 1 reply; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-30 11:31 UTC (permalink / raw)
  To: Pip Cet; +Cc: eggert, 41321, monnier

> From: Pip Cet <pipcet@gmail.com>
> Date: Sat, 30 May 2020 11:06:52 +0000
> Cc: eggert@cs.ucla.edu, 41321@debbugs.gnu.org, 
> 	Stefan Monnier <monnier@iro.umontreal.ca>
> 
> > I understand that part, but my question was why, even before the
> > change in max_align_t, did we start requiring 8-byte alignment on
> > systems where that is not automatically guaranteed?
> 
> I don't know. As I said, I think that was always buggy on pdumper
> systems, though the bug was very subtle. My guess is it predates
> pdumper, at which time it was a valid optimization.

How is pdumper involved here?

> > So this alignment requirement is only due to pthreads being used?
> 
> I'm not sure what you're asking. Obviously there are systems on which
> unaligned accesses will fault or be very slow indeed, so we need to
> make sure, say, pure space allocations are aligned somehow. That
> requires a LISP_ALIGNMENT of 8. Everything beyond that is only for
> performance, pthreads, and SIMD types.

If the system guarantees 4-byte alignment from malloc (and/or a
similar alignment of the runtime C stack), then using that doesn't
trigger problems related to unaligned accesses, right?  So let me
rephrase: why isn't 4-byte alignment "good enough" for us on systems
where malloc and the runtime stack are guaranteed to be thus aligned?

> > If
> > the two 32-bit parts of the object are non-contiguous, will we be able
> > to recognize such an object, and will we be able to mark it correctly,
> > and if so, how?  IOW, don't we need the upper 32-bit (which encodes
> > the object type) for the purposes of marking it?
> 
> For everything but symbols, we don't, mark_maybe_pointer called on the
> low 32 bits suffices. For symbols, mark_maybe_pointer needs to be
> changed to also check the pointer at <low 32-bit word> + &lispsym.

Right, that's what I thought.  So this issue also has to be fixed on
emacs-27 in order for us to provide a stable Emacs 27.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-30 11:31                                                             ` Eli Zaretskii
@ 2020-05-30 13:29                                                               ` Pip Cet
  2020-05-30 16:32                                                                 ` Eli Zaretskii
  2020-05-30 18:04                                                                 ` Paul Eggert
  0 siblings, 2 replies; 132+ messages in thread
From: Pip Cet @ 2020-05-30 13:29 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: eggert, 41321, Stefan Monnier

[-- Attachment #1: Type: text/plain, Size: 2652 bytes --]

On Sat, May 30, 2020 at 11:31 AM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Pip Cet <pipcet@gmail.com>
> > Date: Sat, 30 May 2020 11:06:52 +0000
> > Cc: eggert@cs.ucla.edu, 41321@debbugs.gnu.org,
> >       Stefan Monnier <monnier@iro.umontreal.ca>
> >
> > > I understand that part, but my question was why, even before the
> > > change in max_align_t, did we start requiring 8-byte alignment on
> > > systems where that is not automatically guaranteed?
> >
> > I don't know. As I said, I think that was always buggy on pdumper
> > systems, though the bug was very subtle. My guess is it predates
> > pdumper, at which time it was a valid optimization.
>
> How is pdumper involved here?

See the pdumper issue I described above. I can't imagine this being a
significant bug, because it needs the sole surviving reference to a
pdumper object to be on the stack, while simultaneously being the key
in a weak-key hash table cell...

> > > So this alignment requirement is only due to pthreads being used?
> >
> > I'm not sure what you're asking. Obviously there are systems on which
> > unaligned accesses will fault or be very slow indeed, so we need to
> > make sure, say, pure space allocations are aligned somehow. That
> > requires a LISP_ALIGNMENT of 8. Everything beyond that is only for
> > performance, pthreads, and SIMD types.
>
> If the system guarantees 4-byte alignment from malloc (and/or a
> similar alignment of the runtime C stack), then using that doesn't
> trigger problems related to unaligned accesses, right?  So let me
> rephrase: why isn't 4-byte alignment "good enough" for us on systems
> where malloc and the runtime stack are guaranteed to be thus aligned?

(The runtime stack isn't relevant, as far as I can tell, since we walk
that in 4-byte steps on such systems anyway.)

You're correct that on such a system, we could get away with a
LISP_ALIGNMENT of 4, but a LISP_ALIGNMENT of 8 wouldn't hurt either.

> > > If
> > > the two 32-bit parts of the object are non-contiguous, will we be able
> > > to recognize such an object, and will we be able to mark it correctly,
> > > and if so, how?  IOW, don't we need the upper 32-bit (which encodes
> > > the object type) for the purposes of marking it?
> >
> > For everything but symbols, we don't, mark_maybe_pointer called on the
> > low 32 bits suffices. For symbols, mark_maybe_pointer needs to be
> > changed to also check the pointer at <low 32-bit word> + &lispsym.
>
> Right, that's what I thought.  So this issue also has to be fixed on
> emacs-27 in order for us to provide a stable Emacs 27.

I'm surprised, but glad that you think so. Patch for emacs-27 attached.

[-- Attachment #2: 0001-Be-more-aggressive-in-marking-objects-during-GC-bug-.patch --]
[-- Type: text/x-patch, Size: 2649 bytes --]

From 35d50e6108c6edbac93e80aa1b9998dc6ac19054 Mon Sep 17 00:00:00 2001
From: Pip Cet <pipcet@gmail.com>
Date: Sat, 30 May 2020 13:23:24 +0000
Subject: [PATCH] Be more aggressive in marking objects during GC (bug#41321)

* src/alloc.c (maybe_lisp_pointer): Remove.
  (mark_memory): Mark 32-bit words that might be the only reference
  to a Lisp_Symbol.
---
 src/alloc.c | 37 ++++++++++++++++---------------------
 1 file changed, 16 insertions(+), 21 deletions(-)

diff --git a/src/alloc.c b/src/alloc.c
index 1c6b664b22..3938cdf054 100644
--- a/src/alloc.c
+++ b/src/alloc.c
@@ -4585,18 +4585,6 @@ mark_maybe_objects (Lisp_Object const *array, ptrdiff_t nelts)
     mark_maybe_object (*array);
 }
 
-/* Return true if P might point to Lisp data that can be garbage
-   collected, and false otherwise (i.e., false if it is easy to see
-   that P cannot point to Lisp data that can be garbage collected).
-   Symbols are implemented via offsets not pointers, but the offsets
-   are also multiples of LISP_ALIGNMENT.  */
-
-static bool
-maybe_lisp_pointer (void *p)
-{
-  return (uintptr_t) p % LISP_ALIGNMENT == 0;
-}
-
 /* If P points to Lisp data, mark that as live if it isn't already
    marked.  */
 
@@ -4609,9 +4597,6 @@ mark_maybe_pointer (void *p)
   VALGRIND_MAKE_MEM_DEFINED (&p, sizeof (p));
 #endif
 
-  if (!maybe_lisp_pointer (p))
-    return;
-
   if (pdumper_object_p (p))
     {
       int type = pdumper_find_object_type (p);
@@ -4715,12 +4700,22 @@ mark_memory (void const *start, void const *end)
 
   for (pp = start; (void const *) pp < end; pp += GC_POINTER_ALIGNMENT)
     {
-      mark_maybe_pointer (*(void *const *) pp);
-
-      verify (alignof (Lisp_Object) % GC_POINTER_ALIGNMENT == 0);
-      if (alignof (Lisp_Object) == GC_POINTER_ALIGNMENT
-	  || (uintptr_t) pp % alignof (Lisp_Object) == 0)
-	mark_maybe_object (*(Lisp_Object const *) pp);
+      uintptr_t offset = (uintptr_t) *(void *const *) pp;
+      mark_maybe_pointer ((void *) offset);
+      /* On 32-bit --with-wide-int systems, the two halves of a
+	 Lisp_Object may be stored non-contiguously.  Therefore, we
+	 need to recognize the lower 32 bits of a Lisp_Object encoding
+	 a symbol, and since Qnil is binary zero, that requires adding
+	 &lispsym.  */
+      if (GC_POINTER_ALIGNMENT < sizeof (Lisp_Object))
+	mark_maybe_pointer ((void *) (offset + (uintptr_t) &lispsym));
+      else
+	{
+	  verify (alignof (Lisp_Object) % GC_POINTER_ALIGNMENT == 0);
+	  if (alignof (Lisp_Object) == GC_POINTER_ALIGNMENT
+	      || (uintptr_t) pp % alignof (Lisp_Object) == 0)
+	    mark_maybe_object (*(Lisp_Object const *) pp);
+	}
     }
 }
 
-- 
2.27.0.rc0


^ permalink raw reply related	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-30  5:51                                                                   ` Eli Zaretskii
@ 2020-05-30 14:26                                                                     ` Stefan Monnier
  0 siblings, 0 replies; 132+ messages in thread
From: Stefan Monnier @ 2020-05-30 14:26 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: eggert, 41321, pipcet

>>         Stefan "looking forward to bignums replacing wide-ints"
> Why? so that Emacs could be slower still?

Well, if performance is a serious problem, then maybe "bignums replacing
wide-ints" will never happen.  IOW the above assumes that we can make
them work as fast if not faster (more specifically, using bignums
should(!?) result is better performance in buffers <512MB, while it will
indeed likely result is worse performance in buffers bigger than that).


        Stefan






^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-29 21:01                                                   ` Pip Cet
  2020-05-30  5:58                                                     ` Eli Zaretskii
@ 2020-05-30 16:31                                                     ` Paul Eggert
  2020-05-30 16:42                                                       ` Eli Zaretskii
  2020-05-30 16:53                                                       ` Pip Cet
  1 sibling, 2 replies; 132+ messages in thread
From: Paul Eggert @ 2020-05-30 16:31 UTC (permalink / raw)
  To: Pip Cet; +Cc: 41321, Stefan Monnier

On 5/29/20 2:01 PM, Pip Cet wrote:

> (1) says:
> It’s an invalid optimization, since pointers can address the
> middle of Lisp_Object data.
> 
> That may be true (we still haven't observed it),

I observed it earlier, in code that iterated through a Lisp vector; at the
machine level the only pointer was into the middle of that vector. Addresses of
Lisp_Vector elements are not GCALIGNED on x86 and other platforms.

> but it's not what
> happened in Eli's case:

Yes, that's right. That is, the patch for (1) fixed not only Eli's case, but
other plausible cases.

> 1. malloc() % GCALIGNMENT != 0. Never happens, as far as I can tell.

Although that's true of all current Emacs porting targets as far as I know, I'd
rather not hardwire this into the code, as neither POSIX nor the C standard
require it. This is why the comment refers to platforms where malloc() % 8 != 0
as "oddball hosts".

> 2. A Lisp object requires greater alignment than malloc() gives it.
> IIRC, there was at least one RISC architecture whose specification

We don't need anything that obscure. Just use __int128 on x86 with glibc 2.24.
On that platform __int128's alignment is 16, malloc's is 8.

> I'm not saying it's the best solution, but I would prefer simply
> defining LISP_ALIGNMENT to be 8 to either patch.

That's not correct for !USE_LSB_TAG, where LISP_ALIGNMENT is merely the native
alignment (there's no need to align objects to 8 because the tags are at the
high end).





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-30 13:29                                                               ` Pip Cet
@ 2020-05-30 16:32                                                                 ` Eli Zaretskii
  2020-05-30 16:36                                                                   ` Pip Cet
  2020-05-30 18:04                                                                 ` Paul Eggert
  1 sibling, 1 reply; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-30 16:32 UTC (permalink / raw)
  To: Pip Cet; +Cc: eggert, 41321, monnier

> From: Pip Cet <pipcet@gmail.com>
> Date: Sat, 30 May 2020 13:29:33 +0000
> Cc: eggert@cs.ucla.edu, 41321@debbugs.gnu.org, 
> 	Stefan Monnier <monnier@iro.umontreal.ca>
> 
> > > > So this alignment requirement is only due to pthreads being used?
> > >
> > > I'm not sure what you're asking. Obviously there are systems on which
> > > unaligned accesses will fault or be very slow indeed, so we need to
> > > make sure, say, pure space allocations are aligned somehow. That
> > > requires a LISP_ALIGNMENT of 8. Everything beyond that is only for
> > > performance, pthreads, and SIMD types.
> >
> > If the system guarantees 4-byte alignment from malloc (and/or a
> > similar alignment of the runtime C stack), then using that doesn't
> > trigger problems related to unaligned accesses, right?  So let me
> > rephrase: why isn't 4-byte alignment "good enough" for us on systems
> > where malloc and the runtime stack are guaranteed to be thus aligned?
> 
> (The runtime stack isn't relevant, as far as I can tell, since we walk
> that in 4-byte steps on such systems anyway.)

I think it might be relevant for stack-based Lisp objects (if we keep
requiring that Lisp objects are 8-byte aligned on 32-bit platforms).

> You're correct that on such a system, we could get away with a
> LISP_ALIGNMENT of 4, but a LISP_ALIGNMENT of 8 wouldn't hurt either.

That's for sure.  I just wondered why did we start requiring 8-byte
alignment back when we did.  Perhaps someone still remembers the
reason.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-30 16:32                                                                 ` Eli Zaretskii
@ 2020-05-30 16:36                                                                   ` Pip Cet
  2020-05-30 16:45                                                                     ` Eli Zaretskii
  0 siblings, 1 reply; 132+ messages in thread
From: Pip Cet @ 2020-05-30 16:36 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: eggert, 41321, Stefan Monnier

On Sat, May 30, 2020 at 4:32 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Pip Cet <pipcet@gmail.com>
> > Date: Sat, 30 May 2020 13:29:33 +0000
> > Cc: eggert@cs.ucla.edu, 41321@debbugs.gnu.org,
> >       Stefan Monnier <monnier@iro.umontreal.ca>
> > (The runtime stack isn't relevant, as far as I can tell, since we walk
> > that in 4-byte steps on such systems anyway.)
>
> I think it might be relevant for stack-based Lisp objects (if we keep
> requiring that Lisp objects are 8-byte aligned on 32-bit platforms).

We should never mark stack-based Lisp objects, no matter how
well-aligned they are!





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-30 16:31                                                     ` Paul Eggert
@ 2020-05-30 16:42                                                       ` Eli Zaretskii
  2020-05-30 17:06                                                         ` Paul Eggert
  2020-05-30 16:53                                                       ` Pip Cet
  1 sibling, 1 reply; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-30 16:42 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 41321, monnier, pipcet

> Cc: Eli Zaretskii <eliz@gnu.org>, 41321@debbugs.gnu.org,
>  Stefan Monnier <monnier@iro.umontreal.ca>
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Sat, 30 May 2020 09:31:49 -0700
> 
> > I'm not saying it's the best solution, but I would prefer simply
> > defining LISP_ALIGNMENT to be 8 to either patch.
> 
> That's not correct for !USE_LSB_TAG, where LISP_ALIGNMENT is merely the native
> alignment (there's no need to align objects to 8 because the tags are at the
> high end).

I'm using a !USE_LSB_TAG build, but LISP_ALIGNMENT is 8 nonetheless.
What am I missing?





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-30 16:36                                                                   ` Pip Cet
@ 2020-05-30 16:45                                                                     ` Eli Zaretskii
  0 siblings, 0 replies; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-30 16:45 UTC (permalink / raw)
  To: Pip Cet; +Cc: eggert, 41321, monnier

> From: Pip Cet <pipcet@gmail.com>
> Date: Sat, 30 May 2020 16:36:31 +0000
> Cc: eggert@cs.ucla.edu, 41321@debbugs.gnu.org, 
> 	Stefan Monnier <monnier@iro.umontreal.ca>
> 
> On Sat, May 30, 2020 at 4:32 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > > From: Pip Cet <pipcet@gmail.com>
> > > Date: Sat, 30 May 2020 13:29:33 +0000
> > > Cc: eggert@cs.ucla.edu, 41321@debbugs.gnu.org,
> > >       Stefan Monnier <monnier@iro.umontreal.ca>
> > > (The runtime stack isn't relevant, as far as I can tell, since we walk
> > > that in 4-byte steps on such systems anyway.)
> >
> > I think it might be relevant for stack-based Lisp objects (if we keep
> > requiring that Lisp objects are 8-byte aligned on 32-bit platforms).
> 
> We should never mark stack-based Lisp objects, no matter how
> well-aligned they are!

But we do require them to be aligned, at least in the current
codebase.  We actually had crashes in the past when the Windows build
didn't force GCC to align stack on 8-byte boundary in callback
functions.  I don't remember if this was related to GC or not, but the
requirement is definitely there.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-30 16:31                                                     ` Paul Eggert
  2020-05-30 16:42                                                       ` Eli Zaretskii
@ 2020-05-30 16:53                                                       ` Pip Cet
  1 sibling, 0 replies; 132+ messages in thread
From: Pip Cet @ 2020-05-30 16:53 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 41321, Stefan Monnier

On Sat, May 30, 2020 at 4:31 PM Paul Eggert <eggert@cs.ucla.edu> wrote:
> On 5/29/20 2:01 PM, Pip Cet wrote:
> > (1) says:
> > It’s an invalid optimization, since pointers can address the
> > middle of Lisp_Object data.
> >
> > That may be true (we still haven't observed it),
>
> I observed it earlier, in code that iterated through a Lisp vector;

Sorry, I must have missed that.

> at the
> machine level the only pointer was into the middle of that vector. Addresses of
> Lisp_Vector elements are not GCALIGNED on x86 and other platforms.

True.

> > 1. malloc() % GCALIGNMENT != 0. Never happens, as far as I can tell.
>
> Although that's true of all current Emacs porting targets as far as I know, I'd
> rather not hardwire this into the code, as neither POSIX nor the C standard
> require it. This is why the comment refers to platforms where malloc() % 8 != 0
> as "oddball hosts".

But we can't figure out what alignment malloc guarantees, on practical
hosts. To say we assume a malloc alignment of 8 is much better than to
say we assume one of alignof (max_align_t), which is false on many
systems.

> > 2. A Lisp object requires greater alignment than malloc() gives it.
> > IIRC, there was at least one RISC architecture whose specification
>
> We don't need anything that obscure. Just use __int128 on x86 with glibc 2.24.
> On that platform __int128's alignment is 16, malloc's is 8.

Sorry, but I think a type that is actually used by Emacs is less
obscure than __float128 (which I think you mean; __int128 doesn't
exist on x86), nevermind the question of whether the alignment of that
should have been 16, since it works just fine misaligned (except when
AC is set, but that's no longer x86-as-we-know-and-hate-it).

> > I'm not saying it's the best solution, but I would prefer simply
> > defining LISP_ALIGNMENT to be 8 to either patch.
>
> That's not correct for !USE_LSB_TAG, where LISP_ALIGNMENT is merely the native
> alignment (there's no need to align objects to 8 because the tags are at the
> high end).

How is it incorrect? Suboptimal, maybe, though there's a performance
improvement keeping things you access together in the same cache line.

There's no need to align anything (non-SIMD) to anything on x86
without AC set, it's just good for performance; and that performance
improvement applies whether or not Lisp_Objects are natively 64-bit or
2x32-bit.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-30 16:42                                                       ` Eli Zaretskii
@ 2020-05-30 17:06                                                         ` Paul Eggert
  2020-05-30 17:22                                                           ` Eli Zaretskii
  0 siblings, 1 reply; 132+ messages in thread
From: Paul Eggert @ 2020-05-30 17:06 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 41321, monnier, pipcet

On 5/30/20 9:42 AM, Eli Zaretskii wrote:
>> That's not correct for !USE_LSB_TAG, where LISP_ALIGNMENT is merely the native
>> alignment (there's no need to align objects to 8 because the tags are at the
>> high end).
> I'm using a !USE_LSB_TAG build, but LISP_ALIGNMENT is 8 nonetheless.

That's true for your platform, since alignof (max_align_t) == 8 on your
platform. But neither the C standard nor POSIX guarantee that alignof
(max_align_t) is 8. Admittedly these days one would have to look hard to find a
platform where alignof (max_align_t) is 4 or less.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-30 17:06                                                         ` Paul Eggert
@ 2020-05-30 17:22                                                           ` Eli Zaretskii
  2020-05-30 18:12                                                             ` Paul Eggert
  0 siblings, 1 reply; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-30 17:22 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 41321, monnier, pipcet

> Cc: pipcet@gmail.com, 41321@debbugs.gnu.org, monnier@iro.umontreal.ca
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Sat, 30 May 2020 10:06:35 -0700
> 
> On 5/30/20 9:42 AM, Eli Zaretskii wrote:
> >> That's not correct for !USE_LSB_TAG, where LISP_ALIGNMENT is merely the native
> >> alignment (there's no need to align objects to 8 because the tags are at the
> >> high end).
> > I'm using a !USE_LSB_TAG build, but LISP_ALIGNMENT is 8 nonetheless.
> 
> That's true for your platform, since alignof (max_align_t) == 8 on your
> platform.

No, it's 16.  And I don't understand what does that have to do with
LISP_ALIGNMENT on the master branch, since we all but removed
max_align_t from there.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-30  5:54                                                                     ` Eli Zaretskii
@ 2020-05-30 17:52                                                                       ` Paul Eggert
  2020-05-30 18:11                                                                         ` Eli Zaretskii
  0 siblings, 1 reply; 132+ messages in thread
From: Paul Eggert @ 2020-05-30 17:52 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 41321, monnier, pipcet

On 5/29/20 10:54 PM, Eli Zaretskii wrote:
>>    (USE_LSB_TAG
>>     ? (uintptr_t) word % GCALIGNMENT == Lisp_Symbol
>>     : (uintptr_t) word % alignof (struct Lisp_Symbol) == 0)
> I don't understand how this will work, given that Lisp object on the
> stack can be pushed as 2 non-contiguous 32-bit words.  Can you
> explain?

On a --with-wide-int host where !USE_LSB_TAG, the above test will work 
correctly on the low-order word of a Lisp object that is a symbol, 
because ((uintptr_t) word % alignof (struct Lisp_Symbol) == 0) must be 
true on such a word.

The test is only for symbols; it's not for other Lisp objects.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-30 13:29                                                               ` Pip Cet
  2020-05-30 16:32                                                                 ` Eli Zaretskii
@ 2020-05-30 18:04                                                                 ` Paul Eggert
  2020-05-30 18:12                                                                   ` Pip Cet
                                                                                     ` (2 more replies)
  1 sibling, 3 replies; 132+ messages in thread
From: Paul Eggert @ 2020-05-30 18:04 UTC (permalink / raw)
  To: Pip Cet, Eli Zaretskii; +Cc: 41321, Stefan Monnier

[-- Attachment #1: Type: text/plain, Size: 780 bytes --]

On 5/30/20 6:29 AM, Pip Cet wrote:

> I'm surprised, but glad that you think so. Patch for emacs-27 attached.
> 

That patch is on the right track but it's not clear whether it will 
cause GC to fail to mark some objects that it should, both because it 
omits mark_maybe_object on platforms like x86 --with-wide-int where 
alignof (void *) < sizeof (Lisp_Object), and because it skips 
mark_maybe_pointer on more-typical platforms where alignof (void *) == 
sizeof (Lisp_Object).

For emacs-27 I propose the attached, more-conservative patch instead. 
This is a backport of part of a patch I've been working on for master. 
As part of that effort I've found some other obscure GC-related bugs 
that we've been lucky to avoid; this patch focuses only on the area Eli 
encountered.

[-- Attachment #2: 0001-Be-more-aggressive-in-marking-objects-during-GC.patch --]
[-- Type: text/x-patch, Size: 2116 bytes --]

From 55dbbc828346aa5aca8c56c2813baa66fdaf7e08 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sat, 30 May 2020 10:10:02 -0700
Subject: [PATCH] Be more aggressive in marking objects during GC
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Simplified version of a patch from Pip Cet (Bug#41321#299).
* src/alloc.c (maybe_lisp_pointer): Remove.  All uses removed.
(mark_memory): Also look at the pointer offset by ‘lispsym’,
for symbols.
---
 src/alloc.c | 20 ++++----------------
 1 file changed, 4 insertions(+), 16 deletions(-)

diff --git a/src/alloc.c b/src/alloc.c
index 1c6b664b22..d5a0f0aa9d 100644
--- a/src/alloc.c
+++ b/src/alloc.c
@@ -4585,18 +4585,6 @@ mark_maybe_objects (Lisp_Object const *array, ptrdiff_t nelts)
     mark_maybe_object (*array);
 }
 
-/* Return true if P might point to Lisp data that can be garbage
-   collected, and false otherwise (i.e., false if it is easy to see
-   that P cannot point to Lisp data that can be garbage collected).
-   Symbols are implemented via offsets not pointers, but the offsets
-   are also multiples of LISP_ALIGNMENT.  */
-
-static bool
-maybe_lisp_pointer (void *p)
-{
-  return (uintptr_t) p % LISP_ALIGNMENT == 0;
-}
-
 /* If P points to Lisp data, mark that as live if it isn't already
    marked.  */
 
@@ -4609,9 +4597,6 @@ mark_maybe_pointer (void *p)
   VALGRIND_MAKE_MEM_DEFINED (&p, sizeof (p));
 #endif
 
-  if (!maybe_lisp_pointer (p))
-    return;
-
   if (pdumper_object_p (p))
     {
       int type = pdumper_find_object_type (p);
@@ -4715,7 +4700,10 @@ mark_memory (void const *start, void const *end)
 
   for (pp = start; (void const *) pp < end; pp += GC_POINTER_ALIGNMENT)
     {
-      mark_maybe_pointer (*(void *const *) pp);
+      char *p = *(char *const *) pp;
+      mark_maybe_pointer (p);
+      p += (intptr_t) lispsym;
+      mark_maybe_pointer (p);
 
       verify (alignof (Lisp_Object) % GC_POINTER_ALIGNMENT == 0);
       if (alignof (Lisp_Object) == GC_POINTER_ALIGNMENT
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-30 17:52                                                                       ` Paul Eggert
@ 2020-05-30 18:11                                                                         ` Eli Zaretskii
  2020-05-30 18:17                                                                           ` Paul Eggert
  0 siblings, 1 reply; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-30 18:11 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 41321, monnier, pipcet

> Cc: monnier@iro.umontreal.ca, pipcet@gmail.com, 41321@debbugs.gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Sat, 30 May 2020 10:52:07 -0700
> 
> On 5/29/20 10:54 PM, Eli Zaretskii wrote:
> >>    (USE_LSB_TAG
> >>     ? (uintptr_t) word % GCALIGNMENT == Lisp_Symbol
> >>     : (uintptr_t) word % alignof (struct Lisp_Symbol) == 0)
> > I don't understand how this will work, given that Lisp object on the
> > stack can be pushed as 2 non-contiguous 32-bit words.  Can you
> > explain?
> 
> On a --with-wide-int host where !USE_LSB_TAG, the above test will work 
> correctly on the low-order word of a Lisp object that is a symbol, 
> because ((uintptr_t) word % alignof (struct Lisp_Symbol) == 0) must be 
> true on such a word.
> 
> The test is only for symbols; it's not for other Lisp objects.

So any pointer whose alignment is the same as 'struct Lisp_Symbol'
will pass the test, regardless of the tag bits?  That's basically most
of the struct pointers on those architectures, no?





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-30 18:04                                                                 ` Paul Eggert
@ 2020-05-30 18:12                                                                   ` Pip Cet
  2020-05-30 18:16                                                                   ` Eli Zaretskii
  2020-05-30 18:39                                                                   ` Pip Cet
  2 siblings, 0 replies; 132+ messages in thread
From: Pip Cet @ 2020-05-30 18:12 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 41321, Stefan Monnier

On Sat, May 30, 2020 at 6:04 PM Paul Eggert <eggert@cs.ucla.edu> wrote:
> For emacs-27 I propose the attached, more-conservative patch instead.

More conservative is good! So, yes, I prefer your patch.

> This is a backport of part of a patch I've been working on for master.
> As part of that effort I've found some other obscure GC-related bugs
> that we've been lucky to avoid; this patch focuses only on the area Eli
> encountered.

Looking forward to hearing about those :-)





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-30 17:22                                                           ` Eli Zaretskii
@ 2020-05-30 18:12                                                             ` Paul Eggert
  2020-05-30 18:21                                                               ` Eli Zaretskii
  0 siblings, 1 reply; 132+ messages in thread
From: Paul Eggert @ 2020-05-30 18:12 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 41321, monnier, pipcet

On 5/30/20 10:22 AM, Eli Zaretskii wrote:
>>>> That's not correct for !USE_LSB_TAG, where LISP_ALIGNMENT is merely the native
>>>> alignment (there's no need to align objects to 8 because the tags are at the
>>>> high end).
>>> I'm using a !USE_LSB_TAG build, but LISP_ALIGNMENT is 8 nonetheless.
>> That's true for your platform, since alignof (max_align_t) == 8 on your
>> platform.
> No, it's 16.  And I don't understand what does that have to do with
> LISP_ALIGNMENT on the master branch, since we all but removed
> max_align_t from there.

Oh, I thought you were talking about the emacs-27 branch which is still 
using max_align_t.

You're right that LISP_ALIGNMENT is 8 on your platform on the master 
branch. However, my comment "That's not correct for !USE_LSB_TAG ..." 
(Bug#41321#305) was responding to Pip Cet's earlier comment "I would 
prefer simply defining LISP_ALIGNMENT to be 8" (Bug#41321#272) which was 
talking about the emacs-27 branch.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-30 18:04                                                                 ` Paul Eggert
  2020-05-30 18:12                                                                   ` Pip Cet
@ 2020-05-30 18:16                                                                   ` Eli Zaretskii
  2020-05-30 18:45                                                                     ` Paul Eggert
  2020-05-30 18:39                                                                   ` Pip Cet
  2 siblings, 1 reply; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-30 18:16 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 41321, monnier, pipcet

> Cc: 41321@debbugs.gnu.org, Stefan Monnier <monnier@iro.umontreal.ca>
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Sat, 30 May 2020 11:04:55 -0700
> 
> For emacs-27 I propose the attached, more-conservative patch instead. 
> This is a backport of part of a patch I've been working on for master. 
> As part of that effort I've found some other obscure GC-related bugs 
> that we've been lucky to avoid; this patch focuses only on the area Eli 
> encountered.

Please explain in comments why we are marking one more pointer in the
loop.  Also, I don't think I understand why this solves all of the
problems we were discussing; is this in addition to another patch that
you propose for emacs-27?

Thanks.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-30 18:11                                                                         ` Eli Zaretskii
@ 2020-05-30 18:17                                                                           ` Paul Eggert
  0 siblings, 0 replies; 132+ messages in thread
From: Paul Eggert @ 2020-05-30 18:17 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 41321, monnier, pipcet

On 5/30/20 11:11 AM, Eli Zaretskii wrote:
> So any pointer whose alignment is the same as 'struct Lisp_Symbol'
> will pass the test, regardless of the tag bits?  That's basically most
> of the struct pointers on those architectures, no?

Yes, pretty much.

This is an inevitable consequence of the problem at hand. For aligned 
pointers we must consult the red-black tree no matter what solution we 
pick, because the compiler may have aligned a pointer for us.

Just to make sure we're on the same page here. This stuff is only about 
how to improve performance (compared to the patch proposed for emacs-27 
in Bug#41321#332) by doing fast checks on words before giving them to 
the red-black search.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-30 18:12                                                             ` Paul Eggert
@ 2020-05-30 18:21                                                               ` Eli Zaretskii
  2020-05-30 19:14                                                                 ` Paul Eggert
  0 siblings, 1 reply; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-30 18:21 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 41321, monnier, pipcet

> Cc: pipcet@gmail.com, 41321@debbugs.gnu.org, monnier@iro.umontreal.ca
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Sat, 30 May 2020 11:12:49 -0700
> 
> On 5/30/20 10:22 AM, Eli Zaretskii wrote:
> >>>> That's not correct for !USE_LSB_TAG, where LISP_ALIGNMENT is merely the native
> >>>> alignment (there's no need to align objects to 8 because the tags are at the
> >>>> high end).
> >>> I'm using a !USE_LSB_TAG build, but LISP_ALIGNMENT is 8 nonetheless.
> >> That's true for your platform, since alignof (max_align_t) == 8 on your
> >> platform.
> > No, it's 16.  And I don't understand what does that have to do with
> > LISP_ALIGNMENT on the master branch, since we all but removed
> > max_align_t from there.
> 
> Oh, I thought you were talking about the emacs-27 branch which is still 
> using max_align_t.
> 
> You're right that LISP_ALIGNMENT is 8 on your platform on the master 
> branch. However, my comment "That's not correct for !USE_LSB_TAG ..." 
> (Bug#41321#305) was responding to Pip Cet's earlier comment "I would 
> prefer simply defining LISP_ALIGNMENT to be 8" (Bug#41321#272) which was 
> talking about the emacs-27 branch.

I'm still confused, because on current emacs-27, both LISP_ALIGNMENT
and alignof(max_align_t) are 16 in my builds.  And I still don't
understand why using LISP_ALIGNMENT of 8 is not right in this case (on
emacs-27).





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-30 18:04                                                                 ` Paul Eggert
  2020-05-30 18:12                                                                   ` Pip Cet
  2020-05-30 18:16                                                                   ` Eli Zaretskii
@ 2020-05-30 18:39                                                                   ` Pip Cet
  2020-05-30 18:57                                                                     ` Paul Eggert
  2 siblings, 1 reply; 132+ messages in thread
From: Pip Cet @ 2020-05-30 18:39 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 41321, Stefan Monnier

[-- Attachment #1: Type: text/plain, Size: 1141 bytes --]

On Sat, May 30, 2020 at 6:04 PM Paul Eggert <eggert@cs.ucla.edu> wrote:
> On 5/30/20 6:29 AM, Pip Cet wrote:
> > I'm surprised, but glad that you think so. Patch for emacs-27 attached.
> That patch is on the right track but it's not clear whether it will
> cause GC to fail to mark some objects that it should, both because it
> omits mark_maybe_object on platforms like x86 --with-wide-int where
> alignof (void *) < sizeof (Lisp_Object), and because it skips
> mark_maybe_pointer on more-typical platforms where alignof (void *) ==
> sizeof (Lisp_Object).

I've thought about this for a while, but I fail to see the problem
with my patch. mark_maybe_object is unnecessary on x86
--with-wide-int, and mark_maybe_pointer (off + lispsym) is unnecessary
on platforms that don't rip apart our precious Lisp_Objects. The other
call to mark_maybe_pointer isn't skipped.

I still think we ought to use yours (and accept a ~25% performance
penalty in this particular loop on Eli's platform), but include a
comment like the one I had in mine. It might hide further bugs, but
that's probably what we want to do on emacs-27.

Proposed patch attached.

[-- Attachment #2: 0001-Be-more-aggressive-in-marking-objects-during-GC-bug-.patch --]
[-- Type: text/x-patch, Size: 2222 bytes --]

From 047fc04af8f9d6b6e4587ee88d573dac4292eeb0 Mon Sep 17 00:00:00 2001
From: Pip Cet <pipcet@gmail.com>
Date: Sat, 30 May 2020 13:23:24 +0000
Subject: [PATCH] Be more aggressive in marking objects during GC (bug#41321)

* src/alloc.c (maybe_lisp_pointer): Remove.
  (mark_memory): Mark 32-bit words that might be the only reference
  to a Lisp_Symbol.
---
 src/alloc.c | 24 ++++++++----------------
 1 file changed, 8 insertions(+), 16 deletions(-)

diff --git a/src/alloc.c b/src/alloc.c
index 1c6b664b22..14f75a2259 100644
--- a/src/alloc.c
+++ b/src/alloc.c
@@ -4585,18 +4585,6 @@ mark_maybe_objects (Lisp_Object const *array, ptrdiff_t nelts)
     mark_maybe_object (*array);
 }
 
-/* Return true if P might point to Lisp data that can be garbage
-   collected, and false otherwise (i.e., false if it is easy to see
-   that P cannot point to Lisp data that can be garbage collected).
-   Symbols are implemented via offsets not pointers, but the offsets
-   are also multiples of LISP_ALIGNMENT.  */
-
-static bool
-maybe_lisp_pointer (void *p)
-{
-  return (uintptr_t) p % LISP_ALIGNMENT == 0;
-}
-
 /* If P points to Lisp data, mark that as live if it isn't already
    marked.  */
 
@@ -4609,9 +4597,6 @@ mark_maybe_pointer (void *p)
   VALGRIND_MAKE_MEM_DEFINED (&p, sizeof (p));
 #endif
 
-  if (!maybe_lisp_pointer (p))
-    return;
-
   if (pdumper_object_p (p))
     {
       int type = pdumper_find_object_type (p);
@@ -4715,7 +4700,14 @@ mark_memory (void const *start, void const *end)
 
   for (pp = start; (void const *) pp < end; pp += GC_POINTER_ALIGNMENT)
     {
-      mark_maybe_pointer (*(void *const *) pp);
+      uintptr_t offset = (uintptr_t) *(void *const *) pp;
+      mark_maybe_pointer ((void *) offset);
+      /* On 32-bit --with-wide-int systems, the two halves of a
+	 Lisp_Object may be stored non-contiguously.  Therefore, we
+	 need to recognize the lower 32 bits of a Lisp_Object encoding
+	 a symbol, and since Qnil is binary zero, that requires adding
+	 &lispsym.  */
+      mark_maybe_pointer ((void *) (offset + (uintptr_t) lispsym));
 
       verify (alignof (Lisp_Object) % GC_POINTER_ALIGNMENT == 0);
       if (alignof (Lisp_Object) == GC_POINTER_ALIGNMENT
-- 
2.27.0.rc0


^ permalink raw reply related	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-30 18:16                                                                   ` Eli Zaretskii
@ 2020-05-30 18:45                                                                     ` Paul Eggert
  0 siblings, 0 replies; 132+ messages in thread
From: Paul Eggert @ 2020-05-30 18:45 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 41321, monnier, pipcet

[-- Attachment #1: Type: text/plain, Size: 1238 bytes --]

On 5/30/20 11:16 AM, Eli Zaretskii wrote:

> Please explain in comments why we are marking one more pointer in the
> loop.

Sure. I'm attaching the revised patch proposed for emacs-27. This is 
very similar to what Pip Cet just proposed in Bug#41321#353, but the 
code is simpler with fewer casts (and I like my comment better :-).

> I don't think I understand why this solves all of the
> problems we were discussing; is this in addition to another patch that
> you propose for emacs-27?

This replaces all the patches that I proposed for emacs-27 in this 
thread. Although this patch doesn't solve all the problems we have been 
discussing, it does solve the urgent ones:

* The problem you observed on MinGW for markers; it can also occur for 
many other object types. This problem can cause the GC to incorrectly 
reclaim storage for objects, causing the usual disasters.

* The similar problem that Pip Cet noted for symbols.

The patch does not solve less-urgent problems we've talked about, such 
as over-alignment of Lisp objects on MinGW (this is a relatively minor 
performance issue), or the more-obscure and unlikely GC bugs that we've 
been living with for a while (which I haven't had the time to think 
through entirely).

[-- Attachment #2: 0001-Be-more-aggressive-in-marking-objects-during-GC.patch --]
[-- Type: text/x-patch, Size: 2447 bytes --]

From 6a474a55e68a2bada13db69d4099a4b2de7b1271 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sat, 30 May 2020 10:10:02 -0700
Subject: [PATCH] Be more aggressive in marking objects during GC
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Simplified version of a patch from Pip Cet (Bug#41321#299).
* src/alloc.c (maybe_lisp_pointer): Remove.  All uses removed.
(mark_memory): Also look at the pointer offset by ‘lispsym’,
for symbols.
---
 src/alloc.c | 26 ++++++++++----------------
 1 file changed, 10 insertions(+), 16 deletions(-)

diff --git a/src/alloc.c b/src/alloc.c
index 1c6b664b22..568fee666f 100644
--- a/src/alloc.c
+++ b/src/alloc.c
@@ -4585,18 +4585,6 @@ mark_maybe_objects (Lisp_Object const *array, ptrdiff_t nelts)
     mark_maybe_object (*array);
 }
 
-/* Return true if P might point to Lisp data that can be garbage
-   collected, and false otherwise (i.e., false if it is easy to see
-   that P cannot point to Lisp data that can be garbage collected).
-   Symbols are implemented via offsets not pointers, but the offsets
-   are also multiples of LISP_ALIGNMENT.  */
-
-static bool
-maybe_lisp_pointer (void *p)
-{
-  return (uintptr_t) p % LISP_ALIGNMENT == 0;
-}
-
 /* If P points to Lisp data, mark that as live if it isn't already
    marked.  */
 
@@ -4609,9 +4597,6 @@ mark_maybe_pointer (void *p)
   VALGRIND_MAKE_MEM_DEFINED (&p, sizeof (p));
 #endif
 
-  if (!maybe_lisp_pointer (p))
-    return;
-
   if (pdumper_object_p (p))
     {
       int type = pdumper_find_object_type (p);
@@ -4715,7 +4700,16 @@ mark_memory (void const *start, void const *end)
 
   for (pp = start; (void const *) pp < end; pp += GC_POINTER_ALIGNMENT)
     {
-      mark_maybe_pointer (*(void *const *) pp);
+      char *p = *(char *const *) pp;
+      mark_maybe_pointer (p);
+
+      /* Unmask any struct Lisp_Symbol pointer that make_lisp_symbol
+	 previously disguised by adding the address of 'lispsym'.
+	 On a host with 32-bit pointers and 64-bit Lisp_Objects,
+	 a Lisp_Object might be split into registers saved into
+	 non-adjacent words and P might be the low-order word's value.  */
+      p += (intptr_t) lispsym;
+      mark_maybe_pointer (p);
 
       verify (alignof (Lisp_Object) % GC_POINTER_ALIGNMENT == 0);
       if (alignof (Lisp_Object) == GC_POINTER_ALIGNMENT
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-30 18:39                                                                   ` Pip Cet
@ 2020-05-30 18:57                                                                     ` Paul Eggert
  2020-05-30 19:06                                                                       ` Pip Cet
  0 siblings, 1 reply; 132+ messages in thread
From: Paul Eggert @ 2020-05-30 18:57 UTC (permalink / raw)
  To: Pip Cet; +Cc: 41321, Stefan Monnier

On 5/30/20 11:39 AM, Pip Cet wrote:
> I fail to see the problem
> with my patch. mark_maybe_object is unnecessary on x86
> --with-wide-int, and mark_maybe_pointer (off + lispsym) is unnecessary
> on platforms that don't rip apart our precious Lisp_Objects. The other
> call to mark_maybe_pointer isn't skipped.

The other alloc.c code is inconsistent with respect to the 
live_*_holding versus live_*_p functions. There is no live_float_holding 
function, which means we're relying entirely on mark_maybe_object to 
find roots that contain Lisp floats. So it's dicey that your earlier 
(Bug#41321#299) patch skips the call to mark_maybe_object on some platforms.

I've been working on improving this for master.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-30 18:57                                                                     ` Paul Eggert
@ 2020-05-30 19:06                                                                       ` Pip Cet
  2020-05-30 21:27                                                                         ` Paul Eggert
  0 siblings, 1 reply; 132+ messages in thread
From: Pip Cet @ 2020-05-30 19:06 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 41321, Stefan Monnier

On Sat, May 30, 2020 at 6:57 PM Paul Eggert <eggert@cs.ucla.edu> wrote:
> On 5/30/20 11:39 AM, Pip Cet wrote:
> > I fail to see the problem
> > with my patch. mark_maybe_object is unnecessary on x86
> > --with-wide-int, and mark_maybe_pointer (off + lispsym) is unnecessary
> > on platforms that don't rip apart our precious Lisp_Objects. The other
> > call to mark_maybe_pointer isn't skipped.
>
> The other alloc.c code is inconsistent with respect to the
> live_*_holding versus live_*_p functions. There is no live_float_holding
> function,

Indeed. There's just live_float_p.

> which means we're relying entirely on mark_maybe_object to
> find roots that contain Lisp floats.

No, we're not. There's code in mark_maybe_pointer to handle the float
case, by calling live_float_p.

Is it misaligned pointers into floats you're worried about?

> So it's dicey that your earlier
> (Bug#41321#299) patch skips the call to mark_maybe_object on some platforms.

I still fail to see how.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-30 18:21                                                               ` Eli Zaretskii
@ 2020-05-30 19:14                                                                 ` Paul Eggert
  2020-05-30 19:33                                                                   ` Eli Zaretskii
  0 siblings, 1 reply; 132+ messages in thread
From: Paul Eggert @ 2020-05-30 19:14 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 41321, monnier, pipcet

On 5/30/20 11:21 AM, Eli Zaretskii wrote:

> on current emacs-27, both LISP_ALIGNMENT
> and alignof(max_align_t) are 16 in my builds.  And I still don't
> understand why using LISP_ALIGNMENT of 8 is not right in this case (on
> emacs-27).

You're right that LISP_ALIGNMENT is 16 on your host on the emacs-27 
branch, because alignof (max_align_t) is 16 there. And you're also right 
that setting LISP_ALIGNMENT to be 8 on your host would fix the marker 
bug you observed there, because it would work around your host's bug 
where malloc returns a pointer that is not a multiple of 
alignof(max_align_t). However, C and POSIX allow platforms where 
LISP_ALIGNMENT should be greater than 8, or (if !USE_LSB_TAG) should be 
less than 8, so I'd be leery about changing LISP_ALIGNMENT on any host 
that doesn't have your host's idiosyncrasies. And that specific 
workaround should not be needed anyway if we install the emacs-27 patch 
that I have most-recently suggested (or Pip Cet's very-similar recent 
patch), since this patch solves the problem in a more-general way that 
should help to prevent more bugs like this one.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-30 19:14                                                                 ` Paul Eggert
@ 2020-05-30 19:33                                                                   ` Eli Zaretskii
  2020-05-30 22:18                                                                     ` Paul Eggert
  0 siblings, 1 reply; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-30 19:33 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 41321, monnier, pipcet

> Cc: pipcet@gmail.com, 41321@debbugs.gnu.org, monnier@iro.umontreal.ca
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Sat, 30 May 2020 12:14:14 -0700
> 
> On 5/30/20 11:21 AM, Eli Zaretskii wrote:
> 
> > on current emacs-27, both LISP_ALIGNMENT
> > and alignof(max_align_t) are 16 in my builds.  And I still don't
> > understand why using LISP_ALIGNMENT of 8 is not right in this case (on
> > emacs-27).
> 
> You're right that LISP_ALIGNMENT is 16 on your host on the emacs-27 
> branch, because alignof (max_align_t) is 16 there. And you're also right 
> that setting LISP_ALIGNMENT to be 8 on your host would fix the marker 
> bug you observed there, because it would work around your host's bug 
> where malloc returns a pointer that is not a multiple of 
> alignof(max_align_t). However, C and POSIX allow platforms where 
> LISP_ALIGNMENT should be greater than 8, or (if !USE_LSB_TAG) should be 
> less than 8, so I'd be leery about changing LISP_ALIGNMENT on any host 
> that doesn't have your host's idiosyncrasies.

Posix may require it, but do we actually know of any supported
important platforms where this happens?  If not, let's worry about the
more general fix on master, where we still have time to try various
solutions, and settle for a simpler and easier fix on emacs-27.

> And that specific workaround should not be needed anyway if we
> install the emacs-27 patch that I have most-recently suggested (or
> Pip Cet's very-similar recent patch), since this patch solves the
> problem in a more-general way that should help to prevent more bugs
> like this one.

But your proposal is also less efficient, isn't it?   If so, its more
general nature comes at a price.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-30 19:06                                                                       ` Pip Cet
@ 2020-05-30 21:27                                                                         ` Paul Eggert
  2020-05-30 21:49                                                                           ` Pip Cet
  0 siblings, 1 reply; 132+ messages in thread
From: Paul Eggert @ 2020-05-30 21:27 UTC (permalink / raw)
  To: Pip Cet; +Cc: 41321, Stefan Monnier

On 5/30/20 12:06 PM, Pip Cet wrote:

> Is it misaligned pointers into floats you're worried about?

Yes, and it's plausible there will be pointers misaligned because 
Lisp_Float has been added to them.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-30 21:27                                                                         ` Paul Eggert
@ 2020-05-30 21:49                                                                           ` Pip Cet
  2020-05-30 22:23                                                                             ` Paul Eggert
  0 siblings, 1 reply; 132+ messages in thread
From: Pip Cet @ 2020-05-30 21:49 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 41321, Stefan Monnier

On Sat, May 30, 2020 at 9:27 PM Paul Eggert <eggert@cs.ucla.edu> wrote:
> > Is it misaligned pointers into floats you're worried about?
>
> Yes, and it's plausible there will be pointers misaligned because
> Lisp_Float has been added to them.

Sorry for being dense, but I still don't understand. This is on
!LSB_TAG machines, where Lisp_Float does not affect the representation
of the lower 32 bits. On LSB_TAG machines, the other code path is
taken.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-30 19:33                                                                   ` Eli Zaretskii
@ 2020-05-30 22:18                                                                     ` Paul Eggert
  2020-05-31 15:48                                                                       ` Eli Zaretskii
  0 siblings, 1 reply; 132+ messages in thread
From: Paul Eggert @ 2020-05-30 22:18 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 41321, monnier, pipcet

On 5/30/20 12:33 PM, Eli Zaretskii wrote:

> Posix may require it, but do we actually know of any supported
> important platforms where this happens?

That depends on what the question is. If the question is "Are there 
platforms where the lost-marker bug occurs?", then no, we don't know of 
any supported important platforms. But if the question is "Are there 
platforms where LISP_ALIGNMENT should be some value other than 8?", then 
yes, LISP_ALIGNMENT should be 4 on Ubuntu 18.04.3 i386 when Emacs is 
configured --with-wide-int (I just tested this, and it is indeed 4 on 
that platform in the Emacs master branch). This is because on this 
platform Lisp objects have a native alignment of 4, and this platform is 
!USE_LSB_TAG so the presence of tag bits imposes no extra alignment 
requirement.

> let's worry about the
> more general fix on master, where we still have time to try various
> solutions, and settle for a simpler and easier fix on emacs-27.

Yes, that's what we're trying to do, and it's what's in the latest patch 
that Pip Cet and I proposed very similar variants of.

> But your proposal is also less efficient, isn't it?   If so, its more
> general nature comes at a price.

Sure. Compared to simply making LISP_ALIGNMENT = 8 as a workaround 
(which is not correct as noted above but which fixes the lost-marker 
bug), the proposed patch is about a 1% CPU-time hit in my usual 
benchmark (make compile-always) on a 32-bit platform compiled with 
--with-wide-int (this is Ubuntu 18.04.4, gcc -m32, Xeon E3-1225 v2). We 
can surely speed this up with some cost in complexity (that's what I was 
working on on the master branch), but for emacs-27 I thought that 
reliability took precedence over 1% performance improvements.

I expect that most of the performance hit is not due to the 
LISP_ALIGNMENT thing, it's due to the "you have to check pointers three 
times" thing. In my master-branch draft I'm working on getting this down 
to "you have to check pointers an average of 1+epsilon times" for some 
suitable value of epsilon. I don't know yet what epsilon will be. But 
anyway, this is only about improving that 1% performance hit.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-30 21:49                                                                           ` Pip Cet
@ 2020-05-30 22:23                                                                             ` Paul Eggert
  2020-05-30 22:54                                                                               ` Pip Cet
  0 siblings, 1 reply; 132+ messages in thread
From: Paul Eggert @ 2020-05-30 22:23 UTC (permalink / raw)
  To: Pip Cet; +Cc: 41321, Stefan Monnier

On 5/30/20 2:49 PM, Pip Cet wrote:
> On Sat, May 30, 2020 at 9:27 PM Paul Eggert <eggert@cs.ucla.edu> wrote:
>>> Is it misaligned pointers into floats you're worried about?
>>
>> Yes, and it's plausible there will be pointers misaligned because
>> Lisp_Float has been added to them.
> 
> Sorry for being dense, but I still don't understand. This is on
> !LSB_TAG machines, where Lisp_Float does not affect the representation
> of the lower 32 bits. On LSB_TAG machines, the other code path is
> taken.
> 

Oh, I see I am being the dense one. I was thinking based on some of my 
master-branch improvements. One option is to do away with 
mark_maybe_object entirely, so that one needn't deal with looking at 
each part of the stack twice (this is for efficiency).

In emacs-27 the patch you proposed earlier is probably OK, though I 
haven't had time to think through all the possibilities.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-30 22:23                                                                             ` Paul Eggert
@ 2020-05-30 22:54                                                                               ` Pip Cet
  0 siblings, 0 replies; 132+ messages in thread
From: Pip Cet @ 2020-05-30 22:54 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 41321, Stefan Monnier

On Sat, May 30, 2020 at 10:23 PM Paul Eggert <eggert@cs.ucla.edu> wrote:
> Oh, I see I am being the dense one. I was thinking based on some of my
> master-branch improvements. One option is to do away with
> mark_maybe_object entirely, so that one needn't deal with looking at
> each part of the stack twice (this is for efficiency).

Yes, I thought you'd already done that on master. I must not have been
keeping up with the patches.

Much as I like thinking about putting symbols in the rbtree twice and
walking it smartly to retrieve up to two overlapping nodes, I suspect
there are much easier ways of fixing this, at least on 64-bit
architectures. We could make sure, for example, that all symbol blocks
come after lispsym in memory, and store lispsym - address in the
Lisp_Object. Those values would then fall outside the 48-bit space of
actually valid x86_64 addresses, so we could get away with
mark_maybe_pointer (word < 0 ? lispsym - word : word) on that
architecture.

> In emacs-27 the patch you proposed earlier is probably OK, though I
> haven't had time to think through all the possibilities.

I was just curious. I think we should go with your latest patch.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-30 22:18                                                                     ` Paul Eggert
@ 2020-05-31 15:48                                                                       ` Eli Zaretskii
  2020-06-01 14:48                                                                         ` Eli Zaretskii
  0 siblings, 1 reply; 132+ messages in thread
From: Eli Zaretskii @ 2020-05-31 15:48 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 41321, monnier, pipcet

> Cc: pipcet@gmail.com, 41321@debbugs.gnu.org, monnier@iro.umontreal.ca
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Sat, 30 May 2020 15:18:53 -0700
> 
> On 5/30/20 12:33 PM, Eli Zaretskii wrote:
> 
> > Posix may require it, but do we actually know of any supported
> > important platforms where this happens?
> 
> > But your proposal is also less efficient, isn't it?   If so, its more
> > general nature comes at a price.
> 
> Sure. Compared to simply making LISP_ALIGNMENT = 8 as a workaround 
> (which is not correct as noted above but which fixes the lost-marker 
> bug), the proposed patch is about a 1% CPU-time hit in my usual 
> benchmark (make compile-always) on a 32-bit platform compiled with 
> --with-wide-int (this is Ubuntu 18.04.4, gcc -m32, Xeon E3-1225 v2). We 
> can surely speed this up with some cost in complexity (that's what I was 
> working on on the master branch), but for emacs-27 I thought that 
> reliability took precedence over 1% performance improvements.
> 
> I expect that most of the performance hit is not due to the 
> LISP_ALIGNMENT thing, it's due to the "you have to check pointers three 
> times" thing. In my master-branch draft I'm working on getting this down 
> to "you have to check pointers an average of 1+epsilon times" for some 
> suitable value of epsilon. I don't know yet what epsilon will be. But 
> anyway, this is only about improving that 1% performance hit.

OK, then let's get this change into emacs-27, and thanks.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-05-31 15:48                                                                       ` Eli Zaretskii
@ 2020-06-01 14:48                                                                         ` Eli Zaretskii
  2020-09-27 14:39                                                                           ` Lars Ingebrigtsen
  0 siblings, 1 reply; 132+ messages in thread
From: Eli Zaretskii @ 2020-06-01 14:48 UTC (permalink / raw)
  To: eggert; +Cc: 41321, monnier, pipcet

> Date: Sun, 31 May 2020 18:48:28 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 41321@debbugs.gnu.org, monnier@iro.umontreal.ca, pipcet@gmail.com
> 
> > Sure. Compared to simply making LISP_ALIGNMENT = 8 as a workaround 
> > (which is not correct as noted above but which fixes the lost-marker 
> > bug), the proposed patch is about a 1% CPU-time hit in my usual 
> > benchmark (make compile-always) on a 32-bit platform compiled with 
> > --with-wide-int (this is Ubuntu 18.04.4, gcc -m32, Xeon E3-1225 v2). We 
> > can surely speed this up with some cost in complexity (that's what I was 
> > working on on the master branch), but for emacs-27 I thought that 
> > reliability took precedence over 1% performance improvements.
> > 
> > I expect that most of the performance hit is not due to the 
> > LISP_ALIGNMENT thing, it's due to the "you have to check pointers three 
> > times" thing. In my master-branch draft I'm working on getting this down 
> > to "you have to check pointers an average of 1+epsilon times" for some 
> > suitable value of epsilon. I don't know yet what epsilon will be. But 
> > anyway, this is only about improving that 1% performance hit.
> 
> OK, then let's get this change into emacs-27, and thanks.

FTR, I'm now running Emacs 27.0.91 pretest patched with Paul's changes
in commit  68b6dad1d8e22fe700871c9a5a18da3dd496cc8a.





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-06-01 14:48                                                                         ` Eli Zaretskii
@ 2020-09-27 14:39                                                                           ` Lars Ingebrigtsen
  2020-09-27 14:45                                                                             ` Pip Cet
  2020-09-27 15:16                                                                             ` Eli Zaretskii
  0 siblings, 2 replies; 132+ messages in thread
From: Lars Ingebrigtsen @ 2020-09-27 14:39 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: eggert, 41321, monnier, pipcet

Eli Zaretskii <eliz@gnu.org> writes:

>> OK, then let's get this change into emacs-27, and thanks.
>
> FTR, I'm now running Emacs 27.0.91 pretest patched with Paul's changes
> in commit  68b6dad1d8e22fe700871c9a5a18da3dd496cc8a.

I've just lightly skimmed this thread, but does this mean that the bug
was fixed and this bug report can be closed?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-09-27 14:39                                                                           ` Lars Ingebrigtsen
@ 2020-09-27 14:45                                                                             ` Pip Cet
  2020-09-27 15:02                                                                               ` Lars Ingebrigtsen
  2020-09-27 15:16                                                                             ` Eli Zaretskii
  1 sibling, 1 reply; 132+ messages in thread
From: Pip Cet @ 2020-09-27 14:45 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: eggert, 41321, Stefan Monnier

On Sun, Sep 27, 2020 at 2:40 PM Lars Ingebrigtsen <larsi@gnus.org> wrote:
> Eli Zaretskii <eliz@gnu.org> writes:
> >> OK, then let's get this change into emacs-27, and thanks.
> >
> > FTR, I'm now running Emacs 27.0.91 pretest patched with Paul's changes
> > in commit  68b6dad1d8e22fe700871c9a5a18da3dd496cc8a.
>
> I've just lightly skimmed this thread, but does this mean that the bug
> was fixed and this bug report can be closed?

I believe it can be, yes, though I'm not sure I ever managed to
convince Eli that the bug I found was the bug he was seeing...

(Sorry for not getting to the other bug reports, BTW, I'm incredibly
busy with family business right now.)





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-09-27 14:45                                                                             ` Pip Cet
@ 2020-09-27 15:02                                                                               ` Lars Ingebrigtsen
  0 siblings, 0 replies; 132+ messages in thread
From: Lars Ingebrigtsen @ 2020-09-27 15:02 UTC (permalink / raw)
  To: Pip Cet; +Cc: eggert, 41321, Stefan Monnier

Pip Cet <pipcet@gmail.com> writes:

> (Sorry for not getting to the other bug reports, BTW, I'm incredibly
> busy with family business right now.)

Sure; no problem.  :-)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 132+ messages in thread

* bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects
  2020-09-27 14:39                                                                           ` Lars Ingebrigtsen
  2020-09-27 14:45                                                                             ` Pip Cet
@ 2020-09-27 15:16                                                                             ` Eli Zaretskii
  1 sibling, 0 replies; 132+ messages in thread
From: Eli Zaretskii @ 2020-09-27 15:16 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: eggert, 41321-done, monnier, pipcet

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: eggert@cs.ucla.edu,  41321@debbugs.gnu.org,  monnier@iro.umontreal.ca,
>   pipcet@gmail.com
> Date: Sun, 27 Sep 2020 16:39:51 +0200
> 
> > FTR, I'm now running Emacs 27.0.91 pretest patched with Paul's changes
> > in commit  68b6dad1d8e22fe700871c9a5a18da3dd496cc8a.
> 
> I've just lightly skimmed this thread, but does this mean that the bug
> was fixed and this bug report can be closed?

Yes, done.





^ permalink raw reply	[flat|nested] 132+ messages in thread

end of thread, other threads:[~2020-09-27 15:16 UTC | newest]

Thread overview: 132+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-05-16 10:33 bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector objects Eli Zaretskii
2020-05-16 16:33 ` Paul Eggert
2020-05-16 16:47   ` Eli Zaretskii
2020-05-17 10:56 ` Pip Cet
2020-05-17 15:28   ` Eli Zaretskii
2020-05-17 15:57     ` Eli Zaretskii
2020-05-22  7:22       ` Eli Zaretskii
2020-05-22  8:35         ` Andrea Corallo
2020-05-22 11:04           ` Eli Zaretskii
2020-05-22 12:55             ` Andrea Corallo
2020-05-22 10:54         ` Eli Zaretskii
2020-05-22 11:47         ` Pip Cet
2020-05-22 12:13           ` Eli Zaretskii
2020-05-22 12:39             ` Pip Cet
2020-05-22 12:48               ` Eli Zaretskii
2020-05-22 14:04                 ` Pip Cet
2020-05-22 14:26                   ` Eli Zaretskii
2020-05-22 14:40                     ` Andrea Corallo
2020-05-22 19:03                       ` Eli Zaretskii
     [not found]                         ` <CAOqdjBdpU4U1NqErNH0idBmUxNeE3fL=2=KKpo9kbCM3DhW5gA@mail.gmail.com>
2020-05-23 17:58                           ` Andrea Corallo
2020-05-23 22:37                             ` Stefan Monnier
2020-05-23 22:41                               ` Pip Cet
2020-05-23 23:26                                 ` Stefan Monnier
2020-05-22 12:32           ` Eli Zaretskii
2020-05-29  9:51           ` Eli Zaretskii
2020-05-29 10:00             ` Pip Cet
2020-05-23 23:54         ` Pip Cet
2020-05-24 14:24           ` Eli Zaretskii
2020-05-24 15:00             ` Pip Cet
2020-05-24 16:25               ` Eli Zaretskii
2020-05-24 16:55                 ` Eli Zaretskii
2020-05-24 18:03                   ` Pip Cet
2020-05-24 18:40                     ` Eli Zaretskii
2020-05-24 19:40                       ` Pip Cet
2020-05-25  2:30                         ` Eli Zaretskii
2020-05-25  6:40                           ` Pip Cet
2020-05-25 11:28                             ` Pip Cet
2020-05-25 14:53                               ` Eli Zaretskii
2020-05-25 15:12                                 ` Stefan Monnier
2020-05-26  3:39                                 ` Paul Eggert
2020-05-26  3:33                               ` Paul Eggert
2020-05-26  6:18                                 ` Pip Cet
2020-05-26  7:51                                   ` Paul Eggert
2020-05-26  8:27                                     ` Pip Cet
2020-05-26  6:46                                 ` Paul Eggert
2020-05-26 15:17                                   ` Eli Zaretskii
2020-05-26 22:49                                     ` Paul Eggert
2020-05-27 15:26                                       ` Eli Zaretskii
2020-05-27 16:58                                         ` Paul Eggert
2020-05-27 17:33                                           ` Eli Zaretskii
2020-05-27 17:53                                             ` Paul Eggert
2020-05-27 18:24                                               ` Eli Zaretskii
2020-05-27 18:39                                                 ` Paul Eggert
2020-05-28  2:43                                               ` Stefan Monnier
2020-05-28  7:27                                                 ` Eli Zaretskii
2020-05-28  7:41                                                   ` Paul Eggert
2020-05-28 13:30                                                     ` Stefan Monnier
2020-05-28 14:28                                                       ` Pip Cet
2020-05-28 16:24                                                         ` Stefan Monnier
2020-05-29  9:43                                                         ` Pip Cet
2020-05-29 18:31                                                           ` Paul Eggert
2020-05-29 18:37                                                             ` Pip Cet
2020-05-29 19:32                                                               ` Paul Eggert
2020-05-29 19:37                                                                 ` Pip Cet
2020-05-29 20:26                                                                 ` Stefan Monnier
2020-05-29 20:40                                                                   ` Paul Eggert
2020-05-30  5:54                                                                     ` Eli Zaretskii
2020-05-30 17:52                                                                       ` Paul Eggert
2020-05-30 18:11                                                                         ` Eli Zaretskii
2020-05-30 18:17                                                                           ` Paul Eggert
2020-05-30  5:51                                                                   ` Eli Zaretskii
2020-05-30 14:26                                                                     ` Stefan Monnier
2020-05-27 17:57                                           ` Pip Cet
2020-05-27 18:39                                             ` Paul Eggert
2020-05-27 18:56                                               ` Pip Cet
2020-05-28  1:21                                                 ` Paul Eggert
2020-05-28  6:31                                                   ` Pip Cet
2020-05-28  7:47                                                     ` Paul Eggert
2020-05-28  8:11                                                       ` Pip Cet
2020-05-28 18:27                                           ` Eli Zaretskii
2020-05-28 19:33                                             ` Paul Eggert
2020-05-29  6:19                                               ` Eli Zaretskii
2020-05-29 20:24                                                 ` Paul Eggert
2020-05-29 21:01                                                   ` Pip Cet
2020-05-30  5:58                                                     ` Eli Zaretskii
2020-05-30  7:19                                                       ` Pip Cet
2020-05-30  9:08                                                         ` Eli Zaretskii
2020-05-30 11:06                                                           ` Pip Cet
2020-05-30 11:31                                                             ` Eli Zaretskii
2020-05-30 13:29                                                               ` Pip Cet
2020-05-30 16:32                                                                 ` Eli Zaretskii
2020-05-30 16:36                                                                   ` Pip Cet
2020-05-30 16:45                                                                     ` Eli Zaretskii
2020-05-30 18:04                                                                 ` Paul Eggert
2020-05-30 18:12                                                                   ` Pip Cet
2020-05-30 18:16                                                                   ` Eli Zaretskii
2020-05-30 18:45                                                                     ` Paul Eggert
2020-05-30 18:39                                                                   ` Pip Cet
2020-05-30 18:57                                                                     ` Paul Eggert
2020-05-30 19:06                                                                       ` Pip Cet
2020-05-30 21:27                                                                         ` Paul Eggert
2020-05-30 21:49                                                                           ` Pip Cet
2020-05-30 22:23                                                                             ` Paul Eggert
2020-05-30 22:54                                                                               ` Pip Cet
2020-05-30 16:31                                                     ` Paul Eggert
2020-05-30 16:42                                                       ` Eli Zaretskii
2020-05-30 17:06                                                         ` Paul Eggert
2020-05-30 17:22                                                           ` Eli Zaretskii
2020-05-30 18:12                                                             ` Paul Eggert
2020-05-30 18:21                                                               ` Eli Zaretskii
2020-05-30 19:14                                                                 ` Paul Eggert
2020-05-30 19:33                                                                   ` Eli Zaretskii
2020-05-30 22:18                                                                     ` Paul Eggert
2020-05-31 15:48                                                                       ` Eli Zaretskii
2020-06-01 14:48                                                                         ` Eli Zaretskii
2020-09-27 14:39                                                                           ` Lars Ingebrigtsen
2020-09-27 14:45                                                                             ` Pip Cet
2020-09-27 15:02                                                                               ` Lars Ingebrigtsen
2020-09-27 15:16                                                                             ` Eli Zaretskii
2020-05-30 16:53                                                       ` Pip Cet
2020-05-30  5:50                                                   ` Eli Zaretskii
2020-05-29  8:25                                               ` Pip Cet
2020-05-25 15:14                             ` Eli Zaretskii
2020-05-25 17:41                               ` Pip Cet
2020-05-24 19:00               ` Andy Moreton
2020-05-24 19:09                 ` Pip Cet
2020-05-29 10:16         ` Eli Zaretskii
2020-05-29 10:34           ` Pip Cet
2020-05-29 10:55             ` Eli Zaretskii
2020-05-29 11:47               ` Pip Cet
2020-05-29 13:52                 ` Eli Zaretskii
2020-05-29 14:19                   ` Pip Cet

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).