unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* MPS: unable to build due to assertion violation in igc_dump_check_object_starts
@ 2024-07-23 12:58 Eli Zaretskii
  2024-07-23 14:10 ` Gerd Möllmann
  2024-07-23 16:28 ` Pip Cet
  0 siblings, 2 replies; 32+ messages in thread
From: Eli Zaretskii @ 2024-07-23 12:58 UTC (permalink / raw)
  To: Pip Cet, Helmut Eller, Gerd Möllmann; +Cc: emacs-devel

It looks like this:

  Finding pointers to doc strings...
  Finding pointers to doc strings...done
  Dumping under the name bootstrap-emacs.pdmp
  Dumping fingerprint: fcd9b0097e8172ebf127ea1aea4dcd2e42d173dce96acd73db4b8b68ab5222e3
  cold user data: 7b4c68
  heap end: 7b8ee0

  igc.c:4820: Emacs fatal error: assertion failed: end == p

If I compile igc.c with -O0 (otherwise everything in
igc_dump_check_object_starts is "optimized-out"), I see this:

  Thread 1 hit Breakpoint 1, emacs_abort () at w32fns.c:11335
  11335   {
  (gdb) up
  #1  0x00eb5540 in terminate_due_to_signal (sig=sig@entry=22,
      backtrace_limit=backtrace_limit@entry=2147483647) at emacs.c:470
  470       emacs_raise (sig);
  (gdb)
  #2  0x00f2a81a in die (msg=0x14f8051 <i_fwd+4017> "end == p",
      file=0x14f79c8 <i_fwd+2344> "igc.c", line=4820) at alloc.c:8400
  8400      terminate_due_to_signal (SIGABRT, INT_MAX);
  (gdb)
  #3  0x00fee189 in igc_dump_check_object_starts (relocs=0x11ce31a3,
      dump_base=0x1ab94020, hot_start=0x1ab94098, hot_end=0x1b1a31e8,
      cold_start=0x1b1c4020, heap_end=0x1b33ce88) at igc.c:4820
  4820              eassert (end == p);
  (gdb) l
  4815              EMACS_INT end_off = XFIXNUM (XCAR (XCDR (r)));
  4816              mps_addr_t start = (uint8_t *) dump_base + start_off;
  4817              mps_addr_t end = (uint8_t *) dump_base + end_off;
  4818              eassert (start == p);
  4819              p = dflt_skip (p);
  4820              eassert (end == p);
  4821            }
  4822        }
  4823      eassert (NILP (relocs));
  4824    }
  (gdb) p i
  $1 = 1
  (gdb) p/x region
  $2 = {start = 0x1b1c4020, end = 0x1b33ce88}
  (gdb) p end
  $3 = (mps_addr_t) 0x1b26ac80
  (gdb) p p
  $4 = (mps_addr_t) 0x4af05d90
  (gdb) p start
  $5 = (mps_addr_t) 0x1b26ac70
  (gdb)

So it looks like dflt_skip(0x1b26ac70) yields some bogus value, but I
have no idea what it means or where to look to find the reason(s).

I've been unable to build the branch since 3 days ago.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: MPS: unable to build due to assertion violation in igc_dump_check_object_starts
  2024-07-23 12:58 MPS: unable to build due to assertion violation in igc_dump_check_object_starts Eli Zaretskii
@ 2024-07-23 14:10 ` Gerd Möllmann
  2024-07-23 14:26   ` Eli Zaretskii
  2024-07-23 16:28 ` Pip Cet
  1 sibling, 1 reply; 32+ messages in thread
From: Gerd Möllmann @ 2024-07-23 14:10 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Pip Cet, Helmut Eller, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> It looks like this:
>
>   Finding pointers to doc strings...
>   Finding pointers to doc strings...done
>   Dumping under the name bootstrap-emacs.pdmp
>   Dumping fingerprint: fcd9b0097e8172ebf127ea1aea4dcd2e42d173dce96acd73db4b8b68ab5222e3
>   cold user data: 7b4c68
>   heap end: 7b8ee0
>
>   igc.c:4820: Emacs fatal error: assertion failed: end == p
>
> If I compile igc.c with -O0 (otherwise everything in
> igc_dump_check_object_starts is "optimized-out"), I see this:
>
>   Thread 1 hit Breakpoint 1, emacs_abort () at w32fns.c:11335
>   11335   {
>   (gdb) up
>   #1  0x00eb5540 in terminate_due_to_signal (sig=sig@entry=22,
>       backtrace_limit=backtrace_limit@entry=2147483647) at emacs.c:470
>   470       emacs_raise (sig);
>   (gdb)
>   #2  0x00f2a81a in die (msg=0x14f8051 <i_fwd+4017> "end == p",
>       file=0x14f79c8 <i_fwd+2344> "igc.c", line=4820) at alloc.c:8400
>   8400      terminate_due_to_signal (SIGABRT, INT_MAX);
>   (gdb)
>   #3  0x00fee189 in igc_dump_check_object_starts (relocs=0x11ce31a3,
>       dump_base=0x1ab94020, hot_start=0x1ab94098, hot_end=0x1b1a31e8,
>       cold_start=0x1b1c4020, heap_end=0x1b33ce88) at igc.c:4820
>   4820              eassert (end == p);
>   (gdb) l
>   4815              EMACS_INT end_off = XFIXNUM (XCAR (XCDR (r)));
>   4816              mps_addr_t start = (uint8_t *) dump_base + start_off;
>   4817              mps_addr_t end = (uint8_t *) dump_base + end_off;
>   4818              eassert (start == p);
>   4819              p = dflt_skip (p);
>   4820              eassert (end == p);
>   4821            }
>   4822        }
>   4823      eassert (NILP (relocs));
>   4824    }
>   (gdb) p i
>   $1 = 1
>   (gdb) p/x region
>   $2 = {start = 0x1b1c4020, end = 0x1b33ce88}
>   (gdb) p end
>   $3 = (mps_addr_t) 0x1b26ac80
>   (gdb) p p
>   $4 = (mps_addr_t) 0x4af05d90
>   (gdb) p start
>   $5 = (mps_addr_t) 0x1b26ac70
>   (gdb)
>
> So it looks like dflt_skip(0x1b26ac70) yields some bogus value, but I
> have no idea what it means or where to look to find the reason(s).

This function checks the consistency of the pdump when it is written. It
should be the case that the two regions of the dump (hot and cold)
contain igc_headers that correspond to the relocs that the pdump
contains.

With i == 1 we are currently in the cold region. It should be the case
that the igc_headers contained in the cold region are traversable in the
same way as if the objects in the region had been allocated from MPS.
That is, we start at the start of the regioni with the first objects,
then dftl_skip from there we reach the second object, dflt_skip from
there takes us to the third object and so on.

In the failing case, dflt_skip from p does not take us to the next
object, as far as the relocs say, So either the igc_header at p is
somehow wrong or the reloc entry from relocs is wrong.

Maybe one can see from the igc_header at p what object type was dumped
there? There should be some 32-bit dependency somewhwere since I don't
see that here on 64 bits.




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: MPS: unable to build due to assertion violation in igc_dump_check_object_starts
  2024-07-23 14:10 ` Gerd Möllmann
@ 2024-07-23 14:26   ` Eli Zaretskii
  2024-07-23 14:40     ` Eli Zaretskii
  2024-07-23 14:42     ` Pip Cet
  0 siblings, 2 replies; 32+ messages in thread
From: Eli Zaretskii @ 2024-07-23 14:26 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: pipcet, eller.helmut, emacs-devel

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: Pip Cet <pipcet@protonmail.com>,  Helmut Eller <eller.helmut@gmail.com>,
>   emacs-devel@gnu.org
> Date: Tue, 23 Jul 2024 16:10:42 +0200
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > So it looks like dflt_skip(0x1b26ac70) yields some bogus value, but I
> > have no idea what it means or where to look to find the reason(s).
> 
> This function checks the consistency of the pdump when it is written. It
> should be the case that the two regions of the dump (hot and cold)
> contain igc_headers that correspond to the relocs that the pdump
> contains.
> 
> With i == 1 we are currently in the cold region. It should be the case
> that the igc_headers contained in the cold region are traversable in the
> same way as if the objects in the region had been allocated from MPS.
> That is, we start at the start of the regioni with the first objects,
> then dftl_skip from there we reach the second object, dflt_skip from
> there takes us to the third object and so on.
> 
> In the failing case, dflt_skip from p does not take us to the next
> object, as far as the relocs say, So either the igc_header at p is
> somehow wrong or the reloc entry from relocs is wrong.
> 
> Maybe one can see from the igc_header at p what object type was dumped
> there? There should be some 32-bit dependency somewhwere since I don't
> see that here on 64 bits.

I don't really understand what I'm doing, but does the below help?

  Thread 1 hit Breakpoint 1, emacs_abort () at w32fns.c:11335
  11335   {
  (gdb) up
  #1  0x00eb5540 in terminate_due_to_signal (sig=sig@entry=22,
      backtrace_limit=backtrace_limit@entry=2147483647) at emacs.c:470
  470       emacs_raise (sig);
  (gdb)
  #2  0x00f2a81a in die (msg=0x14f8051 <i_fwd+4017> "end == p",
      file=0x14f79c8 <i_fwd+2344> "igc.c", line=4820) at alloc.c:8400
  8400      terminate_due_to_signal (SIGABRT, INT_MAX);
  (gdb)
  #3  0x00fee189 in igc_dump_check_object_starts (relocs=0x11d32853,
      dump_base=0x1aa41020, hot_start=0x1aa41098, hot_end=0x1b055c70,
      cold_start=0x1b081020, heap_end=0x1b1f9e98) at igc.c:4820
  4820              eassert (end == p);
  (gdb) p obj_size(start)
  $1 = 801749580
  (gdb) p igc_header_nwords(start)
  $2 = 200437395
  (gdb) p header_tag(start)
  $3 = 0
  (gdb) p header_nwords(start)
  $4 = 200437395
  (gdb) p *(struct igc_header *)start
  $5 = {v = 1721744118644157000}
  (gdb) p/x *(struct igc_header *)start
  $6 = {v = 0x17e4dd2759e72a48}
  (gdb) p relocs
  $7 = (Lisp_Object) 0x11d32853
  (gdb) source .gdbinit
  SIGINT is used by the debugger.
  Are you sure you want to change it? (y or n) [answered Y; input not from terminal]
  Environment variable "DISPLAY" not defined.
  Environment variable "TERM" not defined.
  Breakpoint 2 at 0xeb54e0: file emacs.c, line 432.
  (gdb) p relocs
  $8 = XIL(0x11d32853)
  (gdb) xtype
  Lisp_Cons
  (gdb) xcar
  $9 = 0x11a5eda3
  (gdb) xtype
  Lisp_Cons
  (gdb) xcar
  $10 = 0x1b9b062
  (gdb) xtype
  Lisp_Int0
  (gdb) xint
  $11 = 7236632
  (gdb) p r
  $12 = XIL(0x11a5ed33)
  (gdb) xtype
  Lisp_Cons
  (gdb) xcar
  $13 = 0x1b9b022
  (gdb) xtype
  Lisp_Int0
  (gdb) xint
  $14 = 7236616
  (gdb) pp r
  (7236616 7236632)

So relocs seem to be pairs of numbers, and 'r', the one that seems to
cause the problem, looks okay to me.  But is the igc_header okay?

Let me know if I can collect more data.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: MPS: unable to build due to assertion violation in igc_dump_check_object_starts
  2024-07-23 14:26   ` Eli Zaretskii
@ 2024-07-23 14:40     ` Eli Zaretskii
  2024-07-23 14:52       ` Gerd Möllmann
  2024-07-23 14:42     ` Pip Cet
  1 sibling, 1 reply; 32+ messages in thread
From: Eli Zaretskii @ 2024-07-23 14:40 UTC (permalink / raw)
  To: gerd.moellmann; +Cc: pipcet, eller.helmut, emacs-devel

> Date: Tue, 23 Jul 2024 17:26:29 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: pipcet@protonmail.com, eller.helmut@gmail.com, emacs-devel@gnu.org
> 
>   #3  0x00fee189 in igc_dump_check_object_starts (relocs=0x11d32853,
>       dump_base=0x1aa41020, hot_start=0x1aa41098, hot_end=0x1b055c70,
>       cold_start=0x1b081020, heap_end=0x1b1f9e98) at igc.c:4820
>   4820              eassert (end == p);
>   (gdb) p obj_size(start)
>   $1 = 801749580
>   (gdb) p igc_header_nwords(start)
>   $2 = 200437395
>   (gdb) p header_tag(start)
>   $3 = 0
>   (gdb) p header_nwords(start)
>   $4 = 200437395
>   (gdb) p *(struct igc_header *)start
>   $5 = {v = 1721744118644157000}
>   (gdb) p/x *(struct igc_header *)start
>   $6 = {v = 0x17e4dd2759e72a48}

And note that the next region after the one that causes the abort
seems to be perfectly normal:

  #3  0x00fee189 in igc_dump_check_object_starts (relocs=0x11ce31a3,
      dump_base=0x1a9d5020, hot_start=0x1a9d5098, hot_end=0x1afe41e8,
      cold_start=0x1b005020, heap_end=0x1b17de88) at igc.c:4820
  4820              eassert (end == p);
  (gdb) ptype struct igc_header
  type = struct igc_header {
      uint64_t v;
  }
  (gdb) p obj_size(end)
  $1 = 24
  (gdb) p header_tag(end)
  $2 = 1
  (gdb) p IGC_TAG_EXTHDR+0
  $4 = 2
  (gdb) p *(struct igc_header *)end
  $5 = {v = 55835531293}
  (gdb) p/x *(struct igc_header *)end
  $6 = {v = 0xd000e981d}
  (gdb)

So some relocation in the pdump corresponds to a corrupted bloc or
something?



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: MPS: unable to build due to assertion violation in igc_dump_check_object_starts
  2024-07-23 14:26   ` Eli Zaretskii
  2024-07-23 14:40     ` Eli Zaretskii
@ 2024-07-23 14:42     ` Pip Cet
  1 sibling, 0 replies; 32+ messages in thread
From: Pip Cet @ 2024-07-23 14:42 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Gerd Möllmann, eller.helmut, emacs-devel

On Tuesday, July 23rd, 2024 at 14:26, Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Gerd Möllmann gerd.moellmann@gmail.com
> 
> > Cc: Pip Cet pipcet@protonmail.com, Helmut Eller eller.helmut@gmail.com,
> > emacs-devel@gnu.org
> > Date: Tue, 23 Jul 2024 16:10:42 +0200
> > 
> > Eli Zaretskii eliz@gnu.org writes:
> > 
> > > So it looks like dflt_skip(0x1b26ac70) yields some bogus value, but I
> > > have no idea what it means or where to look to find the reason(s).
> > 
> > This function checks the consistency of the pdump when it is written. It
> > should be the case that the two regions of the dump (hot and cold)
> > contain igc_headers that correspond to the relocs that the pdump
> > contains.
> > 
> > With i == 1 we are currently in the cold region. It should be the case
> > that the igc_headers contained in the cold region are traversable in the
> > same way as if the objects in the region had been allocated from MPS.
> > That is, we start at the start of the regioni with the first objects,
> > then dftl_skip from there we reach the second object, dflt_skip from
> > there takes us to the third object and so on.
> > 
> > In the failing case, dflt_skip from p does not take us to the next
> > object, as far as the relocs say, So either the igc_header at p is
> > somehow wrong or the reloc entry from relocs is wrong.
> > 
> > Maybe one can see from the igc_header at p what object type was dumped
> > there? There should be some 32-bit dependency somewhwere since I don't
> > see that here on 64 bits.
> 
> I don't really understand what I'm doing, but does the below help?

That looks very helpful indeed. Thanks for testing this, I need to get a mingw development environment somehow...

> Thread 1 hit Breakpoint 1, emacs_abort () at w32fns.c:11335
> 11335 {
> (gdb) up
> #1 0x00eb5540 in terminate_due_to_signal (sig=sig@entry=22,
> backtrace_limit=backtrace_limit@entry=2147483647) at emacs.c:470
> 470 emacs_raise (sig);
> (gdb)
> #2 0x00f2a81a in die (msg=0x14f8051 <i_fwd+4017> "end == p",
> 
> file=0x14f79c8 <i_fwd+2344> "igc.c", line=4820) at alloc.c:8400
> 
> 8400 terminate_due_to_signal (SIGABRT, INT_MAX);
> (gdb)
> #3 0x00fee189 in igc_dump_check_object_starts (relocs=0x11d32853,
> dump_base=0x1aa41020, hot_start=0x1aa41098, hot_end=0x1b055c70,
> cold_start=0x1b081020, heap_end=0x1b1f9e98) at igc.c:4820
> 4820 eassert (end == p);
> (gdb) p obj_size(start)
> $1 = 801749580
> (gdb) p igc_header_nwords(start)
> $2 = 200437395
> (gdb) p header_tag(start)
> $3 = 0
> (gdb) p header_nwords(start)
> $4 = 200437395
> (gdb) p *(struct igc_header *)start
> $5 = {v = 1721744118644157000}
> (gdb) p/x *(struct igc_header *)start
> $6 = {v = 0x17e4dd2759e72a48}
> (gdb) p relocs
> $7 = (Lisp_Object) 0x11d32853
> (gdb) source .gdbinit
> SIGINT is used by the debugger.
> Are you sure you want to change it? (y or n) [answered Y; input not from terminal]
> Environment variable "DISPLAY" not defined.
> Environment variable "TERM" not defined.
> Breakpoint 2 at 0xeb54e0: file emacs.c, line 432.
> (gdb) p relocs
> $8 = XIL(0x11d32853)
> (gdb) xtype
> Lisp_Cons
> (gdb) xcar
> $9 = 0x11a5eda3
> (gdb) xtype
> Lisp_Cons
> (gdb) xcar
> $10 = 0x1b9b062
> (gdb) xtype
> Lisp_Int0
> (gdb) xint
> $11 = 7236632
> (gdb) p r
> $12 = XIL(0x11a5ed33)
> (gdb) xtype
> Lisp_Cons
> (gdb) xcar
> $13 = 0x1b9b022
> (gdb) xtype
> Lisp_Int0
> (gdb) xint
> $14 = 7236616
> (gdb) pp r
> (7236616 7236632)
> 
> So relocs seem to be pairs of numbers, and 'r', the one that seems to
> cause the problem, looks okay to me. But is the igc_header okay?

Definitely not.  No valid igc_header is divisible by four.

> Let me know if I can collect more data.

Will do once I have an idea.

Thanks again, and sorry for the trouble.

Pip



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: MPS: unable to build due to assertion violation in igc_dump_check_object_starts
  2024-07-23 14:40     ` Eli Zaretskii
@ 2024-07-23 14:52       ` Gerd Möllmann
  2024-07-23 15:43         ` Eli Zaretskii
  0 siblings, 1 reply; 32+ messages in thread
From: Gerd Möllmann @ 2024-07-23 14:52 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, eller.helmut, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> Date: Tue, 23 Jul 2024 17:26:29 +0300
>> From: Eli Zaretskii <eliz@gnu.org>
>> Cc: pipcet@protonmail.com, eller.helmut@gmail.com, emacs-devel@gnu.org
>> 
>>   #3  0x00fee189 in igc_dump_check_object_starts (relocs=0x11d32853,
>>       dump_base=0x1aa41020, hot_start=0x1aa41098, hot_end=0x1b055c70,
>>       cold_start=0x1b081020, heap_end=0x1b1f9e98) at igc.c:4820
>>   4820              eassert (end == p);
>>   (gdb) p obj_size(start)
>>   $1 = 801749580
>>   (gdb) p igc_header_nwords(start)
>>   $2 = 200437395
>>   (gdb) p header_tag(start)
>>   $3 = 0
>>   (gdb) p header_nwords(start)
>>   $4 = 200437395
>>   (gdb) p *(struct igc_header *)start
>>   $5 = {v = 1721744118644157000}
>>   (gdb) p/x *(struct igc_header *)start
>>   $6 = {v = 0x17e4dd2759e72a48}
>
> And note that the next region after the one that causes the abort
> seems to be perfectly normal:
>
>   #3  0x00fee189 in igc_dump_check_object_starts (relocs=0x11ce31a3,
>       dump_base=0x1a9d5020, hot_start=0x1a9d5098, hot_end=0x1afe41e8,
>       cold_start=0x1b005020, heap_end=0x1b17de88) at igc.c:4820
>   4820              eassert (end == p);
>   (gdb) ptype struct igc_header
>   type = struct igc_header {
>       uint64_t v;
>   }
>   (gdb) p obj_size(end)
>   $1 = 24
>   (gdb) p header_tag(end)
>   $2 = 1
>   (gdb) p IGC_TAG_EXTHDR+0
>   $4 = 2
>   (gdb) p *(struct igc_header *)end
>   $5 = {v = 55835531293}
>   (gdb) p/x *(struct igc_header *)end
>   $6 = {v = 0xd000e981d}
>   (gdb)
>
> So some relocation in the pdump corresponds to a corrupted bloc or
> something?

Could you do

  p header_type(start)

please, that should show the IGC_OBJ_xy type of the offending header

  p header_nwords(start)

is the, presumably wrong, size in words, but I think we have already
seen it's too large.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: MPS: unable to build due to assertion violation in igc_dump_check_object_starts
  2024-07-23 14:52       ` Gerd Möllmann
@ 2024-07-23 15:43         ` Eli Zaretskii
  2024-07-23 16:29           ` Gerd Möllmann
  0 siblings, 1 reply; 32+ messages in thread
From: Eli Zaretskii @ 2024-07-23 15:43 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: pipcet, eller.helmut, emacs-devel

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: pipcet@protonmail.com,  eller.helmut@gmail.com,  emacs-devel@gnu.org
> Date: Tue, 23 Jul 2024 16:52:42 +0200
> 
> Could you do
> 
>   p header_type(start)
> 
> please, that should show the IGC_OBJ_xy type of the offending header
> 
>   p header_nwords(start)
> 
> is the, presumably wrong, size in words, but I think we have already
> seen it's too large.

Compliance!

  (gdb) p header_type(start)
  $7 = 58
  (gdb) p (enum igc_obj_type)header_type(start)
  $8 = 58
  (gdb) p IGC_OBJ_NUM_TYPES+0
  $9 = 31
  (gdb) p header_nwords(start)
  $10 = 200437507



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: MPS: unable to build due to assertion violation in igc_dump_check_object_starts
  2024-07-23 12:58 MPS: unable to build due to assertion violation in igc_dump_check_object_starts Eli Zaretskii
  2024-07-23 14:10 ` Gerd Möllmann
@ 2024-07-23 16:28 ` Pip Cet
  1 sibling, 0 replies; 32+ messages in thread
From: Pip Cet @ 2024-07-23 16:28 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Helmut Eller, Gerd Möllmann, emacs-devel

On Tuesday, July 23rd, 2024 at 12:58, Eli Zaretskii <eliz@gnu.org> wrote:
> (gdb) p end
> $3 = (mps_addr_t) 0x1b26ac80
> (gdb) p p
> $4 = (mps_addr_t) 0x4af05d90
> (gdb) p start
> $5 = (mps_addr_t) 0x1b26ac70
> (gdb)

That's a very small object, maybe a dumped bignum?

I can't help but notice that the value of the "header" is 1721744118644157000, which looks suspiciously like a decimal multiple of the Unix time.

Pip



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: MPS: unable to build due to assertion violation in igc_dump_check_object_starts
  2024-07-23 15:43         ` Eli Zaretskii
@ 2024-07-23 16:29           ` Gerd Möllmann
  2024-07-23 18:00             ` Eli Zaretskii
  0 siblings, 1 reply; 32+ messages in thread
From: Gerd Möllmann @ 2024-07-23 16:29 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, eller.helmut, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: pipcet@protonmail.com,  eller.helmut@gmail.com,  emacs-devel@gnu.org
>> Date: Tue, 23 Jul 2024 16:52:42 +0200
>> 
>> Could you do
>> 
>>   p header_type(start)
>> 
>> please, that should show the IGC_OBJ_xy type of the offending header
>> 
>>   p header_nwords(start)
>> 
>> is the, presumably wrong, size in words, but I think we have already
>> seen it's too large.
>
> Compliance!
>
>   (gdb) p header_type(start)
>   $7 = 58
>   (gdb) p (enum igc_obj_type)header_type(start)
>   $8 = 58
>   (gdb) p IGC_OBJ_NUM_TYPES+0
>   $9 = 31
>   (gdb) p header_nwords(start)
>   $10 = 200437507

So the igc_header is completely kaputt.

But we can try to use the value of start_off at the time the assertion
is triggered. The list containing the start/end offsets should be
created here:

  static void
  dump_igc_finish_obj (struct dump_context *ctx)
  {
    if (ctx->flags.dump_object_contents)
      {
        char *base = (char *) ctx->buf + ctx->igc_base_offset;
        char *end = (char *) ctx->buf + ctx->offset;
        eassert (end > base);
        char *should_end = igc_dump_finish_obj (ctx->igc_obj_dumped, ctx->igc_type, base, end);
        eassert (should_end >= end);
        dump_write_zero (ctx, should_end - end);
        if (ctx->flags.record_object_starts)
          dump_push (&ctx->igc_object_starts,
                     list2 (dump_off_to_lisp (ctx->igc_base_offset),
                            dump_off_to_lisp (ctx->offset)));
Here                 ^^^^^^

      }

    ctx->igc_obj_dumped = NULL;
    ctx->igc_type = IGC_OBJ_INVALID;
    ctx->igc_base_offset = -1;
  }
  # endif // HAVE_MPS

If I'm not completely mistaken, start_off when asserting should ==
ctx->igc_base_offset at some point whle dumping. Could you perhaps put a
conditional breakpoint there? We could then go up the stack and see
which dump_xy function that is.

Maybe a short overview what's happening here, as an orientation. In
pdumper object dumping happens by first calling dump_start_object, then
dumping the fields or whatever that object contains, and finally calling
dump_finish_object. For MPS, dump_start_object writes an empty
igc_header, and dump_finish_object fills out the igc_header, since we
then know the size of the object and so on. It also records the object
start/end in ctx->igc_object_starts, which is sort of a remnant from the
itmes that the dump was a root, and not in MPS. But the object starts
are apparently still helpful.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: MPS: unable to build due to assertion violation in igc_dump_check_object_starts
  2024-07-23 16:29           ` Gerd Möllmann
@ 2024-07-23 18:00             ` Eli Zaretskii
  2024-07-23 18:37               ` Eli Zaretskii
  2024-07-23 18:42               ` Gerd Möllmann
  0 siblings, 2 replies; 32+ messages in thread
From: Eli Zaretskii @ 2024-07-23 18:00 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: pipcet, eller.helmut, emacs-devel

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: pipcet@protonmail.com,  eller.helmut@gmail.com,  emacs-devel@gnu.org
> Date: Tue, 23 Jul 2024 18:29:56 +0200
> 
> But we can try to use the value of start_off at the time the assertion
> is triggered. The list containing the start/end offsets should be
> created here:
> 
>   static void
>   dump_igc_finish_obj (struct dump_context *ctx)
>   {
>     if (ctx->flags.dump_object_contents)
>       {
>         char *base = (char *) ctx->buf + ctx->igc_base_offset;
>         char *end = (char *) ctx->buf + ctx->offset;
>         eassert (end > base);
>         char *should_end = igc_dump_finish_obj (ctx->igc_obj_dumped, ctx->igc_type, base, end);
>         eassert (should_end >= end);
>         dump_write_zero (ctx, should_end - end);
>         if (ctx->flags.record_object_starts)
>           dump_push (&ctx->igc_object_starts,
>                      list2 (dump_off_to_lisp (ctx->igc_base_offset),
>                             dump_off_to_lisp (ctx->offset)));
> Here                 ^^^^^^
> 
>       }
> 
>     ctx->igc_obj_dumped = NULL;
>     ctx->igc_type = IGC_OBJ_INVALID;
>     ctx->igc_base_offset = -1;
>   }
>   # endif // HAVE_MPS
> 
> If I'm not completely mistaken, start_off when asserting should ==
> ctx->igc_base_offset at some point whle dumping. Could you perhaps put a
> conditional breakpoint there?

That was not a good idea: I waited for an hour, but the breakpoint
didn't break and temacs didn't finish.  I guess we severely
underestimate how many objects pdumper dumps, and thus how many times
this breakpoint triggers and GDB needs to examine the values.

So instead I added a source-level condition:

      if (ctx->igc_base_offset == THAT_SPECIFIC_VALUE)
	emacs_abort ();

and ran with a breakpoint in emacs_abort.  That did the trick.

> We could then go up the stack and see which dump_xy function that
> is.

Here, let me know what more you want to see:

  Thread 1 hit Breakpoint 1, emacs_abort () at w32fns.c:11335
  11335   {
  (gdb) up
  #1  0x00c9febf in dump_igc_finish_obj (ctx=0x777eb38, ctx@entry=0x777eb3c)
      at pdumper.c:935
  935             emacs_abort ();
  (gdb) bt
  #0  emacs_abort () at w32fns.c:11335
  #1  0x00c9febf in dump_igc_finish_obj (ctx=0x777eb38, ctx@entry=0x777eb3c)
      at pdumper.c:935
  #2  0x00ca5a3f in dump_cold_string (string=<optimized out>, ctx=0x777eb3c)
      at pdumper.c:3560
  #3  dump_drain_cold_data (ctx=0x777eb3c) at pdumper.c:3714
  #4  Fdump_emacs_portable (filename=<optimized out>,
      track_referrers=<optimized out>) at pdumper.c:4514
  #5  0x00ccae36 in eval_sub (form=0xbb2ac73) at eval.c:2630
  #6  0x00ccac41 in eval_sub (form=0xbb2abb3) at eval.c:2578
  #7  0x00ccafc3 in Fprogn (body=0xbb2ae6b) at eval.c:452
  #8  0x00ccc068 in Flet (args=<optimized out>) at lisp.h:1546
  #9  0x00ccac41 in eval_sub (form=0xbb2aacb) at eval.c:2578
  #10 0x00ccc916 in Funwind_protect (args=0xbb2ae7b) at lisp.h:1546
  #11 0x00ccac41 in eval_sub (form=0xbb2aabb) at eval.c:2578
  #12 0x00ccafc3 in Fprogn (body=0x0) at eval.c:452
  #13 0x00ccc068 in Flet (args=<optimized out>) at lisp.h:1546
  #14 0x00ccac41 in eval_sub (form=0xbb2aa8b) at eval.c:2578
  #15 0x00ccafc3 in Fprogn (body=0xbb2b93b) at eval.c:452
  #16 0x00ccc068 in Flet (args=<optimized out>) at lisp.h:1546
  #17 0x00ccac41 in eval_sub (form=0xbb2a52b) at eval.c:2578
  #18 0x00ccac41 in eval_sub (form=0xbb2a50b) at eval.c:2578
  #19 0x00ccafc3 in Fprogn (body=0x0) at eval.c:452
  #20 0x00ccac41 in eval_sub (form=form@entry=0xbb29ff3) at eval.c:2578
  #21 0x00d06f34 in readevalloop (readcharfun=readcharfun@entry=0x60c0,
      infile0=infile0@entry=0x777f638, sourcename=sourcename@entry=0xa848184,
      printflag=printflag@entry=false, unibyte=unibyte@entry=0x0,
      readfun=readfun@entry=0x0, start=start@entry=0x0, end=<optimized out>,
      end@entry=0x0) at lread.c:2537
  #22 0x00d0798b in Fload (file=<optimized out>, noerror=0x0, nomessage=0x0,
      nosuffix=0x0, must_suffix=<optimized out>) at lisp.h:1214
  #23 0x00ccadd3 in eval_sub (form=form@entry=0xa847e4b) at eval.c:2641
  #24 0x00cccf0d in Feval (form=0xa847e4b, lexical=lexical@entry=0x20)
      at eval.c:2486
  #25 0x00c25d51 in top_level_2 () at lisp.h:1214
  #26 0x00cc528c in internal_condition_case (
      bfun=bfun@entry=0xc25cf3 <top_level_2>, handlers=handlers@entry=0x60,
      hfun=hfun@entry=0xc2f806 <cmd_error>) at eval.c:1633
  #27 0x00c26460 in top_level_1 (ignore=0x0) at lisp.h:1214
  #28 0x00cc51a6 in internal_catch (tag=tag@entry=0xc520,
      func=func@entry=0xc26437 <top_level_1>, arg=arg@entry=0x0) at eval.c:1312
  #29 0x00c25b0f in command_loop () at lisp.h:1214
  #30 0x00c2f3c4 in recursive_edit_1 () at keyboard.c:765
  #31 0x00c2f6b2 in Frecursive_edit () at keyboard.c:848
  #32 0x00e75745 in main (argc=<optimized out>, argv=<optimized out>)
      at emacs.c:2646
  (gdb) up
  #2  0x00ca5a3f in dump_cold_string (string=<optimized out>, ctx=0x777eb3c)
      at pdumper.c:3560
  3560      dump_igc_finish_obj (ctx);
  (gdb) p data
  $1 = (struct Lisp_String_Data *) 0xb52b200
  (gdb) p *$
  $2 = {gc_header = {v = 245077557021}, data = 0xb52b208 "ÅÆ!\210Ç\030\t.\017"}
  (gdb) p/x *$1
  $3 = {gc_header = {v = 0x390fc2bf1d}, data = 0xb52b208}
  (gdb)



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: MPS: unable to build due to assertion violation in igc_dump_check_object_starts
  2024-07-23 18:00             ` Eli Zaretskii
@ 2024-07-23 18:37               ` Eli Zaretskii
  2024-07-23 18:40                 ` Pip Cet
  2024-07-23 18:50                 ` Gerd Möllmann
  2024-07-23 18:42               ` Gerd Möllmann
  1 sibling, 2 replies; 32+ messages in thread
From: Eli Zaretskii @ 2024-07-23 18:37 UTC (permalink / raw)
  To: gerd.moellmann, pipcet; +Cc: eller.helmut, emacs-devel

> Date: Tue, 23 Jul 2024 21:00:04 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: pipcet@protonmail.com, eller.helmut@gmail.com, emacs-devel@gnu.org
> 
>   #1  0x00c9febf in dump_igc_finish_obj (ctx=0x777eb38, ctx@entry=0x777eb3c)
>       at pdumper.c:935
>   935             emacs_abort ();
>   (gdb) bt
>   #0  emacs_abort () at w32fns.c:11335
>   #1  0x00c9febf in dump_igc_finish_obj (ctx=0x777eb38, ctx@entry=0x777eb3c)
>       at pdumper.c:935
>   #2  0x00ca5a3f in dump_cold_string (string=<optimized out>, ctx=0x777eb3c)
>       at pdumper.c:3560
>   #3  dump_drain_cold_data (ctx=0x777eb3c) at pdumper.c:3714
>   #4  Fdump_emacs_portable (filename=<optimized out>,
>       track_referrers=<optimized out>) at pdumper.c:4514
>   #5  0x00ccae36 in eval_sub (form=0xbb2ac73) at eval.c:2630
>   #6  0x00ccac41 in eval_sub (form=0xbb2abb3) at eval.c:2578
>   #7  0x00ccafc3 in Fprogn (body=0xbb2ae6b) at eval.c:452
>   #8  0x00ccc068 in Flet (args=<optimized out>) at lisp.h:1546
>   #9  0x00ccac41 in eval_sub (form=0xbb2aacb) at eval.c:2578
>   #10 0x00ccc916 in Funwind_protect (args=0xbb2ae7b) at lisp.h:1546
>   #11 0x00ccac41 in eval_sub (form=0xbb2aabb) at eval.c:2578
>   #12 0x00ccafc3 in Fprogn (body=0x0) at eval.c:452
>   #13 0x00ccc068 in Flet (args=<optimized out>) at lisp.h:1546
>   #14 0x00ccac41 in eval_sub (form=0xbb2aa8b) at eval.c:2578
>   #15 0x00ccafc3 in Fprogn (body=0xbb2b93b) at eval.c:452
>   #16 0x00ccc068 in Flet (args=<optimized out>) at lisp.h:1546
>   #17 0x00ccac41 in eval_sub (form=0xbb2a52b) at eval.c:2578
>   #18 0x00ccac41 in eval_sub (form=0xbb2a50b) at eval.c:2578
>   #19 0x00ccafc3 in Fprogn (body=0x0) at eval.c:452
>   #20 0x00ccac41 in eval_sub (form=form@entry=0xbb29ff3) at eval.c:2578
>   #21 0x00d06f34 in readevalloop (readcharfun=readcharfun@entry=0x60c0,
>       infile0=infile0@entry=0x777f638, sourcename=sourcename@entry=0xa848184,
>       printflag=printflag@entry=false, unibyte=unibyte@entry=0x0,
>       readfun=readfun@entry=0x0, start=start@entry=0x0, end=<optimized out>,
>       end@entry=0x0) at lread.c:2537
>   #22 0x00d0798b in Fload (file=<optimized out>, noerror=0x0, nomessage=0x0,
>       nosuffix=0x0, must_suffix=<optimized out>) at lisp.h:1214
>   #23 0x00ccadd3 in eval_sub (form=form@entry=0xa847e4b) at eval.c:2641
>   #24 0x00cccf0d in Feval (form=0xa847e4b, lexical=lexical@entry=0x20)
>       at eval.c:2486
>   #25 0x00c25d51 in top_level_2 () at lisp.h:1214
>   #26 0x00cc528c in internal_condition_case (
>       bfun=bfun@entry=0xc25cf3 <top_level_2>, handlers=handlers@entry=0x60,
>       hfun=hfun@entry=0xc2f806 <cmd_error>) at eval.c:1633
>   #27 0x00c26460 in top_level_1 (ignore=0x0) at lisp.h:1214
>   #28 0x00cc51a6 in internal_catch (tag=tag@entry=0xc520,
>       func=func@entry=0xc26437 <top_level_1>, arg=arg@entry=0x0) at eval.c:1312
>   #29 0x00c25b0f in command_loop () at lisp.h:1214
>   #30 0x00c2f3c4 in recursive_edit_1 () at keyboard.c:765
>   #31 0x00c2f6b2 in Frecursive_edit () at keyboard.c:848
>   #32 0x00e75745 in main (argc=<optimized out>, argv=<optimized out>)
>       at emacs.c:2646
>   (gdb) up
>   #2  0x00ca5a3f in dump_cold_string (string=<optimized out>, ctx=0x777eb3c)
>       at pdumper.c:3560
>   3560      dump_igc_finish_obj (ctx);
>   (gdb) p data
>   $1 = (struct Lisp_String_Data *) 0xb52b200
>   (gdb) p *$
>   $2 = {gc_header = {v = 245077557021}, data = 0xb52b208 "ÅÆ!\210Ç\030\t.\017"}
>   (gdb) p/x *$1
>   $3 = {gc_header = {v = 0x390fc2bf1d}, data = 0xb52b208}
>   (gdb)

And:

  (gdb) fr 1
  #1  0x00c9febf in dump_igc_finish_obj (ctx=0x777eb38, ctx@entry=0x777eb3c)
      at pdumper.c:935
  935             emacs_abort ();
  (gdb) p ctx->igc_type
  $5 = IGC_OBJ_STRING_DATA
  (gdb) p should_end
  $6 = <optimized out>
  (gdb) p ctx->igc_obj_dumped
  $7 = (void *) 0xb52b200
  (gdb) p igc_header_type(ctx->igc_obj_dumped)
  $8 = IGC_OBJ_STRING_DATA
  (gdb) p obj_size(ctx->igc_obj_dumped)
  $9 = 112

So this is IGC_OBJ_STRING_DATA, but why doesn't igc_dump_finish_obj do
its job in this case?  What are we missing?



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: MPS: unable to build due to assertion violation in igc_dump_check_object_starts
  2024-07-23 18:37               ` Eli Zaretskii
@ 2024-07-23 18:40                 ` Pip Cet
  2024-07-23 18:48                   ` Gerd Möllmann
  2024-07-24  8:01                   ` Helmut Eller
  2024-07-23 18:50                 ` Gerd Möllmann
  1 sibling, 2 replies; 32+ messages in thread
From: Pip Cet @ 2024-07-23 18:40 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gerd.moellmann, eller.helmut, emacs-devel

On Tuesday, July 23rd, 2024 at 18:37, Eli Zaretskii <eliz@gnu.org> wrote:
> So this is IGC_OBJ_STRING_DATA, but why doesn't igc_dump_finish_obj do
> its job in this case? What are we missing?

I'm pretty sure it's the bignum thing, currently testing a fix. I underestimated the complexity of how bignums are dumped, and wrongly assumed that on 64-bit systems, we would be dumping bignums; it seems we don't do so anymore. (BIGNUM_DATA, unlike the bignum PVEC, doesn't have a header; it should, and if I add bignums to the dump manually things work now...).

Pip



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: MPS: unable to build due to assertion violation in igc_dump_check_object_starts
  2024-07-23 18:00             ` Eli Zaretskii
  2024-07-23 18:37               ` Eli Zaretskii
@ 2024-07-23 18:42               ` Gerd Möllmann
  2024-07-23 18:49                 ` Eli Zaretskii
  1 sibling, 1 reply; 32+ messages in thread
From: Gerd Möllmann @ 2024-07-23 18:42 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, eller.helmut, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> So instead I added a source-level condition:
>
>       if (ctx->igc_base_offset == THAT_SPECIFIC_VALUE)
> 	emacs_abort ();
>
> and ran with a breakpoint in emacs_abort.  That did the trick.

Yeah, conditional breakpoints can be pretty slow. I sometimes put an
igc_break () call somewhere instead of the abort, so that I can continue
debugging from there on. With a breakpoint on igc_break of course.

>
>> We could then go up the stack and see which dump_xy function that
>> is.
>
> Here, let me know what more you want to see:
>
>   Thread 1 hit Breakpoint 1, emacs_abort () at w32fns.c:11335
>   11335   {
>   (gdb) up
>   #1  0x00c9febf in dump_igc_finish_obj (ctx=0x777eb38, ctx@entry=0x777eb3c)
>       at pdumper.c:935
>   935             emacs_abort ();
>   (gdb) bt
>   #0  emacs_abort () at w32fns.c:11335
>   #1  0x00c9febf in dump_igc_finish_obj (ctx=0x777eb38, ctx@entry=0x777eb3c)
>       at pdumper.c:935
>   #2  0x00ca5a3f in dump_cold_string (string=<optimized out>, ctx=0x777eb3c)
>       at pdumper.c:3560

Looks to me like dump_cold_string is the culprit, somehow. That's

  static void
  dump_cold_string (struct dump_context *ctx, Lisp_Object string)
  {
    /* Dump string contents.  */
    dump_off string_offset = dump_recall_object (ctx, string);
    eassert (string_offset > 0);
    if (SBYTES (string) > DUMP_OFF_MAX - 1)
      error ("string too large");
    dump_off total_size = ptrdiff_t_to_dump_off (SBYTES (string) + 1);
    eassert (total_size > 0);

  # ifdef HAVE_MPS
    struct Lisp_String_Data *data = (struct Lisp_String_Data *)
      (XSTRING (string)->u.s.data - sizeof (*data));
    dump_align_output (ctx, DUMP_ALIGNMENT);
    dump_igc_start_obj (ctx, IGC_OBJ_STRING_DATA, data);
    dump_remember_fixup_ptr_raw
      (ctx,
       string_offset + dump_offsetof (struct Lisp_String, u.s.data),
       ctx->offset + sizeof (*data));
    dump_write (ctx, data, sizeof (*data) + total_size);
  # else
    dump_remember_fixup_ptr_raw
      (ctx,
       string_offset + dump_offsetof (struct Lisp_String, u.s.data),
       ctx->offset);
    dump_write (ctx, XSTRING (string)->u.s.data, total_size);
  # endif
  # ifdef HAVE_MPS
    dump_igc_finish_obj (ctx);
  # endif
  }

We are inside of the dump_igc_finish_obj. I see nothing immediately
suspicious there, hm. 

Can you try to look at that header? It is at offset ctx->igc_base_offset
from ctx->buf. Something like

  p header_type ((char *) ctx->buf + ctx->igc_base_offset)

Is it already broken? Depending on where exactly you emacs_abort it
either should still be zeroed or contain something valid like type
IGC_OBJ_STRING_DATA.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: MPS: unable to build due to assertion violation in igc_dump_check_object_starts
  2024-07-23 18:40                 ` Pip Cet
@ 2024-07-23 18:48                   ` Gerd Möllmann
  2024-07-23 19:12                     ` Pip Cet
  2024-07-24  8:01                   ` Helmut Eller
  1 sibling, 1 reply; 32+ messages in thread
From: Gerd Möllmann @ 2024-07-23 18:48 UTC (permalink / raw)
  To: Pip Cet; +Cc: Eli Zaretskii, eller.helmut, emacs-devel

Pip Cet <pipcet@protonmail.com> writes:

> On Tuesday, July 23rd, 2024 at 18:37, Eli Zaretskii <eliz@gnu.org> wrote:
>> So this is IGC_OBJ_STRING_DATA, but why doesn't igc_dump_finish_obj do
>> its job in this case? What are we missing?
>
> I'm pretty sure it's the bignum thing, currently testing a fix. I
> underestimated the complexity of how bignums are dumped, and wrongly
> assumed that on 64-bit systems, we would be dumping bignums; it seems
> we don't do so anymore. (BIGNUM_DATA, unlike the bignum PVEC, doesn't
> have a header; it should, and if I add bignums to the dump manually
> things work now...).
>
> Pip

Ok, good to hear :-).



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: MPS: unable to build due to assertion violation in igc_dump_check_object_starts
  2024-07-23 18:42               ` Gerd Möllmann
@ 2024-07-23 18:49                 ` Eli Zaretskii
  2024-07-23 18:59                   ` Gerd Möllmann
  0 siblings, 1 reply; 32+ messages in thread
From: Eli Zaretskii @ 2024-07-23 18:49 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: pipcet, eller.helmut, emacs-devel

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: pipcet@protonmail.com,  eller.helmut@gmail.com,  emacs-devel@gnu.org
> Date: Tue, 23 Jul 2024 20:42:56 +0200
> 
> We are inside of the dump_igc_finish_obj. I see nothing immediately
> suspicious there, hm. 
> 
> Can you try to look at that header? It is at offset ctx->igc_base_offset
> from ctx->buf. Something like
> 
>   p header_type ((char *) ctx->buf + ctx->igc_base_offset)
> 
> Is it already broken? Depending on where exactly you emacs_abort it
> either should still be zeroed or contain something valid like type
> IGC_OBJ_STRING_DATA.

I added emacs_abort here:

  if (ctx->flags.dump_object_contents)
    {
      char *base = (char *) ctx->buf + ctx->igc_base_offset;
      char *end = (char *) ctx->buf + ctx->offset;
      if (ctx->igc_base_offset == 0x6e6c08)
	emacs_abort ();  <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
      eassert (end > base);
      char *should_end = igc_dump_finish_obj (ctx->igc_obj_dumped, ctx->igc_type, base, end);
      eassert (should_end >= end);
      dump_write_zero (ctx, should_end - end);
      if (ctx->flags.record_object_starts)
	dump_push (&ctx->igc_object_starts,
		   list2 (dump_off_to_lisp (ctx->igc_base_offset),
			  dump_off_to_lisp (ctx->offset)));
    }

And:

  (gdb) p header_type ((char *) ctx->buf + ctx->igc_base_offset)
  $10 = IGC_OBJ_STRING_DATA

This is _before_ igc_dump_finish_obj was called.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: MPS: unable to build due to assertion violation in igc_dump_check_object_starts
  2024-07-23 18:37               ` Eli Zaretskii
  2024-07-23 18:40                 ` Pip Cet
@ 2024-07-23 18:50                 ` Gerd Möllmann
  1 sibling, 0 replies; 32+ messages in thread
From: Gerd Möllmann @ 2024-07-23 18:50 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, eller.helmut, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> And:
>
>   (gdb) fr 1
>   #1  0x00c9febf in dump_igc_finish_obj (ctx=0x777eb38, ctx@entry=0x777eb3c)
>       at pdumper.c:935
>   935             emacs_abort ();
>   (gdb) p ctx->igc_type
>   $5 = IGC_OBJ_STRING_DATA
>   (gdb) p should_end
>   $6 = <optimized out>
>   (gdb) p ctx->igc_obj_dumped
>   $7 = (void *) 0xb52b200
>   (gdb) p igc_header_type(ctx->igc_obj_dumped)
>   $8 = IGC_OBJ_STRING_DATA
>   (gdb) p obj_size(ctx->igc_obj_dumped)
>   $9 = 112
>
> So this is IGC_OBJ_STRING_DATA, but why doesn't igc_dump_finish_obj do
> its job in this case?  What are we missing?

Maybe it does its job, and it's later overwritten? Depending on where
the emacs_abort is in the function, maybe we can see that. See my other mail.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: MPS: unable to build due to assertion violation in igc_dump_check_object_starts
  2024-07-23 18:49                 ` Eli Zaretskii
@ 2024-07-23 18:59                   ` Gerd Möllmann
  0 siblings, 0 replies; 32+ messages in thread
From: Gerd Möllmann @ 2024-07-23 18:59 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, eller.helmut, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: pipcet@protonmail.com,  eller.helmut@gmail.com,  emacs-devel@gnu.org
>> Date: Tue, 23 Jul 2024 20:42:56 +0200
>> 
>> We are inside of the dump_igc_finish_obj. I see nothing immediately
>> suspicious there, hm. 
>> 
>> Can you try to look at that header? It is at offset ctx->igc_base_offset
>> from ctx->buf. Something like
>> 
>>   p header_type ((char *) ctx->buf + ctx->igc_base_offset)
>> 
>> Is it already broken? Depending on where exactly you emacs_abort it
>> either should still be zeroed or contain something valid like type
>> IGC_OBJ_STRING_DATA.
>
> I added emacs_abort here:
>
>   if (ctx->flags.dump_object_contents)
>     {
>       char *base = (char *) ctx->buf + ctx->igc_base_offset;
>       char *end = (char *) ctx->buf + ctx->offset;
>       if (ctx->igc_base_offset == 0x6e6c08)
> 	emacs_abort ();  <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>       eassert (end > base);
>       char *should_end = igc_dump_finish_obj (ctx->igc_obj_dumped, ctx->igc_type, base, end);
>       eassert (should_end >= end);
>       dump_write_zero (ctx, should_end - end);
>       if (ctx->flags.record_object_starts)
> 	dump_push (&ctx->igc_object_starts,
> 		   list2 (dump_off_to_lisp (ctx->igc_base_offset),
> 			  dump_off_to_lisp (ctx->offset)));
>     }
>
> And:
>
>   (gdb) p header_type ((char *) ctx->buf + ctx->igc_base_offset)
>   $10 = IGC_OBJ_STRING_DATA
>
> This is _before_ igc_dump_finish_obj was called.

I can't explain that ATM. The start function

  static void
  dump_igc_start_obj (struct dump_context *ctx, enum igc_obj_type type,
                      const void *in)
  {
    eassert (ctx->igc_type == IGC_OBJ_INVALID);
    eassert (ctx->igc_obj_dumped == NULL);
    eassert (ctx->offset % DUMP_ALIGNMENT == 0);
    ctx->igc_obj_dumped = (void *) in;
    ctx->igc_type = type;
    ctx->igc_base_offset = ctx->offset;
    if (ctx->flags.dump_object_contents &&
        (type == IGC_OBJ_DUMPED_BYTES ||
         type == IGC_OBJ_DUMPED_CODE_SPACE_MASKS))
      {
        /* This saving of obj_offset is Because of an assertion in
           dump_write. */
        dump_off obj_offset = ctx->obj_offset;
        ctx->obj_offset = 0;
        dump_write_zero (ctx, igc_header_size ());
        ctx->obj_offset = obj_offset;
      }
  }

doesn't write the type, unless I'm blind, and the finish function
doesn't either before the emacs_abort. Hm.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: MPS: unable to build due to assertion violation in igc_dump_check_object_starts
  2024-07-23 18:48                   ` Gerd Möllmann
@ 2024-07-23 19:12                     ` Pip Cet
  2024-07-23 19:25                       ` Gerd Möllmann
  0 siblings, 1 reply; 32+ messages in thread
From: Pip Cet @ 2024-07-23 19:12 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: Eli Zaretskii, eller.helmut, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1090 bytes --]

On Tuesday, July 23rd, 2024 at 18:48, Gerd Möllmann <gerd.moellmann@gmail.com> wrote:
> Pip Cet pipcet@protonmail.com writes:
> 
> > On Tuesday, July 23rd, 2024 at 18:37, Eli Zaretskii eliz@gnu.org wrote:
> > 
> > > So this is IGC_OBJ_STRING_DATA, but why doesn't igc_dump_finish_obj do
> > > its job in this case? What are we missing?
> > 
> > I'm pretty sure it's the bignum thing, currently testing a fix. I
> > underestimated the complexity of how bignums are dumped, and wrongly
> > assumed that on 64-bit systems, we would be dumping bignums; it seems
> > we don't do so anymore. (BIGNUM_DATA, unlike the bignum PVEC, doesn't
> > have a header; it should, and if I add bignums to the dump manually
> > things work now...).
> > 
> > Pip
> 
> 
> Ok, good to hear :-).

Here's a patch which includes both a reproducer for the bug (defining a global bignum in loadup.el which then gets dumped) and a tentative fix.

If bignum dumping is indeed broken on all architectures, including just the change to loadup.el should reproduce the bug even on 64-bit archs.

Pip

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-dump-bignums-properly.patch --]
[-- Type: text/x-patch; name=0001-dump-bignums-properly.patch, Size: 3440 bytes --]

From 8e53493c06cd798476a79238523d566582a0430a Mon Sep 17 00:00:00 2001
From: Pip Cet <pipcet@protonmail.com>
Date: Tue, 23 Jul 2024 18:57:34 +0000
Subject: [PATCH] dump bignums properly

---
 lisp/loadup.el | 2 ++
 src/alloc.c    | 2 +-
 src/bignum.c   | 3 +++
 src/igc.c      | 6 ++----
 src/pdumper.c  | 6 ++++--
 5 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/lisp/loadup.el b/lisp/loadup.el
index 6d1e13f44bf..df178b3bb2f 100644
--- a/lisp/loadup.el
+++ b/lisp/loadup.el
@@ -184,6 +184,8 @@
   (file-error
    (load "ldefs-boot.el")))
 
+(setq foo (lsh 1 92))
+
 (let ((new (make-hash-table :test #'equal)))
   ;; Now that loaddefs has populated definition-prefixes, purify its contents.
   (maphash (lambda (k v) (puthash (purecopy k) (purecopy v) new))
diff --git a/src/alloc.c b/src/alloc.c
index 320a5adaf0b..3c10a1b605f 100644
--- a/src/alloc.c
+++ b/src/alloc.c
@@ -6118,7 +6118,7 @@ make_pure_float (double num)
 /* Value is a bignum object with value VALUE allocated from pure
    space.  */
 
-static Lisp_Object
+Lisp_Object
 make_pure_bignum (Lisp_Object value)
 {
   mpz_t const *n = xbignum_val (value);
diff --git a/src/bignum.c b/src/bignum.c
index 1fe195d78ea..57e92599cc6 100644
--- a/src/bignum.c
+++ b/src/bignum.c
@@ -48,6 +48,9 @@ xfree_for_gmp (void *ptr, size_t ignore)
   xfree (ptr);
 }
 
+extern Lisp_Object
+make_pure_bignum (Lisp_Object value);
+
 void
 init_bignum (void)
 {
diff --git a/src/igc.c b/src/igc.c
index ad5f7cc3e7a..d1cdce82f98 100644
--- a/src/igc.c
+++ b/src/igc.c
@@ -4766,8 +4766,7 @@ igc_dump_finish_obj (void *client, enum igc_obj_type type,
 	igc_assert (base + obj_size (h) >= end);
 	if (type != IGC_OBJ_DUMPED_BYTES &&
 	    type != IGC_OBJ_DUMPED_CODE_SPACE_MASKS &&
-	    type != IGC_OBJ_DUMPED_BUFFER_TEXT &&
-	    type != IGC_OBJ_DUMPED_BIGNUM_DATA)
+	    type != IGC_OBJ_DUMPED_BUFFER_TEXT)
 	  *out = *h;
 	igc_assert (header_nwords (out) > 0);
 	return base + obj_size (h);
@@ -4780,8 +4779,7 @@ igc_dump_finish_obj (void *client, enum igc_obj_type type,
   type = (is_pure (client)
 	  ? pure_obj_type_and_hash (&hash, type, client)
 	  : builtin_obj_type_and_hash (&hash, type, client));
-  if (type != IGC_OBJ_DUMPED_BIGNUM_DATA)
-    set_header (out, type, nbytes, hash);
+  set_header (out, type, nbytes, hash);
   return base + nbytes;
 }
 
diff --git a/src/pdumper.c b/src/pdumper.c
index 996bdc79106..0f82be46e75 100644
--- a/src/pdumper.c
+++ b/src/pdumper.c
@@ -913,7 +913,8 @@ dump_igc_start_obj (struct dump_context *ctx, enum igc_obj_type type,
   ctx->igc_base_offset = ctx->offset;
   if (ctx->flags.dump_object_contents &&
       (type == IGC_OBJ_DUMPED_BYTES ||
-       type == IGC_OBJ_DUMPED_CODE_SPACE_MASKS))
+       type == IGC_OBJ_DUMPED_CODE_SPACE_MASKS ||
+       type == IGC_OBJ_DUMPED_BIGNUM_DATA))
     {
       /* This saving of obj_offset is Because of an assertion in
 	 dump_write. */
@@ -3638,8 +3639,9 @@ dump_cold_bignum (struct dump_context *ctx, Lisp_Object object)
   eassert (sz_nlimbs < DUMP_OFF_MAX);
   dump_align_output (ctx, alignof (mp_limb_t));
   dump_off nlimbs = (dump_off) sz_nlimbs;
+  char *dummy = (void *)igc_alloc_bytes (nlimbs * sizeof (mp_limb_t));
 # ifdef HAVE_MPS
-  dump_igc_start_obj (ctx, IGC_OBJ_DUMPED_BIGNUM_DATA, n);
+  dump_igc_start_obj (ctx, IGC_OBJ_DUMPED_BIGNUM_DATA, dummy - sizeof (uint64_t));
 # endif
   Lisp_Object descriptor
     = list2 (dump_off_to_lisp (ctx->offset),
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: MPS: unable to build due to assertion violation in igc_dump_check_object_starts
  2024-07-23 19:12                     ` Pip Cet
@ 2024-07-23 19:25                       ` Gerd Möllmann
  2024-07-23 19:30                         ` Gerd Möllmann
  0 siblings, 1 reply; 32+ messages in thread
From: Gerd Möllmann @ 2024-07-23 19:25 UTC (permalink / raw)
  To: Pip Cet; +Cc: Eli Zaretskii, eller.helmut, emacs-devel

Pip Cet <pipcet@protonmail.com> writes:

> On Tuesday, July 23rd, 2024 at 18:48, Gerd Möllmann <gerd.moellmann@gmail.com> wrote:
>> Pip Cet pipcet@protonmail.com writes:
>> 
>> > On Tuesday, July 23rd, 2024 at 18:37, Eli Zaretskii eliz@gnu.org wrote:
>> > 
>> > > So this is IGC_OBJ_STRING_DATA, but why doesn't igc_dump_finish_obj do
>> > > its job in this case? What are we missing?
>> > 
>> > I'm pretty sure it's the bignum thing, currently testing a fix. I
>> > underestimated the complexity of how bignums are dumped, and wrongly
>> > assumed that on 64-bit systems, we would be dumping bignums; it seems
>> > we don't do so anymore. (BIGNUM_DATA, unlike the bignum PVEC, doesn't
>> > have a header; it should, and if I add bignums to the dump manually
>> > things work now...).
>> > 
>> > Pip
>> 
>> 
>> Ok, good to hear :-).
>
> Here's a patch which includes both a reproducer for the bug (defining a global bignum in loadup.el which then gets dumped) and a tentative fix.
>
> If bignum dumping is indeed broken on all architectures, including
> just the change to loadup.el should reproduce the bug even on 64-bit
> archs.

I've added the bignum to loadup.el. Dumping doesn't complain, but after
that

  ANCIENT=yes gmake -C ../lisp compile-first EMACS="../src/bootstrap-emacs"
    ELC      emacs-lisp/macroexp.elc
    ELC      emacs-lisp/cconv.elc
    ELC      emacs-lisp/byte-opt.elc
    ELC      emacs-lisp/bytecomp.elc
    ELC      emacs-lisp/loaddefs-gen.elc
    ELC      emacs-lisp/radix-tree.elc

  /Users/gerd/emacs/github/mps/code/poolamc.c:1288: Emacs fatal error: assertion failed: p < q

  /Users/gerd/emacs/github/mps/code/poolamc.c:1288: Emacs fatal error: assertion failed: p < q

  /Users/gerd/emacs/github/mps/code/poolamc.c:1288: Emacs fatal error: assertion failed: p < q

  /Users/gerd/emacs/github/mps/code/poolamc.c:1288: Emacs fatal error: assertion failed: p < q

  /Users/gerd/emacs/github/mps/code/poolamc.c:1288: Emacs fatal error: assertion failed: p < q

  /Users/gerd/emacs/github/mps/code/poolamc.c:1288: Emacs fatal error: assertion failed: p < q
  gmake[3]: *** [Makefile:335: emacs-lisp/bytecomp.elc] Abort trap: 6

Will apply the whole patch and report back.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: MPS: unable to build due to assertion violation in igc_dump_check_object_starts
  2024-07-23 19:25                       ` Gerd Möllmann
@ 2024-07-23 19:30                         ` Gerd Möllmann
  2024-07-23 22:47                           ` Pip Cet
  0 siblings, 1 reply; 32+ messages in thread
From: Gerd Möllmann @ 2024-07-23 19:30 UTC (permalink / raw)
  To: Pip Cet; +Cc: Eli Zaretskii, eller.helmut, emacs-devel

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Will apply the whole patch and report back.

And that works like a charm :-). Thanks, Pip!



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: MPS: unable to build due to assertion violation in igc_dump_check_object_starts
  2024-07-23 19:30                         ` Gerd Möllmann
@ 2024-07-23 22:47                           ` Pip Cet
  2024-07-24  3:50                             ` Gerd Möllmann
  2024-07-24 11:27                             ` Eli Zaretskii
  0 siblings, 2 replies; 32+ messages in thread
From: Pip Cet @ 2024-07-23 22:47 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: Eli Zaretskii, eller.helmut, emacs-devel

On Tuesday, July 23rd, 2024 at 19:30, Gerd Möllmann <gerd.moellmann@gmail.com> wrote:
> Gerd Möllmann gerd.moellmann@gmail.com writes:
> 
> > Will apply the whole patch and report back.
> 
> And that works like a charm :-). Thanks, Pip!

I finally managed to build a working 32-bit emacs.exe using MSYS (I think that's what Eli is using; the 2017 release, right?) under Wine, and I can confirm it dumps and appears to work now, with the patch. The dumped bignum has the right value. I hope that's true for Eli's machine as well. I'm building without native compilation, though.

While we're on the subject, though, I had to modify mps a little to build with GCC. Are other people using a different compiler for mps? Or are there patches somewhere?

(I also ran into the "Write error to standard output: Success" issue, possibly because I ran wine from within Docker, so I ended up removing the failure exits from 'close_output_streams')

Pip



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: MPS: unable to build due to assertion violation in igc_dump_check_object_starts
  2024-07-23 22:47                           ` Pip Cet
@ 2024-07-24  3:50                             ` Gerd Möllmann
  2024-07-24 11:27                             ` Eli Zaretskii
  1 sibling, 0 replies; 32+ messages in thread
From: Gerd Möllmann @ 2024-07-24  3:50 UTC (permalink / raw)
  To: Pip Cet; +Cc: Eli Zaretskii, eller.helmut, emacs-devel

Pip Cet <pipcet@protonmail.com> writes:

> While we're on the subject, though, I had to modify mps a little to
> build with GCC. Are other people using a different compiler for mps?
> Or are there patches somewhere?

FWIW, I'm using clang, and for the build problems I had on macOS, I've
submitted issues on the github project.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: MPS: unable to build due to assertion violation in igc_dump_check_object_starts
  2024-07-23 18:40                 ` Pip Cet
  2024-07-23 18:48                   ` Gerd Möllmann
@ 2024-07-24  8:01                   ` Helmut Eller
  2024-07-24  8:21                     ` Pip Cet
  1 sibling, 1 reply; 32+ messages in thread
From: Helmut Eller @ 2024-07-24  8:01 UTC (permalink / raw)
  To: Pip Cet; +Cc: Eli Zaretskii, gerd.moellmann, emacs-devel

On Tue, Jul 23 2024, Pip Cet wrote:

> I'm pretty sure it's the bignum thing, currently testing a fix.

I think fix_wrapped_vec is also broken for 32-bit hardware:

  static mps_res_t
  fix_wrapped_vec (mps_ss_t ss, mps_addr_t *p)
  {
    MPS_SCAN_BEGIN (ss)
    {
      mps_addr_t client = *p;
      if (client == NULL)
        return MPS_RES_OK;
      if (is_aligned (client))
        {
          client = (char *)client - sizeof (struct Lisp_Vector);
          mps_addr_t base = client;
          if (MPS_FIX1 (ss, base))
            {
              mps_res_t res = MPS_FIX2 (ss, &base);
              if (res != MPS_RES_OK)
                return res;
              if (base == NULL)
                *p = NULL;
              else
                *p = (char *)base + sizeof (struct Lisp_Vector);
            }
        }
    }
    MPS_SCAN_END (ss);
    return MPS_RES_OK;
  }

First, is_aligned doesn't make any sense; client is neither a tagged
pointer nor does it point to a header.

Second, sizeof (struct Lisp_Vector) is 16 because the struct decl has
GCALIGNED_STRUCT.  The code should perhaps use header_size or
offsetof (struct Lisp_Vector, content).



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: MPS: unable to build due to assertion violation in igc_dump_check_object_starts
  2024-07-24  8:01                   ` Helmut Eller
@ 2024-07-24  8:21                     ` Pip Cet
  2024-07-24  8:33                       ` Helmut Eller
  0 siblings, 1 reply; 32+ messages in thread
From: Pip Cet @ 2024-07-24  8:21 UTC (permalink / raw)
  To: Helmut Eller; +Cc: Eli Zaretskii, gerd.moellmann, emacs-devel

On Wednesday, July 24th, 2024 at 08:01, Helmut Eller <eller.helmut@gmail.com> wrote:
> On Tue, Jul 23 2024, Pip Cet wrote:
> > I'm pretty sure it's the bignum thing, currently testing a fix.
> I think fix_wrapped_vec is also broken for 32-bit hardware:

Thanks for checking! I'm not sure I understand the problem, so please excuse the silly questions.

> First, is_aligned doesn't make any sense; client is neither a tagged
> pointer nor does it point to a header.

It points to the contents area of a struct Lisp_Vector. Since the Lisp vector header is aligned to 8 bytes, and 16 bytes in size including the MPS header, the client address also has to be aligned.

I believe the best thing to do here is to calculate the header address and assert, not check, that it is aligned. Unaligned addresses should never get here.

> Second, sizeof (struct Lisp_Vector) is 16 because the struct decl has
> GCALIGNED_STRUCT. The code should perhaps use header_size or
> offsetof (struct Lisp_Vector, content).

I'm not sure I understand. sizeof (struct Lisp_Vector), header_size, and offsetof (struct Lisp_Vector, content) are all 16.

(And, yes, that means wasting 32 bits per vector. Is that the problem?)

Pip



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: MPS: unable to build due to assertion violation in igc_dump_check_object_starts
  2024-07-24  8:21                     ` Pip Cet
@ 2024-07-24  8:33                       ` Helmut Eller
  2024-07-24  9:20                         ` Pip Cet
  2024-07-24 11:44                         ` Eli Zaretskii
  0 siblings, 2 replies; 32+ messages in thread
From: Helmut Eller @ 2024-07-24  8:33 UTC (permalink / raw)
  To: Pip Cet; +Cc: Eli Zaretskii, gerd.moellmann, emacs-devel

On Wed, Jul 24 2024, Pip Cet wrote:

>> First, is_aligned doesn't make any sense; client is neither a tagged
>> pointer nor does it point to a header.
>
> It points to the contents area of a struct Lisp_Vector. Since the Lisp
> vector header is aligned to 8 bytes, and 16 bytes in size including
> the MPS header, the client address also has to be aligned.

On my machine:

(gdb) p sizeof (struct vectorlike_header)
$56 = 12
(gdb) p sizeof (struct Lisp_Vector)
$57 = 16
(gdb) p &((struct Lisp_Vector*)0)->contents 
$58 = (Lisp_Object (*)[]) 0xc
(gdb)
(gdb) p header_size
$59 = header_size
(gdb) p/x header_size
$60 = 0xc
(gdb)

> I believe the best thing to do here is to calculate the header address
> and assert, not check, that it is aligned. Unaligned addresses should
> never get here.

Right, the is_aligned test doesn't make any sense, even on 64-bit
machines.

>> Second, sizeof (struct Lisp_Vector) is 16 because the struct decl has
>> GCALIGNED_STRUCT. The code should perhaps use header_size or
>> offsetof (struct Lisp_Vector, content).
>
> I'm not sure I understand. sizeof (struct Lisp_Vector), header_size,
> and offsetof (struct Lisp_Vector, content) are all 16.

Not on my machine. See above.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: MPS: unable to build due to assertion violation in igc_dump_check_object_starts
  2024-07-24  8:33                       ` Helmut Eller
@ 2024-07-24  9:20                         ` Pip Cet
  2024-07-24  9:39                           ` Helmut Eller
  2024-07-24 11:44                         ` Eli Zaretskii
  1 sibling, 1 reply; 32+ messages in thread
From: Pip Cet @ 2024-07-24  9:20 UTC (permalink / raw)
  To: Helmut Eller; +Cc: Eli Zaretskii, gerd.moellmann, emacs-devel

On Wednesday, July 24th, 2024 at 08:33, Helmut Eller <eller.helmut@gmail.com> wrote:
> On Wed, Jul 24 2024, Pip Cet wrote:
>
> > > First, is_aligned doesn't make any sense; client is neither a tagged
> > > pointer nor does it point to a header.
> >
> > It points to the contents area of a struct Lisp_Vector. Since the Lisp
> > vector header is aligned to 8 bytes, and 16 bytes in size including
> > the MPS header, the client address also has to be aligned.
>
> On my machine:
>
> (gdb) p sizeof (struct vectorlike_header)
> $56 = 12

Ouch. Sorry. That's definitely not intended.

Is alignof(struct Lisp_Vector) == 4 for you? Which compiler version are you using?

> > I believe the best thing to do here is to calculate the header address
> > and assert, not check, that it is aligned. Unaligned addresses should
> > never get here.
>
> Right, the is_aligned test doesn't make any sense, even on 64-bit
> machines.

Will fix.

> > > Second, sizeof (struct Lisp_Vector) is 16 because the struct decl has
> > > GCALIGNED_STRUCT. The code should perhaps use header_size or
> > > offsetof (struct Lisp_Vector, content).
> >
> > I'm not sure I understand. sizeof (struct Lisp_Vector), header_size,
> > and offsetof (struct Lisp_Vector, content) are all 16.
>
> Not on my machine. See above.

Sorry for the confusion!

Pip



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: MPS: unable to build due to assertion violation in igc_dump_check_object_starts
  2024-07-24  9:20                         ` Pip Cet
@ 2024-07-24  9:39                           ` Helmut Eller
  2024-07-24  9:54                             ` Pip Cet
  0 siblings, 1 reply; 32+ messages in thread
From: Helmut Eller @ 2024-07-24  9:39 UTC (permalink / raw)
  To: Pip Cet; +Cc: Eli Zaretskii, gerd.moellmann, emacs-devel

On Wed, Jul 24 2024, Pip Cet wrote:

> Is alignof(struct Lisp_Vector) == 4 for you?

No, it's 8.  Because of the GCALIGNED_STRUCT attribute.

>Which compiler version are you using?

i686-linux-gnu-gcc -v
Using built-in specs.
COLLECT_GCC=i686-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc-cross/i686-linux-gnu/12/lto-wrapper
Target: i686-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 12.2.0-14' --with-bugurl=file:///usr/share/doc/gcc-12/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-12 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --without-target-system-zlib --enable-targets=all --enable-multiarch --disable-werror --with-arch-32=i686 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=i686-linux-gnu --program-prefix=i686-linux-gnu- --includedir=/usr/i686-linux-gnu/include
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.2.0 (Debian 12.2.0-14)




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: MPS: unable to build due to assertion violation in igc_dump_check_object_starts
  2024-07-24  9:39                           ` Helmut Eller
@ 2024-07-24  9:54                             ` Pip Cet
  2024-07-24 11:40                               ` Helmut Eller
  0 siblings, 1 reply; 32+ messages in thread
From: Pip Cet @ 2024-07-24  9:54 UTC (permalink / raw)
  To: Helmut Eller; +Cc: Eli Zaretskii, gerd.moellmann, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 583 bytes --]

On Wednesday, July 24th, 2024 at 09:39, Helmut Eller <eller.helmut@gmail.com> wrote:
> On Wed, Jul 24 2024, Pip Cet wrote:
> 
> > Is alignof(struct Lisp_Vector) == 4 for you?
> 
> No, it's 8. Because of the GCALIGNED_STRUCT attribute.

Thanks for letting me know.  Is this in gdb or lldb?  lldb shows me alignof(struct Lisp_Vector) == 4 on a similar Debian system, which looks wrong to me.  I've fixed things on that system for now (it was building without mps, because that requires -pthread in the CFLAGS to link on that machine...)

Can you try the attached patch?

Pip

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-mps-aligned.patch --]
[-- Type: text/x-patch; name=0001-mps-aligned.patch, Size: 904 bytes --]

diff --git a/src/lisp.h b/src/lisp.h
index 8a1b1acf5f0..d4331b3a4ff 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -45,7 +45,7 @@ #define EMACS_LISP_H
 INLINE_HEADER_BEGIN
 
 #ifdef HAVE_MPS
-union gc_header { uint64_t v; };
+union gc_header;
 extern void gc_maybe_quit (void);
 extern bool gc_signal_handler_can_run (int);
 #else
@@ -335,6 +335,7 @@ #define GCALIGNED_UNION_MEMBER char alignas (GCALIGNMENT) gcaligned;
 #endif
 #define GCALIGNED(type) (alignof (type) % GCALIGNMENT == 0)
 
+union gc_header { uint64_t v; GCALIGNED_UNION_MEMBER };
 /* Lisp_Word is a scalar word suitable for holding a tagged pointer or
    integer.  Usually it is a pointer to a deliberately-incomplete type
    'struct Lisp_X'.  However, it is EMACS_INT when Lisp_Objects and
@@ -6335,3 +6336,4 @@ define_error (Lisp_Object name, const char *message, Lisp_Object parent);
 INLINE_HEADER_END
 
 #endif /* EMACS_LISP_H */
+

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: MPS: unable to build due to assertion violation in igc_dump_check_object_starts
  2024-07-23 22:47                           ` Pip Cet
  2024-07-24  3:50                             ` Gerd Möllmann
@ 2024-07-24 11:27                             ` Eli Zaretskii
  1 sibling, 0 replies; 32+ messages in thread
From: Eli Zaretskii @ 2024-07-24 11:27 UTC (permalink / raw)
  To: Pip Cet; +Cc: gerd.moellmann, eller.helmut, emacs-devel

> Date: Tue, 23 Jul 2024 22:47:42 +0000
> From: Pip Cet <pipcet@protonmail.com>
> Cc: Eli Zaretskii <eliz@gnu.org>, eller.helmut@gmail.com, emacs-devel@gnu.org
> 
> On Tuesday, July 23rd, 2024 at 19:30, Gerd Möllmann <gerd.moellmann@gmail.com> wrote:
> > Gerd Möllmann gerd.moellmann@gmail.com writes:
> > 
> > > Will apply the whole patch and report back.
> > 
> > And that works like a charm :-). Thanks, Pip!
> 
> I finally managed to build a working 32-bit emacs.exe using MSYS (I
> think that's what Eli is using; the 2017 release, right?)

MinGW and MSYS is many packages, so if you ask about the timestamp,
you need to tell which package are you asking about.  E.g., the latest
runtime of the MinGW I use is from 2021, not 2017.

> under Wine, and I can confirm it dumps and appears to work now, with the patch. The dumped bignum has the right value. I hope that's true for Eli's machine as well. I'm building without native compilation, though.

Yes, it builds now and seems to work, thanks.

> While we're on the subject, though, I had to modify mps a little to build with GCC. Are other people using a different compiler for mps? Or are there patches somewhere?

MPS doesn't support MinGW OOTB, so I needed to add that, and fix a few
compilation warnings and errors.  I can share my changes, if you want.

The important part is to make sure the smoke test suite runs for you
and passes all the tests.  Those tests uncovered a subtle error in the
code which I needed to fix.

> (I also ran into the "Write error to standard output: Success" issue, possibly because I ran wine from within Docker, so I ended up removing the failure exits from 'close_output_streams')

Never saw anything like that, but I didn't use Wine or Docker.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: MPS: unable to build due to assertion violation in igc_dump_check_object_starts
  2024-07-24  9:54                             ` Pip Cet
@ 2024-07-24 11:40                               ` Helmut Eller
  2024-07-24 12:25                                 ` Pip Cet
  0 siblings, 1 reply; 32+ messages in thread
From: Helmut Eller @ 2024-07-24 11:40 UTC (permalink / raw)
  To: Pip Cet; +Cc: Eli Zaretskii, gerd.moellmann, emacs-devel

On Wed, Jul 24 2024, Pip Cet wrote:

>> No, it's 8. Because of the GCALIGNED_STRUCT attribute.
>
> Thanks for letting me know.  Is this in gdb or lldb?

Neither.  It's the output of printf.

> Can you try the attached patch?

With this patch and the bignum patch, make bootstrap works now.

I guess it works because it adds an unused word to vectorlike_header.
Are working on more header related changes?



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: MPS: unable to build due to assertion violation in igc_dump_check_object_starts
  2024-07-24  8:33                       ` Helmut Eller
  2024-07-24  9:20                         ` Pip Cet
@ 2024-07-24 11:44                         ` Eli Zaretskii
  1 sibling, 0 replies; 32+ messages in thread
From: Eli Zaretskii @ 2024-07-24 11:44 UTC (permalink / raw)
  To: Helmut Eller; +Cc: pipcet, gerd.moellmann, emacs-devel

> From: Helmut Eller <eller.helmut@gmail.com>
> Cc: Eli Zaretskii <eliz@gnu.org>,  gerd.moellmann@gmail.com,
>  emacs-devel@gnu.org
> Date: Wed, 24 Jul 2024 10:33:11 +0200
> 
> On my machine:
> 
> (gdb) p sizeof (struct vectorlike_header)
> $56 = 12
> (gdb) p sizeof (struct Lisp_Vector)
> $57 = 16
> (gdb) p &((struct Lisp_Vector*)0)->contents 
> $58 = (Lisp_Object (*)[]) 0xc
> (gdb)
> (gdb) p header_size
> $59 = header_size
> (gdb) p/x header_size
> $60 = 0xc
> (gdb)

With my 32-bit build:

  (gdb) p alignof(struct Lisp_Vector)
  $1 = 8
  (gdb) p sizeof (struct vectorlike_header)
  $2 = 16
  (gdb) p sizeof (struct Lisp_Vector)
  $3 = 16
  (gdb) p &((struct Lisp_Vector*)0)->contents
  $4 = (Lisp_Object (*)[]) 0x10
  (gdb) p/x header_size
  $5 = 0x10



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: MPS: unable to build due to assertion violation in igc_dump_check_object_starts
  2024-07-24 11:40                               ` Helmut Eller
@ 2024-07-24 12:25                                 ` Pip Cet
  0 siblings, 0 replies; 32+ messages in thread
From: Pip Cet @ 2024-07-24 12:25 UTC (permalink / raw)
  To: Helmut Eller; +Cc: Eli Zaretskii, gerd.moellmann, emacs-devel

On Wednesday, July 24th, 2024 at 11:40, Helmut Eller <eller.helmut@gmail.com> wrote:
> On Wed, Jul 24 2024, Pip Cet wrote:
> 
> > > No, it's 8. Because of the GCALIGNED_STRUCT attribute.
> > 
> > Thanks for letting me know. Is this in gdb or lldb?
> 
> Neither. It's the output of printf.

Ah, that's trustworthy. Thanks again!

> > Can you try the attached patch?
> With this patch and the bignum patch, make bootstrap works now.

Excellent. I'm currently working on setting up CI for a few Emacs variants, so I was confident enough to push the patch.

> I guess it works because it adds an unused word to vectorlike_header.

Yes.  In theory, we could get away with storing the sizes of ordinary vectors in the MPS header (using an extra bit on 32-bit architectures), but that would complicate things further.

> Are working on more header related changes?

Not right now. Two things I would like to change is to use the extended header for "large" objects, and to reduce the number of size bits in the ordinary header (so a 1 GB vector, for example, would need an extended header).  That would allow us to have more hash bits (and also to have objects >= 32GB, which MPS really is not designed for), reducing the likelihood of collisions after wraparound. However, that's really low priority.

What I'm looking at right now, apart from the CI thing, is to fix the signal issue. My current approach is to "veto" any (non-SIGPROF) signal that arrives while igc_busy_p(), then check for missed signals in maybe_quit and raise() them. So far, it's surviving my destruction testing, though of course it's far from perfect.

Pip



^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2024-07-24 12:25 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-23 12:58 MPS: unable to build due to assertion violation in igc_dump_check_object_starts Eli Zaretskii
2024-07-23 14:10 ` Gerd Möllmann
2024-07-23 14:26   ` Eli Zaretskii
2024-07-23 14:40     ` Eli Zaretskii
2024-07-23 14:52       ` Gerd Möllmann
2024-07-23 15:43         ` Eli Zaretskii
2024-07-23 16:29           ` Gerd Möllmann
2024-07-23 18:00             ` Eli Zaretskii
2024-07-23 18:37               ` Eli Zaretskii
2024-07-23 18:40                 ` Pip Cet
2024-07-23 18:48                   ` Gerd Möllmann
2024-07-23 19:12                     ` Pip Cet
2024-07-23 19:25                       ` Gerd Möllmann
2024-07-23 19:30                         ` Gerd Möllmann
2024-07-23 22:47                           ` Pip Cet
2024-07-24  3:50                             ` Gerd Möllmann
2024-07-24 11:27                             ` Eli Zaretskii
2024-07-24  8:01                   ` Helmut Eller
2024-07-24  8:21                     ` Pip Cet
2024-07-24  8:33                       ` Helmut Eller
2024-07-24  9:20                         ` Pip Cet
2024-07-24  9:39                           ` Helmut Eller
2024-07-24  9:54                             ` Pip Cet
2024-07-24 11:40                               ` Helmut Eller
2024-07-24 12:25                                 ` Pip Cet
2024-07-24 11:44                         ` Eli Zaretskii
2024-07-23 18:50                 ` Gerd Möllmann
2024-07-23 18:42               ` Gerd Möllmann
2024-07-23 18:49                 ` Eli Zaretskii
2024-07-23 18:59                   ` Gerd Möllmann
2024-07-23 14:42     ` Pip Cet
2024-07-23 16:28 ` Pip Cet

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).