Some experience with the igc branch

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

* Some experience with the igc branch
@ 2024-12-22 15:40 Óscar Fuentes
  2024-12-22 17:18 ` Gerd Möllmann
                   ` (2 more replies)
  0 siblings, 3 replies; 203+ messages in thread
From: Óscar Fuentes @ 2024-12-22 15:40 UTC (permalink / raw)
  To: emacs-devel

I've using the igc branch for the past weeks. It was mostly Dart/Flutter
development with lsp-dart / lsp-mode enabled, with all its default
features enabled. On top of that, I use the flx completion algorithm.

This setup puts a lot of stress on GC. To illustrate, on master Emacs
after setting garbage-collection-messages to t, one can see that simply
writing a few characters triggers GC several times, each with its
corresponding pause, which may be very noticeable ("uh! that keypress
didn't register... wait, there it is.") The experience is not great.
Quite miserable, I would say. People suggest playing with
gc-cons-threshold (I have mine set to 10'000'000) but those tricks
simply make things a bit less awful.

With igc the pauses are still there, but they much shorter and
predictable, they no longer distract me from thinking on what I'm
writing, which is a huge improvement. I suspect that some of those
pauses are not related to garbage collection (executing code and moving
data also takes time.)

TL/DR: now I enjoy using Emacs with this setup and I'm no longer tempted
to switch to other editors for this type of work.

A big thank you to all involved on this feature.

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-22 15:40 Some experience with the igc branch Óscar Fuentes
@ 2024-12-22 17:18 ` Gerd Möllmann
  2024-12-22 17:29 ` Gerd Möllmann
  2024-12-22 17:41 ` Pip Cet via Emacs development discussions.
  2 siblings, 0 replies; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-22 17:18 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: emacs-devel

Óscar Fuentes <ofv@wanadoo.es> writes:

> I've using the igc branch for the past weeks. It was mostly Dart/Flutter
> development with lsp-dart / lsp-mode enabled, with all its default
> features enabled. On top of that, I use the flx completion algorithm.
>
> This setup puts a lot of stress on GC. To illustrate, on master Emacs
> after setting garbage-collection-messages to t, one can see that simply
> writing a few characters triggers GC several times, each with its
> corresponding pause, which may be very noticeable ("uh! that keypress
> didn't register... wait, there it is.") The experience is not great.
> Quite miserable, I would say. People suggest playing with
> gc-cons-threshold (I have mine set to 10'000'000) but those tricks
> simply make things a bit less awful.
>
> With igc the pauses are still there, but they much shorter and
> predictable, they no longer distract me from thinking on what I'm
> writing, which is a huge improvement. I suspect that some of those
> pauses are not related to garbage collection (executing code and moving
> data also takes time.)
>
> TL/DR: now I enjoy using Emacs with this setup and I'm no longer tempted
> to switch to other editors for this type of work.
>
> A big thank you to all involved on this feature.

👍 



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-22 15:40 Some experience with the igc branch Óscar Fuentes
  2024-12-22 17:18 ` Gerd Möllmann
@ 2024-12-22 17:29 ` Gerd Möllmann
  2024-12-22 17:41 ` Pip Cet via Emacs development discussions.
  2 siblings, 0 replies; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-22 17:29 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: emacs-devel

Óscar Fuentes <ofv@wanadoo.es> writes:

> With igc the pauses are still there, but they much shorter and
> predictable, they no longer distract me from thinking on what I'm
> writing, which is a huge improvement. I suspect that some of those
> pauses are not related to garbage collection (executing code and moving
> data also takes time.)

In my case, with Eglot, the following settings made a difference;

  :custom
  (eglot-sync-connect nil)
  (eglot-events-buffer-config '(:size 0 :format full)))

Don't know if Lsp-mode has similar knobs.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-22 15:40 Some experience with the igc branch Óscar Fuentes
  2024-12-22 17:18 ` Gerd Möllmann
  2024-12-22 17:29 ` Gerd Möllmann
@ 2024-12-22 17:41 ` Pip Cet via Emacs development discussions.
  2024-12-22 17:56   ` Gerd Möllmann
                     ` (3 more replies)
  2 siblings, 4 replies; 203+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-22 17:41 UTC (permalink / raw)
  To: Óscar Fuentes
  Cc: emacs-devel, Gerd Möllmann, Helmut Eller, Andrea Corallo

Óscar Fuentes <ofv@wanadoo.es> writes:
> With igc the pauses are still there, but they much shorter and
> predictable, they no longer distract me from thinking on what I'm
> writing, which is a huge improvement. I suspect that some of those
> pauses are not related to garbage collection (executing code and moving
> data also takes time.)

Quite possible.  Even if it is GC, please keep in mind that MPS has many
settings which you can play with, and it can improve things a lot.  It's
not too early to become a fan of the scratch/igc branch, but it is too
early to reject it for performance reasons.  It's a "heads you lose, tails I
win" situation, I guess.

> TL/DR: now I enjoy using Emacs with this setup and I'm no longer tempted
> to switch to other editors for this type of work.

I think this is an important point: ultimately, it's about having daily
drivers.  We need to remove the remaining impediments for that:

1. The signal issue.  I don't have a good way to fix this and make
everyone happy, but I do have a solution which hasn't caused a crash for
me in quite a while.  It may be good enough.

2. no-purespace.  Merging that into scratch/igc would help, well, me.
What do others think?

3. bytecode stack marking.  That comment raises my red-flag alert,
because it sounds like we're just accepting a preventable crash at this
stage rather than wanting to do anything about it.  The reality, of
course, is different, but I'd be happier if we refused to create a byte
code object that intends to use more stack than we can guarantee we
would scan.  Can we do that?

Pip

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-22 17:41 ` Pip Cet via Emacs development discussions.
@ 2024-12-22 17:56   ` Gerd Möllmann
  2024-12-22 19:11   ` Óscar Fuentes
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-22 17:56 UTC (permalink / raw)
  To: Pip Cet; +Cc: Óscar Fuentes, emacs-devel, Helmut Eller, Andrea Corallo

Pip Cet <pipcet@protonmail.com> writes:

> 3. bytecode stack marking.  That comment raises my red-flag alert,
> because it sounds like we're just accepting a preventable crash at this
> stage rather than wanting to do anything about it.  The reality, of
> course, is different, but I'd be happier if we refused to create a byte
> code object that intends to use more stack than we can guarantee we
> would scan.  Can we do that?
>
> Pip

You mean my comment here?

static mps_res_t
scan_bc (mps_ss_t ss, void *start, void *end, void *closure)
{
  MPS_SCAN_BEGIN (ss)
  {
    struct igc_thread_list *t = closure;
    struct bc_thread_state *bc = &t->d.ts->bc;
    igc_assert (start == (void *) bc->stack);
    igc_assert (end == (void *) bc->stack_end);
    /* FIXME/igc: AFAIU the current top frame starts at
       bc->fp->next_stack and has a maximum length that is given by the
       bytecode being executed (COMPILED_STACK_DEPTH). So, we need to
       scan upto bc->fo->next_stack + that max depth to be safe.  Since
       I don't have that number ATM, I'm using an arbitrary estimate for
       now.

       This must be changed to something better. Note that Mattias said
       the bc stack marking will be changed in the future.  */
    const size_t HORRIBLE_ESTIMATE = 1024;
    char *scan_end = bc_next_frame (bc->fp);
    scan_end += HORRIBLE_ESTIMATE;
    end = min (end, (void *) scan_end);
    if (end > start)
      IGC_FIX_CALL (ss, scan_ambig (ss, start, end, NULL));
  }
  MPS_SCAN_END (ss);
  return MPS_RES_OK;
}

I never felt like changing the byte code stack, TBH. For reasons :-).



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-22 17:41 ` Pip Cet via Emacs development discussions.
  2024-12-22 17:56   ` Gerd Möllmann
@ 2024-12-22 19:11   ` Óscar Fuentes
  2024-12-23  0:05     ` Pip Cet via Emacs development discussions.
  2024-12-23  6:27     ` Jean Louis
  2024-12-22 20:29   ` Helmut Eller
  2024-12-22 20:50   ` Gerd Möllmann
  3 siblings, 2 replies; 203+ messages in thread
From: Óscar Fuentes @ 2024-12-22 19:11 UTC (permalink / raw)
  To: emacs-devel; +Cc: Pip Cet, Gerd Möllmann, Helmut Eller, Andrea Corallo

Pip Cet via "Emacs development discussions." <emacs-devel@gnu.org>
writes:

>> I suspect that some of those
>> pauses are not related to garbage collection (executing code and moving
>> data also takes time.)
>
> Quite possible.  Even if it is GC, please keep in mind that MPS has many
> settings which you can play with, and it can improve things a lot.  It's
> not too early to become a fan of the scratch/igc branch, but it is too
> early to reject it for performance reasons.  It's a "heads you lose, tails I
> win" situation, I guess.

IIRC MPS is well documented and I can look up those settings, but does
Emacs collect the required info for taking informed decisions?

Anyway, with the setup I'm using for this job is totally unrealistic to
expect instant reaction from Emacs, there is too much heavy stuff
kicking in for every keypress.

> 1. The signal issue.  I don't have a good way to fix this and make
> everyone happy, but I do have a solution which hasn't caused a crash for
> me in quite a while.  It may be good enough.

Inevitably, a few minutes after sending my message Emacs froze after
working flawlessly since you fixed the JSON issue.

Redisplay just stopped while showing the menu, no crash nor infinite
loop, its CPU usage was typical for the repeating timers that my config
creates. Sadly, instead of attaching gdb I tried to wake up Emacs by
sending SIGUSR1 (no effect, as it is the wrong signal, should be
SIGUSR2) and then sent SINGINT by mistake, which terminated the process.

It's very likely that MPS is innocent on this, but I'm happy to apply
and test any stability improvement patch you have and wish to share.

Thanks.

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-22 19:11   ` Óscar Fuentes
@ 2024-12-23  0:05     ` Pip Cet via Emacs development discussions.
  2024-12-23  1:00       ` Óscar Fuentes
  2024-12-23  3:42       ` Some experience with the igc branch Gerd Möllmann
  2024-12-23  6:27     ` Jean Louis
  1 sibling, 2 replies; 203+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-23  0:05 UTC (permalink / raw)
  To: Óscar Fuentes
  Cc: emacs-devel, Gerd Möllmann, Helmut Eller, Andrea Corallo

Óscar Fuentes <ofv@wanadoo.es> writes:

> Pip Cet via "Emacs development discussions." <emacs-devel@gnu.org>
> writes:
>
>>> I suspect that some of those
>>> pauses are not related to garbage collection (executing code and moving
>>> data also takes time.)
>>
>> Quite possible.  Even if it is GC, please keep in mind that MPS has many
>> settings which you can play with, and it can improve things a lot.  It's
>> not too early to become a fan of the scratch/igc branch, but it is too
>> early to reject it for performance reasons.  It's a "heads you lose, tails I
>> win" situation, I guess.
>
> IIRC MPS is well documented and I can look up those settings, but does
> Emacs collect the required info for taking informed decisions?

Not that I'm aware of, at this point.

>> 1. The signal issue.  I don't have a good way to fix this and make
>> everyone happy, but I do have a solution which hasn't caused a crash for
>> me in quite a while.  It may be good enough.
>
> Inevitably, a few minutes after sending my message Emacs froze after
> working flawlessly since you fixed the JSON issue.

Sorry to hear it, and thanks for letting us know!  If it happens again,
any additional information you can provide would be very helpful.

> Redisplay just stopped while showing the menu, no crash nor infinite
> loop, its CPU usage was typical for the repeating timers that my config
> creates.

That's a bit odd.  It might be the signal issue, but that's purely a
guess.  If it happens again, please let us know.

Which windowing system are you using, and how are you displaying menus,
though?

> It's very likely that MPS is innocent on this, but I'm happy to apply
> and test any stability improvement patch you have and wish to share.

I just pushed the temporary fix for the signal issue, which should
improve stability.

Pip




^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-23  0:05     ` Pip Cet via Emacs development discussions.
@ 2024-12-23  1:00       ` Óscar Fuentes
  2024-12-24 22:34         ` Pip Cet via Emacs development discussions.
  2024-12-23  3:42       ` Some experience with the igc branch Gerd Möllmann
  1 sibling, 1 reply; 203+ messages in thread
From: Óscar Fuentes @ 2024-12-23  1:00 UTC (permalink / raw)
  To: Pip Cet; +Cc: emacs-devel, Gerd Möllmann, Helmut Eller, Andrea Corallo

Pip Cet <pipcet@protonmail.com> writes:

>> Redisplay just stopped while showing the menu, no crash nor infinite
>> loop, its CPU usage was typical for the repeating timers that my config
>> creates.
>
> That's a bit odd.  It might be the signal issue, but that's purely a
> guess.  If it happens again, please let us know.

Sure.

> Which windowing system are you using, and how are you displaying menus,
> though?

Configured using:
 'configure CPPFLAGS=-I/home/oscar/dev/include/mps
 LDFLAGS=-L/home/oscar/dev/other/mps/code --with-native-compilation
 --with-tree-sitter --without-toolkit-scroll-bars --with-x-toolkit=lucid
 --with-modules --without-imagemagick --with-mps=yes'

Configured features:
CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GSETTINGS HARFBUZZ JPEG LIBOTF
LIBSELINUX LIBXML2 MODULES MPS NATIVE_COMP NOTIFY INOTIFY PDUMPER PNG
SECCOMP SOUND SQLITE3 THREADS TIFF TREE_SITTER WEBP X11 XAW3D XDBE XIM
XINPUT2 XPM LUCID ZLIB

I have the menubar disabled (menu-bar-mode -1) and use a custom command
to open it:

(defun my-menu-bar-open-after ()
  (remove-hook 'pre-command-hook 'my-menu-bar-open-after)
  (when (eq menu-bar-mode 42)
    (menu-bar-mode -1)))

(defun my-menu-bar-open (&rest args)
  (interactive)
  (let ((open menu-bar-mode))
    (unless open
      (menu-bar-mode 1))
    (funcall 'menu-bar-open args)
    (unless open
      (setq menu-bar-mode 42)
      (add-hook 'pre-command-hook 'my-menu-bar-open-after))))

(global-set-key [f10] 'my-menu-bar-open)

On that same session I used the command multiple times.

>> It's very likely that MPS is innocent on this, but I'm happy to apply
>> and test any stability improvement patch you have and wish to share.
>
> I just pushed the temporary fix for the signal issue, which should
> improve stability.

Emacs is already running here with that commit. Thanks!



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-23  1:00       ` Óscar Fuentes
@ 2024-12-24 22:34         ` Pip Cet via Emacs development discussions.
  2024-12-25  4:25           ` Freezing frame with igc Gerd Möllmann
  0 siblings, 1 reply; 203+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-24 22:34 UTC (permalink / raw)
  To: Óscar Fuentes
  Cc: emacs-devel, Gerd Möllmann, Helmut Eller, Andrea Corallo

Óscar Fuentes <ofv@wanadoo.es> writes:

> Pip Cet <pipcet@protonmail.com> writes:
>
>>> Redisplay just stopped while showing the menu, no crash nor infinite
>>> loop, its CPU usage was typical for the repeating timers that my config
>>> creates.
>>
>> That's a bit odd.  It might be the signal issue, but that's purely a
>> guess.  If it happens again, please let us know.
>
> Sure.

I'm not a hundred percent sure, because I was testing other changes, but
I just observed an Emacs session in a very similar state to what you
describe: very little but nonzero CPU usage, but unresponsive to X
interactions.  I attached gdb, observed it was stuck in read_char, then
I messed up and set Vquit_flag to Qt, at which point the Emacs session
recovered and seems fully usable once more (it did take a while to do
so, though).  So no valuable debug info this time, hope I'll hit it
again.

Again, it's possible this is a similar-looking but different bug,
possibly caused by local changes.

I don't think read_char or its subroutines even use MPS memory, though?
As this is a GTK build, and yours wasn't, we should probably look at X
interaction code shared between the GTK and non-GTK builds.

Pip

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Freezing frame with igc
  2024-12-24 22:34         ` Pip Cet via Emacs development discussions.
@ 2024-12-25  4:25           ` Gerd Möllmann
  2024-12-25 11:19             ` Pip Cet via Emacs development discussions.
  2024-12-25 11:55             ` Óscar Fuentes
  0 siblings, 2 replies; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-25  4:25 UTC (permalink / raw)
  To: Pip Cet; +Cc: Óscar Fuentes, emacs-devel, Helmut Eller, Andrea Corallo

(Subject changed.)

Pip Cet <pipcet@protonmail.com> writes:

> Óscar Fuentes <ofv@wanadoo.es> writes:
>
>> Pip Cet <pipcet@protonmail.com> writes:
>>
>>>> Redisplay just stopped while showing the menu, no crash nor infinite
>>>> loop, its CPU usage was typical for the repeating timers that my config
>>>> creates.
>>>
>>> That's a bit odd.  It might be the signal issue, but that's purely a
>>> guess.  If it happens again, please let us know.
>>
>> Sure.
>
> I'm not a hundred percent sure, because I was testing other changes, but
> I just observed an Emacs session in a very similar state to what you
> describe: very little but nonzero CPU usage, but unresponsive to X
> interactions.  I attached gdb, observed it was stuck in read_char, then
> I messed up and set Vquit_flag to Qt, at which point the Emacs session
> recovered and seems fully usable once more (it did take a while to do
> so, though).  So no valuable debug info this time, hope I'll hit it
> again.
>
> Again, it's possible this is a similar-looking but different bug,
> possibly caused by local changes.
>
> I don't think read_char or its subroutines even use MPS memory, though?
> As this is a GTK build, and yours wasn't, we should probably look at X
> interaction code shared between the GTK and non-GTK builds.
>
> Pip

That reminds of something. Maybe what you've seen is completely
unrelated, it's impossible to tell, but please find below a comment that
I added to do_switch_frame in frame.c.

  /* We want to make sure that the next event generates a frame-switch
     event to the appropriate frame.  This seems kludgy to me, but
     before you take it out, make sure that evaluating something like
     (select-window (frame-root-window (make-frame))) doesn't end up
     with your typing being interpreted in the new frame instead of
     the one you're actually typing in.  */

  /* FIXME/tty: I don't understand this.  (The comment above is from
     Jim BLandy 1993 BTW, and the frame_ancestor_p from 2017.)

     Setting the last event frame to nil leads to switch-frame events
     being generated even if they normally wouldn't be because the frame
     in question equals selected-frame.  See the places in keyboard.c
     where make_lispy_switch_frame is called.

     This leads to problems at least on ttys.

     Imagine that we have functions in post-command-hook that use
     select-frame in some way (e.g., with-selected-window).  Let these
     functions select different frames during the execution of
     post-command-hook in command_loop_1.  Setting
     internal_last_event_frame to nil here makes these select-frame
     calls (potentially and in reality) generate switch-frame events.
     (But only in one direction (frame_ancestor_p), which I also don't
     understand).

     These switch-frame events form an endless loop in
     command_loop_1.  It runs post-command-hook, which generates
     switch-frame events, which command_loop_1 finds (bound to '#ignore)
     and executes, which again runs post-command-hook etc., ad
     infinitum.

     Let's not do that for now on ttys.  */





^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Freezing frame with igc
  2024-12-25  4:25           ` Freezing frame with igc Gerd Möllmann
@ 2024-12-25 11:19             ` Pip Cet via Emacs development discussions.
  2024-12-25 11:55             ` Óscar Fuentes
  1 sibling, 0 replies; 203+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-25 11:19 UTC (permalink / raw)
  To: Gerd Möllmann
  Cc: Óscar Fuentes, emacs-devel, Helmut Eller, Andrea Corallo

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> That reminds of something. Maybe what you've seen is completely
> unrelated, it's impossible to tell, but please find below a comment that
> I added to do_switch_frame in frame.c.

Thanks!  I hope I'll be able to reproduce the issue again.

>      These switch-frame events form an endless loop in
>      command_loop_1.  It runs post-command-hook, which generates
>      switch-frame events, which command_loop_1 finds (bound to '#ignore)
>      and executes, which again runs post-command-hook etc., ad
>      infinitum.

The strange thing is there is not apparently an infinite loop: Emacs
doesn't use 100% CPU, it seems to be calling *select() as usual.
Definitely something to check for, though!

Pip

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Freezing frame with igc
  2024-12-25  4:25           ` Freezing frame with igc Gerd Möllmann
  2024-12-25 11:19             ` Pip Cet via Emacs development discussions.
@ 2024-12-25 11:55             ` Óscar Fuentes
  1 sibling, 0 replies; 203+ messages in thread
From: Óscar Fuentes @ 2024-12-25 11:55 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: Pip Cet, emacs-devel

(CC list trimmed)

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

>>>>> Redisplay just stopped while showing the menu, no crash nor infinite
>>>>> loop, its CPU usage was typical for the repeating timers that my config
>>>>> creates.
>>>>
>>>> That's a bit odd.  It might be the signal issue, but that's purely a
>>>> guess.  If it happens again, please let us know.
>>>
>>> Sure.
>>
>> I'm not a hundred percent sure, because I was testing other changes, but
>> I just observed an Emacs session in a very similar state to what you
>> describe: very little but nonzero CPU usage, but unresponsive to X
>> interactions.  I attached gdb, observed it was stuck in read_char, then
>> I messed up and set Vquit_flag to Qt, at which point the Emacs session
>> recovered and seems fully usable once more (it did take a while to do
>> so, though).  So no valuable debug info this time, hope I'll hit it
>> again.
>>
>> Again, it's possible this is a similar-looking but different bug,
>> possibly caused by local changes.
>>
>> I don't think read_char or its subroutines even use MPS memory, though?
>> As this is a GTK build, and yours wasn't, we should probably look at X
>> interaction code shared between the GTK and non-GTK builds.
>>
>> Pip
>
> That reminds of something. Maybe what you've seen is completely
> unrelated, it's impossible to tell, but please find below a comment that
> I added to do_switch_frame in frame.c.

[snip]

At the time I was working with two frames. The frame where I tried to
show the menu went blank.

I use mini-echo [1], which removes the mode line and uses the echo area
instead, periodically (0.3 seconds) updating the echo area (it has a
cache for not updating when there are no changes, but my experience is
that the updates are very frequent.)

So it could be that mini-echo tried to update the echo area (of both
frames, because it shows the same text on all frames) at an "unfortunate
time" while the menu was being displayed. I don't know if this
hypothesis even makes sense.

BTW, I'm fairly sure that mini-echo was the responsible of the small CPU
activity I saw on htop after Emacs UI froze.

1. https://github.com/liuyinz/mini-echo.el



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-23  0:05     ` Pip Cet via Emacs development discussions.
  2024-12-23  1:00       ` Óscar Fuentes
@ 2024-12-23  3:42       ` Gerd Möllmann
  1 sibling, 0 replies; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-23  3:42 UTC (permalink / raw)
  To: Pip Cet; +Cc: Óscar Fuentes, emacs-devel, Helmut Eller, Andrea Corallo

Pip Cet <pipcet@protonmail.com> writes:

> Óscar Fuentes <ofv@wanadoo.es> writes:
>
>> Pip Cet via "Emacs development discussions." <emacs-devel@gnu.org>
>> writes:
>>
>>>> I suspect that some of those
>>>> pauses are not related to garbage collection (executing code and moving
>>>> data also takes time.)
>>>
>>> Quite possible.  Even if it is GC, please keep in mind that MPS has many
>>> settings which you can play with, and it can improve things a lot.  It's
>>> not too early to become a fan of the scratch/igc branch, but it is too
>>> early to reject it for performance reasons.  It's a "heads you lose, tails I
>>> win" situation, I guess.
>>
>> IIRC MPS is well documented and I can look up those settings, but does
>> Emacs collect the required info for taking informed decisions?
>
> Not that I'm aware of, at this point.

Me neither.

(And, at least for me personally, "interactive performance", i.e. the
impression a user gets when he's using Emacs interactively, is the only
interesting part. That's difficult to measure of course. I don't care
much about performance improvements I don't notice :-)).

>
>>> 1. The signal issue.  I don't have a good way to fix this and make
>>> everyone happy, but I do have a solution which hasn't caused a crash for
>>> me in quite a while.  It may be good enough.
>>
>> Inevitably, a few minutes after sending my message Emacs froze after
>> working flawlessly since you fixed the JSON issue.
>
> Sorry to hear it, and thanks for letting us know!  If it happens again,
> any additional information you can provide would be very helpful.
>
>> Redisplay just stopped while showing the menu, no crash nor infinite
>> loop, its CPU usage was typical for the repeating timers that my config
>> creates.
>
> That's a bit odd.  It might be the signal issue, but that's purely a
> guess.  If it happens again, please let us know.
>
> Which windowing system are you using, and how are you displaying menus,
> though?

Yes. I don't think I've ever seen a freeze caused by igc here. It always
was crashes. But one never knows, of course.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-22 19:11   ` Óscar Fuentes
  2024-12-23  0:05     ` Pip Cet via Emacs development discussions.
@ 2024-12-23  6:27     ` Jean Louis
  1 sibling, 0 replies; 203+ messages in thread
From: Jean Louis @ 2024-12-23  6:27 UTC (permalink / raw)
  To: Óscar Fuentes
  Cc: emacs-devel, Pip Cet, Gerd Möllmann, Helmut Eller,
	Andrea Corallo

* Óscar Fuentes <ofv@wanadoo.es> [2024-12-22 22:13]:
> > 1. The signal issue.  I don't have a good way to fix this and make
> > everyone happy, but I do have a solution which hasn't caused a crash for
> > me in quite a while.  It may be good enough.
> 
> Inevitably, a few minutes after sending my message Emacs froze after
> working flawlessly since you fixed the JSON issue.
> 
> Redisplay just stopped while showing the menu, no crash nor infinite
> loop, its CPU usage was typical for the repeating timers that my config
> creates. Sadly, instead of attaching gdb I tried to wake up Emacs by
> sending SIGUSR1 (no effect, as it is the wrong signal, should be
> SIGUSR2) and then sent SINGINT by mistake, which terminated the process.
> 
> It's very likely that MPS is innocent on this, but I'm happy to apply
> and test any stability improvement patch you have and wish to share.

I was using that branch for longer, but being heavy daily user of
Emacs with serious business, I cannot use it, it is not yet
stable. Reasons I have sent already to this list, I had no issues
since I switched to standard Emacs.

Most terrible was that ghostly appearance of words and characters
which I didn't type, or totally scrambling characters which I type.


-- 
Jean Louis



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-22 17:41 ` Pip Cet via Emacs development discussions.
  2024-12-22 17:56   ` Gerd Möllmann
  2024-12-22 19:11   ` Óscar Fuentes
@ 2024-12-22 20:29   ` Helmut Eller
  2024-12-22 20:50   ` Gerd Möllmann
  3 siblings, 0 replies; 203+ messages in thread
From: Helmut Eller @ 2024-12-22 20:29 UTC (permalink / raw)
  To: Pip Cet; +Cc: Óscar Fuentes, emacs-devel, Gerd Möllmann,
	Andrea Corallo

On Sun, Dec 22 2024, Pip Cet wrote:

> 2. no-purespace.  Merging that into scratch/igc would help, well, me.
> What do others think?

No objections from me.

> 3. bytecode stack marking.  That comment raises my red-flag alert,
> because it sounds like we're just accepting a preventable crash at this
> stage rather than wanting to do anything about it.  The reality, of
> course, is different, but I'd be happier if we refused to create a byte
> code object that intends to use more stack than we can guarantee we
> would scan.  Can we do that?

Maybe the bytecode engine could handle large stack frames differently
from small stack frames.

For large stack frames we would:
 1. initialize the stack frame with NULLs
 2. bump the stack pointer
 3. now the stack frame is usable

For small stack frames, we would skip step 1 but the GC would always
scan one extra "small frame with maximal length".

Helmut





^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-22 17:41 ` Pip Cet via Emacs development discussions.
                     ` (2 preceding siblings ...)
  2024-12-22 20:29   ` Helmut Eller
@ 2024-12-22 20:50   ` Gerd Möllmann
  2024-12-22 22:26     ` Pip Cet via Emacs development discussions.
  3 siblings, 1 reply; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-22 20:50 UTC (permalink / raw)
  To: Pip Cet; +Cc: Óscar Fuentes, emacs-devel, Helmut Eller, Andrea Corallo

Pip Cet <pipcet@protonmail.com> writes:

> Óscar Fuentes <ofv@wanadoo.es> writes:
>> With igc the pauses are still there, but they much shorter and
>> predictable, they no longer distract me from thinking on what I'm
>> writing, which is a huge improvement. I suspect that some of those
>> pauses are not related to garbage collection (executing code and moving
>> data also takes time.)
>
> Quite possible.  Even if it is GC, please keep in mind that MPS has many
> settings which you can play with, and it can improve things a lot.  It's
> not too early to become a fan of the scratch/igc branch, but it is too
> early to reject it for performance reasons.  It's a "heads you lose, tails I
> win" situation, I guess.
>
>> TL/DR: now I enjoy using Emacs with this setup and I'm no longer tempted
>> to switch to other editors for this type of work.
>
> I think this is an important point: ultimately, it's about having daily
> drivers.  We need to remove the remaining impediments for that:
>
> 1. The signal issue.  I don't have a good way to fix this and make
> everyone happy, but I do have a solution which hasn't caused a crash for
> me in quite a while.  It may be good enough.

TBH, I'd have put it in already.

> 2. no-purespace.  Merging that into scratch/igc would help, well, me.
> What do others think?

Doesn't affect me much.




^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-22 20:50   ` Gerd Möllmann
@ 2024-12-22 22:26     ` Pip Cet via Emacs development discussions.
  2024-12-23  3:23       ` Gerd Möllmann
  2024-12-23 13:35       ` Some experience with the igc branch Eli Zaretskii
  0 siblings, 2 replies; 203+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-22 22:26 UTC (permalink / raw)
  To: Gerd Möllmann
  Cc: Óscar Fuentes, emacs-devel, Helmut Eller, Andrea Corallo

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Pip Cet <pipcet@protonmail.com> writes:
>
>> Óscar Fuentes <ofv@wanadoo.es> writes:
>>> With igc the pauses are still there, but they much shorter and
>>> predictable, they no longer distract me from thinking on what I'm
>>> writing, which is a huge improvement. I suspect that some of those
>>> pauses are not related to garbage collection (executing code and moving
>>> data also takes time.)
>>
>> Quite possible.  Even if it is GC, please keep in mind that MPS has many
>> settings which you can play with, and it can improve things a lot.  It's
>> not too early to become a fan of the scratch/igc branch, but it is too
>> early to reject it for performance reasons.  It's a "heads you lose, tails I
>> win" situation, I guess.
>>
>>> TL/DR: now I enjoy using Emacs with this setup and I'm no longer tempted
>>> to switch to other editors for this type of work.
>>
>> I think this is an important point: ultimately, it's about having daily
>> drivers.  We need to remove the remaining impediments for that:
>>
>> 1. The signal issue.  I don't have a good way to fix this and make
>> everyone happy, but I do have a solution which hasn't caused a crash for
>> me in quite a while.  It may be good enough.
>
> TBH, I'd have put it in already.

Pushed it now.  It is imperfect, but better than crashing.

>> 2. no-purespace.  Merging that into scratch/igc would help, well, me.
>> What do others think?
>
> Doesn't affect me much.

Well, it does cause some noise, so I thought I'd ask first.

Pip

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-22 22:26     ` Pip Cet via Emacs development discussions.
@ 2024-12-23  3:23       ` Gerd Möllmann
       [not found]         ` <m234ieddeu.fsf_-_@gmail.com>
  2024-12-23 13:35       ` Some experience with the igc branch Eli Zaretskii
  1 sibling, 1 reply; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-23  3:23 UTC (permalink / raw)
  To: Pip Cet; +Cc: Óscar Fuentes, emacs-devel, Helmut Eller, Andrea Corallo

Pip Cet <pipcet@protonmail.com> writes:

> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>
>> Pip Cet <pipcet@protonmail.com> writes:
>>
>>> Óscar Fuentes <ofv@wanadoo.es> writes:
>>>> With igc the pauses are still there, but they much shorter and
>>>> predictable, they no longer distract me from thinking on what I'm
>>>> writing, which is a huge improvement. I suspect that some of those
>>>> pauses are not related to garbage collection (executing code and moving
>>>> data also takes time.)
>>>
>>> Quite possible.  Even if it is GC, please keep in mind that MPS has many
>>> settings which you can play with, and it can improve things a lot.  It's
>>> not too early to become a fan of the scratch/igc branch, but it is too
>>> early to reject it for performance reasons.  It's a "heads you lose, tails I
>>> win" situation, I guess.
>>>
>>>> TL/DR: now I enjoy using Emacs with this setup and I'm no longer tempted
>>>> to switch to other editors for this type of work.
>>>
>>> I think this is an important point: ultimately, it's about having daily
>>> drivers.  We need to remove the remaining impediments for that:
>>>
>>> 1. The signal issue.  I don't have a good way to fix this and make
>>> everyone happy, but I do have a solution which hasn't caused a crash for
>>> me in quite a while.  It may be good enough.
>>
>> TBH, I'd have put it in already.
>
> Pushed it now.  It is imperfect, but better than crashing.

100%. Thanks!

>
>>> 2. no-purespace.  Merging that into scratch/igc would help, well, me.
>>> What do others think?
>>
>> Doesn't affect me much.
>
> Well, it does cause some noise, so I thought I'd ask first.
>
> Pip

That's nice of you.



^ permalink raw reply	[flat|nested] 203+ messages in thread

[parent not found: <m234ieddeu.fsf_-_@gmail.com>]

[parent not found: <87ttaueqp9.fsf@protonmail.com>]

[parent not found: <m2frme921u.fsf@gmail.com>]

[parent not found: <87ldw6ejkv.fsf@protonmail.com>]

[parent not found: <m2bjx2h8dh.fsf@gmail.com>]

* Re: Make Signal handling patch platform-dependent?
       [not found]                 ` <m2bjx2h8dh.fsf@gmail.com>
@ 2024-12-23 14:45                   ` Pip Cet via Emacs development discussions.
  2024-12-23 14:54                     ` Gerd Möllmann
  0 siblings, 1 reply; 203+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-23 14:45 UTC (permalink / raw)
  To: Gerd Möllmann
  Cc: Óscar Fuentes, emacs-devel, Helmut Eller, Andrea Corallo

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Pip Cet <pipcet@protonmail.com> writes:
>
>> And, who knows, using a separate thread might help (debugging, not
>> performance).
>
> Yeah, more long-term goals, I'd guess. I'm glad we're moving forward,
> ATM :-).

If we come up with a solution to the signal issue which works but
requires the creation of extra threads, would that prevent the merge?

>> The rest of this email is about a half-baked idea to perform dry-run
>> background GCs to facilitate debugging.  It's tantalizingly close to
>> offering performance benefits, but doesn't quite get there, and it
>> doesn't have to: it'd help us detect leaked references to objects, and
>> that's all it needs to do.
>>
>> I'm still thinking about double-mapping MPS segments so one thread can
>> scan them using a "privileged" mapping while the barrier is in place for
>> the ordinary mapping and prevents access to that segment.
>
> Quick question upfront; I'll have to think longer about the rest, and
> maybe try to find existing examples: the double-mapping. How would that
> be done? I know about page-table manipulation, but I don't think it's
> easily doable, at least not on macOS. What would you use for
> double-mapping?

My understanding is the two options are SysV shm* (clunky) or mmapping a
file handle corresponding to an already-deleted file, twice (some risk
the OS will synchronize the file to disk, maybe even page it out; also
might count towards the disk quota).

I prefer the latter because I wouldn't actually delete the file, which
would give us a snapshot of the MPS heap in the event of a crash.  If
that isn't enough, we could explicitly snapshot the file once in a
while, before moving objects, giving us the ability to detect where an
object moved to.  (If THAT isn't enough, we'd have two additional
options: either hack MPS not to reuse virtual addresses unless it really
has to, or store the file on a fully journaled file system allowing us
to time-travel through the MPS heap.)  Also, I've never used shm*.

If there's a third option, it'd be great to learn about it.

Needless to say, double-mapping doubles the VM size, which is limited on
32-bit systems.

I don't think virtually-indexed caches are a thing anymore (if the cache
doesn't recognize two VAs correspond to the same PA, well, great fun
ensues).

IIUC, some aarch64 systems, but not those usually running macOS, have
weak cache coherency, and as double-mapping is a valid but rare thing to
do, who knows what would happen.

Pip

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Make Signal handling patch platform-dependent?
  2024-12-23 14:45                   ` Make Signal handling patch platform-dependent? Pip Cet via Emacs development discussions.
@ 2024-12-23 14:54                     ` Gerd Möllmann
  2024-12-23 15:11                       ` Eli Zaretskii
  0 siblings, 1 reply; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-23 14:54 UTC (permalink / raw)
  To: Pip Cet; +Cc: Óscar Fuentes, emacs-devel, Helmut Eller, Andrea Corallo

Pip Cet <pipcet@protonmail.com> writes:

> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>
>> Pip Cet <pipcet@protonmail.com> writes:
>>
>>> And, who knows, using a separate thread might help (debugging, not
>>> performance).
>>
>> Yeah, more long-term goals, I'd guess. I'm glad we're moving forward,
>> ATM :-).
>
> If we come up with a solution to the signal issue which works but
> requires the creation of extra threads, would that prevent the merge?

I think that's for Eli to answer.

>>> The rest of this email is about a half-baked idea to perform dry-run
>>> background GCs to facilitate debugging.  It's tantalizingly close to
>>> offering performance benefits, but doesn't quite get there, and it
>>> doesn't have to: it'd help us detect leaked references to objects, and
>>> that's all it needs to do.
>>>
>>> I'm still thinking about double-mapping MPS segments so one thread can
>>> scan them using a "privileged" mapping while the barrier is in place for
>>> the ordinary mapping and prevents access to that segment.
>>
>> Quick question upfront; I'll have to think longer about the rest, and
>> maybe try to find existing examples: the double-mapping. How would that
>> be done? I know about page-table manipulation, but I don't think it's
>> easily doable, at least not on macOS. What would you use for
>> double-mapping?
>
> My understanding is the two options are SysV shm* (clunky) or mmapping a
> file handle corresponding to an already-deleted file, twice (some risk
> the OS will synchronize the file to disk, maybe even page it out; also
> might count towards the disk quota).
>
> I prefer the latter because I wouldn't actually delete the file, which
> would give us a snapshot of the MPS heap in the event of a crash.  If
> that isn't enough, we could explicitly snapshot the file once in a
> while, before moving objects, giving us the ability to detect where an
> object moved to.  (If THAT isn't enough, we'd have two additional
> options: either hack MPS not to reuse virtual addresses unless it really
> has to, or store the file on a fully journaled file system allowing us
> to time-travel through the MPS heap.)  Also, I've never used shm*.
>
> If there's a third option, it'd be great to learn about it.

I don't know of a third option.

>
> Needless to say, double-mapping doubles the VM size, which is limited on
> 32-bit systems.
>
> I don't think virtually-indexed caches are a thing anymore (if the cache
> doesn't recognize two VAs correspond to the same PA, well, great fun
> ensues).
>
> IIUC, some aarch64 systems, but not those usually running macOS, have
> weak cache coherency, and as double-mapping is a valid but rare thing to
> do, who knows what would happen.
>
> Pip

Thanks so far!



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Make Signal handling patch platform-dependent?
  2024-12-23 14:54                     ` Gerd Möllmann
@ 2024-12-23 15:11                       ` Eli Zaretskii
  0 siblings, 0 replies; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-23 15:11 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: pipcet, ofv, emacs-devel, eller.helmut, acorallo

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: Óscar Fuentes <ofv@wanadoo.es>,  emacs-devel@gnu.org,
>  Helmut Eller <eller.helmut@gmail.com>,  Andrea Corallo <acorallo@gnu.org>
> Date: Mon, 23 Dec 2024 15:54:36 +0100
> 
> Pip Cet <pipcet@protonmail.com> writes:
> 
> > If we come up with a solution to the signal issue which works but
> > requires the creation of extra threads, would that prevent the merge?
> 
> I think that's for Eli to answer.

I don't see why extra threads would be a problem, as long as they
don't use the Lisp machine or any parts of the global state.  We
already have several threads on MS-Windows, including for emulating
Posix signals, so I don't see why adding C threads on Posix systems
would be "verboten".



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-22 22:26     ` Pip Cet via Emacs development discussions.
  2024-12-23  3:23       ` Gerd Möllmann
@ 2024-12-23 13:35       ` Eli Zaretskii
  2024-12-23 14:03         ` Discussion with MPS people Gerd Möllmann
  2024-12-23 15:07         ` Some experience with the igc branch Pip Cet via Emacs development discussions.
  1 sibling, 2 replies; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-23 13:35 UTC (permalink / raw)
  To: Pip Cet; +Cc: gerd.moellmann, ofv, emacs-devel, eller.helmut, acorallo

> Date: Sun, 22 Dec 2024 22:26:11 +0000
> Cc: Óscar Fuentes <ofv@wanadoo.es>, emacs-devel@gnu.org,
>  Helmut Eller <eller.helmut@gmail.com>, Andrea Corallo <acorallo@gnu.org>
> From:  Pip Cet via "Emacs development discussions." <emacs-devel@gnu.org>
> 
> >> 1. The signal issue.  I don't have a good way to fix this and make
> >> everyone happy, but I do have a solution which hasn't caused a crash for
> >> me in quite a while.  It may be good enough.
> >
> > TBH, I'd have put it in already.
> 
> Pushed it now.  It is imperfect, but better than crashing.

Why didn't we discuss this with MPS folks?  A program can legitimately
call some code from a signal handler, so the limitations that MPS
seems to impose now are not very reasonable.  Maybe we are missing
some feature, or maybe the MPS folks will agree to extend the library
to provide better support for programs that use signals.  E.g., AFAIU
with this code installed, we are limiting our profiler too much (it
will never report GC, IIRC?).  I think igc_busy_p returns non-zero in
too many situations where delivering signals could not possibly cause
harm, like during object allocation, AFAIR.  According to
documentation, that function is not intended for this kind of purpose.

IOW, we had discussions about this which never concluded anything, and
we should pick up where we left off and solve this problem.

We should definitely try improving this before we land the branch on
master.  We shouldn't consider this solution "good enough", but just a
temporary kludge meant to avoid too frequent crashes.

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Discussion with MPS people
  2024-12-23 13:35       ` Some experience with the igc branch Eli Zaretskii
@ 2024-12-23 14:03         ` Gerd Möllmann
  2024-12-23 14:04           ` Gerd Möllmann
  2024-12-23 15:07         ` Some experience with the igc branch Pip Cet via Emacs development discussions.
  1 sibling, 1 reply; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-23 14:03 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Pip Cet, ofv, emacs-devel, eller.helmut, acorallo

Eli Zaretskii <eliz@gnu.org> writes:

> Why didn't we discuss this with MPS folks?

That would be a good thing. Maybe we can get at least some initial
contact going.

I've CC'd Richard Brooksby, who is one of the main people behind MPS.
(AFAIU; sorry Richard if I under-represent your role.) I've seen that he
recently answered on the bug list, so maybe he's interested in helping
us.

@Richard:

Eli is an Emacs co-maintainer, and this mail goes in CC to the
emacs-devel mailing list. Pip Cet has taken over further development of
the branch scratch/igc, which contains a GC for Emacs based on MPS.
Helmut Eller is basically the third person doing importatnt work on
scratch/igc in the past.

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Discussion with MPS people
  2024-12-23 14:03         ` Discussion with MPS people Gerd Möllmann
@ 2024-12-23 14:04           ` Gerd Möllmann
  0 siblings, 0 replies; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-23 14:04 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: Pip Cet, ofv, emacs-devel, eller.helmut, acorallo,
	Richard Brooksby

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Eli Zaretskii <eliz@gnu.org> writes:
>
>> Why didn't we discuss this with MPS folks?
>
> That would be a good thing. Maybe we can get at least some initial
> contact going.
>
> I've CC'd Richard Brooksby, who is one of the main people behind MPS.
> (AFAIU; sorry Richard if I under-represent your role.) I've seen that he
> recently answered on the bug list, so maybe he's interested in helping
> us.
>
> @Richard:
>
> Eli is an Emacs co-maintainer, and this mail goes in CC to the
> emacs-devel mailing list. Pip Cet has taken over further development of
> the branch scratch/igc, which contains a GC for Emacs based on MPS.
> Helmut Eller is basically the third person doing importatnt work on
> scratch/igc in the past.

And of course I've forgotten to actually add Richard in CC. Now done.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-23 13:35       ` Some experience with the igc branch Eli Zaretskii
  2024-12-23 14:03         ` Discussion with MPS people Gerd Möllmann
@ 2024-12-23 15:07         ` Pip Cet via Emacs development discussions.
  2024-12-23 15:26           ` Gerd Möllmann
  1 sibling, 1 reply; 203+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-23 15:07 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gerd.moellmann, ofv, emacs-devel, eller.helmut, acorallo

"Eli Zaretskii" <eliz@gnu.org> writes:

>> Date: Sun, 22 Dec 2024 22:26:11 +0000
>> Cc: Óscar Fuentes <ofv@wanadoo.es>, emacs-devel@gnu.org,
>>  Helmut Eller <eller.helmut@gmail.com>, Andrea Corallo <acorallo@gnu.org>
>> From:  Pip Cet via "Emacs development discussions." <emacs-devel@gnu.org>
>>
>> >> 1. The signal issue.  I don't have a good way to fix this and make
>> >> everyone happy, but I do have a solution which hasn't caused a crash for
>> >> me in quite a while.  It may be good enough.
>> >
>> > TBH, I'd have put it in already.
>>
>> Pushed it now.  It is imperfect, but better than crashing.
>
> Why didn't we discuss this with MPS folks?  A program can legitimately

Because...

> call some code from a signal handler, so the limitations that MPS
> seems to impose now are not very reasonable.  Maybe we are missing

...if they were interested, maybe they've read this or some other
blanket accusation of being "unreasonable", and became uninterested
quickly.  I know I would.

> some feature, or maybe the MPS folks will agree to extend the library
> to provide better support for programs that use signals.  E.g., AFAIU
> with this code installed, we are limiting our profiler too much (it
> will never report GC, IIRC?).  I think igc_busy_p returns non-zero in
> too many situations where delivering signals could not possibly cause
> harm, like during object allocation, AFAIR.  According to
> documentation, that function is not intended for this kind of purpose.
>
> IOW, we had discussions about this which never concluded anything, and
> we should pick up where we left off and solve this problem.

I have a different idea using a separate allocation thread (for the slow
path only, of course).  Would that be potentially acceptable?

It would limit MPS to systems providing a working atomic.h header, and
in practice also require some sort of working (and reasonably fast)
inter-thread signalling (though I suspect it'd be faster to run both
threads on the same core, since it's a handover rather than a
parallelism situation).  That excludes very few systems these days
(sorry, MS-DOS).

I'll spare you most of the details for now, but having read the mps
header, MPS allocation is not safe to use from separate threads without
locking the AP (or having per-thread APs), which we might end up doing
on Windows, IIRC.  I'd rather give those (potential) issues a wide
berth.  Also, by the campsite rule, merging MPS shouldn't make it harder
to move in the direction of multi-threaded Emacs.

Better debugging (which I agree with you is something we need to
improve), no MPS modification.  Performance implications TBD.

> We should definitely try improving this before we land the branch on
> master.  We shouldn't consider this solution "good enough", but just a
> temporary kludge meant to avoid too frequent crashes.

Agreed.

Pip

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-23 15:07         ` Some experience with the igc branch Pip Cet via Emacs development discussions.
@ 2024-12-23 15:26           ` Gerd Möllmann
  2024-12-23 16:03             ` Pip Cet via Emacs development discussions.
  0 siblings, 1 reply; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-23 15:26 UTC (permalink / raw)
  To: Pip Cet; +Cc: Eli Zaretskii, ofv, emacs-devel, eller.helmut, acorallo

Pip Cet <pipcet@protonmail.com> writes:

> I'll spare you most of the details for now, but having read the mps
> header, MPS allocation is not safe to use from separate threads without
> locking the AP (or having per-thread APs), which we might end up doing
> on Windows, IIRC.

Now I'm confused. We're using thread allocation points. See
create_thread_aps, thread_ap, and so on. 

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-23 15:26           ` Gerd Möllmann
@ 2024-12-23 16:03             ` Pip Cet via Emacs development discussions.
  2024-12-23 16:44               ` Eli Zaretskii
  2024-12-23 17:44               ` Gerd Möllmann
  0 siblings, 2 replies; 203+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-23 16:03 UTC (permalink / raw)
  To: Gerd Möllmann
  Cc: Eli Zaretskii, ofv, emacs-devel, eller.helmut, acorallo

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Pip Cet <pipcet@protonmail.com> writes:
>
>> I'll spare you most of the details for now, but having read the mps
>> header, MPS allocation is not safe to use from separate threads without
>> locking the AP (or having per-thread APs), which we might end up doing
>> on Windows, IIRC.
>
> Now I'm confused. We're using thread allocation points. See
> create_thread_aps, thread_ap, and so on.

I was confused.  This is only a problem if we allocate memory from a
signal handler, which is effectively sharing the per-thread structure.

(I'm still confused. My patch worked on the first attempt, which my code
never does.  I suspect that while I made a mistake, it caused a subtle
bug rather than an obvious one.)

And we don't want to allocate memory from signal handlers, right?  We
could, now (see warnings below):

diff --git a/src/igc.c b/src/igc.c
index eb72406e529..14ecc30f982 100644
--- a/src/igc.c
+++ b/src/igc.c
@@ -747,19 +747,41 @@ IGC_DEFINE_LIST (igc_root);
 
 /* Registry entry for an MPS thread mps_thr_t.  */
 
+#include <pthread.h>
+#include <stdatomic.h>
+
+struct emacs_ap
+{
+  mps_ap_t mps_ap;
+  struct igc *gc;
+  pthread_t allocation_thread;
+  atomic_uintptr_t usable_memory;
+  atomic_uintptr_t usable_bytes;
+
+  atomic_uintptr_t waiting_threads;
+  atomic_uintptr_t requested_bytes;
+  atomic_intptr_t requested_type;
+};
+
+typedef struct emacs_ap emacs_ap_t;
+
+#ifndef ATOMIC_POINTER_LOCK_FREE
+#error "this probably won't work"
+#endif
+
 struct igc_thread
 {
   struct igc *gc;
   mps_thr_t thr;
 
   /* Allocation points for the thread.  */
-  mps_ap_t dflt_ap;
-  mps_ap_t leaf_ap;
-  mps_ap_t weak_strong_ap;
-  mps_ap_t weak_weak_ap;
-  mps_ap_t weak_hash_strong_ap;
-  mps_ap_t weak_hash_weak_ap;
-  mps_ap_t immovable_ap;
+  emacs_ap_t dflt_ap;
+  emacs_ap_t leaf_ap;
+  emacs_ap_t weak_strong_ap;
+  emacs_ap_t weak_weak_ap;
+  emacs_ap_t weak_hash_strong_ap;
+  emacs_ap_t weak_hash_weak_ap;
+  emacs_ap_t immovable_ap;
 
   /* Quick access to the roots used for specpdl, bytecode stack and
      control stack.  */
@@ -805,6 +827,8 @@ IGC_DEFINE_LIST (igc_thread);
 
   /* Registered threads.  */
   struct igc_thread_list *threads;
+
+  pthread_cond_t cond;
 };
 
 static bool process_one_message (struct igc *gc);
@@ -2904,8 +2928,84 @@ igc_root_destroy_comp_unit_eph (struct Lisp_Native_Comp_Unit *u)
   maybe_destroy_root (&u->data_eph_relocs_root);
 }
 
+static mps_addr_t alloc_impl_raw (size_t size, enum igc_obj_type type, mps_ap_t ap);
+static mps_addr_t alloc_impl (size_t size, enum igc_obj_type type, emacs_ap_t *ap);
+
+static void *igc_allocation_thread (void *ap_v)
+{
+  emacs_ap_t *ap = ap_v;
+  while (true)
+    {
+      if (ap->requested_bytes)
+	{
+	  void *p = alloc_impl_raw (ap->requested_bytes, (enum igc_obj_type) ap->requested_type, ap->mps_ap);
+	  atomic_store (&ap->usable_memory, (uintptr_t) p);
+	  atomic_store (&ap->usable_bytes, ap->requested_bytes);
+	  atomic_store (&ap->requested_type, -1);
+	  atomic_store (&ap->requested_bytes, 0);
+	}
+    }
+
+  return NULL;
+}
+
+static mps_addr_t alloc_impl (size_t size, enum igc_obj_type type, emacs_ap_t *ap)
+{
+  if (size == 0)
+    return 0;
+  while (true)
+    {
+      uintptr_t other_threads = atomic_fetch_add (&ap->waiting_threads, 1);
+      if (other_threads != 0)
+	{
+	  /* we know that the other "thread" is actually on top of us,
+	   * and we're a signal handler. Wait, should we even be
+	   * allocating memory?  We should still eassert that we're the
+	   * right thread. */
+	  emacs_ap_t saved_state;
+	  while (ap->requested_bytes);
+	  memcpy (&saved_state, ap, sizeof saved_state);
+	  atomic_store (&ap->waiting_threads, 0);
+	  mps_addr_t ret = alloc_impl (size, type, ap);
+	  atomic_store (&ap->waiting_threads, saved_state.waiting_threads);
+	  memcpy (ap, &saved_state, sizeof saved_state);
+	  atomic_fetch_add (&ap->waiting_threads, -1);
+	  return ret;
+	}
+
+      atomic_store (&ap->requested_type, (uintptr_t) type);
+      atomic_store (&ap->requested_bytes, (uintptr_t) size);
+
+      while (ap->requested_bytes);
+
+      mps_addr_t ret = (mps_addr_t) ap->usable_memory;
+      atomic_fetch_add (&ap->waiting_threads, -1);
+      return ret;
+    }
+}
+
+static mps_res_t emacs_ap_create_k (emacs_ap_t *ap, mps_pool_t pool,
+				    mps_arg_s *args)
+{
+  atomic_store(&ap->usable_memory, 0);
+  atomic_store(&ap->usable_bytes, 0);
+  atomic_store(&ap->waiting_threads, 0);
+  atomic_store(&ap->requested_bytes, 0);
+
+  pthread_attr_t thread_attr;
+  pthread_attr_init (&thread_attr);
+  pthread_create(&ap->allocation_thread, &thread_attr, igc_allocation_thread, ap);
+
+  return mps_ap_create_k (&ap->mps_ap, pool, args);
+}
+
+static void emacs_ap_destroy (emacs_ap_t *ap)
+{
+  return;
+}
+
 static mps_res_t
-create_weak_ap (mps_ap_t *ap, struct igc_thread *t, bool weak)
+create_weak_ap (emacs_ap_t *ap, struct igc_thread *t, bool weak)
 {
   struct igc *gc = t->gc;
   mps_res_t res;
@@ -2914,14 +3014,14 @@ create_weak_ap (mps_ap_t *ap, struct igc_thread *t, bool weak)
   {
     MPS_ARGS_ADD (args, MPS_KEY_RANK,
 		  weak ? mps_rank_weak () : mps_rank_exact ());
-    res = mps_ap_create_k (ap, pool, args);
+    res = emacs_ap_create_k (ap, pool, args);
   }
   MPS_ARGS_END (args);
   return res;
 }
 
 static mps_res_t
-create_weak_hash_ap (mps_ap_t *ap, struct igc_thread *t, bool weak)
+create_weak_hash_ap (emacs_ap_t *ap, struct igc_thread *t, bool weak)
 {
   struct igc *gc = t->gc;
   mps_res_t res;
@@ -2930,7 +3030,7 @@ create_weak_hash_ap (mps_ap_t *ap, struct igc_thread *t, bool weak)
   {
     MPS_ARGS_ADD (args, MPS_KEY_RANK,
 		  weak ? mps_rank_weak () : mps_rank_exact ());
-    res = mps_ap_create_k (ap, pool, args);
+    res = emacs_ap_create_k (ap, pool, args);
   }
   MPS_ARGS_END (args);
   return res;
@@ -2940,12 +3040,15 @@ create_weak_hash_ap (mps_ap_t *ap, struct igc_thread *t, bool weak)
 create_thread_aps (struct igc_thread *t)
 {
   struct igc *gc = t->gc;
+  pthread_condattr_t condattr;
+  pthread_condattr_init (&condattr);
+  pthread_cond_init (&gc->cond, &condattr);
   mps_res_t res;
-  res = mps_ap_create_k (&t->dflt_ap, gc->dflt_pool, mps_args_none);
+  res = emacs_ap_create_k (&t->dflt_ap, gc->dflt_pool, mps_args_none);
   IGC_CHECK_RES (res);
-  res = mps_ap_create_k (&t->leaf_ap, gc->leaf_pool, mps_args_none);
+  res = emacs_ap_create_k (&t->leaf_ap, gc->leaf_pool, mps_args_none);
   IGC_CHECK_RES (res);
-  res = mps_ap_create_k (&t->immovable_ap, gc->immovable_pool, mps_args_none);
+  res = emacs_ap_create_k (&t->immovable_ap, gc->immovable_pool, mps_args_none);
   IGC_CHECK_RES (res);
   res = create_weak_ap (&t->weak_strong_ap, t, false);
   res = create_weak_hash_ap (&t->weak_hash_strong_ap, t, false);
@@ -3007,13 +3110,13 @@ igc_thread_remove (void **pinfo)
   destroy_root (&t->d.stack_root);
   destroy_root (&t->d.specpdl_root);
   destroy_root (&t->d.bc_root);
-  mps_ap_destroy (t->d.dflt_ap);
-  mps_ap_destroy (t->d.leaf_ap);
-  mps_ap_destroy (t->d.weak_strong_ap);
-  mps_ap_destroy (t->d.weak_weak_ap);
-  mps_ap_destroy (t->d.weak_hash_strong_ap);
-  mps_ap_destroy (t->d.weak_hash_weak_ap);
-  mps_ap_destroy (t->d.immovable_ap);
+  emacs_ap_destroy (&t->d.dflt_ap);
+  emacs_ap_destroy (&t->d.leaf_ap);
+  emacs_ap_destroy (&t->d.weak_strong_ap);
+  emacs_ap_destroy (&t->d.weak_weak_ap);
+  emacs_ap_destroy (&t->d.weak_hash_strong_ap);
+  emacs_ap_destroy (&t->d.weak_hash_weak_ap);
+  emacs_ap_destroy (&t->d.immovable_ap);
   mps_thread_dereg (deregister_thread (t));
 }
 
@@ -3677,7 +3780,7 @@ igc_on_idle (void)
   }
 }
 
-static mps_ap_t
+static emacs_ap_t *
 thread_ap (enum igc_obj_type type)
 {
   struct igc_thread_list *t = current_thread->gc_info;
@@ -3698,13 +3801,13 @@ thread_ap (enum igc_obj_type type)
       emacs_abort ();
 
     case IGC_OBJ_MARKER_VECTOR:
-      return t->d.weak_weak_ap;
+      return &t->d.weak_weak_ap;
 
     case IGC_OBJ_WEAK_HASH_TABLE_WEAK_PART:
-      return t->d.weak_hash_weak_ap;
+      return &t->d.weak_hash_weak_ap;
 
     case IGC_OBJ_WEAK_HASH_TABLE_STRONG_PART:
-      return t->d.weak_hash_strong_ap;
+      return &t->d.weak_hash_strong_ap;
 
     case IGC_OBJ_VECTOR:
     case IGC_OBJ_CONS:
@@ -3719,12 +3822,12 @@ thread_ap (enum igc_obj_type type)
     case IGC_OBJ_FACE_CACHE:
     case IGC_OBJ_BLV:
     case IGC_OBJ_HANDLER:
-      return t->d.dflt_ap;
+      return &t->d.dflt_ap;
 
     case IGC_OBJ_STRING_DATA:
     case IGC_OBJ_FLOAT:
     case IGC_OBJ_BYTES:
-      return t->d.leaf_ap;
+      return &t->d.leaf_ap;
     }
   emacs_abort ();
 }
@@ -3796,7 +3899,7 @@ igc_hash (Lisp_Object key)
    object.  */
 
 static mps_addr_t
-alloc_impl (size_t size, enum igc_obj_type type, mps_ap_t ap)
+alloc_impl_raw (size_t size, enum igc_obj_type type, mps_ap_t ap)
 {
   mps_addr_t p UNINIT;
   size = alloc_size (size);
@@ -3845,7 +3948,7 @@ alloc (size_t size, enum igc_obj_type type)
 alloc_immovable (size_t size, enum igc_obj_type type)
 {
   struct igc_thread_list *t = current_thread->gc_info;
-  return alloc_impl (size, type, t->d.immovable_ap);
+  return alloc_impl (size, type, &t->d.immovable_ap);
 }
 
 #ifdef HAVE_MODULES
@@ -4883,17 +4986,17 @@ igc_on_pdump_loaded (void *dump_base, void *hot_start, void *hot_end,
 igc_alloc_dump (size_t nbytes)
 {
   igc_assert (global_igc->park_count > 0);
-  mps_ap_t ap = thread_ap (IGC_OBJ_CONS);
+  emacs_ap_t *ap = thread_ap (IGC_OBJ_CONS);
   size_t block_size = igc_header_size () + nbytes;
   mps_addr_t block;
   do
     {
-      mps_res_t res = mps_reserve (&block, ap, block_size);
+      mps_res_t res = mps_reserve (&block, ap->mps_ap, block_size);
       if (res != MPS_RES_OK)
 	memory_full (0);
       set_header (block, IGC_OBJ_INVALID, block_size, 0);
     }
-  while (!mps_commit (ap, block, block_size));
+  while (!mps_commit (ap->mps_ap, block, block_size));
   return (char *) block + igc_header_size ();
 }
 

Warnings:

This is the "slow path" only, used for all allocations. Will cause a
great number of busy-looping threads.  Will be very slow.  Creating
additional emacs threads will result in a proportional number of
additional threads, which will be very, very slow, so don't.  Requires
pthread.h and stdatomic.h, and still does things not covered by those
APIs (memcpying over an atomic_uintptr_t, even if we know that its value
won't change, is probably verboten, and definitely should be).  I
*think* this code might work if we allocate from signal handlers, and I
think this code might work on systems that don't have lock-free atomics
(once the #error is removed), but it definitely won't do both at the
same time.

Pip




^ permalink raw reply related	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-23 16:03             ` Pip Cet via Emacs development discussions.
@ 2024-12-23 16:44               ` Eli Zaretskii
  2024-12-23 17:16                 ` Pip Cet via Emacs development discussions.
  2024-12-23 17:44               ` Gerd Möllmann
  1 sibling, 1 reply; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-23 16:44 UTC (permalink / raw)
  To: Pip Cet; +Cc: gerd.moellmann, ofv, emacs-devel, eller.helmut, acorallo

> Date: Mon, 23 Dec 2024 16:03:53 +0000
> From: Pip Cet <pipcet@protonmail.com>
> Cc: Eli Zaretskii <eliz@gnu.org>, ofv@wanadoo.es, emacs-devel@gnu.org, eller.helmut@gmail.com, acorallo@gnu.org
> 
> --- a/src/igc.c
> +++ b/src/igc.c
> @@ -747,19 +747,41 @@ IGC_DEFINE_LIST (igc_root);
>  
>  /* Registry entry for an MPS thread mps_thr_t.  */
>  
> +#include <pthread.h>

We cannot use pthreads.h in portable code.  If we want to use threads,
we need separate implementations for Posix and Windows, like wedid in
systhread.c for Lisp threads.

> +struct emacs_ap
> +{
> +  mps_ap_t mps_ap;
> +  struct igc *gc;
> +  pthread_t allocation_thread;

pthread_t is non-portable, for the same reasons.

> This is the "slow path" only, used for all allocations. Will cause a
> great number of busy-looping threads.

A lot of threads might be problematic.  Each thread reserves memory
for its stack, so you end up with lots of reserved memory, and on
32-bit systems can run out of address space.

Why do we need this, again?



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-23 16:44               ` Eli Zaretskii
@ 2024-12-23 17:16                 ` Pip Cet via Emacs development discussions.
  2024-12-23 18:35                   ` Eli Zaretskii
  0 siblings, 1 reply; 203+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-23 17:16 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gerd.moellmann, ofv, emacs-devel, eller.helmut, acorallo

"Eli Zaretskii" <eliz@gnu.org> writes:

>> Date: Mon, 23 Dec 2024 16:03:53 +0000
>> From: Pip Cet <pipcet@protonmail.com>
>> Cc: Eli Zaretskii <eliz@gnu.org>, ofv@wanadoo.es, emacs-devel@gnu.org, eller.helmut@gmail.com, acorallo@gnu.org
>>
>> --- a/src/igc.c
>> +++ b/src/igc.c
>> @@ -747,19 +747,41 @@ IGC_DEFINE_LIST (igc_root);
>>
>>  /* Registry entry for an MPS thread mps_thr_t.  */
>>
>> +#include <pthread.h>
>
> We cannot use pthreads.h in portable code.  If we want to use threads,
> we need separate implementations for Posix and Windows, like wedid in
> systhread.c for Lisp threads.

Noted.

As an aside, without any relevance to the fact that we should avoid
using them, aren't pthreads available on "mingw"64 systems?

>> +struct emacs_ap
>> +{
>> +  mps_ap_t mps_ap;
>> +  struct igc *gc;
>> +  pthread_t allocation_thread;
>
> pthread_t is non-portable, for the same reasons.
>
>> This is the "slow path" only, used for all allocations. Will cause a
>> great number of busy-looping threads.
>
> A lot of threads might be problematic.  Each thread reserves memory
> for its stack, so you end up with lots of reserved memory, and on
> 32-bit systems can run out of address space.

This is a PoC.  While we shouldn't share structures between Emacs-side
threads, we should of course use (at most) a single allocation thread
rather than one per thread per AP.  Also, yield the CPU once in a while
:-)

> Why do we need this, again?

We can't interrupt allocation, so we move it to a separate thread where
it will complete (unlocking the arena) even if a signal interrupts us.

Pip




^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-23 17:16                 ` Pip Cet via Emacs development discussions.
@ 2024-12-23 18:35                   ` Eli Zaretskii
  2024-12-23 18:48                     ` Gerd Möllmann
  2024-12-23 20:30                     ` Benjamin Riefenstahl
  0 siblings, 2 replies; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-23 18:35 UTC (permalink / raw)
  To: Pip Cet; +Cc: gerd.moellmann, ofv, emacs-devel, eller.helmut, acorallo

> Date: Mon, 23 Dec 2024 17:16:32 +0000
> From: Pip Cet <pipcet@protonmail.com>
> Cc: gerd.moellmann@gmail.com, ofv@wanadoo.es, emacs-devel@gnu.org, eller.helmut@gmail.com, acorallo@gnu.org
> 
> "Eli Zaretskii" <eliz@gnu.org> writes:
> 
> >> +#include <pthread.h>
> >
> > We cannot use pthreads.h in portable code.  If we want to use threads,
> > we need separate implementations for Posix and Windows, like wedid in
> > systhread.c for Lisp threads.
> 
> Noted.
> 
> As an aside, without any relevance to the fact that we should avoid
> using them, aren't pthreads available on "mingw"64 systems?

pthreads are ported to both 32-bit and 64-bit Windows (more than
once), but the ports are buggy, and pthreads.h itself defines all
kinds of stuff that conflicts with various w32 places in Emacs.  The
following lines from nt/mingw-site.cfg is one sign of that:

  # We don't want pthread.h to be picked up just because it defines timespec
  gl_cv_sys_struct_timespec_in_pthread_h=no
  # Or at all...
  ac_cv_header_pthread_h=no

> > Why do we need this, again?
> 
> We can't interrupt allocation, so we move it to a separate thread where
> it will complete (unlocking the arena) even if a signal interrupts us.

How will this allow us to run the Lisp machine from a signal?  Because
this is the goal, right?



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-23 18:35                   ` Eli Zaretskii
@ 2024-12-23 18:48                     ` Gerd Möllmann
  2024-12-23 19:25                       ` Eli Zaretskii
  2024-12-23 20:30                     ` Benjamin Riefenstahl
  1 sibling, 1 reply; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-23 18:48 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Pip Cet, ofv, emacs-devel, eller.helmut, acorallo

Eli Zaretskii <eliz@gnu.org> writes:

> How will this allow us to run the Lisp machine from a signal?  Because
> this is the goal, right?

Today I'm confused.

Can I ask what you mean by running the Lisp Machine from a signal
handler? Sounds to me like calling eval, but I'd doubt that works with
the old GC, or does it?



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-23 18:48                     ` Gerd Möllmann
@ 2024-12-23 19:25                       ` Eli Zaretskii
  0 siblings, 0 replies; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-23 19:25 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: pipcet, ofv, emacs-devel, eller.helmut, acorallo

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: Pip Cet <pipcet@protonmail.com>,  ofv@wanadoo.es,  emacs-devel@gnu.org,
>   eller.helmut@gmail.com,  acorallo@gnu.org
> Date: Mon, 23 Dec 2024 19:48:08 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > How will this allow us to run the Lisp machine from a signal?  Because
> > this is the goal, right?
> 
> Today I'm confused.
> 
> Can I ask what you mean by running the Lisp Machine from a signal
> handler? Sounds to me like calling eval, but I'd doubt that works with
> the old GC, or does it?

See what I wrote in response to your other questions, regarding what
SIGPROF and SIGCHLD handlers do.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-23 18:35                   ` Eli Zaretskii
  2024-12-23 18:48                     ` Gerd Möllmann
@ 2024-12-23 20:30                     ` Benjamin Riefenstahl
  2024-12-23 23:39                       ` Pip Cet via Emacs development discussions.
  2024-12-24  3:37                       ` Eli Zaretskii
  1 sibling, 2 replies; 203+ messages in thread
From: Benjamin Riefenstahl @ 2024-12-23 20:30 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: Pip Cet, gerd.moellmann, ofv, emacs-devel, eller.helmut, acorallo

>> From: Pip Cet <pipcet@protonmail.com>
>> >> +#include <pthread.h>

Eli Zaretskii writes:
>> > We cannot use pthreads.h in portable code.  If we want to use
>> > threads, we need separate implementations for Posix and Windows,
>> > like wedid in systhread.c for Lisp threads.

Just a drive-by observation: Signals are a POSIX feature, so we have to
think about the potential conflict between signals and MPS only on
POSIX, not on MS Windows, right?

Regards, benny



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-23 20:30                     ` Benjamin Riefenstahl
@ 2024-12-23 23:39                       ` Pip Cet via Emacs development discussions.
  2024-12-24 12:14                         ` Eli Zaretskii
  2024-12-24  3:37                       ` Eli Zaretskii
  1 sibling, 1 reply; 203+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-23 23:39 UTC (permalink / raw)
  To: Benjamin Riefenstahl
  Cc: Eli Zaretskii, gerd.moellmann, ofv, emacs-devel, eller.helmut,
	acorallo

"Benjamin Riefenstahl" <b.riefenstahl@turtle-trading.net> writes:

>>> From: Pip Cet <pipcet@protonmail.com>
>>> >> +#include <pthread.h>
>
> Eli Zaretskii writes:
>>> > We cannot use pthreads.h in portable code.  If we want to use
>>> > threads, we need separate implementations for Posix and Windows,
>>> > like wedid in systhread.c for Lisp threads.
>
> Just a drive-by observation: Signals are a POSIX feature, so we have to
> think about the potential conflict between signals and MPS only on
> POSIX, not on MS Windows, right?

I believe it affects all operating systems we're playing with (well, I'm
also playing with FreeDOS but I'm not going to port MPS to it.  It's my
New Year's resolution not to).

The allocation thread approach should work for all of them.  If we have
stdatomic.h, performance should be acceptable.

Pip




^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-23 23:39                       ` Pip Cet via Emacs development discussions.
@ 2024-12-24 12:14                         ` Eli Zaretskii
  2024-12-24 13:18                           ` Pip Cet via Emacs development discussions.
  2024-12-24 13:42                           ` Benjamin Riefenstahl
  0 siblings, 2 replies; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-24 12:14 UTC (permalink / raw)
  To: Pip Cet
  Cc: b.riefenstahl, gerd.moellmann, ofv, emacs-devel, eller.helmut,
	acorallo

> Date: Mon, 23 Dec 2024 23:39:41 +0000
> From: Pip Cet <pipcet@protonmail.com>
> Cc: Eli Zaretskii <eliz@gnu.org>, gerd.moellmann@gmail.com, ofv@wanadoo.es, emacs-devel@gnu.org, eller.helmut@gmail.com, acorallo@gnu.org
> 
> "Benjamin Riefenstahl" <b.riefenstahl@turtle-trading.net> writes:
> 
> The allocation thread approach should work for all of them.  If we have
> stdatomic.h, performance should be acceptable.

We should carefully discuss the design and its implications before we
conclude that this is an idea that is good enough to justify such a
significant change.  If nothing else, it throws out the window several
months of everyone's experience with the current implementation.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-24 12:14                         ` Eli Zaretskii
@ 2024-12-24 13:18                           ` Pip Cet via Emacs development discussions.
  2024-12-24 13:42                           ` Benjamin Riefenstahl
  1 sibling, 0 replies; 203+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-24 13:18 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: b.riefenstahl, gerd.moellmann, ofv, emacs-devel, eller.helmut,
	acorallo

"Eli Zaretskii" <eliz@gnu.org> writes:

>> Date: Mon, 23 Dec 2024 23:39:41 +0000
>> From: Pip Cet <pipcet@protonmail.com>
>> Cc: Eli Zaretskii <eliz@gnu.org>, gerd.moellmann@gmail.com, ofv@wanadoo.es, emacs-devel@gnu.org, eller.helmut@gmail.com, acorallo@gnu.org
>>
>> "Benjamin Riefenstahl" <b.riefenstahl@turtle-trading.net> writes:
>>
>> The allocation thread approach should work for all of them.  If we have
>> stdatomic.h, performance should be acceptable.
>
> We should carefully discuss the design and its implications before we
> conclude that this is an idea that is good enough to justify such a
> significant change.  If nothing else, it throws out the window several
> months of everyone's experience with the current implementation.

I mostly agree, though I think to say that we throw out months of
experience is to overestimate the magnitude of the change a bit.

I'll push the bugfix I found, but I won't push this until there's some
sort of consensus about whether it's a good idea (we seem to be close to
a consensus that it isn't necessary or desirable; I'm the only one who
disagrees with that, and I can live with the "not desirable" part if
someone can convince me this change isn't necessary).

Pip




^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-24 12:14                         ` Eli Zaretskii
  2024-12-24 13:18                           ` Pip Cet via Emacs development discussions.
@ 2024-12-24 13:42                           ` Benjamin Riefenstahl
  1 sibling, 0 replies; 203+ messages in thread
From: Benjamin Riefenstahl @ 2024-12-24 13:42 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: Pip Cet, gerd.moellmann, ofv, emacs-devel, eller.helmut, acorallo

>> From: Pip Cet <pipcet@protonmail.com>
>> [...]
>> "Benjamin Riefenstahl" <b.riefenstahl@turtle-trading.net> writes:
>> 
>> The allocation thread approach should work for all of them.  If we have
>> stdatomic.h, performance should be acceptable.

JFTR, that quote wasn't from me, I believe it belongs to Pip.  ;-)



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-23 20:30                     ` Benjamin Riefenstahl
  2024-12-23 23:39                       ` Pip Cet via Emacs development discussions.
@ 2024-12-24  3:37                       ` Eli Zaretskii
  2024-12-24  8:48                         ` Benjamin Riefenstahl
  1 sibling, 1 reply; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-24  3:37 UTC (permalink / raw)
  To: Benjamin Riefenstahl
  Cc: pipcet, gerd.moellmann, ofv, emacs-devel, eller.helmut, acorallo

> From: Benjamin Riefenstahl <b.riefenstahl@turtle-trading.net>
> Cc: Pip Cet <pipcet@protonmail.com>,  gerd.moellmann@gmail.com,
>   ofv@wanadoo.es,  emacs-devel@gnu.org,  eller.helmut@gmail.com,
>   acorallo@gnu.org
> Date: Mon, 23 Dec 2024 22:30:57 +0200
> 
> >> From: Pip Cet <pipcet@protonmail.com>
> >> >> +#include <pthread.h>
> 
> Eli Zaretskii writes:
> >> > We cannot use pthreads.h in portable code.  If we want to use
> >> > threads, we need separate implementations for Posix and Windows,
> >> > like wedid in systhread.c for Lisp threads.
> 
> Just a drive-by observation: Signals are a POSIX feature, so we have to
> think about the potential conflict between signals and MPS only on
> POSIX, not on MS Windows, right?

Emacs on Windows emulates some Posix signals (SIGPROF, SIGCHLD,
SIGALRM), so this affects the Windows build as well.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-24  3:37                       ` Eli Zaretskii
@ 2024-12-24  8:48                         ` Benjamin Riefenstahl
  2024-12-24 13:52                           ` Eli Zaretskii
  0 siblings, 1 reply; 203+ messages in thread
From: Benjamin Riefenstahl @ 2024-12-24  8:48 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: pipcet, gerd.moellmann, ofv, emacs-devel, eller.helmut, acorallo

Eli Zaretskii writes:
> Emacs on Windows emulates some Posix signals (SIGPROF, SIGCHLD,
> SIGALRM), so this affects the Windows build as well.

That's interesting.  But does this emulation have the same constraints
as POSIX signals have?

benny



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-24  8:48                         ` Benjamin Riefenstahl
@ 2024-12-24 13:52                           ` Eli Zaretskii
  2024-12-24 13:54                             ` Benjamin Riefenstahl
  0 siblings, 1 reply; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-24 13:52 UTC (permalink / raw)
  To: Benjamin Riefenstahl
  Cc: pipcet, gerd.moellmann, ofv, emacs-devel, eller.helmut, acorallo

> From: Benjamin Riefenstahl <b.riefenstahl@turtle-trading.net>
> Cc: pipcet@protonmail.com,  gerd.moellmann@gmail.com,  ofv@wanadoo.es,
>   emacs-devel@gnu.org,  eller.helmut@gmail.com,  acorallo@gnu.org
> Date: Tue, 24 Dec 2024 10:48:34 +0200
> 
> Eli Zaretskii writes:
> > Emacs on Windows emulates some Posix signals (SIGPROF, SIGCHLD,
> > SIGALRM), so this affects the Windows build as well.
> 
> That's interesting.  But does this emulation have the same constraints
> as POSIX signals have?

If it's a useful emulation, it must somehow generate an asynchronous
event, and then arrange for that event to call the signal handler.
Right?  So the constraints we are talking about, which have to do with
the fact that the signal handlers are invoked asynchronously, are
definitely relevant for this emulation (or any useful emulation),
because the problem we discuss here is the situation where the signal
handler is invoked while MPS holds the arena lock.

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-24 13:52                           ` Eli Zaretskii
@ 2024-12-24 13:54                             ` Benjamin Riefenstahl
  0 siblings, 0 replies; 203+ messages in thread
From: Benjamin Riefenstahl @ 2024-12-24 13:54 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: pipcet, gerd.moellmann, ofv, emacs-devel, eller.helmut, acorallo

Eli Zaretskii writes:
> So the constraints we are talking about, which have to do with the
> fact that the signal handlers are invoked asynchronously, are
> definitely relevant for this emulation (or any useful emulation),
> because the problem we discuss here is the situation where the signal
> handler is invoked while MPS holds the arena lock.

Understood.

Regards, benny



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-23 16:03             ` Pip Cet via Emacs development discussions.
  2024-12-23 16:44               ` Eli Zaretskii
@ 2024-12-23 17:44               ` Gerd Möllmann
  2024-12-23 19:00                 ` Eli Zaretskii
  1 sibling, 1 reply; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-23 17:44 UTC (permalink / raw)
  To: Pip Cet; +Cc: Eli Zaretskii, ofv, emacs-devel, eller.helmut, acorallo

Pip Cet <pipcet@protonmail.com> writes:

> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>
>> Pip Cet <pipcet@protonmail.com> writes:
>>
>>> I'll spare you most of the details for now, but having read the mps
>>> header, MPS allocation is not safe to use from separate threads without
>>> locking the AP (or having per-thread APs), which we might end up doing
>>> on Windows, IIRC.
>>
>> Now I'm confused. We're using thread allocation points. See
>> create_thread_aps, thread_ap, and so on.
>
> I was confused.  This is only a problem if we allocate memory from a
> signal handler, which is effectively sharing the per-thread structure.
>
> (I'm still confused. My patch worked on the first attempt, which my code
> never does.  I suspect that while I made a mistake, it caused a subtle
> bug rather than an obvious one.)
>
> And we don't want to allocate memory from signal handlers, right?  We
> could, now (see warnings below):

Can't speak for others, but I wouldn't want it :-).

I can't cite myself, but I'm pretty sure I said some time ago already
that in a portable program one cannot do much in a signal handler in the
first place. So I wouldn't be surprised if MPS didn't support being
called from a signal handler. Not unreasonable for me.

But whatever. Maybe Richard Brooksby answers, and can shed light on that
or has ideas, if we don't overload him :-). And anyway, there is now
something workable in the igc branch. Maybe we could wait a bit, and
just proceed with something else meanwhile.

[... Thanks for the patch ...]

> Warnings:
>
> This is the "slow path" only, used for all allocations. Will cause a
> great number of busy-looping threads.  

Don't know why, but the busy looping threads makes me feel a bit
uncomfortable :-),

> Will be very slow. Creating additional emacs threads will result in a
> proportional number of additional threads, which will be very, very
> slow, so don't. Requires pthread.h and stdatomic.h, and still does
> things not covered by those APIs (memcpying over an atomic_uintptr_t,
> even if we know that its value won't change, is probably verboten, and
> definitely should be). I *think* this code might work if we allocate
> from signal handlers, and I think this code might work on systems that
> don't have lock-free atomics (once the #error is removed), but it
> definitely won't do both at the same time.
>
> Pip

BTW, do you know which signal handlers use Lisp, i.e. allocate Lisp
objects or access some? All? Or, would it be realistic to rewrite signal
handlers to not do that?

One thing I've seen done elsewhere is to publish a message to a message
board so that it can be handled outside of the signal handler. Something
like that, you know what I mean.

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-23 17:44               ` Gerd Möllmann
@ 2024-12-23 19:00                 ` Eli Zaretskii
  2024-12-23 19:37                   ` Eli Zaretskii
                                     ` (2 more replies)
  0 siblings, 3 replies; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-23 19:00 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: pipcet, ofv, emacs-devel, eller.helmut, acorallo

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: Eli Zaretskii <eliz@gnu.org>,  ofv@wanadoo.es,  emacs-devel@gnu.org,
>   eller.helmut@gmail.com,  acorallo@gnu.org
> Date: Mon, 23 Dec 2024 18:44:42 +0100
> 
> BTW, do you know which signal handlers use Lisp, i.e. allocate Lisp
> objects or access some? All? Or, would it be realistic to rewrite signal
> handlers to not do that?

SIGPROF does (it's the basis for our Lisp profiler).

SIGCHLD doesn't run Lisp (I think), but it examines objects and data
structures of the Lisp machine (those related to child processes).

> One thing I've seen done elsewhere is to publish a message to a message
> board so that it can be handled outside of the signal handler. Something
> like that, you know what I mean.

This is tricky for the profiler, because you want to sample the
function in which you are right there and then, not some time later.

For SIGCHLD this could work, but it might make Emacs slower in
handling subprocesses (there are some Lisp packages that fire
subprocesses at very high rate).

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-23 19:00                 ` Eli Zaretskii
@ 2024-12-23 19:37                   ` Eli Zaretskii
  2024-12-23 20:49                   ` Gerd Möllmann
  2024-12-23 23:37                   ` Some experience with the igc branch Pip Cet via Emacs development discussions.
  2 siblings, 0 replies; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-23 19:37 UTC (permalink / raw)
  To: gerd.moellmann, pipcet; +Cc: ofv, emacs-devel, eller.helmut, acorallo

> Date: Mon, 23 Dec 2024 21:00:53 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: pipcet@protonmail.com, ofv@wanadoo.es, emacs-devel@gnu.org,
>  eller.helmut@gmail.com, acorallo@gnu.org
> 
> > From: Gerd Möllmann <gerd.moellmann@gmail.com>
> > Cc: Eli Zaretskii <eliz@gnu.org>,  ofv@wanadoo.es,  emacs-devel@gnu.org,
> >   eller.helmut@gmail.com,  acorallo@gnu.org
> > Date: Mon, 23 Dec 2024 18:44:42 +0100
> > 
> > BTW, do you know which signal handlers use Lisp, i.e. allocate Lisp
> > objects or access some? All? Or, would it be realistic to rewrite signal
> > handlers to not do that?
> 
> SIGPROF does (it's the basis for our Lisp profiler).

Let me clarify to avoid possible confusion: the SIGPROF handler
doesn't run Lisp, but it does access the Lisp machine, via the
backtrace_top_function and get_backtrace functions.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-23 19:00                 ` Eli Zaretskii
  2024-12-23 19:37                   ` Eli Zaretskii
@ 2024-12-23 20:49                   ` Gerd Möllmann
  2024-12-23 21:43                     ` Helmut Eller
  2024-12-24  6:03                     ` SIGPROF + SIGCHLD and igc Gerd Möllmann
  2024-12-23 23:37                   ` Some experience with the igc branch Pip Cet via Emacs development discussions.
  2 siblings, 2 replies; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-23 20:49 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, ofv, emacs-devel, eller.helmut, acorallo

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: Eli Zaretskii <eliz@gnu.org>,  ofv@wanadoo.es,  emacs-devel@gnu.org,
>>   eller.helmut@gmail.com,  acorallo@gnu.org
>> Date: Mon, 23 Dec 2024 18:44:42 +0100
>> 
>> BTW, do you know which signal handlers use Lisp, i.e. allocate Lisp
>> objects or access some? All? Or, would it be realistic to rewrite signal
>> handlers to not do that?
>
> SIGPROF does (it's the basis for our Lisp profiler).
>
> SIGCHLD doesn't run Lisp (I think), but it examines objects and data
> structures of the Lisp machine (those related to child processes).
>
>> One thing I've seen done elsewhere is to publish a message to a message
>> board so that it can be handled outside of the signal handler. Something
>> like that, you know what I mean.
>
> This is tricky for the profiler, because you want to sample the
> function in which you are right there and then, not some time later.
>
> For SIGCHLD this could work, but it might make Emacs slower in
> handling subprocesses (there are some Lisp packages that fire
> subprocesses at very high rate).

Thanks.

I've looked at SIGPROF. From an admittedly brief look at this, I'd
summarize my results as:

- The important part is get_backtrace. The rest could be done elsewhere
  by posting that to a message board, or whatever the mechanism is at
  the end.

- Didn't see get_backtrace or functions called from it allocating Lisp
  objects.

- It reads from a Lisp object because of

    #define specpdl (current_thread->m_specpdl)
    #define specpdl_end (current_thread->m_specpdl_end)
    #define specpdl_ptr (current_thread->m_specpdl_ptr)

  current_thread is a struct thread_state which is a PVEC_THREAD.

- I remember that I wrote a scanner for the specpdl stacks, so that's
  not a Lisp object but a root, so no problem here, I think.

- struct thread_state allocation is done in igc.c via alloc_immovable in
  igc_alloc_pseudovector. That allocated from from an AMS pool, which
  doesn't use barriers.

- It doesn't seem to access other Lisp objects except current_thread.

That doesn't look bad, I think. Worth mentioning is perhaps that
directly after get_backtrace here

  static void
  record_backtrace (struct profiler_log *plog, EMACS_INT count)
  {
    log_t *log = plog->log;
    get_backtrace (log->trace, log->depth);
    EMACS_UINT hash = trace_hash (log->trace, log->depth);

we access Lisp objects in trace_hash when computing the hash and in the
other hash table code. IIUC that code counts hits with the same
backtrace. Don't know how long that takes. But if posting the backtrace
would take the same time, we would be on par.

I'll try to also look at SIGCHLD at some later point, but Christmas,
family etc.

Happy holidays!



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-23 20:49                   ` Gerd Möllmann
@ 2024-12-23 21:43                     ` Helmut Eller
  2024-12-23 21:49                       ` Pip Cet via Emacs development discussions.
  2024-12-24  4:05                       ` Gerd Möllmann
  2024-12-24  6:03                     ` SIGPROF + SIGCHLD and igc Gerd Möllmann
  1 sibling, 2 replies; 203+ messages in thread
From: Helmut Eller @ 2024-12-23 21:43 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: Eli Zaretskii, pipcet, ofv, emacs-devel, acorallo

On Mon, Dec 23 2024, Gerd Möllmann wrote:

> [...]
> Worth mentioning is perhaps that [...]
> directly after get_backtrace here [...]
> we access Lisp objects in trace_hash when computing the hash and in the
> other hash table code.

Also worth mentioning is that trace_hash uses XHASH, which is probably
problematic in combination with a moving GC.

Helmut



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-23 21:43                     ` Helmut Eller
@ 2024-12-23 21:49                       ` Pip Cet via Emacs development discussions.
  2024-12-23 21:58                         ` Helmut Eller
  2024-12-24  4:05                       ` Gerd Möllmann
  1 sibling, 1 reply; 203+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-23 21:49 UTC (permalink / raw)
  To: Helmut Eller
  Cc: Gerd Möllmann, Eli Zaretskii, ofv, emacs-devel, acorallo

"Helmut Eller" <eller.helmut@gmail.com> writes:

> On Mon, Dec 23 2024, Gerd Möllmann wrote:
>
>> [...]
>> Worth mentioning is perhaps that [...]
>> directly after get_backtrace here [...]
>> we access Lisp objects in trace_hash when computing the hash and in the
>> other hash table code.
>
> Also worth mentioning is that trace_hash uses XHASH, which is probably
> problematic in combination with a moving GC.

Good catch.  s/XHASH/sxhash_eq/ there, I think?  And let's poison XHASH
when MPS is in use?

Pip




^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-23 21:49                       ` Pip Cet via Emacs development discussions.
@ 2024-12-23 21:58                         ` Helmut Eller
  2024-12-23 23:20                           ` Pip Cet via Emacs development discussions.
  0 siblings, 1 reply; 203+ messages in thread
From: Helmut Eller @ 2024-12-23 21:58 UTC (permalink / raw)
  To: Pip Cet; +Cc: Gerd Möllmann, Eli Zaretskii, ofv, emacs-devel, acorallo

On Mon, Dec 23 2024, Pip Cet wrote:
[...]
>> Also worth mentioning is that trace_hash uses XHASH, which is probably
>> problematic in combination with a moving GC.
>
> Good catch.  s/XHASH/sxhash_eq/ there, I think?  And let's poison XHASH
> when MPS is in use?

sxhash_eq doesn't fly with headerless objects.  It should be obsoleted,
IMO.

Helmut



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-23 21:58                         ` Helmut Eller
@ 2024-12-23 23:20                           ` Pip Cet via Emacs development discussions.
  2024-12-24  5:38                             ` Helmut Eller
  0 siblings, 1 reply; 203+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-23 23:20 UTC (permalink / raw)
  To: Helmut Eller
  Cc: Gerd Möllmann, Eli Zaretskii, ofv, emacs-devel, acorallo

"Helmut Eller" <eller.helmut@gmail.com> writes:

> On Mon, Dec 23 2024, Pip Cet wrote:
> [...]
>>> Also worth mentioning is that trace_hash uses XHASH, which is probably
>>> problematic in combination with a moving GC.
>>
>> Good catch.  s/XHASH/sxhash_eq/ there, I think?  And let's poison XHASH
>> when MPS is in use?
>
> sxhash_eq doesn't fly with headerless objects.

Which objects would that be?

Right now all IGC objects have headers, right?  Did I miss any?

> It should be obsoleted, IMO.

I don't see why.

Is this about cons cells exclusively?  Because 3 words/cons is too
much (possibly 4 words on W64 or 32-bit systems)?

For vectors we can usually derive the length of the vector from the IGC
header (which has plenty of extra bits), which would have the equivalent
effect.  Strings, symbols, floats shouldn't matter.

That leaves conses.  My guess so far was that you wanted to implement a
hack where a headerless cons is a two-word object that would turn into a
tagged pointer to another two-word object with a header as soon as its
hash value is taken.  That requires slowing down either XCAR or XCDR, I
think, and that's sufficient reason for me not to do it, but I guess I
misunderstood your plans.  This would also mean sxhash_eq would allocate
memory, so it couldn't be called from a signal handler without yet
another workaround.

(Note that cons cells used to store long lists are inherently
inefficient: naively storing an n-element list with a header requires
n+1 words, but even headerless cons cells will require 2*n words.  So if
we really decide we need to reduce cons memory usage, I'd look into that
instead.)

Pip

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-23 23:20                           ` Pip Cet via Emacs development discussions.
@ 2024-12-24  5:38                             ` Helmut Eller
  2024-12-24  6:27                               ` Gerd Möllmann
  2024-12-24 10:09                               ` Pip Cet via Emacs development discussions.
  0 siblings, 2 replies; 203+ messages in thread
From: Helmut Eller @ 2024-12-24  5:38 UTC (permalink / raw)
  To: Pip Cet; +Cc: Gerd Möllmann, Eli Zaretskii, ofv, emacs-devel, acorallo

On Mon, Dec 23 2024, Pip Cet wrote:

>> sxhash_eq doesn't fly with headerless objects.
>
> Which objects would that be?
>
> Right now all IGC objects have headers, right?  Did I miss any?

Right, but I'd like to keep that option on the table.

>> It should be obsoleted, IMO.

[...]
> That leaves conses.  My guess so far was that you wanted to implement a
> hack where a headerless cons is a two-word object that would turn into a
> tagged pointer to another two-word object with a header as soon as its
> hash value is taken.  That requires slowing down either XCAR or XCDR, I
> think, and that's sufficient reason for me not to do it, but I guess I
> misunderstood your plans.  This would also mean sxhash_eq would allocate
> memory, so it couldn't be called from a signal handler without yet
> another workaround.

I would go the obvious way: use segregated allocation.  Each Lisp_Type
gets its own MPS pool, without igc-headers.  The dflt pool would only
contain non-lisp types, like IGC_OBJ_STRING_DATA, with igc-headers.
That wouldn't slow down XCAR, but it requires that hash tables use MPS's
location dependencies.

Helmut



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-24  5:38                             ` Helmut Eller
@ 2024-12-24  6:27                               ` Gerd Möllmann
  2024-12-24 10:09                               ` Pip Cet via Emacs development discussions.
  1 sibling, 0 replies; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-24  6:27 UTC (permalink / raw)
  To: Helmut Eller; +Cc: Pip Cet, Eli Zaretskii, ofv, emacs-devel, acorallo

Helmut Eller <eller.helmut@gmail.com> writes:

> On Mon, Dec 23 2024, Pip Cet wrote:
>
>>> sxhash_eq doesn't fly with headerless objects.
>>
>> Which objects would that be?
>>
>> Right now all IGC objects have headers, right?  Did I miss any?
>
> Right, but I'd like to keep that option on the table.
>
>>> It should be obsoleted, IMO.

Agree. I thihk sxhash-eq sort of leaks details of the GC implementation.




^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-24  5:38                             ` Helmut Eller
  2024-12-24  6:27                               ` Gerd Möllmann
@ 2024-12-24 10:09                               ` Pip Cet via Emacs development discussions.
  1 sibling, 0 replies; 203+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-24 10:09 UTC (permalink / raw)
  To: Helmut Eller
  Cc: Gerd Möllmann, Eli Zaretskii, ofv, emacs-devel, acorallo

"Helmut Eller" <eller.helmut@gmail.com> writes:

> On Mon, Dec 23 2024, Pip Cet wrote:
>
>>> sxhash_eq doesn't fly with headerless objects.
>>
>> Which objects would that be?
>>
>> Right now all IGC objects have headers, right?  Did I miss any?
>
> Right, but I'd like to keep that option on the table.

I see one specific case where it would be useful: storing 64-bit
integers on 32-bit systems.  We don't need the entire integer range,
since -256M .. +256M - 1 are fixnums (assuming we reduce fixnum range by
1 bit).  So we have 512M unused values, which is precisely the number of
possible forwarding pointers if we maintain 8-byte alignment.  We can
use two "impossible" forwarding pointers for 1-word padding and N-word
padding, so this case works out precisely.  No hash problems, since a
u64 is constant and we can hash the contents instead.

The only relevant 2-word object is conses, and I don't see a way to do
it for them.

Most N-word objects with N>2 are either fairly large to begin with, or
they're vectorlikes and we have a redundant size field, which we can get
rid of.

>>> It should be obsoleted, IMO.
>
> [...]
>> That leaves conses.  My guess so far was that you wanted to implement a
>> hack where a headerless cons is a two-word object that would turn into a
>> tagged pointer to another two-word object with a header as soon as its
>> hash value is taken.  That requires slowing down either XCAR or XCDR, I
>> think, and that's sufficient reason for me not to do it, but I guess I
>> misunderstood your plans.  This would also mean sxhash_eq would allocate
>> memory, so it couldn't be called from a signal handler without yet
>> another workaround.
>
> I would go the obvious way: use segregated allocation.  Each Lisp_Type
> gets its own MPS pool, without igc-headers.  The dflt pool would only

Why bother for non-conses?

> contain non-lisp types, like IGC_OBJ_STRING_DATA, with igc-headers.
> That wouldn't slow down XCAR, but it requires that hash tables use MPS's
> location dependencies.

I don't think we want to use location dependencies: even if we solved
all the other problems (Fsxhash_eq, permanent hashes for those places
where we can't rehash), I'm pretty sure rehashing would kill us.  In
particular, if we somehow managed to make GC more fine-grained and move
fewer objects, we'd end up rehashing more, so suddenly we'd have an
incentive not to use minor GCs.

But I confess that I haven't looked at the location dependency code.
There's no need for us to use it, and from the documentation it seemed
it wouldn't be a good idea to start using it if you don't have to.

(Also, at that point, shouldn't we just use an AMS pool for conses?)

Pip

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-23 21:43                     ` Helmut Eller
  2024-12-23 21:49                       ` Pip Cet via Emacs development discussions.
@ 2024-12-24  4:05                       ` Gerd Möllmann
  2024-12-24  8:50                         ` Gerd Möllmann
  1 sibling, 1 reply; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-24  4:05 UTC (permalink / raw)
  To: Helmut Eller; +Cc: Eli Zaretskii, pipcet, ofv, emacs-devel, acorallo

Helmut Eller <eller.helmut@gmail.com> writes:

> On Mon, Dec 23 2024, Gerd Möllmann wrote:
>
>> [...]
>> Worth mentioning is perhaps that [...]
>> directly after get_backtrace here [...]
>> we access Lisp objects in trace_hash when computing the hash and in the
>> other hash table code.
>
> Also worth mentioning is that trace_hash uses XHASH, which is probably
> problematic in combination with a moving GC.
>
> Helmut

Right, I must have overlooked that back then :-/



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-24  4:05                       ` Gerd Möllmann
@ 2024-12-24  8:50                         ` Gerd Möllmann
  0 siblings, 0 replies; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-24  8:50 UTC (permalink / raw)
  To: Helmut Eller; +Cc: Eli Zaretskii, pipcet, ofv, emacs-devel, acorallo

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Helmut Eller <eller.helmut@gmail.com> writes:
>
>> On Mon, Dec 23 2024, Gerd Möllmann wrote:
>>
>>> [...]
>>> Worth mentioning is perhaps that [...]
>>> directly after get_backtrace here [...]
>>> we access Lisp objects in trace_hash when computing the hash and in the
>>> other hash table code.
>>
>> Also worth mentioning is that trace_hash uses XHASH, which is probably
>> problematic in combination with a moving GC.
>>
>> Helmut
>
> Right, I must have overlooked that back then :-/

I've pushed 2 things to igc. One for the above, and a second that
removes XHASH for HAVE_MPS. Hope the second works for all platforms, at
couldn't find uses elsewhere with git-grep.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* SIGPROF + SIGCHLD and igc
  2024-12-23 20:49                   ` Gerd Möllmann
  2024-12-23 21:43                     ` Helmut Eller
@ 2024-12-24  6:03                     ` Gerd Möllmann
  2024-12-24  8:23                       ` Helmut Eller
                                         ` (2 more replies)
  1 sibling, 3 replies; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-24  6:03 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, ofv, emacs-devel, eller.helmut, acorallo

(I've given this a new subject.)

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Eli Zaretskii <eliz@gnu.org> writes:
>
>>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>>> Cc: Eli Zaretskii <eliz@gnu.org>,  ofv@wanadoo.es,  emacs-devel@gnu.org,
>>>   eller.helmut@gmail.com,  acorallo@gnu.org
>>> Date: Mon, 23 Dec 2024 18:44:42 +0100
>>> 
>>> BTW, do you know which signal handlers use Lisp, i.e. allocate Lisp
>>> objects or access some? All? Or, would it be realistic to rewrite signal
>>> handlers to not do that?
>>
>> SIGPROF does (it's the basis for our Lisp profiler).
>>
>> SIGCHLD doesn't run Lisp (I think), but it examines objects and data
>> structures of the Lisp machine (those related to child processes).
>>
>>> One thing I've seen done elsewhere is to publish a message to a message
>>> board so that it can be handled outside of the signal handler. Something
>>> like that, you know what I mean.
>>
>> This is tricky for the profiler, because you want to sample the
>> function in which you are right there and then, not some time later.
>>
>> For SIGCHLD this could work, but it might make Emacs slower in
>> handling subprocesses (there are some Lisp packages that fire
>> subprocesses at very high rate).
>
> Thanks.
>
> I've looked at SIGPROF. From an admittedly brief look at this, I'd
> summarize my results as:
>
> - The important part is get_backtrace. The rest could be done elsewhere
>   by posting that to a message board, or whatever the mechanism is at
>   the end.
>
> - Didn't see get_backtrace or functions called from it allocating Lisp
>   objects.
>
> - It reads from a Lisp object because of
>
>     #define specpdl (current_thread->m_specpdl)
>     #define specpdl_end (current_thread->m_specpdl_end)
>     #define specpdl_ptr (current_thread->m_specpdl_ptr)
>
>   current_thread is a struct thread_state which is a PVEC_THREAD.
>
> - I remember that I wrote a scanner for the specpdl stacks, so that's
>   not a Lisp object but a root, so no problem here, I think.
>
> - struct thread_state allocation is done in igc.c via alloc_immovable in
>   igc_alloc_pseudovector. That allocated from from an AMS pool, which
>   doesn't use barriers.
>
> - It doesn't seem to access other Lisp objects except current_thread.
>
> That doesn't look bad, I think. Worth mentioning is perhaps that
> directly after get_backtrace here
>
>   static void
>   record_backtrace (struct profiler_log *plog, EMACS_INT count)
>   {
>     log_t *log = plog->log;
>     get_backtrace (log->trace, log->depth);
>     EMACS_UINT hash = trace_hash (log->trace, log->depth);
>
> we access Lisp objects in trace_hash when computing the hash and in the
> other hash table code. IIUC that code counts hits with the same
> backtrace. Don't know how long that takes. But if posting the backtrace
> would take the same time, we would be on par.
>
> I'll try to also look at SIGCHLD at some later point, but Christmas,
> family etc.
>
> Happy holidays!

Been up a bit early, so...

This is about SIGCHLD, and I must say I find it a bit hard to tell if
all other platforms do the same. There are simply too many #if's to
consider in the signal handling code.

Anyway, what I see here: SIGCHLD doesn't do anything dangerous in the
signal handler. Instead, the occurrence of SIGCHLD is added to a queue
with enqueue_async_work and that's basically it.

The work items in the queue are processed by process_pending_signals,
outside of the signal handler. Very nice, that's how it should be :-).
(And maybe, just as an inspiration, one could use that construct for
SIGPROF?)

So, there is actually no problem at all with SIGCHLD that I can see. 

My personal summary for SIGPROF + SIGCHLD at this point:

- I recommend rewriting SIGPROF handling in the way I tried to describe,
  possibly using the existing work queue mechanism. Everything else looks
  too complicated to me.
  
- Lisp allocation in signal handlers cannot exist because alloc.c is not
  reentrant which means we would crash with the old GC. We don't need
  anything extra for that in igc.

- No longer wondering why macOS does not show any problems in that whole
  area. The only problem is SIGPROF accessing Lisp objects, and the
  memory barrier is not a problem on macOS because it doesn't use
  signals.

Please double-check!




^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: SIGPROF + SIGCHLD and igc
  2024-12-24  6:03                     ` SIGPROF + SIGCHLD and igc Gerd Möllmann
@ 2024-12-24  8:23                       ` Helmut Eller
  2024-12-24  8:39                         ` Gerd Möllmann
  2024-12-24 13:05                         ` Eli Zaretskii
  2024-12-24 12:54                       ` Eli Zaretskii
  2024-12-27  8:08                       ` Helmut Eller
  2 siblings, 2 replies; 203+ messages in thread
From: Helmut Eller @ 2024-12-24  8:23 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: Eli Zaretskii, pipcet, ofv, emacs-devel, acorallo

On Tue, Dec 24 2024, Gerd Möllmann wrote:

[...]
> Anyway, what I see here: SIGCHLD doesn't do anything dangerous in the
> signal handler. Instead, the occurrence of SIGCHLD is added to a queue
> with enqueue_async_work and that's basically it.

Wrong branch!  enqueue_async_work doesn't exist in master.  ISTR that in
master it iterates through process-list.  Also, Pip said something that
the queue is not signal safe, because signals can nest or something like
that.  Also, Eli didn't like enqueue_async_work much.

> My personal summary for SIGPROF + SIGCHLD at this point:
>
> - I recommend rewriting SIGPROF handling in the way I tried to describe,
>   possibly using the existing work queue mechanism. Everything else looks
>   too complicated to me.
>   
> - Lisp allocation in signal handlers cannot exist because alloc.c is not
>   reentrant which means we would crash with the old GC. We don't need
>   anything extra for that in igc.
>
> - No longer wondering why macOS does not show any problems in that whole
>   area. The only problem is SIGPROF accessing Lisp objects, and the
>   memory barrier is not a problem on macOS because it doesn't use
>   signals.
>
> Please double-check!

I think, SIGIO might cause trouble.  But that async IO code in process.c
is sooo hard to read.  I wonder if it would be simpler with threads,
e.g. one thread per Lisp_Process.

Helmut



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: SIGPROF + SIGCHLD and igc
  2024-12-24  8:23                       ` Helmut Eller
@ 2024-12-24  8:39                         ` Gerd Möllmann
  2024-12-25  9:22                           ` Helmut Eller
  2024-12-24 13:05                         ` Eli Zaretskii
  1 sibling, 1 reply; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-24  8:39 UTC (permalink / raw)
  To: Helmut Eller; +Cc: Eli Zaretskii, pipcet, ofv, emacs-devel, acorallo

Helmut Eller <eller.helmut@gmail.com> writes:

> On Tue, Dec 24 2024, Gerd Möllmann wrote:
>
> [...]
>> Anyway, what I see here: SIGCHLD doesn't do anything dangerous in the
>> signal handler. Instead, the occurrence of SIGCHLD is added to a queue
>> with enqueue_async_work and that's basically it.
>
> Wrong branch!  enqueue_async_work doesn't exist in master.  ISTR that in
> master it iterates through process-list.  Also, Pip said something that
> the queue is not signal safe, because signals can nest or something like
> that.  Also, Eli didn't like enqueue_async_work much.

Oops, thanks for checking! And 👍 to Pip. Then we have to see what to do
with nested signals if that's a problem.

>> My personal summary for SIGPROF + SIGCHLD at this point:
>>
>> - I recommend rewriting SIGPROF handling in the way I tried to describe,
>>   possibly using the existing work queue mechanism. Everything else looks
>>   too complicated to me.
>>   
>> - Lisp allocation in signal handlers cannot exist because alloc.c is not
>>   reentrant which means we would crash with the old GC. We don't need
>>   anything extra for that in igc.
>>
>> - No longer wondering why macOS does not show any problems in that whole
>>   area. The only problem is SIGPROF accessing Lisp objects, and the
>>   memory barrier is not a problem on macOS because it doesn't use
>>   signals.
>>
>> Please double-check!
>
> I think, SIGIO might cause trouble.  But that async IO code in process.c
> is sooo hard to read.  I wonder if it would be simpler with threads,
> e.g. one thread per Lisp_Process.

It's a maze :-(.

BTW, do you agree with my analysis that Lisp allocations can't possibly
exist in signal handlers today?



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: SIGPROF + SIGCHLD and igc
  2024-12-24  8:39                         ` Gerd Möllmann
@ 2024-12-25  9:22                           ` Helmut Eller
  2024-12-25  9:43                             ` Gerd Möllmann
  0 siblings, 1 reply; 203+ messages in thread
From: Helmut Eller @ 2024-12-25  9:22 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: Eli Zaretskii, pipcet, ofv, emacs-devel, acorallo

On Tue, Dec 24 2024, Gerd Möllmann wrote:

[...]
>>> - Lisp allocation in signal handlers cannot exist because alloc.c is not
>>>   reentrant which means we would crash with the old GC. We don't need
>>>   anything extra for that in igc.
[...]
> BTW, do you agree with my analysis that Lisp allocations can't possibly
> exist in signal handlers today?

I don't know alloc.c well enough to make a judgment.  This comment for
XMALLOC_BLOCK_INPUT_CHECK seems to say that signal handlers used to
allocate but no longer do:

  If compiled with XMALLOC_BLOCK_INPUT_CHECK, define a symbol
  BLOCK_INPUT_IN_MEMORY_ALLOCATORS that is visible to the debugger.
  If that variable is set, block input while in one of Emacs's memory
  allocation functions.  There should be no need for this debugging
  option, since signal handlers do not allocate memory, but Emacs
  formerly allocated memory in signal handlers and this compile-time
  option remains as a way to help debug the issue should it rear its
  ugly head again.  */

Helmut

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: SIGPROF + SIGCHLD and igc
  2024-12-25  9:22                           ` Helmut Eller
@ 2024-12-25  9:43                             ` Gerd Möllmann
  0 siblings, 0 replies; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-25  9:43 UTC (permalink / raw)
  To: Helmut Eller; +Cc: Eli Zaretskii, pipcet, ofv, emacs-devel, acorallo

Helmut Eller <eller.helmut@gmail.com> writes:

> On Tue, Dec 24 2024, Gerd Möllmann wrote:
>
> [...]
>>>> - Lisp allocation in signal handlers cannot exist because alloc.c is not
>>>>   reentrant which means we would crash with the old GC. We don't need
>>>>   anything extra for that in igc.
> [...]
>> BTW, do you agree with my analysis that Lisp allocations can't possibly
>> exist in signal handlers today?
>
> I don't know alloc.c well enough to make a judgment.  This comment for
> XMALLOC_BLOCK_INPUT_CHECK seems to say that signal handlers used to
> allocate but no longer do:
>
>   If compiled with XMALLOC_BLOCK_INPUT_CHECK, define a symbol
>   BLOCK_INPUT_IN_MEMORY_ALLOCATORS that is visible to the debugger.
>   If that variable is set, block input while in one of Emacs's memory
>   allocation functions.  There should be no need for this debugging
>   option, since signal handlers do not allocate memory, but Emacs
>   formerly allocated memory in signal handlers and this compile-time
>   option remains as a way to help debug the issue should it rear its
>   ugly head again.  */
>
> Helmut

Thanks.

Stefan Monnier seems to have added that 2012, judging from git grep in
the ChangeLogs, and it reads as if it has something to do with
SYNC_INPUT, which I think means no longer doing X event handling in a
SIGIO handler.

And it seems to be no longer in use. XMALLOC_BLOCK_INPUT_CHECK appears
nowhere, and MALLOC_BLOCK_INPUE is always a no-op. alloc.c could need
some love.

Anyway. Just looking at Fcons wrt async-signal-safety, makes me pretty
sure.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: SIGPROF + SIGCHLD and igc
  2024-12-24  8:23                       ` Helmut Eller
  2024-12-24  8:39                         ` Gerd Möllmann
@ 2024-12-24 13:05                         ` Eli Zaretskii
  2024-12-25 10:46                           ` Helmut Eller
  1 sibling, 1 reply; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-24 13:05 UTC (permalink / raw)
  To: Helmut Eller; +Cc: gerd.moellmann, pipcet, ofv, emacs-devel, acorallo

> From: Helmut Eller <eller.helmut@gmail.com>
> Cc: Eli Zaretskii <eliz@gnu.org>,  pipcet@protonmail.com,  ofv@wanadoo.es,
>   emacs-devel@gnu.org,  acorallo@gnu.org
> Date: Tue, 24 Dec 2024 09:23:11 +0100
> 
> I think, SIGIO might cause trouble.

Why do you think so?  Its handler does almost nothing, just sets a
flag (if you ignore the Android-specific stuff there).

> But that async IO code in process.c is sooo hard to read.  I wonder
> if it would be simpler with threads, e.g. one thread per
> Lisp_Process.

Heh, see w32proc.c.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: SIGPROF + SIGCHLD and igc
  2024-12-24 13:05                         ` Eli Zaretskii
@ 2024-12-25 10:46                           ` Helmut Eller
  2024-12-25 12:45                             ` Eli Zaretskii
  0 siblings, 1 reply; 203+ messages in thread
From: Helmut Eller @ 2024-12-25 10:46 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gerd.moellmann, pipcet, ofv, emacs-devel, acorallo

On Tue, Dec 24 2024, Eli Zaretskii wrote:

>> From: Helmut Eller <eller.helmut@gmail.com>
>> I think, SIGIO might cause trouble.
>
> Why do you think so?  Its handler does almost nothing, just sets a
> flag (if you ignore the Android-specific stuff there).

Indeed that looks quite tame.  Makes me wonder why
handle_interrupt_signal needs to be so complicated in comparison.  E.g.
the line

      internal_last_event_frame = terminal->display_info.tty->top_frame;

looks problematic for MPS.

Helmut



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: SIGPROF + SIGCHLD and igc
  2024-12-25 10:46                           ` Helmut Eller
@ 2024-12-25 12:45                             ` Eli Zaretskii
  0 siblings, 0 replies; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-25 12:45 UTC (permalink / raw)
  To: Helmut Eller; +Cc: gerd.moellmann, pipcet, ofv, emacs-devel, acorallo

> From: Helmut Eller <eller.helmut@gmail.com>
> Cc: gerd.moellmann@gmail.com,  pipcet@protonmail.com,  ofv@wanadoo.es,
>   emacs-devel@gnu.org,  acorallo@gnu.org
> Date: Wed, 25 Dec 2024 11:46:38 +0100
> 
> On Tue, Dec 24 2024, Eli Zaretskii wrote:
> 
> >> From: Helmut Eller <eller.helmut@gmail.com>
> >> I think, SIGIO might cause trouble.
> >
> > Why do you think so?  Its handler does almost nothing, just sets a
> > flag (if you ignore the Android-specific stuff there).
> 
> Indeed that looks quite tame.  Makes me wonder why
> handle_interrupt_signal needs to be so complicated in comparison.  E.g.
> the line
> 
>       internal_last_event_frame = terminal->display_info.tty->top_frame;
> 
> looks problematic for MPS.

SIGINT is different because C-g is programmed to trigger SIGINT on TTY
frames.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: SIGPROF + SIGCHLD and igc
  2024-12-24  6:03                     ` SIGPROF + SIGCHLD and igc Gerd Möllmann
  2024-12-24  8:23                       ` Helmut Eller
@ 2024-12-24 12:54                       ` Eli Zaretskii
  2024-12-24 12:59                         ` Gerd Möllmann
  2024-12-27  8:08                       ` Helmut Eller
  2 siblings, 1 reply; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-24 12:54 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: pipcet, ofv, emacs-devel, eller.helmut, acorallo

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: pipcet@protonmail.com,  ofv@wanadoo.es,  emacs-devel@gnu.org,
>   eller.helmut@gmail.com,  acorallo@gnu.org
> Date: Tue, 24 Dec 2024 07:03:53 +0100
> 
> (I've given this a new subject.)

Not a second too soon!

> This is about SIGCHLD, and I must say I find it a bit hard to tell if
> all other platforms do the same. There are simply too many #if's to
> consider in the signal handling code.
> 
> Anyway, what I see here: SIGCHLD doesn't do anything dangerous in the
> signal handler. Instead, the occurrence of SIGCHLD is added to a queue
> with enqueue_async_work and that's basically it.

Are we looking at the same code?  I was talking about
handle_child_signal, which is called thusly:

  static void
  deliver_child_signal (int sig)
  {
    deliver_process_signal (sig, handle_child_signal);
  }

What I see in handle_child_signal is not what you describe above.

> The work items in the queue are processed by process_pending_signals,
> outside of the signal handler. Very nice, that's how it should be :-).

I think you are looking at how SIGIO and SIGALRM are processed.

> (And maybe, just as an inspiration, one could use that construct for
> SIGPROF?)

Could one?  SIGPROF's handler should sample the "program counter", so
delaying the sample will sample it in a wrong place.  Right?



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: SIGPROF + SIGCHLD and igc
  2024-12-24 12:54                       ` Eli Zaretskii
@ 2024-12-24 12:59                         ` Gerd Möllmann
  0 siblings, 0 replies; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-24 12:59 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, ofv, emacs-devel, eller.helmut, acorallo

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: pipcet@protonmail.com,  ofv@wanadoo.es,  emacs-devel@gnu.org,
>>   eller.helmut@gmail.com,  acorallo@gnu.org
>> Date: Tue, 24 Dec 2024 07:03:53 +0100
>> 
>> (I've given this a new subject.)
>
> Not a second too soon!
>
>> This is about SIGCHLD, and I must say I find it a bit hard to tell if
>> all other platforms do the same. There are simply too many #if's to
>> consider in the signal handling code.
>> 
>> Anyway, what I see here: SIGCHLD doesn't do anything dangerous in the
>> signal handler. Instead, the occurrence of SIGCHLD is added to a queue
>> with enqueue_async_work and that's basically it.
>
> Are we looking at the same code?  I was talking about
> handle_child_signal, which is called thusly:

No we aren't :-). My mistake. I was looking at he code Pip wrote.
See Helmut's later message and my response.

>
>   static void
>   deliver_child_signal (int sig)
>   {
>     deliver_process_signal (sig, handle_child_signal);
>   }
>
> What I see in handle_child_signal is not what you describe above.
>
>> The work items in the queue are processed by process_pending_signals,
>> outside of the signal handler. Very nice, that's how it should be :-).
>
> I think you are looking at how SIGIO and SIGALRM are processed.
>
>> (And maybe, just as an inspiration, one could use that construct for
>> SIGPROF?)
>
> Could one?  SIGPROF's handler should sample the "program counter", so
> delaying the sample will sample it in a wrong place.  Right?

Taking the backtrace would be done in the signal handler, the rest would
be done elsewhere. So, no.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: SIGPROF + SIGCHLD and igc
  2024-12-24  6:03                     ` SIGPROF + SIGCHLD and igc Gerd Möllmann
  2024-12-24  8:23                       ` Helmut Eller
  2024-12-24 12:54                       ` Eli Zaretskii
@ 2024-12-27  8:08                       ` Helmut Eller
  2024-12-27  8:51                         ` Eli Zaretskii
                                           ` (2 more replies)
  2 siblings, 3 replies; 203+ messages in thread
From: Helmut Eller @ 2024-12-27  8:08 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: Eli Zaretskii, pipcet, ofv, emacs-devel, acorallo

On Tue, Dec 24 2024, Gerd Möllmann wrote:

>> I've looked at SIGPROF. From an admittedly brief look at this, I'd
>> summarize my results as:
>>
>> - The important part is get_backtrace. The rest could be done elsewhere
>>   by posting that to a message board, or whatever the mechanism is at
>>   the end.

I have an idea how to make a safer profiler.  First, remember that MPS
will stop mutator threads to protect its own consistency.  What happens
if we make the profiler its own thread?  MPS will stop the profiler like
normal mutator threads.  This is useful and is as it should be.

The problem now is how the profiler thread can do what get_backtrace
does.  We can use a protocol between the profiler thread and the main
thread that goes like this:

1. The profiler periodically sends a signal, say SIGPROF, to the main
   thread.

2. In the signal handler for SIGPROF, the main thread synchronizes
   itself with the profiler by communicating over a pipe (like the Unix
   fathers did it).  It sends some token to the profiler and waits.

3. The profiler receives the token and can now access the virtual
   machine state because the main thread waits.  The profiler now does
   what get_backtrace does.  After logging the stack snapshot, the
   profiler sends the token back to the main thread

4. The main thread receives the token, returns from the signal handler,
   and continues execution.

Note that the signal handler only communicates over the pipe and doesn't
read or write any memory that could mess up MPS.  The profiler thread
doesn't run any signal handlers (other than those that MPS may use
behind the scenes).

Another observation: MPS sends SIGXFSZ and SIGXCPU to stop and resume
threads.  We could intercept those signals and block SIGPROF while the
thread is stopped.  Obviously a hack, but could still be useful.

Helmut

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: SIGPROF + SIGCHLD and igc
  2024-12-27  8:08                       ` Helmut Eller
@ 2024-12-27  8:51                         ` Eli Zaretskii
  2024-12-27 14:53                           ` Helmut Eller
  2024-12-27  8:55                         ` Gerd Möllmann
  2024-12-27 11:36                         ` Pip Cet via Emacs development discussions.
  2 siblings, 1 reply; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-27  8:51 UTC (permalink / raw)
  To: Helmut Eller; +Cc: gerd.moellmann, pipcet, ofv, emacs-devel, acorallo

> From: Helmut Eller <eller.helmut@gmail.com>
> Cc: Eli Zaretskii <eliz@gnu.org>,  pipcet@protonmail.com,  ofv@wanadoo.es,
>   emacs-devel@gnu.org,  acorallo@gnu.org
> Date: Fri, 27 Dec 2024 09:08:00 +0100
> 
> On Tue, Dec 24 2024, Gerd Möllmann wrote:
> 
> >> I've looked at SIGPROF. From an admittedly brief look at this, I'd
> >> summarize my results as:
> >>
> >> - The important part is get_backtrace. The rest could be done elsewhere
> >>   by posting that to a message board, or whatever the mechanism is at
> >>   the end.
> 
> I have an idea how to make a safer profiler.  First, remember that MPS
> will stop mutator threads to protect its own consistency.

Can you spell out what are "mutator threads" in this context, so we
all are on the same page?

> What happens if we make the profiler its own thread?  MPS will stop
> the profiler like normal mutator threads.  This is useful and is as
> it should be.

AFAIU, it means the profiler will not be able to account for GC, but
let's put this aside for a moment.

> The problem now is how the profiler thread can do what get_backtrace
> does.  We can use a protocol between the profiler thread and the main
> thread that goes like this:
> 
> 1. The profiler periodically sends a signal, say SIGPROF, to the main
>    thread.
> 
> 2. In the signal handler for SIGPROF, the main thread synchronizes
>    itself with the profiler by communicating over a pipe (like the Unix
>    fathers did it).  It sends some token to the profiler and waits.
> 
> 3. The profiler receives the token and can now access the virtual
>    machine state because the main thread waits.  The profiler now does
>    what get_backtrace does.  After logging the stack snapshot, the
>    profiler sends the token back to the main thread
> 
> 4. The main thread receives the token, returns from the signal handler,
>    and continues execution.

You are basically describing the way SIGPROF emulation is implemented
on Windows (see w32proc.c for the details).  But I don't understand
why you need that pipe: doesn't pthreads allow one thread to stop the
other?  If so, just make the "profiler thread" stop the main thread
instead of your step 2, and resume the main thread instead of your
step 4.  Am I missing something?

> Note that the signal handler only communicates over the pipe and doesn't
> read or write any memory that could mess up MPS.  The profiler thread
> doesn't run any signal handlers (other than those that MPS may use
> behind the scenes).

You basically emulate blocking of SIGPROF by relying on MPS to stop
the profiler thread when it cannot allow access to memory that could
mess up MPS, is that right?

> Another observation: MPS sends SIGXFSZ and SIGXCPU to stop and resume
> threads.  We could intercept those signals and block SIGPROF while the
> thread is stopped.  Obviously a hack, but could still be useful.

Do these signals get delivered to the main thread as well?  Or only to
mutator threads?



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: SIGPROF + SIGCHLD and igc
  2024-12-27  8:51                         ` Eli Zaretskii
@ 2024-12-27 14:53                           ` Helmut Eller
  2024-12-27 15:09                             ` Pip Cet via Emacs development discussions.
  2024-12-27 15:19                             ` Eli Zaretskii
  0 siblings, 2 replies; 203+ messages in thread
From: Helmut Eller @ 2024-12-27 14:53 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gerd.moellmann, pipcet, ofv, emacs-devel, acorallo

On Fri, Dec 27 2024, Eli Zaretskii wrote:

>> I have an idea how to make a safer profiler.  First, remember that MPS
>> will stop mutator threads to protect its own consistency.
>
> Can you spell out what are "mutator threads" in this context, so we
> all are on the same page?

The mutator threads are those that we register with
mps_thread_reg.

>> What happens if we make the profiler its own thread?  MPS will stop
>> the profiler like normal mutator threads.  This is useful and is as
>> it should be.
>
> AFAIU, it means the profiler will not be able to account for GC, but
> let's put this aside for a moment.

Yes.  Probably a similar issue as when SIGPROF is blocked while threads
are stopped.

>> The problem now is how the profiler thread can do what get_backtrace
>> does.  We can use a protocol between the profiler thread and the main
>> thread that goes like this:
>> 
>> 1. The profiler periodically sends a signal, say SIGPROF, to the main
>>    thread.
>> 
>> 2. In the signal handler for SIGPROF, the main thread synchronizes
>>    itself with the profiler by communicating over a pipe (like the Unix
>>    fathers did it).  It sends some token to the profiler and waits.
>> 
>> 3. The profiler receives the token and can now access the virtual
>>    machine state because the main thread waits.  The profiler now does
>>    what get_backtrace does.  After logging the stack snapshot, the
>>    profiler sends the token back to the main thread
>> 
>> 4. The main thread receives the token, returns from the signal handler,
>>    and continues execution.
>
> You are basically describing the way SIGPROF emulation is implemented
> on Windows (see w32proc.c for the details).

Good; then I assume that we don't need to discuss that this might
introduce too much overhead.

> But I don't understand
> why you need that pipe: doesn't pthreads allow one thread to stop the
> other?  If so, just make the "profiler thread" stop the main thread
> instead of your step 2, and resume the main thread instead of your
> step 4.  Am I missing something?

You mean there is a phtread_stop function that is similar to
SuspendThread on Windows?  I've not found anything like that; but that
doesn't mean that there isn't one.  The Linux man-pages for pthreads are
notoriously useless.

I like the pipe because it's a signal safe and thread safe communication
mechanism.  It avoids the need for mutexes or stdatomic stuff; that's
best left to wizards.

>> Note that the signal handler only communicates over the pipe and doesn't
>> read or write any memory that could mess up MPS.  The profiler thread
>> doesn't run any signal handlers (other than those that MPS may use
>> behind the scenes).
>
> You basically emulate blocking of SIGPROF by relying on MPS to stop
> the profiler thread when it cannot allow access to memory that could
> mess up MPS, is that right?

Exactly.

>> Another observation: MPS sends SIGXFSZ and SIGXCPU to stop and resume
>> threads.  We could intercept those signals and block SIGPROF while the
>> thread is stopped.  Obviously a hack, but could still be useful.
>
> Do these signals get delivered to the main thread as well?  Or only to
> mutator threads?

I've not looked too closely, but the documentation for ThreadRingSuspend
suggests that they are delivered to all threads, except for the current
thread.  So if a non-main thread triggers a GC cycle, then the main
thread would receive them too.

Helmut



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: SIGPROF + SIGCHLD and igc
  2024-12-27 14:53                           ` Helmut Eller
@ 2024-12-27 15:09                             ` Pip Cet via Emacs development discussions.
  2024-12-27 15:19                             ` Eli Zaretskii
  1 sibling, 0 replies; 203+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-27 15:09 UTC (permalink / raw)
  To: Helmut Eller; +Cc: Eli Zaretskii, gerd.moellmann, ofv, emacs-devel, acorallo

"Helmut Eller" <eller.helmut@gmail.com> writes:

> I've not looked too closely, but the documentation for ThreadRingSuspend
> suggests that they are delivered to all threads, except for the current
> thread.  So if a non-main thread triggers a GC cycle, then the main
> thread would receive them too.

That is correct.

Pip




^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: SIGPROF + SIGCHLD and igc
  2024-12-27 14:53                           ` Helmut Eller
  2024-12-27 15:09                             ` Pip Cet via Emacs development discussions.
@ 2024-12-27 15:19                             ` Eli Zaretskii
  1 sibling, 0 replies; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-27 15:19 UTC (permalink / raw)
  To: Helmut Eller; +Cc: gerd.moellmann, pipcet, ofv, emacs-devel, acorallo

> From: Helmut Eller <eller.helmut@gmail.com>
> Cc: gerd.moellmann@gmail.com,  pipcet@protonmail.com,  ofv@wanadoo.es,
>   emacs-devel@gnu.org,  acorallo@gnu.org
> Date: Fri, 27 Dec 2024 15:53:43 +0100
> 
> On Fri, Dec 27 2024, Eli Zaretskii wrote:
> 
> > But I don't understand
> > why you need that pipe: doesn't pthreads allow one thread to stop the
> > other?  If so, just make the "profiler thread" stop the main thread
> > instead of your step 2, and resume the main thread instead of your
> > step 4.  Am I missing something?
> 
> You mean there is a phtread_stop function that is similar to
> SuspendThread on Windows?  I've not found anything like that; but that
> doesn't mean that there isn't one.  The Linux man-pages for pthreads are
> notoriously useless.

This:

  https://stackoverflow.com/questions/18826853/how-to-stop-a-running-pthread-thread

seems to say you need to call pthread_kill with SIGSTOP/SIGCONT,
he-he.

> I like the pipe because it's a signal safe and thread safe communication
> mechanism.  It avoids the need for mutexes or stdatomic stuff; that's
> best left to wizards.

This could work for Posix systems, but I don't think it works for
Windows to have a pipe between two threads.  And Windows already has
SuspendThread, so maybe a pipe is not needed there.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: SIGPROF + SIGCHLD and igc
  2024-12-27  8:08                       ` Helmut Eller
  2024-12-27  8:51                         ` Eli Zaretskii
@ 2024-12-27  8:55                         ` Gerd Möllmann
  2024-12-27 15:40                           ` Helmut Eller
  2024-12-27 11:36                         ` Pip Cet via Emacs development discussions.
  2 siblings, 1 reply; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-27  8:55 UTC (permalink / raw)
  To: Helmut Eller; +Cc: Eli Zaretskii, pipcet, ofv, emacs-devel, acorallo

Helmut Eller <eller.helmut@gmail.com> writes:

> On Tue, Dec 24 2024, Gerd Möllmann wrote:
>
>>> I've looked at SIGPROF. From an admittedly brief look at this, I'd
>>> summarize my results as:
>>>
>>> - The important part is get_backtrace. The rest could be done elsewhere
>>>   by posting that to a message board, or whatever the mechanism is at
>>>   the end.
>
> I have an idea how to make a safer profiler.  First, remember that MPS
> will stop mutator threads to protect its own consistency.  What happens
> if we make the profiler its own thread?  MPS will stop the profiler like
> normal mutator threads.  This is useful and is as it should be.

That sounds interesting.

> The problem now is how the profiler thread can do what get_backtrace
> does.  We can use a protocol between the profiler thread and the main
> thread that goes like this:
>
> 1. The profiler periodically sends a signal, say SIGPROF, to the main
>    thread.
>
> 2. In the signal handler for SIGPROF, the main thread synchronizes
>    itself with the profiler by communicating over a pipe (like the Unix
>    fathers did it).  It sends some token to the profiler and waits.
>
> 3. The profiler receives the token and can now access the virtual
>    machine state because the main thread waits.  The profiler now does
>    what get_backtrace does.  After logging the stack snapshot, the
>    profiler sends the token back to the main thread
>
> 4. The main thread receives the token, returns from the signal handler,
>    and continues execution.
>
> Note that the signal handler only communicates over the pipe and doesn't
> read or write any memory that could mess up MPS.  The profiler thread
> doesn't run any signal handlers (other than those that MPS may use
> behind the scenes).

I think I've seen a pipe being used in similar circumstances elsewhere,
not in Emacs, in an actor system handling signals. Very interesting. I
like it.

> Another observation: MPS sends SIGXFSZ and SIGXCPU to stop and resume
> threads.  We could intercept those signals and block SIGPROF while the
> thread is stopped.  Obviously a hack, but could still be useful.

Do you know if it does the same on macOS? It sounds like something the
Mach heritage of macOS might kick in.

>
> Helmut



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: SIGPROF + SIGCHLD and igc
  2024-12-27  8:55                         ` Gerd Möllmann
@ 2024-12-27 15:40                           ` Helmut Eller
  2024-12-27 15:53                             ` Gerd Möllmann
  0 siblings, 1 reply; 203+ messages in thread
From: Helmut Eller @ 2024-12-27 15:40 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: Eli Zaretskii, pipcet, ofv, emacs-devel, acorallo

On Fri, Dec 27 2024, Gerd Möllmann wrote:

>> Another observation: MPS sends SIGXFSZ and SIGXCPU to stop and resume
>> threads.  We could intercept those signals and block SIGPROF while the
>> thread is stopped.  Obviously a hack, but could still be useful.
>
> Do you know if it does the same on macOS? It sounds like something the
> Mach heritage of macOS might kick in.

Sorry, I don't know anything about Mach.  The code in thxc.c seems to
use a thread_suspend function.

Helmut



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: SIGPROF + SIGCHLD and igc
  2024-12-27 15:40                           ` Helmut Eller
@ 2024-12-27 15:53                             ` Gerd Möllmann
  0 siblings, 0 replies; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-27 15:53 UTC (permalink / raw)
  To: Helmut Eller; +Cc: Eli Zaretskii, pipcet, ofv, emacs-devel, acorallo

Helmut Eller <eller.helmut@gmail.com> writes:

> On Fri, Dec 27 2024, Gerd Möllmann wrote:
>
>>> Another observation: MPS sends SIGXFSZ and SIGXCPU to stop and resume
>>> threads.  We could intercept those signals and block SIGPROF while the
>>> thread is stopped.  Obviously a hack, but could still be useful.
>>
>> Do you know if it does the same on macOS? It sounds like something the
>> Mach heritage of macOS might kick in.
>
> Sorry, I don't know anything about Mach.  The code in thxc.c seems to
> use a thread_suspend function.
>
> Helmut

That seems to be a kernel function

  https://developer.apple.com/documentation/kernel/1418833-thread_suspend
  



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: SIGPROF + SIGCHLD and igc
  2024-12-27  8:08                       ` Helmut Eller
  2024-12-27  8:51                         ` Eli Zaretskii
  2024-12-27  8:55                         ` Gerd Möllmann
@ 2024-12-27 11:36                         ` Pip Cet via Emacs development discussions.
  2024-12-27 16:14                           ` Helmut Eller
  2 siblings, 1 reply; 203+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-27 11:36 UTC (permalink / raw)
  To: Helmut Eller
  Cc: Gerd Möllmann, Eli Zaretskii, ofv, emacs-devel, acorallo

"Helmut Eller" <eller.helmut@gmail.com> writes:

> On Tue, Dec 24 2024, Gerd Möllmann wrote:
>
>>> I've looked at SIGPROF. From an admittedly brief look at this, I'd
>>> summarize my results as:
>>>
>>> - The important part is get_backtrace. The rest could be done elsewhere
>>>   by posting that to a message board, or whatever the mechanism is at
>>>   the end.
>
> I have an idea how to make a safer profiler.  First, remember that MPS

"Safer" how?

> will stop mutator threads to protect its own consistency.  What happens
> if we make the profiler its own thread?  MPS will stop the profiler like
> normal mutator threads.  This is useful and is as it should be.

If we tell MPS about it, it'll be stopped, yes.

> The problem now is how the profiler thread can do what get_backtrace
> does.  We can use a protocol between the profiler thread and the main
> thread that goes like this:
>
> 1. The profiler periodically sends a signal, say SIGPROF, to the main
>    thread.
>
> 2. In the signal handler for SIGPROF, the main thread synchronizes
>    itself with the profiler by communicating over a pipe (like the Unix
>    fathers did it).  It sends some token to the profiler and waits.

...forever, if the profiler thread has been suspended at this point.

> 3. The profiler receives the token and can now access the virtual
>    machine state because the main thread waits.  The profiler now does
>    what get_backtrace does.  After logging the stack snapshot, the
>    profiler sends the token back to the main thread
>
> 4. The main thread receives the token, returns from the signal handler,
>    and continues execution.
>
> Note that the signal handler only communicates over the pipe and doesn't
> read or write any memory that could mess up MPS.  The profiler thread
> doesn't run any signal handlers (other than those that MPS may use
> behind the scenes).

Can you explain how this would produce different behavior than what we
have currently, apart from the deadlock?  I'm sorry, but it's not
obvious to me.

If the main thread was performing GC when SIGPROF arrived, the SIGPROF
handler cannot do get_backtrace, and the profiler thread can't, either.

> Another observation: MPS sends SIGXFSZ and SIGXCPU to stop and resume
> threads.  We could intercept those signals and block SIGPROF while the
> thread is stopped.  Obviously a hack, but could still be useful.

MPS blocks all signals (except for the wakeup signal) while threads are
suspended.  This is suboptimal behavior and not something we should
expect and rely on.

Pip




^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: SIGPROF + SIGCHLD and igc
  2024-12-27 11:36                         ` Pip Cet via Emacs development discussions.
@ 2024-12-27 16:14                           ` Helmut Eller
  2024-12-28 10:02                             ` Helmut Eller
  0 siblings, 1 reply; 203+ messages in thread
From: Helmut Eller @ 2024-12-27 16:14 UTC (permalink / raw)
  To: Pip Cet; +Cc: Gerd Möllmann, Eli Zaretskii, ofv, emacs-devel, acorallo

>> 1. The profiler periodically sends a signal, say SIGPROF, to the main
>>    thread.
>>
>> 2. In the signal handler for SIGPROF, the main thread synchronizes
>>    itself with the profiler by communicating over a pipe (like the Unix
>>    fathers did it).  It sends some token to the profiler and waits.
>
> ...forever, if the profiler thread has been suspended at this point.

Hmm.  Indeed.  So back to the drawing board.

Helmut



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: SIGPROF + SIGCHLD and igc
  2024-12-27 16:14                           ` Helmut Eller
@ 2024-12-28 10:02                             ` Helmut Eller
  2024-12-28 10:50                               ` Eli Zaretskii
  0 siblings, 1 reply; 203+ messages in thread
From: Helmut Eller @ 2024-12-28 10:02 UTC (permalink / raw)
  To: Pip Cet; +Cc: Gerd Möllmann, Eli Zaretskii, ofv, emacs-devel, acorallo

On Fri, Dec 27 2024, Helmut Eller wrote:

> Pip Cet via wrote:
>> ...forever, if the profiler thread has been suspended at this point.
>
> Hmm.  Indeed.  So back to the drawing board.

It seems that the statement

  block SIGPROF while MPS holds the lock

is logically equivalent to

  deliver SIGPROF only while MPS does not hold the lock.

This variant might be bit easier to implement.  The "while MPS does not
hold the lock" part can be implemented by claiming the lock in the
profiler thread like so:

  mps_arena_t arena = global_igc->arena;
  ArenaEnter (arena);
  ... deliver SIGPROF part goes here ...
  ArenaLeave (arena);

The functions ArenaEnter and ArenaLeave are not part of the public API
but they are external symbols, so the linker can find them.

The "deliver SIGPROF" part goes like this:

1. The profiler thread calls pthread_kill (SIGPROF, <main_thread>) and
   then waits (on a pipe or whatever).

2. The SIGPROF handler gets called and immediately notifies the profiler
   thread (without waiting for a reply).  After that, it continues as
   usual calling get_backtrace etc.

3. The profiler thread awakes and releases the lock.

Regarding deadlocks: the profiler thread holds the lock while it waits.
So MPS should not be able to stop the profiler thread there.

Helmut

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: SIGPROF + SIGCHLD and igc
  2024-12-28 10:02                             ` Helmut Eller
@ 2024-12-28 10:50                               ` Eli Zaretskii
  2024-12-28 13:52                                 ` Helmut Eller
  0 siblings, 1 reply; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-28 10:50 UTC (permalink / raw)
  To: Helmut Eller; +Cc: pipcet, gerd.moellmann, ofv, emacs-devel, acorallo

> From: Helmut Eller <eller.helmut@gmail.com>
> Cc: Gerd Möllmann <gerd.moellmann@gmail.com>,  Eli
>  Zaretskii <eliz@gnu.org>,
>   ofv@wanadoo.es,  emacs-devel@gnu.org,  acorallo@gnu.org
> Date: Sat, 28 Dec 2024 11:02:44 +0100
> 
> On Fri, Dec 27 2024, Helmut Eller wrote:
> 
> > Pip Cet via wrote:
> >> ...forever, if the profiler thread has been suspended at this point.
> >
> > Hmm.  Indeed.  So back to the drawing board.
> 
> It seems that the statement
> 
>   block SIGPROF while MPS holds the lock
> 
> is logically equivalent to
> 
>   deliver SIGPROF only while MPS does not hold the lock.

Not necessarily.  Blocking a signals delays its delivery until the
time the signal is unblocked.  By contrast, what you propose will be
unable to profile code which caused MPS to take the lock.

> This variant might be bit easier to implement.  The "while MPS does not
> hold the lock" part can be implemented by claiming the lock in the
> profiler thread like so:
> 
>   mps_arena_t arena = global_igc->arena;
>   ArenaEnter (arena);
>   ... deliver SIGPROF part goes here ...
>   ArenaLeave (arena);

What happens if, when we call ArenaEnter, MPS already holds the arena
lock?

> The functions ArenaEnter and ArenaLeave are not part of the public API
> but they are external symbols, so the linker can find them.

We should ask the MPS folks what they think about using non-public
API.

> The "deliver SIGPROF" part goes like this:
> 
> 1. The profiler thread calls pthread_kill (SIGPROF, <main_thread>) and
>    then waits (on a pipe or whatever).
> 
> 2. The SIGPROF handler gets called and immediately notifies the profiler
>    thread (without waiting for a reply).  After that, it continues as
>    usual calling get_backtrace etc.
> 
> 3. The profiler thread awakes and releases the lock.

This leaves a small window between the time the SIGPROF handler writes
to the pipe and the time the profiler thread calls ArenaLeave.  During
that window, the arena is locked, AFAIU, and we can still have
recursive-lock situations, which cause an abort.  Am I missing
something?

> Regarding deadlocks: the profiler thread holds the lock while it waits.
> So MPS should not be able to stop the profiler thread there.

Which means we don't register the profiler thread with MPS, right?



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: SIGPROF + SIGCHLD and igc
  2024-12-28 10:50                               ` Eli Zaretskii
@ 2024-12-28 13:52                                 ` Helmut Eller
  2024-12-28 14:25                                   ` Eli Zaretskii
  0 siblings, 1 reply; 203+ messages in thread
From: Helmut Eller @ 2024-12-28 13:52 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, gerd.moellmann, ofv, emacs-devel, acorallo

On Sat, Dec 28 2024, Eli Zaretskii wrote:

>> It seems that the statement
>> 
>>   block SIGPROF while MPS holds the lock
>> 
>> is logically equivalent to
>> 
>>   deliver SIGPROF only while MPS does not hold the lock.
>
> Not necessarily.  Blocking a signals delays its delivery until the
> time the signal is unblocked.  By contrast, what you propose will be
> unable to profile code which caused MPS to take the lock.

Hmm, I think I disagree.  But I'm not sure of what sequence of events
you're thinking of.

>> This variant might be bit easier to implement.  The "while MPS does not
>> hold the lock" part can be implemented by claiming the lock in the
>> profiler thread like so:
>> 
>>   mps_arena_t arena = global_igc->arena;
>>   ArenaEnter (arena);
>>   ... deliver SIGPROF part goes here ...
>>   ArenaLeave (arena);
>
> What happens if, when we call ArenaEnter, MPS already holds the arena
> lock?

Since MPS holds the lock, it would run in a different thread.  So the
profiler thread blocks until MPS releases the lock.

ArenaEnter uses non-recursive locks.

>> The "deliver SIGPROF" part goes like this:
>> 
>> 1. The profiler thread calls pthread_kill (SIGPROF, <main_thread>) and
>>    then waits (on a pipe or whatever).
>> 
>> 2. The SIGPROF handler gets called and immediately notifies the profiler
>>    thread (without waiting for a reply).  After that, it continues as
>>    usual calling get_backtrace etc.
>> 
>> 3. The profiler thread awakes and releases the lock.
>
> This leaves a small window between the time the SIGPROF handler writes
> to the pipe and the time the profiler thread calls ArenaLeave.  During
> that window, the arena is locked, AFAIU, and we can still have
> recursive-lock situations, which cause an abort.  Am I missing
> something?

During that time window, the lock is held by the profiler thread.  The
SIGPROF handler runs in the main thread.  If the main thread tries to
claim the lock, it will block until the profiler thread releases it.

>> Regarding deadlocks: the profiler thread holds the lock while it waits.
>> So MPS should not be able to stop the profiler thread there.
>
> Which means we don't register the profiler thread with MPS, right?

I'm not sure. It may not be safe to call ArenaEnter in non-registered
threads.

Helmut



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: SIGPROF + SIGCHLD and igc
  2024-12-28 13:52                                 ` Helmut Eller
@ 2024-12-28 14:25                                   ` Eli Zaretskii
  2024-12-28 16:46                                     ` Helmut Eller
  0 siblings, 1 reply; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-28 14:25 UTC (permalink / raw)
  To: Helmut Eller; +Cc: pipcet, gerd.moellmann, ofv, emacs-devel, acorallo

> From: Helmut Eller <eller.helmut@gmail.com>
> Cc: pipcet@protonmail.com,  gerd.moellmann@gmail.com,  ofv@wanadoo.es,
>   emacs-devel@gnu.org,  acorallo@gnu.org
> Date: Sat, 28 Dec 2024 14:52:33 +0100
> 
> On Sat, Dec 28 2024, Eli Zaretskii wrote:
> 
> >> It seems that the statement
> >> 
> >>   block SIGPROF while MPS holds the lock
> >> 
> >> is logically equivalent to
> >> 
> >>   deliver SIGPROF only while MPS does not hold the lock.
> >
> > Not necessarily.  Blocking a signals delays its delivery until the
> > time the signal is unblocked.  By contrast, what you propose will be
> > unable to profile code which caused MPS to take the lock.
> 
> Hmm, I think I disagree.  But I'm not sure of what sequence of events
> you're thinking of.

I'm thinking about a situation where SIGPROF was delivered while it
was blocked.  In that case, it will be re-delivered once we unblock
it.

By contrast, if we avoid delivering SIGPROF in the first place, it
will never be delivered until the next time SIGPROF is due.

So imagine a function FUNC that conses some Lisp object.  This calls
into MPS, which blocks SIGPROF, takes the arena lock, then does its
thing, then releases the lock and unblocks SIGPROF.  If SIGPROF
happened while MPS was working with SIGPROF blocked, then the moment
SIGPROF is unblocked, the SIGPROF handler in the main thread will be
called, and will have the opportunity to see that we were executing
FUNC.  By contrast, if the profiler thread avoided delivering SIGPROF
because it saw the arena locked, the next time the profiler thread
decides to deliver SIGPROF, execution could have already left FUNC,
and thus FUNC will not be in the profile.

I hope I made myself more clear this time.

> >> This variant might be bit easier to implement.  The "while MPS does not
> >> hold the lock" part can be implemented by claiming the lock in the
> >> profiler thread like so:
> >> 
> >>   mps_arena_t arena = global_igc->arena;
> >>   ArenaEnter (arena);
> >>   ... deliver SIGPROF part goes here ...
> >>   ArenaLeave (arena);
> >
> > What happens if, when we call ArenaEnter, MPS already holds the arena
> > lock?
> 
> Since MPS holds the lock, it would run in a different thread.

Yes, of course: we are talking about an implementation where the
profiler thread is separate, so the above code, which AFAIU runs in
the profiler thread, will be in a thread separate from the one where
MPS runs.

> So the profiler thread blocks until MPS releases the lock.
> 
> ArenaEnter uses non-recursive locks.

Hm... if ArenaEnter uses non-recursive locks, how come we get aborts
if some code tries to lock the arena when it is already locked?  IOW,
how is this situation different from what we already saw several times
in the crashes related to having SIGPROF delivered while MPS holds the
arena lock?

> 
> >> The "deliver SIGPROF" part goes like this:
> >> 
> >> 1. The profiler thread calls pthread_kill (SIGPROF, <main_thread>) and
> >>    then waits (on a pipe or whatever).
> >> 
> >> 2. The SIGPROF handler gets called and immediately notifies the profiler
> >>    thread (without waiting for a reply).  After that, it continues as
> >>    usual calling get_backtrace etc.
> >> 
> >> 3. The profiler thread awakes and releases the lock.
> >
> > This leaves a small window between the time the SIGPROF handler writes
> > to the pipe and the time the profiler thread calls ArenaLeave.  During
> > that window, the arena is locked, AFAIU, and we can still have
> > recursive-lock situations, which cause an abort.  Am I missing
> > something?
> 
> During that time window, the lock is held by the profiler thread.  The
> SIGPROF handler runs in the main thread.  If the main thread tries to
> claim the lock, it will block until the profiler thread releases it.

See above: I thought that such a situation triggers crashes.  I'm
probably missing something.

> >> Regarding deadlocks: the profiler thread holds the lock while it waits.
> >> So MPS should not be able to stop the profiler thread there.
> >
> > Which means we don't register the profiler thread with MPS, right?
> 
> I'm not sure. It may not be safe to call ArenaEnter in non-registered
> threads.

But if we do register the thread, then MPS _will_ stop it, no?

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: SIGPROF + SIGCHLD and igc
  2024-12-28 14:25                                   ` Eli Zaretskii
@ 2024-12-28 16:46                                     ` Helmut Eller
  2024-12-28 17:35                                       ` Eli Zaretskii
  2024-12-28 19:32                                       ` Pip Cet via Emacs development discussions.
  0 siblings, 2 replies; 203+ messages in thread
From: Helmut Eller @ 2024-12-28 16:46 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, gerd.moellmann, ofv, emacs-devel, acorallo

On Sat, Dec 28 2024, Eli Zaretskii wrote:

> I'm thinking about a situation where SIGPROF was delivered while it
> was blocked.  In that case, it will be re-delivered once we unblock
> it.
>
> By contrast, if we avoid delivering SIGPROF in the first place, it
> will never be delivered until the next time SIGPROF is due.
>
> So imagine a function FUNC that conses some Lisp object.  This calls
> into MPS, which blocks SIGPROF, takes the arena lock, then does its
> thing, then releases the lock and unblocks SIGPROF.  If SIGPROF
> happened while MPS was working with SIGPROF blocked, then the moment
> SIGPROF is unblocked, the SIGPROF handler in the main thread will be
> called, and will have the opportunity to see that we were executing
> FUNC.  By contrast, if the profiler thread avoided delivering SIGPROF
> because it saw the arena locked, the next time the profiler thread
> decides to deliver SIGPROF, execution could have already left FUNC,
> and thus FUNC will not be in the profile.
>
> I hope I made myself more clear this time.

I think I see what you mean.  I imagine the profiler thread to be a loop
like

  while (true) {
     sleep (<x-seconds>)
     ArenaEnter (<arena>)
       pthread_kill (SIGPROF, <main-thread>)
       wait (<pipe>)
     ArenaLeave (<arena>)
  }

If the profiler thread blocks in ArenaEnter, then we are at the mercy of
the thread scheduler.  The kernel may decide to let the main thread run
for a long time before running the profiler thread again.  While with
sigblock(), we know exactly when the kernel will call the SIGPROF
handler.  So sigblock() would be more predictable and accurate.

>> >> This variant might be bit easier to implement.  The "while MPS does not
>> >> hold the lock" part can be implemented by claiming the lock in the
>> >> profiler thread like so:
>> >> 
>> >>   mps_arena_t arena = global_igc->arena;
>> >>   ArenaEnter (arena);
>> >>   ... deliver SIGPROF part goes here ...
>> >>   ArenaLeave (arena);
>> >
>> > What happens if, when we call ArenaEnter, MPS already holds the arena
>> > lock?
>> 
>> Since MPS holds the lock, it would run in a different thread.
>
> Yes, of course: we are talking about an implementation where the
> profiler thread is separate, so the above code, which AFAIU runs in
> the profiler thread, will be in a thread separate from the one where
> MPS runs.
>
>> So the profiler thread blocks until MPS releases the lock.
>> 
>> ArenaEnter uses non-recursive locks.
>
> Hm... if ArenaEnter uses non-recursive locks, how come we get aborts
> if some code tries to lock the arena when it is already locked?  IOW,
> how is this situation different from what we already saw several times
> in the crashes related to having SIGPROF delivered while MPS holds the
> arena lock?

I'm not sure what you expect instead.  It's an error to claim a
non-recursive lock twice in the same thread.  The fault handler claims
the lock.  If the SIGPROF handler interrupts MPS while it's holding the
lock and then triggers a fault, then it claims the lock a second time.
It's no surprise to see crashes here.

>> During that time window, the lock is held by the profiler thread.  The
>> SIGPROF handler runs in the main thread.  If the main thread tries to
>> claim the lock, it will block until the profiler thread releases it.
>
> See above: I thought that such a situation triggers crashes.  I'm
> probably missing something.

If two threads are claiming a the same non-recursive lock concurrently,
then it's not an error.

>> >> Regarding deadlocks: the profiler thread holds the lock while it waits.
>> >> So MPS should not be able to stop the profiler thread there.
>> >
>> > Which means we don't register the profiler thread with MPS, right?
>> 
>> I'm not sure. It may not be safe to call ArenaEnter in non-registered
>> threads.
>
> But if we do register the thread, then MPS _will_ stop it, no?

Good point.  But I think we are safe: to access the list of threads to
stop, MPS must first hold the arena lock.

Helmut





^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: SIGPROF + SIGCHLD and igc
  2024-12-28 16:46                                     ` Helmut Eller
@ 2024-12-28 17:35                                       ` Eli Zaretskii
  2024-12-28 18:08                                         ` Helmut Eller
  2024-12-28 19:32                                       ` Pip Cet via Emacs development discussions.
  1 sibling, 1 reply; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-28 17:35 UTC (permalink / raw)
  To: Helmut Eller; +Cc: pipcet, gerd.moellmann, ofv, emacs-devel, acorallo

> From: Helmut Eller <eller.helmut@gmail.com>
> Cc: pipcet@protonmail.com,  gerd.moellmann@gmail.com,  ofv@wanadoo.es,
>   emacs-devel@gnu.org,  acorallo@gnu.org
> Date: Sat, 28 Dec 2024 17:46:42 +0100
> 
> On Sat, Dec 28 2024, Eli Zaretskii wrote:
> 
> > Hm... if ArenaEnter uses non-recursive locks, how come we get aborts
> > if some code tries to lock the arena when it is already locked?  IOW,
> > how is this situation different from what we already saw several times
> > in the crashes related to having SIGPROF delivered while MPS holds the
> > arena lock?
> 
> I'm not sure what you expect instead.  It's an error to claim a
> non-recursive lock twice in the same thread.  The fault handler claims
> the lock.  If the SIGPROF handler interrupts MPS while it's holding the
> lock and then triggers a fault, then it claims the lock a second time.
> It's no surprise to see crashes here.
> 
> >> During that time window, the lock is held by the profiler thread.  The
> >> SIGPROF handler runs in the main thread.  If the main thread tries to
> >> claim the lock, it will block until the profiler thread releases it.
> >
> > See above: I thought that such a situation triggers crashes.  I'm
> > probably missing something.
> 
> If two threads are claiming a the same non-recursive lock concurrently,
> then it's not an error.

Oh, so you are saying that taking the lock twice is a fatal error only
if that is done from the same thread?  Is that known for certain?

> >> >> Regarding deadlocks: the profiler thread holds the lock while it waits.
> >> >> So MPS should not be able to stop the profiler thread there.
> >> >
> >> > Which means we don't register the profiler thread with MPS, right?
> >> 
> >> I'm not sure. It may not be safe to call ArenaEnter in non-registered
> >> threads.
> >
> > But if we do register the thread, then MPS _will_ stop it, no?
> 
> Good point.  But I think we are safe: to access the list of threads to
> stop, MPS must first hold the arena lock.

Are we sure MPS must take the lock before it can stop registered
threads?



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: SIGPROF + SIGCHLD and igc
  2024-12-28 17:35                                       ` Eli Zaretskii
@ 2024-12-28 18:08                                         ` Helmut Eller
  2024-12-28 19:00                                           ` Eli Zaretskii
  0 siblings, 1 reply; 203+ messages in thread
From: Helmut Eller @ 2024-12-28 18:08 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, gerd.moellmann, ofv, emacs-devel, acorallo

On Sat, Dec 28 2024, Eli Zaretskii wrote:

>> If two threads are claiming a the same non-recursive lock concurrently,
>> then it's not an error.
>
> Oh, so you are saying that taking the lock twice is a fatal error only
> if that is done from the same thread?  Is that known for certain?

Ahem, now you're making be nervous.  I'll show the implementation
before I say something stupid:

/* LockClaim -- claim a lock (non-recursive) */

void (LockClaim)(Lock lock)
{
  int res;

  AVERT(Lock, lock);

  res = pthread_mutex_lock(&lock->mut);
  /* pthread_mutex_lock will error if we own the lock already. */
  AVER(res == 0); /* <design/check/#.common> */

  /* This should be the first claim.  Now we own the mutex */
  /* it is ok to check this. */
  AVER(lock->claims == 0);
  lock->claims = 1;
}

/* LockClaimRecursive -- claim a lock (recursive) */

void (LockClaimRecursive)(Lock lock)
{
  int res;

  AVERT(Lock, lock);

  res = pthread_mutex_lock(&lock->mut);
  /* pthread_mutex_lock will return: */
  /*     0 if we have just claimed the lock */
  /*     EDEADLK if we own the lock already. */
  AVER((res == 0) == (lock->claims == 0));
  AVER((res == EDEADLK) == (lock->claims > 0));

  ++lock->claims;
  AVER(lock->claims > 0);
}

> Are we sure MPS must take the lock before it can stop registered
> threads?

Now that's a question for the MPS mailing list.

Helmut



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: SIGPROF + SIGCHLD and igc
  2024-12-28 18:08                                         ` Helmut Eller
@ 2024-12-28 19:00                                           ` Eli Zaretskii
  2024-12-28 19:28                                             ` Helmut Eller
  0 siblings, 1 reply; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-28 19:00 UTC (permalink / raw)
  To: Helmut Eller; +Cc: pipcet, gerd.moellmann, ofv, emacs-devel, acorallo

> From: Helmut Eller <eller.helmut@gmail.com>
> Cc: pipcet@protonmail.com,  gerd.moellmann@gmail.com,  ofv@wanadoo.es,
>   emacs-devel@gnu.org,  acorallo@gnu.org
> Date: Sat, 28 Dec 2024 19:08:18 +0100
> 
> On Sat, Dec 28 2024, Eli Zaretskii wrote:
> 
> >> If two threads are claiming a the same non-recursive lock concurrently,
> >> then it's not an error.
> >
> > Oh, so you are saying that taking the lock twice is a fatal error only
> > if that is done from the same thread?  Is that known for certain?
> 
> Ahem, now you're making be nervous.  I'll show the implementation
> before I say something stupid:
> 
> /* LockClaim -- claim a lock (non-recursive) */
> 
> void (LockClaim)(Lock lock)
> {
>   int res;
> 
>   AVERT(Lock, lock);
> 
>   res = pthread_mutex_lock(&lock->mut);
>   /* pthread_mutex_lock will error if we own the lock already. */
>   AVER(res == 0); /* <design/check/#.common> */
> 
>   /* This should be the first claim.  Now we own the mutex */
>   /* it is ok to check this. */
>   AVER(lock->claims == 0);
>   lock->claims = 1;
> }

I guess they use PTHREAD_MUTEX_ERRORCHECK to get the error?

Anyway, the MS-Windows implementation uses EnterCriticalSection, which
allows a thread to call it any number of times after the first call
succeeded, so they must be using something else on Windows to detect
multiple locks by the same thread, maybe the count of claims or
something?

> 
> > Are we sure MPS must take the lock before it can stop registered
> > threads?
> 
> Now that's a question for the MPS mailing list.

Right.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: SIGPROF + SIGCHLD and igc
  2024-12-28 19:00                                           ` Eli Zaretskii
@ 2024-12-28 19:28                                             ` Helmut Eller
  0 siblings, 0 replies; 203+ messages in thread
From: Helmut Eller @ 2024-12-28 19:28 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, gerd.moellmann, ofv, emacs-devel, acorallo

On Sat, Dec 28 2024, Eli Zaretskii wrote:

> I guess they use PTHREAD_MUTEX_ERRORCHECK to get the error?

Yes, they do.

> Anyway, the MS-Windows implementation uses EnterCriticalSection, which
> allows a thread to call it any number of times after the first call
> succeeded, so they must be using something else on Windows to detect
> multiple locks by the same thread, maybe the count of claims or
> something?

The comment at the beginning of lockw3.c suggests so.  I think the AVER
assertions are only checked in the debug build.

 * .design: These are implemented using critical sections.
 *  See the section titled "Synchronization functions" in the Groups
 *  chapter of the Microsoft Win32 API Programmer's Reference.
 *  The "Synchronization" section of the Overview is also relevant.
 *
 *  Critical sections support recursive locking, so the implementation
 *  could be trivial.  This implementation counts the claims to provide
 *  extra checking.
 *
 *  The limit on the number of recursive claims is the max of
 *  ULONG_MAX and the limit imposed by critical sections, which
 *  is believed to be about UCHAR_MAX.
 *
 *  During use the claims field is updated to remember the number of
 *  claims acquired on a lock.  This field must only be modified
 *  while we are inside the critical section.

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: SIGPROF + SIGCHLD and igc
  2024-12-28 16:46                                     ` Helmut Eller
  2024-12-28 17:35                                       ` Eli Zaretskii
@ 2024-12-28 19:32                                       ` Pip Cet via Emacs development discussions.
  2024-12-28 19:51                                         ` Helmut Eller
  1 sibling, 1 reply; 203+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-28 19:32 UTC (permalink / raw)
  To: Helmut Eller; +Cc: Eli Zaretskii, gerd.moellmann, ofv, emacs-devel, acorallo

"Helmut Eller" <eller.helmut@gmail.com> writes:

> On Sat, Dec 28 2024, Eli Zaretskii wrote:
>
>> I'm thinking about a situation where SIGPROF was delivered while it
>> was blocked.  In that case, it will be re-delivered once we unblock
>> it.
>>
>> By contrast, if we avoid delivering SIGPROF in the first place, it
>> will never be delivered until the next time SIGPROF is due.
>>
>> So imagine a function FUNC that conses some Lisp object.  This calls
>> into MPS, which blocks SIGPROF, takes the arena lock, then does its
>> thing, then releases the lock and unblocks SIGPROF.  If SIGPROF
>> happened while MPS was working with SIGPROF blocked, then the moment
>> SIGPROF is unblocked, the SIGPROF handler in the main thread will be
>> called, and will have the opportunity to see that we were executing
>> FUNC.  By contrast, if the profiler thread avoided delivering SIGPROF
>> because it saw the arena locked, the next time the profiler thread
>> decides to deliver SIGPROF, execution could have already left FUNC,
>> and thus FUNC will not be in the profile.
>>
>> I hope I made myself more clear this time.
>
> I think I see what you mean.  I imagine the profiler thread to be a loop
> like
>
>   while (true) {
>      sleep (<x-seconds>)
>      ArenaEnter (<arena>)
>        pthread_kill (SIGPROF, <main-thread>)
>        wait (<pipe>)
>      ArenaLeave (<arena>)
>   }

I'm not really following.  Did you mean to include a call to a "clear
all memory barriers" function after the ArenaEnter call?  If not, the
SIGPROF handler (and all handlers interrupting the SIGPROF handler which
aren't being delayed) would not be able to access MPS memory, which I
thought was the goal.

Pip




^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: SIGPROF + SIGCHLD and igc
  2024-12-28 19:32                                       ` Pip Cet via Emacs development discussions.
@ 2024-12-28 19:51                                         ` Helmut Eller
  2024-12-28 20:43                                           ` Pip Cet via Emacs development discussions.
  0 siblings, 1 reply; 203+ messages in thread
From: Helmut Eller @ 2024-12-28 19:51 UTC (permalink / raw)
  To: Pip Cet; +Cc: Eli Zaretskii, gerd.moellmann, ofv, emacs-devel, acorallo

On Sat, Dec 28 2024, Pip Cet wrote:

>> I think I see what you mean.  I imagine the profiler thread to be a loop
>> like
>>
>>   while (true) {
>>      sleep (<x-seconds>)
>>      ArenaEnter (<arena>)
>>        pthread_kill (SIGPROF, <main-thread>)
>>        wait (<pipe>)
>>      ArenaLeave (<arena>)
>>   }
>
> I'm not really following.  Did you mean to include a call to a "clear
> all memory barriers" function after the ArenaEnter call?  If not, the
> SIGPROF handler (and all handlers interrupting the SIGPROF handler which
> aren't being delayed) would not be able to access MPS memory, which I
> thought was the goal.

In my mind it works like this: when the SIGPROF handler tries to access
MPS memory, the SIGSEGV handler kicks in as it usually would in a
non-signal handler context.  This should work because at the beginning
of the SIGPROF handler we guarantee that MPS doesn't hold the arena
lock.

Helmut



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: SIGPROF + SIGCHLD and igc
  2024-12-28 19:51                                         ` Helmut Eller
@ 2024-12-28 20:43                                           ` Pip Cet via Emacs development discussions.
  2024-12-29  5:44                                             ` Eli Zaretskii
  0 siblings, 1 reply; 203+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-28 20:43 UTC (permalink / raw)
  To: Helmut Eller; +Cc: Eli Zaretskii, gerd.moellmann, ofv, emacs-devel, acorallo

"Helmut Eller" <eller.helmut@gmail.com> writes:

> On Sat, Dec 28 2024, Pip Cet wrote:
>
>>> I think I see what you mean.  I imagine the profiler thread to be a loop
>>> like
>>>
>>>   while (true) {
>>>      sleep (<x-seconds>)
>>>      ArenaEnter (<arena>)
>>>        pthread_kill (SIGPROF, <main-thread>)
>>>        wait (<pipe>)
>>>      ArenaLeave (<arena>)
>>>   }
>>
>> I'm not really following.  Did you mean to include a call to a "clear
>> all memory barriers" function after the ArenaEnter call?  If not, the
>> SIGPROF handler (and all handlers interrupting the SIGPROF handler which
>> aren't being delayed) would not be able to access MPS memory, which I
>> thought was the goal.
>
> In my mind it works like this: when the SIGPROF handler tries to access
> MPS memory, the SIGSEGV handler kicks in as it usually would in a
> non-signal handler context.  This should work because at the beginning
> of the SIGPROF handler we guarantee that MPS doesn't hold the arena
> lock.

Sorry, still not following.  The SIGPROF-sending thread holds the arena
lock.  So we can't take it in the SIGSEGV handler.  It's still a
deadlock, right?

Pip




^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: SIGPROF + SIGCHLD and igc
  2024-12-28 20:43                                           ` Pip Cet via Emacs development discussions.
@ 2024-12-29  5:44                                             ` Eli Zaretskii
  0 siblings, 0 replies; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-29  5:44 UTC (permalink / raw)
  To: Pip Cet; +Cc: eller.helmut, gerd.moellmann, ofv, emacs-devel, acorallo

> Date: Sat, 28 Dec 2024 20:43:18 +0000
> From: Pip Cet <pipcet@protonmail.com>
> Cc: Eli Zaretskii <eliz@gnu.org>, gerd.moellmann@gmail.com, ofv@wanadoo.es, emacs-devel@gnu.org, acorallo@gnu.org
> 
> "Helmut Eller" <eller.helmut@gmail.com> writes:
> 
> > On Sat, Dec 28 2024, Pip Cet wrote:
> >
> >>> I think I see what you mean.  I imagine the profiler thread to be a loop
> >>> like
> >>>
> >>>   while (true) {
> >>>      sleep (<x-seconds>)
> >>>      ArenaEnter (<arena>)
> >>>        pthread_kill (SIGPROF, <main-thread>)
> >>>        wait (<pipe>)
> >>>      ArenaLeave (<arena>)
> >>>   }
> >>
> >> I'm not really following.  Did you mean to include a call to a "clear
> >> all memory barriers" function after the ArenaEnter call?  If not, the
> >> SIGPROF handler (and all handlers interrupting the SIGPROF handler which
> >> aren't being delayed) would not be able to access MPS memory, which I
> >> thought was the goal.
> >
> > In my mind it works like this: when the SIGPROF handler tries to access
> > MPS memory, the SIGSEGV handler kicks in as it usually would in a
> > non-signal handler context.  This should work because at the beginning
> > of the SIGPROF handler we guarantee that MPS doesn't hold the arena
> > lock.
> 
> Sorry, still not following.  The SIGPROF-sending thread holds the arena
> lock.  So we can't take it in the SIGSEGV handler.  It's still a
> deadlock, right?

But the SIGPROF handler doesn't take the arena lock, it starts by
writing to the pipe, which then causes the SIGPROF-sending thread to
release the lock, thus letting the SIGPROF handler touch memory which
could trigger MPS into taking the lock.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-23 19:00                 ` Eli Zaretskii
  2024-12-23 19:37                   ` Eli Zaretskii
  2024-12-23 20:49                   ` Gerd Möllmann
@ 2024-12-23 23:37                   ` Pip Cet via Emacs development discussions.
  2024-12-24  4:03                     ` Gerd Möllmann
  2024-12-24 12:11                     ` Eli Zaretskii
  2 siblings, 2 replies; 203+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-23 23:37 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: Gerd Möllmann, ofv, emacs-devel, eller.helmut, acorallo

"Eli Zaretskii" <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: Eli Zaretskii <eliz@gnu.org>,  ofv@wanadoo.es,  emacs-devel@gnu.org,
>>   eller.helmut@gmail.com,  acorallo@gnu.org
>> Date: Mon, 23 Dec 2024 18:44:42 +0100
>>
>> BTW, do you know which signal handlers use Lisp, i.e. allocate Lisp
>> objects or access some? All? Or, would it be realistic to rewrite signal
>> handlers to not do that?

I think there are several questions hiding behind the first question
mark:

1. which signal handlers want to read Lisp data
2. which signal handlers want to write Lisp data
3. which signal handlers want to allocate Lisp objects temporarily,
while guaranteeing no references to those objects survive when the
signal handler returns.
4. which signal handlers want to allocate Lisp objects permanently,
storing references to the new objects in "old" data
4a. ... and are willing to call a special transformation function to do
so
4b. ... and want to do so implicitly, expecting memory manipulation to
"just work".

1: definitely works
2: should work, but may hit a write barrier
3: could be made to work if there's interest
4a: if we must
4b: see the other thread.  If we have both make_object_writable
(formerly CHECK_IMPURE) and commit_object_changes functions and call
them consistently, it might be possible to find a way.

> SIGPROF does (it's the basis for our Lisp profiler).

That's 1, 2, but not 3 or 4, right?

> SIGCHLD doesn't run Lisp (I think), but it examines objects and data
> structures of the Lisp machine (those related to child processes).

Just 1, then?

>> One thing I've seen done elsewhere is to publish a message to a message
>> board so that it can be handled outside of the signal handler. Something
>> like that, you know what I mean.
>
> This is tricky for the profiler, because you want to sample the
> function in which you are right there and then, not some time later.

But would it be so bad to use a copy of the specpdl stack, placed in a
prepared area which is a GC root so we'd guarantee survival (but not
immutability; I don't think that matters in practice) of entries?
memcpy is safe to call from a signal handler, and then we could do all
of the processing safely.

(My preference would be to make the specpdl stack an ambiguous root
while the profiler is in use: that way, we'd get usable backtraces even
if the SIGPROF happened during GC, which is probably more useful than
merely saying that we were in GC).

Pip

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-23 23:37                   ` Some experience with the igc branch Pip Cet via Emacs development discussions.
@ 2024-12-24  4:03                     ` Gerd Möllmann
  2024-12-24 10:25                       ` Pip Cet via Emacs development discussions.
  2024-12-24 12:26                       ` Eli Zaretskii
  2024-12-24 12:11                     ` Eli Zaretskii
  1 sibling, 2 replies; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-24  4:03 UTC (permalink / raw)
  To: Pip Cet; +Cc: Eli Zaretskii, ofv, emacs-devel, eller.helmut, acorallo

Pip Cet <pipcet@protonmail.com> writes:

> "Eli Zaretskii" <eliz@gnu.org> writes:
>
>>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>>> Cc: Eli Zaretskii <eliz@gnu.org>,  ofv@wanadoo.es,  emacs-devel@gnu.org,
>>>   eller.helmut@gmail.com,  acorallo@gnu.org
>>> Date: Mon, 23 Dec 2024 18:44:42 +0100
>>>
>>> BTW, do you know which signal handlers use Lisp, i.e. allocate Lisp
>>> objects or access some? All? Or, would it be realistic to rewrite signal
>>> handlers to not do that?
>
> I think there are several questions hiding behind the first question
> mark:
>
> 1. which signal handlers want to read Lisp data
> 2. which signal handlers want to write Lisp data
> 3. which signal handlers want to allocate Lisp objects temporarily,
> while guaranteeing no references to those objects survive when the
> signal handler returns.
> 4. which signal handlers want to allocate Lisp objects permanently,
> storing references to the new objects in "old" data
> 4a. ... and are willing to call a special transformation function to do
> so
> 4b. ... and want to do so implicitly, expecting memory manipulation to
> "just work".

New day, new beliefs :-). Today, when I read my question again, I'd
actually be surprised if a signal handler could allocate Lisp objects
because I wouldn't be able to explain how that works with alloc.c which
isn't reentrant. Not even Fcons is reentrant when I look at it now.

Correct, or am I overlooking something? Could others please check? If
it's right, things get a lot easier.

Maybe allocation of Lisp objects on the stack remains as some sort of
problem (AUTO_CONS etc)? I don't see how though, ATM.

> 1: definitely works
> 2: should work, but may hit a write barrier
> 3: could be made to work if there's interest
> 4a: if we must
> 4b: see the other thread.  If we have both make_object_writable
> (formerly CHECK_IMPURE) and commit_object_changes functions and call
> them consistently, it might be possible to find a way.
>
>> SIGPROF does (it's the basis for our Lisp profiler).
>
> That's 1, 2, but not 3 or 4, right?
>
>> SIGCHLD doesn't run Lisp (I think), but it examines objects and data
>> structures of the Lisp machine (those related to child processes).
>
> Just 1, then?
>
>>> One thing I've seen done elsewhere is to publish a message to a message
>>> board so that it can be handled outside of the signal handler. Something
>>> like that, you know what I mean.
>>
>> This is tricky for the profiler, because you want to sample the
>> function in which you are right there and then, not some time later.
>
> But would it be so bad to use a copy of the specpdl stack, placed in a
> prepared area which is a GC root so we'd guarantee survival (but not
> immutability; I don't think that matters in practice) of entries?
> memcpy is safe to call from a signal handler, and then we could do all
> of the processing safely.
>
> (My preference would be to make the specpdl stack an ambiguous root
> while the profiler is in use: that way, we'd get usable backtraces even
> if the SIGPROF happened during GC, which is probably more useful than
> merely saying that we were in GC).
>
> Pip

I'd prefer to send messages from handle_profiler_signal. Or something
equivalent to sending messages. Please see my other mail where I looked
at that function. Implementing such a message board is of course not
easy. But I think it would be easy to understand how things work once one
has something like that.

And if I'm right with what I wrote above about allocation (and I think I
am), we also don't need provisions for allocating Lisp objects from
signal handlers, which would be a great simplification.




^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-24  4:03                     ` Gerd Möllmann
@ 2024-12-24 10:25                       ` Pip Cet via Emacs development discussions.
  2024-12-24 10:50                         ` Gerd Möllmann
  2024-12-24 13:15                         ` Eli Zaretskii
  2024-12-24 12:26                       ` Eli Zaretskii
  1 sibling, 2 replies; 203+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-24 10:25 UTC (permalink / raw)
  To: Gerd Möllmann
  Cc: Eli Zaretskii, ofv, emacs-devel, eller.helmut, acorallo

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> New day, new beliefs :-). Today, when I read my question again, I'd
> actually be surprised if a signal handler could allocate Lisp objects
> because I wouldn't be able to explain how that works with alloc.c which
> isn't reentrant. Not even Fcons is reentrant when I look at it now.
>
> Correct, or am I overlooking something? Could others please check? If
> it's right, things get a lot easier.

I agree.  But Eli said something about wanting to run Lisp from a signal
handler, which would change that.  I was trying to explain why we don't
want to do that.

> Maybe allocation of Lisp objects on the stack remains as some sort of
> problem (AUTO_CONS etc)? I don't see how though, ATM.

Stack objects are always optional, so if there is code that attempts to
avoid alloc.c by using those, it's broken.

My current patch makes it so the main thread never takes the arena lock,
ever.  Performance isn't quite the same as scratch/igc: for some reason
I don't understand, it's slightly better.  Still needs cleanup,
de-pthreading, and we probably don't need to use atomic types
everywhere.

Pip

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-24 10:25                       ` Pip Cet via Emacs development discussions.
@ 2024-12-24 10:50                         ` Gerd Möllmann
  2024-12-24 13:15                         ` Eli Zaretskii
  1 sibling, 0 replies; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-24 10:50 UTC (permalink / raw)
  To: Pip Cet; +Cc: Eli Zaretskii, ofv, emacs-devel, eller.helmut, acorallo

Pip Cet <pipcet@protonmail.com> writes:

> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>
>> New day, new beliefs :-). Today, when I read my question again, I'd
>> actually be surprised if a signal handler could allocate Lisp objects
>> because I wouldn't be able to explain how that works with alloc.c which
>> isn't reentrant. Not even Fcons is reentrant when I look at it now.
>>
>> Correct, or am I overlooking something? Could others please check? If
>> it's right, things get a lot easier.
>
> I agree.  But Eli said something about wanting to run Lisp from a signal
> handler, which would change that.  I was trying to explain why we don't
> want to do that.

Thanks for checking! Must be kind of a misunderstanding going on. And
anyway, it would be a feature we don't have with the old GC, so I'd
declare it out of scope :-).

>> Maybe allocation of Lisp objects on the stack remains as some sort of
>> problem (AUTO_CONS etc)? I don't see how though, ATM.
>
> Stack objects are always optional, so if there is code that attempts to
> avoid alloc.c by using those, it's broken.

Yes!

> My current patch makes it so the main thread never takes the arena lock,
> ever.  

Hm, how and why does that work?



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-24 10:25                       ` Pip Cet via Emacs development discussions.
  2024-12-24 10:50                         ` Gerd Möllmann
@ 2024-12-24 13:15                         ` Eli Zaretskii
  1 sibling, 0 replies; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-24 13:15 UTC (permalink / raw)
  To: Pip Cet; +Cc: gerd.moellmann, ofv, emacs-devel, eller.helmut, acorallo

> Date: Tue, 24 Dec 2024 10:25:38 +0000
> From: Pip Cet <pipcet@protonmail.com>
> Cc: Eli Zaretskii <eliz@gnu.org>, ofv@wanadoo.es, emacs-devel@gnu.org, eller.helmut@gmail.com, acorallo@gnu.org
> 
> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
> 
> > New day, new beliefs :-). Today, when I read my question again, I'd
> > actually be surprised if a signal handler could allocate Lisp objects
> > because I wouldn't be able to explain how that works with alloc.c which
> > isn't reentrant. Not even Fcons is reentrant when I look at it now.
> >
> > Correct, or am I overlooking something? Could others please check? If
> > it's right, things get a lot easier.
> 
> I agree.  But Eli said something about wanting to run Lisp from a signal
> handler, which would change that.

Not Lisp, but the Lisp machine in general.  Which includes access to
Lisp data to read and write it.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-24  4:03                     ` Gerd Möllmann
  2024-12-24 10:25                       ` Pip Cet via Emacs development discussions.
@ 2024-12-24 12:26                       ` Eli Zaretskii
  2024-12-24 12:56                         ` Gerd Möllmann
  1 sibling, 1 reply; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-24 12:26 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: pipcet, ofv, emacs-devel, eller.helmut, acorallo

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: Eli Zaretskii <eliz@gnu.org>,  ofv@wanadoo.es,  emacs-devel@gnu.org,
>   eller.helmut@gmail.com,  acorallo@gnu.org
> Date: Tue, 24 Dec 2024 05:03:36 +0100
> 
> I'd prefer to send messages from handle_profiler_signal. Or something
> equivalent to sending messages.

How would that be different?  If the messages arrive asynchronously
and are handled asynchronously, that's the moral equivalent of
signals, no?  If the messages are not handled asynchronously, how do
we make sure the obtained profile is accurate?



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-24 12:26                       ` Eli Zaretskii
@ 2024-12-24 12:56                         ` Gerd Möllmann
  2024-12-24 13:19                           ` Pip Cet via Emacs development discussions.
  2024-12-24 13:46                           ` Eli Zaretskii
  0 siblings, 2 replies; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-24 12:56 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, ofv, emacs-devel, eller.helmut, acorallo

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: Eli Zaretskii <eliz@gnu.org>,  ofv@wanadoo.es,  emacs-devel@gnu.org,
>>   eller.helmut@gmail.com,  acorallo@gnu.org
>> Date: Tue, 24 Dec 2024 05:03:36 +0100
>> 
>> I'd prefer to send messages from handle_profiler_signal. Or something
>> equivalent to sending messages.
>
> How would that be different?  If the messages arrive asynchronously
> and are handled asynchronously, that's the moral equivalent of
> signals, no? 

I'm using SIGPROF below to make it more concrete. Similar for other
signals.

The idea is to get the backtrace in the SIGPROF handler, without
accessing Lisp data. That can be done, as I've tried to show.
Then place that backtrace somewhere.

In an an actor model architecture, one would use a message that contains
the backtrace and post it to a message board. I used that architecture
just as an example, because I like it a lot. In the same architecture,
typically a scheduler thread would then assign a thread to handle the
message. The handler handling the profiler message would then do what
record_backtrace today does after get_backtrace, i.e. count same
backtraces. 

That's only one example architectures, of course. One can use something
else, like queues that are handled by another thread, one doesn't need a
scheduler thread, and so on, and so on. Pip's work queue is an
example.

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-24 12:56                         ` Gerd Möllmann
@ 2024-12-24 13:19                           ` Pip Cet via Emacs development discussions.
  2024-12-24 13:38                             ` Gerd Möllmann
  2024-12-24 13:46                           ` Eli Zaretskii
  1 sibling, 1 reply; 203+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-24 13:19 UTC (permalink / raw)
  To: Gerd Möllmann
  Cc: Eli Zaretskii, ofv, emacs-devel, eller.helmut, acorallo

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> I'm using SIGPROF below to make it more concrete. Similar for other
> signals.
>
> The idea is to get the backtrace in the SIGPROF handler, without
> accessing Lisp data. That can be done, as I've tried to show.

I don't understand.  We need to access the specpdl, which I consider
Lisp data, and certainly the backtrace includes data which can only be
generated using MPS-managed memory.

> Then place that backtrace somewhere.

I still think it's better to copy the specpdl, since that allows us to
generate the "backtrace" (whatever we choose to use for that) in Lisp.
If we spend too much time allocating short-lived data which triggers too
many GCs, we want to know what to fix in the Lisp code.

Honestly, though, it doesn't matter much, does it?

> That's only one example architectures, of course. One can use something
> else, like queues that are handled by another thread, one doesn't need a
> scheduler thread, and so on, and so on. Pip's work queue is an
> example.

That's Helmut's code, not mine.

Pip

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-24 13:19                           ` Pip Cet via Emacs development discussions.
@ 2024-12-24 13:38                             ` Gerd Möllmann
  0 siblings, 0 replies; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-24 13:38 UTC (permalink / raw)
  To: Pip Cet; +Cc: Eli Zaretskii, ofv, emacs-devel, eller.helmut, acorallo

Pip Cet <pipcet@protonmail.com> writes:

> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>
>> I'm using SIGPROF below to make it more concrete. Similar for other
>> signals.
>>
>> The idea is to get the backtrace in the SIGPROF handler, without
>> accessing Lisp data. That can be done, as I've tried to show.
>
> I don't understand.  We need to access the specpdl, which I consider
> Lisp data, and certainly the backtrace includes data which can only be
> generated using MPS-managed memory.

What I meant with it not being Lisp data is union specbinding. The stack
of bindings is a root, and doesn't have a barrier. And accessing the
stack is not a problem because PVEC_THREAD is allocated from AMS which
doesn't have barriers. What we collect in get_backtrace is an array of
Lisp_Objects for the functions, and that's okay.

>
>> Then place that backtrace somewhere.
>
> I still think it's better to copy the specpdl, since that allows us to
> generate the "backtrace" (whatever we choose to use for that) in Lisp.
> If we spend too much time allocating short-lived data which triggers too
> many GCs, we want to know what to fix in the Lisp code.

In a way, what get_backtrace does is copy part of the bindings stack,
only the functions. The resulting backtrace that the user sees could
be done in Lisp, maybe, don't know. Important part for me is that we get
out of the signal handler to do stuff.

> Honestly, though, it doesn't matter much, does it?

Right, it's all details.

>
>> That's only one example architectures, of course. One can use something
>> else, like queues that are handled by another thread, one doesn't need a
>> scheduler thread, and so on, and so on. Pip's work queue is an
>> example.
>
> That's Helmut's code, not mine.

+2👍 to Helmut, -👍 to Pip, -👍 to me :-)



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-24 12:56                         ` Gerd Möllmann
  2024-12-24 13:19                           ` Pip Cet via Emacs development discussions.
@ 2024-12-24 13:46                           ` Eli Zaretskii
  2024-12-24 14:12                             ` Gerd Möllmann
  1 sibling, 1 reply; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-24 13:46 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: pipcet, ofv, emacs-devel, eller.helmut, acorallo

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: pipcet@protonmail.com,  ofv@wanadoo.es,  emacs-devel@gnu.org,
>   eller.helmut@gmail.com,  acorallo@gnu.org
> Date: Tue, 24 Dec 2024 13:56:18 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> >> Cc: Eli Zaretskii <eliz@gnu.org>,  ofv@wanadoo.es,  emacs-devel@gnu.org,
> >>   eller.helmut@gmail.com,  acorallo@gnu.org
> >> Date: Tue, 24 Dec 2024 05:03:36 +0100
> >> 
> >> I'd prefer to send messages from handle_profiler_signal. Or something
> >> equivalent to sending messages.
> >
> > How would that be different?  If the messages arrive asynchronously
> > and are handled asynchronously, that's the moral equivalent of
> > signals, no? 
> 
> I'm using SIGPROF below to make it more concrete. Similar for other
> signals.
> 
> The idea is to get the backtrace in the SIGPROF handler, without
> accessing Lisp data. That can be done, as I've tried to show.
> Then place that backtrace somewhere.

Let's be more accurate: when I said "Lisp data", I actually meant any
data that is part of the Lisp machine's global state.  That's because
you cannot safely access that state while the Lisp machine runs (and
modifies the state).  You need the Lisp machine stopped in its tracks.
Agreed?

Now, with that definition, isn't specpdl stack part of "Lisp data"?
If so, and if we can safely access it from a signal handler, why do we
need to move it aside at all?  And how would the "message handler" be
different in that aspect from a signal hanlder?

> In an an actor model architecture, one would use a message that contains
> the backtrace and post it to a message board. I used that architecture
> just as an example, because I like it a lot. In the same architecture,
> typically a scheduler thread would then assign a thread to handle the
> message. The handler handling the profiler message would then do what
> record_backtrace today does after get_backtrace, i.e. count same
> backtraces. 

What is the purpose of delaying the part of record_backtrace after
get_backtrace to later?  Is the counting it does dangerous when done
from a signal handler?

> That's only one example architectures, of course. One can use something
> else, like queues that are handled by another thread, one doesn't need a
> scheduler thread, and so on, and so on. Pip's work queue is an
> example.

Doing this from another thread raises the problem I describe above: we
need the Lisp thread(s) stopped, because you cannot examine the data
of the Lisp machine while the machine is running.  And if we stop the
Lisp threads, why do we need the other thread at all?

I guess we are tossing ideas without sufficient detail, so each one
understands something different from each idea (since we have
different backgrounds and experiences).  My suggestion is that to
describe each idea in enough detail to make the design and its
implications clear to all.  A kind of DR, if you want.  Then we will
be on the same page, and can have an effective discussion of the
various ideas.

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-24 13:46                           ` Eli Zaretskii
@ 2024-12-24 14:12                             ` Gerd Möllmann
  2024-12-24 14:40                               ` Eli Zaretskii
  2024-12-24 21:18                               ` Pip Cet via Emacs development discussions.
  0 siblings, 2 replies; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-24 14:12 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, ofv, emacs-devel, eller.helmut, acorallo

Eli Zaretskii <eliz@gnu.org> writes:

>> 
>> I'm using SIGPROF below to make it more concrete. Similar for other
>> signals.
>> 
>> The idea is to get the backtrace in the SIGPROF handler, without
>> accessing Lisp data. That can be done, as I've tried to show.
>> Then place that backtrace somewhere.
>
> Let's be more accurate: when I said "Lisp data", I actually meant any
> data that is part of the Lisp machine's global state.  That's because
> you cannot safely access that state while the Lisp machine runs (and
> modifies the state).  You need the Lisp machine stopped in its tracks.
> Agreed?

Ok, let's use that definition.

> Now, with that definition, isn't specpdl stack part of "Lisp data"?
> If so, and if we can safely access it from a signal handler, why do we
> need to move it aside at all?  And how would the "message handler" be
> different in that aspect from a signal hanlder?

We're coming from the problem that MPS uses signals for memory barriers.
On platforms != macOS. And I am proposing a solution for that.

The SIGPROF handler does two things: (1) get the current backtrace,
which does not trip on memory barriers, and (2) build a summary, i.e.
count same backtraces using a hash table. (2) trips on memory barriers.

So, my proposal, is to do (1) in the signal handler and do (2)
elsewhere, not in the signal handler. Where (2) is done is a matter of
design. If we use Helmut's work queue, it would be the main thread, I
suppose.

In any case we're in "normal" multi-threading territory, with the usual
restrictions and so on, but these are restrictions Emacs has. And we
don't need anything from MPS, which might or might not be possible to
get.

>
>> In an an actor model architecture, one would use a message that contains
>> the backtrace and post it to a message board. I used that architecture
>> just as an example, because I like it a lot. In the same architecture,
>> typically a scheduler thread would then assign a thread to handle the
>> message. The handler handling the profiler message would then do what
>> record_backtrace today does after get_backtrace, i.e. count same
>> backtraces. 
>
> What is the purpose of delaying the part of record_backtrace after
> get_backtrace to later?  Is the counting it does dangerous when done
> from a signal handler?

That part (2) which can trip on memory barriers because it accesses
MPS-managed memory like vectors and so on.

>
>> That's only one example architectures, of course. One can use something
>> else, like queues that are handled by another thread, one doesn't need a
>> scheduler thread, and so on, and so on. Pip's work queue is an
>> example.
>
> Doing this from another thread raises the problem I describe above: we
> need the Lisp thread(s) stopped, because you cannot examine the data
> of the Lisp machine while the machine is running.  And if we stop the
> Lisp threads, why do we need the other thread at all?
>
> I guess we are tossing ideas without sufficient detail, so each one
> understands something different from each idea (since we have
> different backgrounds and experiences).  My suggestion is that to
> describe each idea in enough detail to make the design and its
> implications clear to all.  A kind of DR, if you want.  Then we will
> be on the same page, and can have an effective discussion of the
> various ideas.

I hope the above helps. Please understand that I'm not proposing a
ready-made design, but mainly recommend moving (2) out of the signal
handler. Sorry if that was too abstract so far, I guess that's just the
way I'm thinking.

If it helps, maybe we should concentrate on solving this with Helmut's
work queue. Put the backtrace from (1) in the work queue, then do (2)
where the work queue is processed. Something like that. 

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-24 14:12                             ` Gerd Möllmann
@ 2024-12-24 14:40                               ` Eli Zaretskii
  2024-12-25  4:56                                 ` Gerd Möllmann
  2024-12-24 21:18                               ` Pip Cet via Emacs development discussions.
  1 sibling, 1 reply; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-24 14:40 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: pipcet, ofv, emacs-devel, eller.helmut, acorallo

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: pipcet@protonmail.com,  ofv@wanadoo.es,  emacs-devel@gnu.org,
>   eller.helmut@gmail.com,  acorallo@gnu.org
> Date: Tue, 24 Dec 2024 15:12:40 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Now, with that definition, isn't specpdl stack part of "Lisp data"?
> > If so, and if we can safely access it from a signal handler, why do we
> > need to move it aside at all?  And how would the "message handler" be
> > different in that aspect from a signal hanlder?
> 
> We're coming from the problem that MPS uses signals for memory barriers.
> On platforms != macOS. And I am proposing a solution for that.
> 
> The SIGPROF handler does two things: (1) get the current backtrace,
> which does not trip on memory barriers, and (2) build a summary, i.e.
> count same backtraces using a hash table. (2) trips on memory barriers.

Can you elaborate on (2) and why it trips?  I guess I'm missing
something because I don't understand which code in record_backtrace
does trip on memory barriers and why.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-24 14:40                               ` Eli Zaretskii
@ 2024-12-25  4:56                                 ` Gerd Möllmann
  2024-12-25 12:19                                   ` Eli Zaretskii
  0 siblings, 1 reply; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-25  4:56 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, ofv, emacs-devel, eller.helmut, acorallo

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: pipcet@protonmail.com,  ofv@wanadoo.es,  emacs-devel@gnu.org,
>>   eller.helmut@gmail.com,  acorallo@gnu.org
>> Date: Tue, 24 Dec 2024 15:12:40 +0100
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> > Now, with that definition, isn't specpdl stack part of "Lisp data"?
>> > If so, and if we can safely access it from a signal handler, why do we
>> > need to move it aside at all?  And how would the "message handler" be
>> > different in that aspect from a signal hanlder?
>> 
>> We're coming from the problem that MPS uses signals for memory barriers.
>> On platforms != macOS. And I am proposing a solution for that.
>> 
>> The SIGPROF handler does two things: (1) get the current backtrace,
>> which does not trip on memory barriers, and (2) build a summary, i.e.
>> count same backtraces using a hash table. (2) trips on memory barriers.
>
> Can you elaborate on (2) and why it trips?  I guess I'm missing
> something because I don't understand which code in record_backtrace
> does trip on memory barriers and why.

Ok, (2) begins as shown below.

  static void
  record_backtrace (struct profiler_log *plog, EMACS_INT count)
  {
    log_t *log = plog->log;
    get_backtrace (log->trace, log->depth);
  --- (2) begins after this line -------------------------------
    EMACS_UINT hash = trace_hash (log->trace, log->depth);

The SIGPROF can have interrupted Emacs at any point, both the MPS thread
and all others. MPS may have been doing arbitrary stuff when
interrupted, and Emacs threads too. Memory barriers may be on
unpredictable segments of memory, as they usually are, as part of MPS'
GC implementation. Do you agree with this picture?

Elsewhere I tried to explain why I think this works up to the line
marked (2) above. Now enter trace_hash. Current implementation:

  static EMACS_UINT
  trace_hash (Lisp_Object *trace, int depth)
  {
    EMACS_UINT hash = 0;
    for (int i = 0; i < depth; i++)
      {
        Lisp_Object f = trace[i];
        EMACS_UINT hash1;
  #ifdef HAVE_MPS
        hash1 = (CLOSUREP (f) ? igc_hash (AREF (f, CLOSURE_CODE)) : igc_hash (f));
                 ^^^^^^^^       ^^^^^^^^  ^^^^

The constructs I marked with ^^^ all access the memory of F. F is a
vectorlike, it's memory is managed by MPS in an MPS pool that uses
memory barriers, so the memory of F can currently be behind a barrier.
It doesn't have to, but it can.

When we access F's memory and it is behind a barrier, the result is a
nested SIgSEGV while handling SIGPROF.

More code accessing memory that is potentially behind a barrier follows
in record_backtrace.

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-25  4:56                                 ` Gerd Möllmann
@ 2024-12-25 12:19                                   ` Eli Zaretskii
  2024-12-25 12:50                                     ` Gerd Möllmann
  0 siblings, 1 reply; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-25 12:19 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: pipcet, ofv, emacs-devel, eller.helmut, acorallo

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: pipcet@protonmail.com,  ofv@wanadoo.es,  emacs-devel@gnu.org,
>   eller.helmut@gmail.com,  acorallo@gnu.org
> Date: Wed, 25 Dec 2024 05:56:26 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> The SIGPROF handler does two things: (1) get the current backtrace,
> >> which does not trip on memory barriers, and (2) build a summary, i.e.
> >> count same backtraces using a hash table. (2) trips on memory barriers.
> >
> > Can you elaborate on (2) and why it trips?  I guess I'm missing
> > something because I don't understand which code in record_backtrace
> > does trip on memory barriers and why.
> 
> Ok, (2) begins as shown below.
> 
>   static void
>   record_backtrace (struct profiler_log *plog, EMACS_INT count)
>   {
>     log_t *log = plog->log;
>     get_backtrace (log->trace, log->depth);
>   --- (2) begins after this line -------------------------------
>     EMACS_UINT hash = trace_hash (log->trace, log->depth);
> 
> The SIGPROF can have interrupted Emacs at any point, both the MPS thread
> and all others. MPS may have been doing arbitrary stuff when
> interrupted, and Emacs threads too. Memory barriers may be on
> unpredictable segments of memory, as they usually are, as part of MPS'
> GC implementation. Do you agree with this picture?
> 
> Elsewhere I tried to explain why I think this works up to the line
> marked (2) above. Now enter trace_hash. Current implementation:
> 
>   static EMACS_UINT
>   trace_hash (Lisp_Object *trace, int depth)
>   {
>     EMACS_UINT hash = 0;
>     for (int i = 0; i < depth; i++)
>       {
>         Lisp_Object f = trace[i];
>         EMACS_UINT hash1;
>   #ifdef HAVE_MPS
>         hash1 = (CLOSUREP (f) ? igc_hash (AREF (f, CLOSURE_CODE)) : igc_hash (f));
>                  ^^^^^^^^       ^^^^^^^^  ^^^^
> 
> The constructs I marked with ^^^ all access the memory of F. F is a
> vectorlike, it's memory is managed by MPS in an MPS pool that uses
> memory barriers, so the memory of F can currently be behind a barrier.
> It doesn't have to, but it can.
> 
> When we access F's memory and it is behind a barrier, the result is a
> nested SIgSEGV while handling SIGPROF.

Two followup questions:

  . how is accessing F different from accessing the specpdl stack?
  . how does this work with the current GC, where F could have been
    collected and its memory freed?

The first question is more important, from where I stand.  Looking
forward beyond the point where we land igc on master, I wonder how
will be able to tell, for a random non-trivial change on the C level,
whether what it does can cause trouble with MPS?  That is, how can a
mere mortal determine whether a given data structure in igc Emacs can
or cannot be safely touched when MPS happens to do its thing, whether
synchronously or asynchronously?  We must have some reasonably
practical way of telling this, or else we will be breaking Emacs high
and low.

> More code accessing memory that is potentially behind a barrier follows
> in record_backtrace.

Which code is that?  (It's a serious question: I tried to identify
that code, but couldn't.  I'm probably missing something.)



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-25 12:19                                   ` Eli Zaretskii
@ 2024-12-25 12:50                                     ` Gerd Möllmann
  2024-12-25 13:00                                       ` Eli Zaretskii
                                                         ` (2 more replies)
  0 siblings, 3 replies; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-25 12:50 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, ofv, emacs-devel, eller.helmut, acorallo

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: pipcet@protonmail.com,  ofv@wanadoo.es,  emacs-devel@gnu.org,
>>   eller.helmut@gmail.com,  acorallo@gnu.org
>> Date: Wed, 25 Dec 2024 05:56:26 +0100
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> >> The SIGPROF handler does two things: (1) get the current backtrace,
>> >> which does not trip on memory barriers, and (2) build a summary, i.e.
>> >> count same backtraces using a hash table. (2) trips on memory barriers.
>> >
>> > Can you elaborate on (2) and why it trips?  I guess I'm missing
>> > something because I don't understand which code in record_backtrace
>> > does trip on memory barriers and why.
>> 
>> Ok, (2) begins as shown below.
>> 
>>   static void
>>   record_backtrace (struct profiler_log *plog, EMACS_INT count)
>>   {
>>     log_t *log = plog->log;
>>     get_backtrace (log->trace, log->depth);
>>   --- (2) begins after this line -------------------------------
>>     EMACS_UINT hash = trace_hash (log->trace, log->depth);
>> 
>> The SIGPROF can have interrupted Emacs at any point, both the MPS thread
>> and all others. MPS may have been doing arbitrary stuff when
>> interrupted, and Emacs threads too. Memory barriers may be on
>> unpredictable segments of memory, as they usually are, as part of MPS'
>> GC implementation. Do you agree with this picture?
>> 
>> Elsewhere I tried to explain why I think this works up to the line
>> marked (2) above. Now enter trace_hash. Current implementation:
>> 
>>   static EMACS_UINT
>>   trace_hash (Lisp_Object *trace, int depth)
>>   {
>>     EMACS_UINT hash = 0;
>>     for (int i = 0; i < depth; i++)
>>       {
>>         Lisp_Object f = trace[i];
>>         EMACS_UINT hash1;
>>   #ifdef HAVE_MPS
>>         hash1 = (CLOSUREP (f) ? igc_hash (AREF (f, CLOSURE_CODE)) : igc_hash (f));
>>                  ^^^^^^^^       ^^^^^^^^  ^^^^
>> 
>> The constructs I marked with ^^^ all access the memory of F. F is a
>> vectorlike, it's memory is managed by MPS in an MPS pool that uses
>> memory barriers, so the memory of F can currently be behind a barrier.
>> It doesn't have to, but it can.
>> 
>> When we access F's memory and it is behind a barrier, the result is a
>> nested SIgSEGV while handling SIGPROF.
>
> Two followup questions:
>
>   . how is accessing F different from accessing the specpdl stack?

F's memory is allocated from an MPS pool via alloc_impl in igc.c. Most
objects are allocated from a pool that uses barriers (I think except
PVEC_THREAD). The specpdl stacks are mallocs (see
grow_specpdl_allocation), and uses as a roots. There are currently no
barriers on roots.

>   . how does this work with the current GC, where F could have been
>     collected and its memory freed?

I think when we find F in a specpdl stack, GC should have seen it and
marked it too in mark_specpdl. So it wouldn't be freed.

(Same for igc, where the stacks are roots, and should have seen F in
that way in scan_specdl.)

> The first question is more important, from where I stand.  Looking
> forward beyond the point where we land igc on master, I wonder how
> will be able to tell, for a random non-trivial change on the C level,
> whether what it does can cause trouble with MPS?  That is, how can a
> mere mortal determine whether a given data structure in igc Emacs can
> or cannot be safely touched when MPS happens to do its thing, whether
> synchronously or asynchronously?  We must have some reasonably
> practical way of telling this, or else we will be breaking Emacs high
> and low.
>
>> More code accessing memory that is potentially behind a barrier follows
>> in record_backtrace.
>
> Which code is that?  (It's a serious question: I tried to identify
> that code, but couldn't.  I'm probably missing something.)

The example I saw, with ^^^^ marking the call sites:

static void
record_backtrace (struct profiler_log *plog, EMACS_INT count)
{
  log_t *log = plog->log;
  get_backtrace (log->trace, log->depth);
  EMACS_UINT hash = trace_hash (log->trace, log->depth);
  int hidx = log_hash_index (log, hash);
  int idx = log->index[hidx];
  while (idx >= 0)
    {
      if (log->hash[idx] == hash
	  && trace_equal (log->trace, get_key_vector (log, idx), log->depth))
             ^^^^^^^^^^^

static bool
trace_equal (Lisp_Object *bt1, Lisp_Object *bt2, int depth)
{
  for (int i = 0; i < depth; i++)
    if (!BASE_EQ (bt1[i], bt2[i]) && NILP (Ffunction_equal (bt1[i], bt2[i])))
                                           ^^^^^^^^^^^^^^^

DEFUN ("function-equal", Ffunction_equal, Sfunction_equal, 2, 2, 0,
       doc: /* Return non-nil if F1 and F2 come from the same source.
Used to determine if different closures are just different instances of
the same lambda expression, or are really unrelated function.  */)
     (Lisp_Object f1, Lisp_Object f2)
{
  bool res;
  if (EQ (f1, f2))
    res = true;
  else if (CLOSUREP (f1) && CLOSUREP (f2))
           ^^^^^^^^         ^^^^^^^^
    res = EQ (AREF (f1, CLOSURE_CODE), AREF (f2, CLOSURE_CODE));
              ^^^^                     ^^^^

Didn't look further than that, though.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-25 12:50                                     ` Gerd Möllmann
@ 2024-12-25 13:00                                       ` Eli Zaretskii
  2024-12-25 13:08                                         ` Gerd Möllmann
  2024-12-25 13:09                                       ` Eli Zaretskii
  2024-12-25 17:40                                       ` Pip Cet via Emacs development discussions.
  2 siblings, 1 reply; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-25 13:00 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: pipcet, ofv, emacs-devel, eller.helmut, acorallo

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: pipcet@protonmail.com,  ofv@wanadoo.es,  emacs-devel@gnu.org,
>   eller.helmut@gmail.com,  acorallo@gnu.org
> Date: Wed, 25 Dec 2024 13:50:37 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> More code accessing memory that is potentially behind a barrier follows
> >> in record_backtrace.
> >
> > Which code is that?  (It's a serious question: I tried to identify
> > that code, but couldn't.  I'm probably missing something.)
> 
> The example I saw, with ^^^^ marking the call sites:
> 
> static void
> record_backtrace (struct profiler_log *plog, EMACS_INT count)
> {
>   log_t *log = plog->log;
>   get_backtrace (log->trace, log->depth);
>   EMACS_UINT hash = trace_hash (log->trace, log->depth);
>   int hidx = log_hash_index (log, hash);
>   int idx = log->index[hidx];
>   while (idx >= 0)
>     {
>       if (log->hash[idx] == hash
> 	  && trace_equal (log->trace, get_key_vector (log, idx), log->depth))
>              ^^^^^^^^^^^
> 
> static bool
> trace_equal (Lisp_Object *bt1, Lisp_Object *bt2, int depth)
> {
>   for (int i = 0; i < depth; i++)
>     if (!BASE_EQ (bt1[i], bt2[i]) && NILP (Ffunction_equal (bt1[i], bt2[i])))
>                                            ^^^^^^^^^^^^^^^
> 
> DEFUN ("function-equal", Ffunction_equal, Sfunction_equal, 2, 2, 0,
>        doc: /* Return non-nil if F1 and F2 come from the same source.
> Used to determine if different closures are just different instances of
> the same lambda expression, or are really unrelated function.  */)
>      (Lisp_Object f1, Lisp_Object f2)
> {
>   bool res;
>   if (EQ (f1, f2))
>     res = true;
>   else if (CLOSUREP (f1) && CLOSUREP (f2))
>            ^^^^^^^^         ^^^^^^^^
>     res = EQ (AREF (f1, CLOSURE_CODE), AREF (f2, CLOSURE_CODE));
>               ^^^^                     ^^^^
> 
> Didn't look further than that, though.

But CLOSUREP is just

  INLINE bool
  CLOSUREP (Lisp_Object a)
  {
    return PSEUDOVECTORP (a, PVEC_CLOSURE);
  }

And AREF is even simpler:

  INLINE Lisp_Object
  AREF (Lisp_Object array, ptrdiff_t idx)
  {
    eassert (0 <= idx && idx < gc_asize (array));
    return XVECTOR (array)->contents[idx];
  }

So why are those unsafe?  Because they access Lisp objects, or for
some other reason?



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-25 13:00                                       ` Eli Zaretskii
@ 2024-12-25 13:08                                         ` Gerd Möllmann
  2024-12-25 13:26                                           ` Eli Zaretskii
  0 siblings, 1 reply; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-25 13:08 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, ofv, emacs-devel, eller.helmut, acorallo

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: pipcet@protonmail.com,  ofv@wanadoo.es,  emacs-devel@gnu.org,
>>   eller.helmut@gmail.com,  acorallo@gnu.org
>> Date: Wed, 25 Dec 2024 13:50:37 +0100
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> >> More code accessing memory that is potentially behind a barrier follows
>> >> in record_backtrace.
>> >
>> > Which code is that?  (It's a serious question: I tried to identify
>> > that code, but couldn't.  I'm probably missing something.)
>> 
>> The example I saw, with ^^^^ marking the call sites:
>> 
>> static void
>> record_backtrace (struct profiler_log *plog, EMACS_INT count)
>> {
>>   log_t *log = plog->log;
>>   get_backtrace (log->trace, log->depth);
>>   EMACS_UINT hash = trace_hash (log->trace, log->depth);
>>   int hidx = log_hash_index (log, hash);
>>   int idx = log->index[hidx];
>>   while (idx >= 0)
>>     {
>>       if (log->hash[idx] == hash
>> 	  && trace_equal (log->trace, get_key_vector (log, idx), log->depth))
>>              ^^^^^^^^^^^
>> 
>> static bool
>> trace_equal (Lisp_Object *bt1, Lisp_Object *bt2, int depth)
>> {
>>   for (int i = 0; i < depth; i++)
>>     if (!BASE_EQ (bt1[i], bt2[i]) && NILP (Ffunction_equal (bt1[i], bt2[i])))
>>                                            ^^^^^^^^^^^^^^^
>> 
>> DEFUN ("function-equal", Ffunction_equal, Sfunction_equal, 2, 2, 0,
>>        doc: /* Return non-nil if F1 and F2 come from the same source.
>> Used to determine if different closures are just different instances of
>> the same lambda expression, or are really unrelated function.  */)
>>      (Lisp_Object f1, Lisp_Object f2)
>> {
>>   bool res;
>>   if (EQ (f1, f2))
>>     res = true;
>>   else if (CLOSUREP (f1) && CLOSUREP (f2))
>>            ^^^^^^^^         ^^^^^^^^
>>     res = EQ (AREF (f1, CLOSURE_CODE), AREF (f2, CLOSURE_CODE));
>>               ^^^^                     ^^^^
>> 
>> Didn't look further than that, though.
>
> But CLOSUREP is just
>
>   INLINE bool
>   CLOSUREP (Lisp_Object a)
>   {
>     return PSEUDOVECTORP (a, PVEC_CLOSURE);
>   }

PSEUDOVECTORP reads the vectorlike_header header from A's memory.

> And AREF is even simpler:
>
>   INLINE Lisp_Object
>   AREF (Lisp_Object array, ptrdiff_t idx)
>   {
>     eassert (0 <= idx && idx < gc_asize (array));
>     return XVECTOR (array)->contents[idx];
>   }

And AREF accesses ARRAY's memory via ->contents.

> So why are those unsafe?  Because they access Lisp objects, or for
> some other reason?

What do you mean with unsafe? We are accessing an object's memory. That
memory may potentially be protected by a barrier. I thought we agreed on
that.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-25 13:08                                         ` Gerd Möllmann
@ 2024-12-25 13:26                                           ` Eli Zaretskii
  2024-12-25 14:07                                             ` Gerd Möllmann
  0 siblings, 1 reply; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-25 13:26 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: pipcet, ofv, emacs-devel, eller.helmut, acorallo

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: pipcet@protonmail.com,  ofv@wanadoo.es,  emacs-devel@gnu.org,
>   eller.helmut@gmail.com,  acorallo@gnu.org
> Date: Wed, 25 Dec 2024 14:08:34 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > So why are those unsafe?  Because they access Lisp objects, or for
> > some other reason?
> 
> What do you mean with unsafe? We are accessing an object's memory. That
> memory may potentially be protected by a barrier.

That's what I meant by "unsafe".  I'm still wrapping my head around
this stuff, so apologies if I ask stupid questions.  Here's another
one: why accessing the same object's memory that may be protected by a
barrier OK from the main (a.k.a. "Lisp") thread, when MPS could have
meanwhile started GC asynchronously?  IOW, how is this "normal" access
to Lisp objects different from access from a signal handler?



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-25 13:26                                           ` Eli Zaretskii
@ 2024-12-25 14:07                                             ` Gerd Möllmann
  2024-12-25 14:43                                               ` Helmut Eller
  0 siblings, 1 reply; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-25 14:07 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, ofv, emacs-devel, eller.helmut, acorallo

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: pipcet@protonmail.com,  ofv@wanadoo.es,  emacs-devel@gnu.org,
>>   eller.helmut@gmail.com,  acorallo@gnu.org
>> Date: Wed, 25 Dec 2024 14:08:34 +0100
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> > So why are those unsafe?  Because they access Lisp objects, or for
>> > some other reason?
>> 
>> What do you mean with unsafe? We are accessing an object's memory. That
>> memory may potentially be protected by a barrier.
>
> That's what I meant by "unsafe".  I'm still wrapping my head around
> this stuff, so apologies if I ask stupid questions.  

No reason to apologize. We're just working on getting onto common
ground, if that's an expression.

> Here's another one: why accessing the same object's memory that may be
> protected by a barrier OK from the main (a.k.a. "Lisp") thread, when
> MPS could have meanwhile started GC asynchronously? IOW, how is this
> "normal" access to Lisp objects different from access from a signal
> handler?

Under "normal" circumstances, in the main thread say, when we access an
object that is behind a barrier, MPS gets invoked (signal, Mach
exception), does its thing, removes the barrier, and lets the
application continue. So the application doesn't notice anything.

A problem occurs only, apparently (I've not read the MPS code), when the
barrier handling code in MPS is called while being in another signal
handler like Emacs' SIGPROF handler.

I don't know what exactly the problem is in the end, in MPS. That would
be a good question for Richard Brooksby, I think.

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-25 14:07                                             ` Gerd Möllmann
@ 2024-12-25 14:43                                               ` Helmut Eller
  2024-12-25 14:59                                                 ` Eli Zaretskii
  2024-12-25 15:02                                                 ` Gerd Möllmann
  0 siblings, 2 replies; 203+ messages in thread
From: Helmut Eller @ 2024-12-25 14:43 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: Eli Zaretskii, pipcet, ofv, emacs-devel, acorallo

On Wed, Dec 25 2024, Gerd Möllmann wrote:

> A problem occurs only, apparently (I've not read the MPS code), when the
> barrier handling code in MPS is called while being in another signal
> handler like Emacs' SIGPROF handler.
>
>
> I don't know what exactly the problem is in the end, in MPS. That would
> be a good question for Richard Brooksby, I think.

The problem is probably simply that MPS uses a non-recursive lock.  The
SIGSEGV signal handler can't claim the lock when it's already claimed.

See:
https://memory-pool-system.readthedocs.io/en/latest/design/arena.html#locks

Helmut



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-25 14:43                                               ` Helmut Eller
@ 2024-12-25 14:59                                                 ` Eli Zaretskii
  2024-12-25 20:44                                                   ` Helmut Eller
  2024-12-25 15:02                                                 ` Gerd Möllmann
  1 sibling, 1 reply; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-25 14:59 UTC (permalink / raw)
  To: Helmut Eller; +Cc: gerd.moellmann, pipcet, ofv, emacs-devel, acorallo

> From: Helmut Eller <eller.helmut@gmail.com>
> Cc: Eli Zaretskii <eliz@gnu.org>,  pipcet@protonmail.com,  ofv@wanadoo.es,
>  emacs-devel@gnu.org,  acorallo@gnu.org
> Date: Wed, 25 Dec 2024 15:43:21 +0100
> 
> On Wed, Dec 25 2024, Gerd Möllmann wrote:
> 
> > A problem occurs only, apparently (I've not read the MPS code), when the
> > barrier handling code in MPS is called while being in another signal
> > handler like Emacs' SIGPROF handler.
> >
> >
> > I don't know what exactly the problem is in the end, in MPS. That would
> > be a good question for Richard Brooksby, I think.
> 
> The problem is probably simply that MPS uses a non-recursive lock.  The
> SIGSEGV signal handler can't claim the lock when it's already claimed.
> 
> See:
> https://memory-pool-system.readthedocs.io/en/latest/design/arena.html#locks

Which means we should try talking to the MPS folks so that they
provide us with the means of blocking some signals while the lock is
held.  Like some callback MPS would call when it takes the lock and
another one when it releases the lock.  I cannot imagine that Emacs is
the first program that uses signals which hits this issue.  E.g., can
MPS-based programs be built with -pg and then profiled?



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-25 14:59                                                 ` Eli Zaretskii
@ 2024-12-25 20:44                                                   ` Helmut Eller
  2024-12-26  6:29                                                     ` Eli Zaretskii
  0 siblings, 1 reply; 203+ messages in thread
From: Helmut Eller @ 2024-12-25 20:44 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gerd.moellmann, pipcet, ofv, emacs-devel, acorallo

On Wed, Dec 25 2024, Eli Zaretskii wrote:

> Which means we should try talking to the MPS folks so that they
> provide us with the means of blocking some signals while the lock is
> held.  Like some callback MPS would call when it takes the lock and
> another one when it releases the lock.

Your position is that blocking SIGPROF while MPS holds the lock is
the correct thing to do, right?



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-25 20:44                                                   ` Helmut Eller
@ 2024-12-26  6:29                                                     ` Eli Zaretskii
  2024-12-26  8:02                                                       ` Helmut Eller
  0 siblings, 1 reply; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-26  6:29 UTC (permalink / raw)
  To: Helmut Eller; +Cc: gerd.moellmann, pipcet, ofv, emacs-devel, acorallo

> From: Helmut Eller <eller.helmut@gmail.com>
> Cc: gerd.moellmann@gmail.com,  pipcet@protonmail.com,  ofv@wanadoo.es,
>   emacs-devel@gnu.org,  acorallo@gnu.org
> Date: Wed, 25 Dec 2024 21:44:21 +0100
> 
> On Wed, Dec 25 2024, Eli Zaretskii wrote:
> 
> > Which means we should try talking to the MPS folks so that they
> > provide us with the means of blocking some signals while the lock is
> > held.  Like some callback MPS would call when it takes the lock and
> > another one when it releases the lock.
> 
> Your position is that blocking SIGPROF while MPS holds the lock is
> the correct thing to do, right?

Yes, I think so.  (If you disagree, please tell why, and let's discuss
that.)  It is certainly a relatively simple thing to do.

It is also possible for MPS to somehow manage an attempt to take the
lock which is already held in a smarter way, by stopping the code
which does that until MPS releases the lock.  For example, MPS could
define a protocol for such situations not unlike the GIL protocol we
use for Lisp threads.  But that's much more complex, and I don't
necessarily expect the MPS folks to go to such lengths.

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-26  6:29                                                     ` Eli Zaretskii
@ 2024-12-26  8:02                                                       ` Helmut Eller
  2024-12-26  9:32                                                         ` Eli Zaretskii
  0 siblings, 1 reply; 203+ messages in thread
From: Helmut Eller @ 2024-12-26  8:02 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gerd.moellmann, pipcet, ofv, emacs-devel, acorallo

On Thu, Dec 26 2024, Eli Zaretskii wrote:

>> > Which means we should try talking to the MPS folks so that they
>> > provide us with the means of blocking some signals while the lock is
>> > held.  Like some callback MPS would call when it takes the lock and
>> > another one when it releases the lock.
>>
>> Your position is that blocking SIGPROF while MPS holds the lock is
>> the correct thing to do, right?
>
> Yes, I think so.  (If you disagree, please tell why, and let's discuss
> that.)  It is certainly a relatively simple thing to do.

I quite like Pip's proposal of re-installing the SIGSEGV handler with an
additional sa_mask argument to block other signals.  That would be nice
because a) we can do that without changing MPS and b) it's likely more
efficient than callbacks.

It would still be nice to simplify some signal handlers, like
handle_interrupt_signal, but with other signals blocked for SIGSEGV, it
would all be quite independent of MPS.

> It is also possible for MPS to somehow manage an attempt to take the
> lock which is already held in a smarter way, by stopping the code
> which does that until MPS releases the lock.  For example, MPS could
> define a protocol for such situations not unlike the GIL protocol we
> use for Lisp threads.  But that's much more complex, and I don't
> necessarily expect the MPS folks to go to such lengths.

I think that mps_arena_busy, which tests whether MPS holds the lock, is
quite adequate for the SIGPROF handler.  It lets us detect the situation
and increment some counter.  An advantage of blocking SIGPROF would be
that no #ifdefs HAVE_MPS would be needed.

Helmut

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-26  8:02                                                       ` Helmut Eller
@ 2024-12-26  9:32                                                         ` Eli Zaretskii
  2024-12-26 12:24                                                           ` Helmut Eller
  0 siblings, 1 reply; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-26  9:32 UTC (permalink / raw)
  To: Helmut Eller; +Cc: gerd.moellmann, pipcet, ofv, emacs-devel, acorallo

> From: Helmut Eller <eller.helmut@gmail.com>
> Cc: gerd.moellmann@gmail.com,  pipcet@protonmail.com,  ofv@wanadoo.es,
>   emacs-devel@gnu.org,  acorallo@gnu.org
> Date: Thu, 26 Dec 2024 09:02:11 +0100
> 
> On Thu, Dec 26 2024, Eli Zaretskii wrote:
> 
> >> Your position is that blocking SIGPROF while MPS holds the lock is
> >> the correct thing to do, right?
> >
> > Yes, I think so.  (If you disagree, please tell why, and let's discuss
> > that.)  It is certainly a relatively simple thing to do.
> 
> I quite like Pip's proposal of re-installing the SIGSEGV handler with an
> additional sa_mask argument to block other signals.  That would be nice
> because a) we can do that without changing MPS and b) it's likely more
> efficient than callbacks.

Are we sure doing so will solve the problem?  AFAIU, MPS can take the
lock before SIGSEGV is delivered, or without its being delivered at
all, isn't that so?

(Besides, the sa_mask way won't work on Windows, which doesn't have
sigaction and its emulation currently ignores sa_mask; we'd need to
extend the emulation, but that will still leave a small window where
the other signals are not immediately blocked after SIGSEGV.)

> It would still be nice to simplify some signal handlers, like
> handle_interrupt_signal, but with other signals blocked for SIGSEGV, it
> would all be quite independent of MPS.

Maybe.  What bothers me more is whether the signals are delivered only
to the main thread or to other threads.  AFAIU, this behavior is
system-dependent, and currently we seem to rely on the fact that the
signals is delivered to the main thread.  Given that we have other
threads, including the MPS thread, I'm not sure we have this figured
out.

> > It is also possible for MPS to somehow manage an attempt to take the
> > lock which is already held in a smarter way, by stopping the code
> > which does that until MPS releases the lock.  For example, MPS could
> > define a protocol for such situations not unlike the GIL protocol we
> > use for Lisp threads.  But that's much more complex, and I don't
> > necessarily expect the MPS folks to go to such lengths.
> 
> I think that mps_arena_busy, which tests whether MPS holds the lock, is
> quite adequate for the SIGPROF handler. It lets us detect the situation
> and increment some counter.

I'd like to have MPS folks confirm that.  The documentation of this
function seems to say we shouldn't use it "normally", only for
debugging (so perhaps in some commands in .gdbinit).  And see also
this warning in their docs:

          Warning: This function only gives a reliable result in
          single-threaded programs, and in multi-threaded programs where
          all threads but one are known to be stopped (as they are when
          the debugger is decoding the call stack in the use case
          described above).

AFAIU, Emacs doesn't fulfill the condition they define.

Also, the code we currently have on the branch doesn't just "increment
some counter", it flatly blocks the signal.

For these reasons, I'm not happy with our current usage of that
function for this purpose.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-26  9:32                                                         ` Eli Zaretskii
@ 2024-12-26 12:24                                                           ` Helmut Eller
  2024-12-26 15:23                                                             ` Eli Zaretskii
  0 siblings, 1 reply; 203+ messages in thread
From: Helmut Eller @ 2024-12-26 12:24 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gerd.moellmann, pipcet, ofv, emacs-devel, acorallo

On Thu, Dec 26 2024, Eli Zaretskii wrote:

>> > Yes, I think so.  (If you disagree, please tell why, and let's discuss
>> > that.)  It is certainly a relatively simple thing to do.
>> 
>> I quite like Pip's proposal of re-installing the SIGSEGV handler with an
>> additional sa_mask argument to block other signals.  That would be nice
>> because a) we can do that without changing MPS and b) it's likely more
>> efficient than callbacks.
>
> Are we sure doing so will solve the problem?  AFAIU, MPS can take the
> lock before SIGSEGV is delivered, or without its being delivered at
> all, isn't that so?

Ahem.  I completely forgot that.

An alternative to callbacks would be to implement our own lock module as
described here:

 https://memory-pool-system.readthedocs.io/en/latest/topic/porting.html

It would probably be a clean and efficient solution; but it would
basically be our own fork of MPS.

>> It would still be nice to simplify some signal handlers, like
>> handle_interrupt_signal, but with other signals blocked for SIGSEGV, it
>> would all be quite independent of MPS.
>
> Maybe.  What bothers me more is whether the signals are delivered only
> to the main thread or to other threads.  AFAIU, this behavior is
> system-dependent, and currently we seem to rely on the fact that the
> signals is delivered to the main thread.  Given that we have other
> threads, including the MPS thread, I'm not sure we have this figured
> out.

I thought deliver_process_signal was there to forward signals to the
main thread but you certainly know better what it does and doesn't.

Helmut



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-26 12:24                                                           ` Helmut Eller
@ 2024-12-26 15:23                                                             ` Eli Zaretskii
  2024-12-26 23:29                                                               ` Paul Eggert
  0 siblings, 1 reply; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-26 15:23 UTC (permalink / raw)
  To: Helmut Eller, Paul Eggert; +Cc: gerd.moellmann, pipcet, emacs-devel

> From: Helmut Eller <eller.helmut@gmail.com>
> Cc: gerd.moellmann@gmail.com,  pipcet@protonmail.com,  ofv@wanadoo.es,
>   emacs-devel@gnu.org,  acorallo@gnu.org
> Date: Thu, 26 Dec 2024 13:24:13 +0100
> 
> On Thu, Dec 26 2024, Eli Zaretskii wrote:
> 
> >> I quite like Pip's proposal of re-installing the SIGSEGV handler with an
> >> additional sa_mask argument to block other signals.  That would be nice
> >> because a) we can do that without changing MPS and b) it's likely more
> >> efficient than callbacks.
> >
> > Are we sure doing so will solve the problem?  AFAIU, MPS can take the
> > lock before SIGSEGV is delivered, or without its being delivered at
> > all, isn't that so?
> 
> Ahem.  I completely forgot that.
> 
> An alternative to callbacks would be to implement our own lock module as
> described here:
> 
>  https://memory-pool-system.readthedocs.io/en/latest/topic/porting.html
> 
> It would probably be a clean and efficient solution; but it would
> basically be our own fork of MPS.

Probably.  I still think we should talk to the MPS folks and hear what
they suggest.

> >> It would still be nice to simplify some signal handlers, like
> >> handle_interrupt_signal, but with other signals blocked for SIGSEGV, it
> >> would all be quite independent of MPS.
> >
> > Maybe.  What bothers me more is whether the signals are delivered only
> > to the main thread or to other threads.  AFAIU, this behavior is
> > system-dependent, and currently we seem to rely on the fact that the
> > signals is delivered to the main thread.  Given that we have other
> > threads, including the MPS thread, I'm not sure we have this figured
> > out.
> 
> I thought deliver_process_signal was there to forward signals to the
> main thread but you certainly know better what it does and doesn't.

Actually, we need Paul Eggert to chime in, because he knows much more
about this than I do.  We have arrangements for when a signal is
delivered to a thread, but I think Paul said this should rarely if
ever happen.  My bother is what if the signal is delivered when the
MPS thread runs.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-26 15:23                                                             ` Eli Zaretskii
@ 2024-12-26 23:29                                                               ` Paul Eggert
  2024-12-27  7:57                                                                 ` Eli Zaretskii
  0 siblings, 1 reply; 203+ messages in thread
From: Paul Eggert @ 2024-12-26 23:29 UTC (permalink / raw)
  To: Eli Zaretskii, Helmut Eller; +Cc: gerd.moellmann, pipcet, emacs-devel

On 2024-12-26 07:23, Eli Zaretskii wrote:
>> From: Helmut Eller <eller.helmut@gmail.com>
>> I thought deliver_process_signal was there to forward signals to the
>> main thread but you certainly know better what it does and doesn't.
> 
> Actually, we need Paul Eggert to chime in, because he knows much more
> about this than I do.  We have arrangements for when a signal is
> delivered to a thread, but I think Paul said this should rarely if
> ever happen.

Helmut's right: deliver_process_signal arranges for the handler to be 
called in the main thread even if the thread was delivered to some other 
thread.

And to some extent you're right, too, on GNU/Linux, where historically 
this rarely happened unless the signal was blocked in the main thread. 
That part of the Linux kernel has evolved, though, and I don't know 
whether this is still true. However, whether it's true shouldn't affect 
the correctness of deliver_process_signal.

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-26 23:29                                                               ` Paul Eggert
@ 2024-12-27  7:57                                                                 ` Eli Zaretskii
  2024-12-27 19:34                                                                   ` Paul Eggert
  0 siblings, 1 reply; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-27  7:57 UTC (permalink / raw)
  To: Paul Eggert; +Cc: eller.helmut, gerd.moellmann, pipcet, emacs-devel

> Date: Thu, 26 Dec 2024 15:29:29 -0800
> Cc: gerd.moellmann@gmail.com, pipcet@protonmail.com, emacs-devel@gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> 
> On 2024-12-26 07:23, Eli Zaretskii wrote:
> >> From: Helmut Eller <eller.helmut@gmail.com>
> >> I thought deliver_process_signal was there to forward signals to the
> >> main thread but you certainly know better what it does and doesn't.
> > 
> > Actually, we need Paul Eggert to chime in, because he knows much more
> > about this than I do.  We have arrangements for when a signal is
> > delivered to a thread, but I think Paul said this should rarely if
> > ever happen.
> 
> Helmut's right: deliver_process_signal arranges for the handler to be 
> called in the main thread even if the thread was delivered to some other 
> thread.

Is this true also on Posix systems other than GNU/Linux, though?

> And to some extent you're right, too, on GNU/Linux, where historically 
> this rarely happened unless the signal was blocked in the main thread. 
> That part of the Linux kernel has evolved, though, and I don't know 
> whether this is still true. However, whether it's true shouldn't affect 
> the correctness of deliver_process_signal.

Can you tell what happens during the short window between the signal
being delivered to a thread and until it is redirected to the main
thread?  Specifically, are the threads other than the one which git
hit by the signal still running until pthread_kill stops the main
thread?  This could be important for us to understand what could
happen if the signal hits the MPS thread or another non-Lisp thread in
Emacs.

Btw, init_signals initializes main_thread_id to the main thread of the
Emacs process, and AFAICT this is never changed during the session.
So what happens if, when the signal arrives, the main thread is stuck
in acquire_global_lock trying to take the global lock while some other
Lisp thread runs?  AFAIU, this means, among other things, that our
profiler cannot profile non-main Lisp threads, because SIGPROF will
always sample the state of the main thread?

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-27  7:57                                                                 ` Eli Zaretskii
@ 2024-12-27 19:34                                                                   ` Paul Eggert
  2024-12-28  8:06                                                                     ` Eli Zaretskii
  0 siblings, 1 reply; 203+ messages in thread
From: Paul Eggert @ 2024-12-27 19:34 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: eller.helmut, gerd.moellmann, pipcet, emacs-devel

On 12/26/24 23:57, Eli Zaretskii wrote:
>> Date: Thu, 26 Dec 2024 15:29:29 -0800
>> Cc: gerd.moellmann@gmail.com, pipcet@protonmail.com, emacs-devel@gnu.org
>> From: Paul Eggert <eggert@cs.ucla.edu>
>>
>> On 2024-12-26 07:23, Eli Zaretskii wrote:
>>>> From: Helmut Eller <eller.helmut@gmail.com>
>>>> I thought deliver_process_signal was there to forward signals to the
>>>> main thread but you certainly know better what it does and doesn't.
>>>
>>> Actually, we need Paul Eggert to chime in, because he knows much more
>>> about this than I do.  We have arrangements for when a signal is
>>> delivered to a thread, but I think Paul said this should rarely if
>>> ever happen.
>>
>> Helmut's right: deliver_process_signal arranges for the handler to be
>> called in the main thread even if the thread was delivered to some other
>> thread.
> 
> Is this true also on Posix systems other than GNU/Linux, though?

Yes, if they follow this part of the POSIX spec. I don't know of any 
that don't.

> Can you tell what happens during the short window between the signal
> being delivered to a thread and until it is redirected to the main
> thread?  Specifically, are the threads other than the one which git
> hit by the signal still running until pthread_kill stops the main
> thread?

Yes, the signal handler runs in whatever thread the OS decides to 
deliver it to. While the handler runs, all other threads proceed 
normally. If the handler happens to run in a non-main thread, all it 
does is block the signal (so that its thread isn't bothered again by the 
same signal) and then use pthread_kill to resend the signal to the main 
thread. The handler then returns, which allows the non-main thread to 
proceed; if the non-main thread was in the middle of a syscall, that 
syscall may fail with errno==EINTR.

> So what happens if, when the signal arrives, the main thread is stuck
> in acquire_global_lock trying to take the global lock while some other
> Lisp thread runs?

I don't know how Lisp threads work. But if they are OS threads, then if 
the other thread has the lock, the main thread will remain stuck until 
the other thread releases the lock.

> our profiler cannot profile non-main Lisp threads, because SIGPROF will
> always sample the state of the main thread?

Yes, but that's not the only reason. A quick look at the profiling code 
suggests that it is not thread-safe, so chaos would ensue if SIGPROF 
were not forwarded to the main thread.

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-27 19:34                                                                   ` Paul Eggert
@ 2024-12-28  8:06                                                                     ` Eli Zaretskii
  2024-12-28 20:44                                                                       ` Paul Eggert
  0 siblings, 1 reply; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-28  8:06 UTC (permalink / raw)
  To: Paul Eggert; +Cc: eller.helmut, gerd.moellmann, pipcet, emacs-devel

> Date: Fri, 27 Dec 2024 11:34:12 -0800
> Cc: eller.helmut@gmail.com, gerd.moellmann@gmail.com, pipcet@protonmail.com,
>  emacs-devel@gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> 
> > Can you tell what happens during the short window between the signal
> > being delivered to a thread and until it is redirected to the main
> > thread?  Specifically, are the threads other than the one which git
> > hit by the signal still running until pthread_kill stops the main
> > thread?
> 
> Yes, the signal handler runs in whatever thread the OS decides to 
> deliver it to. While the handler runs, all other threads proceed 
> normally. If the handler happens to run in a non-main thread, all it 
> does is block the signal (so that its thread isn't bothered again by the 
> same signal) and then use pthread_kill to resend the signal to the main 
> thread. The handler then returns, which allows the non-main thread to 
> proceed; if the non-main thread was in the middle of a syscall, that 
> syscall may fail with errno==EINTR.

And when the thread to which the OS delivered the signal calls
pthread_kill to deliver the signal to the main thread, does that stop
only the main thread?  That is, do other threads keep running?

> > So what happens if, when the signal arrives, the main thread is stuck
> > in acquire_global_lock trying to take the global lock while some other
> > Lisp thread runs?
> 
> I don't know how Lisp threads work. But if they are OS threads, then if 
> the other thread has the lock, the main thread will remain stuck until 
> the other thread releases the lock.

Really?  Why?  The main thread is stuck in taking a mutex, which AFAIU
is a system call?  Then delivering a signal to the main thread should
terminate the syscall, and the main thread should execute the handler
code for the signal it got delivered, no?  (And yes, Lisp threads are
OS threads; in particular, on Posix systems they are pthreads
threads.)

> > our profiler cannot profile non-main Lisp threads, because SIGPROF will
> > always sample the state of the main thread?
> 
> Yes, but that's not the only reason. A quick look at the profiling code 
> suggests that it is not thread-safe, so chaos would ensue if SIGPROF 
> were not forwarded to the main thread.

"Not thread-safe" in what way?  Only one Lisp thread can run at a
given time, so some thread-safe issues should not exist in that case.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-28  8:06                                                                     ` Eli Zaretskii
@ 2024-12-28 20:44                                                                       ` Paul Eggert
  2024-12-29  5:47                                                                         ` Eli Zaretskii
  0 siblings, 1 reply; 203+ messages in thread
From: Paul Eggert @ 2024-12-28 20:44 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: eller.helmut, gerd.moellmann, pipcet, emacs-devel

On 12/28/24 00:06, Eli Zaretskii wrote:

> And when the thread to which the OS delivered the signal calls
> pthread_kill to deliver the signal to the main thread, does that stop
> only the main thread?  That is, do other threads keep running?

Yes.

The main thread doesn't stop; it merely gets a signal queued for 
delivery. Eventually the main thread's signal handler will be invoked.

>> I don't know how Lisp threads work. But if they are OS threads, then if
>> the other thread has the lock, the main thread will remain stuck until
>> the other thread releases the lock.
> 
> Really?  Why?  The main thread is stuck in taking a mutex, which AFAIU
> is a system call?  Then delivering a signal to the main thread should
> terminate the syscall, and the main thread should execute the handler
> code for the signal it got delivered, no?

By "remain stuck" I meant that the main thread will remain waiting for 
the lock if the signal handler returns normally.

pthread_mutex_lock is not an EINTRish syscall: it does not fail with 
errno==EINTR when interrupted. Instead, if the signal handler returns 
normally pthread_mutex_lock resumes waiting for the mutex. In GNU/Linux, 
pthread_mutex_lock typically operates entirely in user space: no syscall 
is involved.

>> Yes, but that's not the only reason. A quick look at the profiling code
>> suggests that it is not thread-safe, so chaos would ensue if SIGPROF
>> were not forwarded to the main thread.
> 
> "Not thread-safe" in what way?  Only one Lisp thread can run at a
> given time, so some thread-safe issues should not exist in that case.

I didn't know that only one Lisp thread can run at a time. If in 
addition Lisp threads can't be preempted by other Lisp threads being 
profiled, the profiling code is quite possibly safe to run in non-main 
Lisp threads, though this should be checked by an expert in that part of 
the code.

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-28 20:44                                                                       ` Paul Eggert
@ 2024-12-29  5:47                                                                         ` Eli Zaretskii
  0 siblings, 0 replies; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-29  5:47 UTC (permalink / raw)
  To: Paul Eggert; +Cc: eller.helmut, gerd.moellmann, pipcet, emacs-devel

> Date: Sat, 28 Dec 2024 12:44:29 -0800
> Cc: eller.helmut@gmail.com, gerd.moellmann@gmail.com, pipcet@protonmail.com,
>  emacs-devel@gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> 
> On 12/28/24 00:06, Eli Zaretskii wrote:
> 
> > And when the thread to which the OS delivered the signal calls
> > pthread_kill to deliver the signal to the main thread, does that stop
> > only the main thread?  That is, do other threads keep running?
> 
> Yes.
> 
> The main thread doesn't stop; it merely gets a signal queued for 
> delivery. Eventually the main thread's signal handler will be invoked.

OK.

> >> I don't know how Lisp threads work. But if they are OS threads, then if
> >> the other thread has the lock, the main thread will remain stuck until
> >> the other thread releases the lock.
> > 
> > Really?  Why?  The main thread is stuck in taking a mutex, which AFAIU
> > is a system call?  Then delivering a signal to the main thread should
> > terminate the syscall, and the main thread should execute the handler
> > code for the signal it got delivered, no?
> 
> By "remain stuck" I meant that the main thread will remain waiting for 
> the lock if the signal handler returns normally.
> 
> pthread_mutex_lock is not an EINTRish syscall: it does not fail with 
> errno==EINTR when interrupted. Instead, if the signal handler returns 
> normally pthread_mutex_lock resumes waiting for the mutex. In GNU/Linux, 
> pthread_mutex_lock typically operates entirely in user space: no syscall 
> is involved.

OK, then this is just terminology differences.  What you describe is
the expected result.

> >> Yes, but that's not the only reason. A quick look at the profiling code
> >> suggests that it is not thread-safe, so chaos would ensue if SIGPROF
> >> were not forwarded to the main thread.
> > 
> > "Not thread-safe" in what way?  Only one Lisp thread can run at a
> > given time, so some thread-safe issues should not exist in that case.
> 
> I didn't know that only one Lisp thread can run at a time.

Yes, that's the purpose of the global lock: a Lisp thread can only run
if it acquires the lock.

Thanks.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-25 14:43                                               ` Helmut Eller
  2024-12-25 14:59                                                 ` Eli Zaretskii
@ 2024-12-25 15:02                                                 ` Gerd Möllmann
  1 sibling, 0 replies; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-25 15:02 UTC (permalink / raw)
  To: Helmut Eller; +Cc: Eli Zaretskii, pipcet, ofv, emacs-devel, acorallo

Helmut Eller <eller.helmut@gmail.com> writes:

> On Wed, Dec 25 2024, Gerd Möllmann wrote:
>
>> A problem occurs only, apparently (I've not read the MPS code), when the
>> barrier handling code in MPS is called while being in another signal
>> handler like Emacs' SIGPROF handler.
>>
>>
>> I don't know what exactly the problem is in the end, in MPS. That would
>> be a good question for Richard Brooksby, I think.
>
> The problem is probably simply that MPS uses a non-recursive lock.  The
> SIGSEGV signal handler can't claim the lock when it's already claimed.
>
> See:
> https://memory-pool-system.readthedocs.io/en/latest/design/arena.html#locks
>
> Helmut

Could be, but OTOH then it's strange that things work fine when the
barriers use Mach exceptions.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-25 12:50                                     ` Gerd Möllmann
  2024-12-25 13:00                                       ` Eli Zaretskii
@ 2024-12-25 13:09                                       ` Eli Zaretskii
  2024-12-25 13:46                                         ` Gerd Möllmann
  2024-12-25 17:40                                       ` Pip Cet via Emacs development discussions.
  2 siblings, 1 reply; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-25 13:09 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: pipcet, ofv, emacs-devel, eller.helmut, acorallo

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: pipcet@protonmail.com,  ofv@wanadoo.es,  emacs-devel@gnu.org,
>  eller.helmut@gmail.com,  acorallo@gnu.org
> Date: Wed, 25 Dec 2024 13:50:37 +0100
> 
> >   . how is accessing F different from accessing the specpdl stack?
> 
> F's memory is allocated from an MPS pool via alloc_impl in igc.c. Most
> objects are allocated from a pool that uses barriers (I think except
> PVEC_THREAD). The specpdl stacks are mallocs (see
> grow_specpdl_allocation), and uses as a roots. There are currently no
> barriers on roots.

So you are saying that the answer to this:

> > The first question is more important, from where I stand.  Looking
> > forward beyond the point where we land igc on master, I wonder how
> > will be able to tell, for a random non-trivial change on the C level,
> > whether what it does can cause trouble with MPS?  That is, how can a
> > mere mortal determine whether a given data structure in igc Emacs can
> > or cannot be safely touched when MPS happens to do its thing, whether
> > synchronously or asynchronously?  We must have some reasonably
> > practical way of telling this, or else we will be breaking Emacs high
> > and low.

is that we need to trace each datum to see whether it is "used as
roots" (what does that mean in practice, btw?) or is "allocated via
alloc_impl in igc.c"?  Does the latter include all the Lisp objects
(except fixnums)?  Do we allocate non-Lisp data via alloc_impl, and if
so, which data?

Once again, I think this is very important for future maintenance.  I
feel that this barrier thing in MPS introduces significant
complications into reasoning about safety of C-level changes.
Previously, we only had the mark bit to worry about if we wanted to
access Lisp objects during GC (see gc_asize, for example), but now we
have a much larger problem, AFAIU.  How do we manage that for the next
40 years?



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-25 13:09                                       ` Eli Zaretskii
@ 2024-12-25 13:46                                         ` Gerd Möllmann
  2024-12-25 14:37                                           ` Eli Zaretskii
  0 siblings, 1 reply; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-25 13:46 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, ofv, emacs-devel, eller.helmut, acorallo

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: pipcet@protonmail.com,  ofv@wanadoo.es,  emacs-devel@gnu.org,
>>  eller.helmut@gmail.com,  acorallo@gnu.org
>> Date: Wed, 25 Dec 2024 13:50:37 +0100
>> 
>> >   . how is accessing F different from accessing the specpdl stack?
>> 
>> F's memory is allocated from an MPS pool via alloc_impl in igc.c. Most
>> objects are allocated from a pool that uses barriers (I think except
>> PVEC_THREAD). The specpdl stacks are mallocs (see
>> grow_specpdl_allocation), and uses as a roots. There are currently no
>> barriers on roots.
>
> So you are saying that the answer to this:
>
>> > The first question is more important, from where I stand.  Looking
>> > forward beyond the point where we land igc on master, I wonder how
>> > will be able to tell, for a random non-trivial change on the C level,
>> > whether what it does can cause trouble with MPS?  That is, how can a
>> > mere mortal determine whether a given data structure in igc Emacs can
>> > or cannot be safely touched when MPS happens to do its thing, whether
>> > synchronously or asynchronously?  We must have some reasonably
>> > practical way of telling this, or else we will be breaking Emacs high
>> > and low.
>
> is that we need to trace each datum to see whether it is "used as
> roots" (what does that mean in practice, btw?) or is "allocated via
> alloc_impl in igc.c"?  Does the latter include all the Lisp objects
> (except fixnums)?  Do we allocate non-Lisp data via alloc_impl, and if
> so, which data?

No, I'm not saying anything like that. Apparently I've used some
hints/terms that you are not familiar with. Sorry for that.

Roots: There are GC roots in the old GC. Specpdl stacks are roots,
DEFVARs, control stacks, DEFSYM symbols, the bytecode stacks and so on.
They are marked first at the beginning of GC. The roots and everything
recursively reachable from them are all live objects.

Roots in igc are basically the same thing. There are MPS function with
which one can define roots, if you mean that by "what it means in
practice". These MPS functions are called in igc.c. Please don't ask
which functions :-).

alloc_impl: This function is used for the allocation of all Lisp
objects, vectors, strings, frames, everything that can end up being
references as Lisp_Object. So everything except fixnums. And it is not
used to allocate other stuff, the xmalloc family of functions is used
for that.

WRT to the mere mortal and so on: I was talking specifically about
get_backtrace, and why it can be used as-is, in the way I described, to
get a backtrace in the SIGPROF handler. What's the problem with that? I
think what you write is a grossly exaggerating the situation with "trace
each datum" and a bit implying "all the time" (re your paragraph below).

Does that help?

>
> Once again, I think this is very important for future maintenance.  I
> feel that this barrier thing in MPS introduces significant
> complications into reasoning about safety of C-level changes.
> Previously, we only had the mark bit to worry about if we wanted to
> access Lisp objects during GC (see gc_asize, for example), but now we
> have a much larger problem, AFAIU.  How do we manage that for the next
> 40 years?

These problems do not exist. The barriers are transparent for the
application, except in vary special circumstances, namely this shit
signal handler.

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-25 13:46                                         ` Gerd Möllmann
@ 2024-12-25 14:37                                           ` Eli Zaretskii
  2024-12-25 14:57                                             ` Gerd Möllmann
  0 siblings, 1 reply; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-25 14:37 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: pipcet, ofv, emacs-devel, eller.helmut, acorallo

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: pipcet@protonmail.com,  ofv@wanadoo.es,  emacs-devel@gnu.org,
>   eller.helmut@gmail.com,  acorallo@gnu.org
> Date: Wed, 25 Dec 2024 14:46:44 +0100
> 
> Roots: There are GC roots in the old GC. Specpdl stacks are roots,
> DEFVARs, control stacks, DEFSYM symbols, the bytecode stacks and so on.
> They are marked first at the beginning of GC. The roots and everything
> recursively reachable from them are all live objects.
> 
> Roots in igc are basically the same thing. There are MPS function with
> which one can define roots, if you mean that by "what it means in
> practice". These MPS functions are called in igc.c. Please don't ask
> which functions :-).
> 
> alloc_impl: This function is used for the allocation of all Lisp
> objects, vectors, strings, frames, everything that can end up being
> references as Lisp_Object. So everything except fixnums. And it is not
> used to allocate other stuff, the xmalloc family of functions is used
> for that.

OK, thanks.

So there are Lisp objects allocated by alloc_impl, roots allocated
via MPS, and data allocated by xmalloc that MPS doesn't know about, is
that correct?
> > Once again, I think this is very important for future maintenance.  I
> > feel that this barrier thing in MPS introduces significant
> > complications into reasoning about safety of C-level changes.
> > Previously, we only had the mark bit to worry about if we wanted to
> > access Lisp objects during GC (see gc_asize, for example), but now we
> > have a much larger problem, AFAIU.  How do we manage that for the next
> > 40 years?
> 
> These problems do not exist. The barriers are transparent for the
> application, except in vary special circumstances, namely this shit
> signal handler.

But I _am_ talking about this "shit signal handler".  I'm trying to
understand how would I go about reasoning whether accessing specpdl
from the signal handler is okay.  Is that because I'm supposed to know
that the specpdl stack is a root?  If so, I'd need to figure out that
for every datum the handler accesses, no?

I guess I'm yearning for some commentary in igc.c, not unlike what you
wrote in xdisp.c at the time, which would explain the basics, like
what are roots, what's the purpose of all those root_create_SOMETHING
functions, what's the difference between exact and ambiguous roots,
etc.  Because currently we are not too spoiled by comments in igc.c.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-25 14:37                                           ` Eli Zaretskii
@ 2024-12-25 14:57                                             ` Gerd Möllmann
  2024-12-25 15:28                                               ` Eli Zaretskii
  0 siblings, 1 reply; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-25 14:57 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, ofv, emacs-devel, eller.helmut, acorallo

Eli Zaretskii <eliz@gnu.org> writes:

> So there are Lisp objects allocated by alloc_impl, roots allocated
> via MPS, and data allocated by xmalloc that MPS doesn't know about, is
> that correct?

Correct.

I'd perhaps change "roots allocated" to "roots declared". One simply
tells MPS a range of memory [start, end] is a root of type so-and-so
(ambiguous, exact). The memory itself can come from xmalloc for example.

>> > Once again, I think this is very important for future maintenance.  I
>> > feel that this barrier thing in MPS introduces significant
>> > complications into reasoning about safety of C-level changes.
>> > Previously, we only had the mark bit to worry about if we wanted to
>> > access Lisp objects during GC (see gc_asize, for example), but now we
>> > have a much larger problem, AFAIU.  How do we manage that for the next
>> > 40 years?
>> 
>> These problems do not exist. The barriers are transparent for the
>> application, except in vary special circumstances, namely this shit
>> signal handler.
>
> But I _am_ talking about this "shit signal handler".  I'm trying to
> understand how would I go about reasoning whether accessing specpdl
> from the signal handler is okay.  Is that because I'm supposed to know
> that the specpdl stack is a root?  If so, I'd need to figure out that
> for every datum the handler accesses, no?

That's right, but perhaps it helps that anything that can end up being
Lisp_Object is not a root. And only those can be affected by barriers.

> I guess I'm yearning for some commentary in igc.c, not unlike what you
> wrote in xdisp.c at the time, which would explain the basics, like
> what are roots, what's the purpose of all those root_create_SOMETHING
> functions, what's the difference between exact and ambiguous roots,
> etc.  Because currently we are not too spoiled by comments in igc.c.

Would a reference to the MPS Guide

  https://memory-pool-system.readthedocs.io/en/latest/guide/index.html#

help?




^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-25 14:57                                             ` Gerd Möllmann
@ 2024-12-25 15:28                                               ` Eli Zaretskii
  2024-12-25 15:49                                                 ` Gerd Möllmann
  0 siblings, 1 reply; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-25 15:28 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: pipcet, ofv, emacs-devel, eller.helmut, acorallo

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: pipcet@protonmail.com,  ofv@wanadoo.es,  emacs-devel@gnu.org,
>   eller.helmut@gmail.com,  acorallo@gnu.org
> Date: Wed, 25 Dec 2024 15:57:32 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > I guess I'm yearning for some commentary in igc.c, not unlike what you
> > wrote in xdisp.c at the time, which would explain the basics, like
> > what are roots, what's the purpose of all those root_create_SOMETHING
> > functions, what's the difference between exact and ambiguous roots,
> > etc.  Because currently we are not too spoiled by comments in igc.c.
> 
> Would a reference to the MPS Guide
> 
>   https://memory-pool-system.readthedocs.io/en/latest/guide/index.html#
> 
> help?

Not really.  Their documentation is (a) too vague/abstract/circular
(i.e., stuff is not explained for itself, but instead they
cross-reference to some other stuff -- e.g., try to understand what is
a "root"), and (b) not specific to Emacs, so the relation between MPS
terminology and Emacs objects is missing.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-25 15:28                                               ` Eli Zaretskii
@ 2024-12-25 15:49                                                 ` Gerd Möllmann
  2024-12-25 17:26                                                   ` Eli Zaretskii
  0 siblings, 1 reply; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-25 15:49 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, ofv, emacs-devel, eller.helmut, acorallo

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: pipcet@protonmail.com,  ofv@wanadoo.es,  emacs-devel@gnu.org,
>>   eller.helmut@gmail.com,  acorallo@gnu.org
>> Date: Wed, 25 Dec 2024 15:57:32 +0100
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> > I guess I'm yearning for some commentary in igc.c, not unlike what you
>> > wrote in xdisp.c at the time, which would explain the basics, like
>> > what are roots, what's the purpose of all those root_create_SOMETHING
>> > functions, what's the difference between exact and ambiguous roots,
>> > etc.  Because currently we are not too spoiled by comments in igc.c.
>> 
>> Would a reference to the MPS Guide
>> 
>>   https://memory-pool-system.readthedocs.io/en/latest/guide/index.html#
>> 
>> help?
>
> Not really.  Their documentation is (a) too vague/abstract/circular
> (i.e., stuff is not explained for itself, but instead they
> cross-reference to some other stuff -- e.g., try to understand what is
> a "root"), and (b) not specific to Emacs, so the relation between MPS
> terminology and Emacs objects is missing.

Too bad. In that case, I guess I should consider to begin to think about
planning to start to try write something up. That's a bit like pulling
teeth, TBH. Don't know how long that will take.








^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-25 15:49                                                 ` Gerd Möllmann
@ 2024-12-25 17:26                                                   ` Eli Zaretskii
  2024-12-26  5:25                                                     ` Gerd Möllmann
  0 siblings, 1 reply; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-25 17:26 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: pipcet, ofv, emacs-devel, eller.helmut, acorallo

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: pipcet@protonmail.com,  ofv@wanadoo.es,  emacs-devel@gnu.org,
>   eller.helmut@gmail.com,  acorallo@gnu.org
> Date: Wed, 25 Dec 2024 16:49:30 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> >> Cc: pipcet@protonmail.com,  ofv@wanadoo.es,  emacs-devel@gnu.org,
> >>   eller.helmut@gmail.com,  acorallo@gnu.org
> >> Date: Wed, 25 Dec 2024 15:57:32 +0100
> >> 
> >> Eli Zaretskii <eliz@gnu.org> writes:
> >> 
> >> > I guess I'm yearning for some commentary in igc.c, not unlike what you
> >> > wrote in xdisp.c at the time, which would explain the basics, like
> >> > what are roots, what's the purpose of all those root_create_SOMETHING
> >> > functions, what's the difference between exact and ambiguous roots,
> >> > etc.  Because currently we are not too spoiled by comments in igc.c.
> >> 
> >> Would a reference to the MPS Guide
> >> 
> >>   https://memory-pool-system.readthedocs.io/en/latest/guide/index.html#
> >> 
> >> help?
> >
> > Not really.  Their documentation is (a) too vague/abstract/circular
> > (i.e., stuff is not explained for itself, but instead they
> > cross-reference to some other stuff -- e.g., try to understand what is
> > a "root"), and (b) not specific to Emacs, so the relation between MPS
> > terminology and Emacs objects is missing.
> 
> Too bad. In that case, I guess I should consider to begin to think about
> planning to start to try write something up. That's a bit like pulling
> teeth, TBH. Don't know how long that will take.

Thanks.  No matter how long it will take, it will be shorter than
eternity.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-25 17:26                                                   ` Eli Zaretskii
@ 2024-12-26  5:25                                                     ` Gerd Möllmann
  2024-12-26  7:43                                                       ` Eli Zaretskii
  0 siblings, 1 reply; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-26  5:25 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, ofv, emacs-devel, eller.helmut, acorallo

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: pipcet@protonmail.com,  ofv@wanadoo.es,  emacs-devel@gnu.org,
>>   eller.helmut@gmail.com,  acorallo@gnu.org
>> Date: Wed, 25 Dec 2024 16:49:30 +0100
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> >> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> >> Cc: pipcet@protonmail.com,  ofv@wanadoo.es,  emacs-devel@gnu.org,
>> >>   eller.helmut@gmail.com,  acorallo@gnu.org
>> >> Date: Wed, 25 Dec 2024 15:57:32 +0100
>> >> 
>> >> Eli Zaretskii <eliz@gnu.org> writes:
>> >> 
>> >> > I guess I'm yearning for some commentary in igc.c, not unlike what you
>> >> > wrote in xdisp.c at the time, which would explain the basics, like
>> >> > what are roots, what's the purpose of all those root_create_SOMETHING
>> >> > functions, what's the difference between exact and ambiguous roots,
>> >> > etc.  Because currently we are not too spoiled by comments in igc.c.
>> >> 
>> >> Would a reference to the MPS Guide
>> >> 
>> >>   https://memory-pool-system.readthedocs.io/en/latest/guide/index.html#
>> >> 
>> >> help?
>> >
>> > Not really.  Their documentation is (a) too vague/abstract/circular
>> > (i.e., stuff is not explained for itself, but instead they
>> > cross-reference to some other stuff -- e.g., try to understand what is
>> > a "root"), and (b) not specific to Emacs, so the relation between MPS
>> > terminology and Emacs objects is missing.
>> 
>> Too bad. In that case, I guess I should consider to begin to think about
>> planning to start to try write something up. That's a bit like pulling
>> teeth, TBH. Don't know how long that will take.
>
> Thanks.  No matter how long it will take, it will be shorter than
> eternity.

Did what I wrote in this thread help you understand/judge what I
proposed better? Please ask further questions if you want.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-26  5:25                                                     ` Gerd Möllmann
@ 2024-12-26  7:43                                                       ` Eli Zaretskii
  2024-12-26  7:57                                                         ` Gerd Möllmann
  0 siblings, 1 reply; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-26  7:43 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: pipcet, ofv, emacs-devel, eller.helmut, acorallo

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: pipcet@protonmail.com,  ofv@wanadoo.es,  emacs-devel@gnu.org,
>   eller.helmut@gmail.com,  acorallo@gnu.org
> Date: Thu, 26 Dec 2024 06:25:33 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> >>   https://memory-pool-system.readthedocs.io/en/latest/guide/index.html#
> >> >> 
> >> >> help?
> >> >
> >> > Not really.  Their documentation is (a) too vague/abstract/circular
> >> > (i.e., stuff is not explained for itself, but instead they
> >> > cross-reference to some other stuff -- e.g., try to understand what is
> >> > a "root"), and (b) not specific to Emacs, so the relation between MPS
> >> > terminology and Emacs objects is missing.
> >> 
> >> Too bad. In that case, I guess I should consider to begin to think about
> >> planning to start to try write something up. That's a bit like pulling
> >> teeth, TBH. Don't know how long that will take.
> >
> > Thanks.  No matter how long it will take, it will be shorter than
> > eternity.
> 
> Did what I wrote in this thread help you understand/judge what I
> proposed better? Please ask further questions if you want.

It does help, to some extent.  But I'm still in the dark regarding
some aspects of this.  I'll keep asking questions, but there's a limit
to which I feel myself entitled to ask questions without annoying too
much.  Which is why I suggested to write commentary instead, to get
some of the basics out of the way.  It doesn't have to be you
personally who writes the commentary, btw: anyone who knows the
answers to the questions I posted could do that.  With that in mind,
let me again post the questions which I'm currently struggling with:

  . what are "roots"?
  . what is the purpose of each root_create_SOMETHING function in igc.c?
  . what is the difference between "exact" and "ambiguous" roots, and
    when should we use each one in Emacs?



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-26  7:43                                                       ` Eli Zaretskii
@ 2024-12-26  7:57                                                         ` Gerd Möllmann
  2024-12-26 11:56                                                           ` Eli Zaretskii
  2024-12-26 15:27                                                           ` Stefan Kangas
  0 siblings, 2 replies; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-26  7:57 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, ofv, emacs-devel, eller.helmut, acorallo

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: pipcet@protonmail.com,  ofv@wanadoo.es,  emacs-devel@gnu.org,
>>   eller.helmut@gmail.com,  acorallo@gnu.org
>> Date: Thu, 26 Dec 2024 06:25:33 +0100
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> >> >>   https://memory-pool-system.readthedocs.io/en/latest/guide/index.html#
>> >> >> 
>> >> >> help?
>> >> >
>> >> > Not really.  Their documentation is (a) too vague/abstract/circular
>> >> > (i.e., stuff is not explained for itself, but instead they
>> >> > cross-reference to some other stuff -- e.g., try to understand what is
>> >> > a "root"), and (b) not specific to Emacs, so the relation between MPS
>> >> > terminology and Emacs objects is missing.
>> >> 
>> >> Too bad. In that case, I guess I should consider to begin to think about
>> >> planning to start to try write something up. That's a bit like pulling
>> >> teeth, TBH. Don't know how long that will take.
>> >
>> > Thanks.  No matter how long it will take, it will be shorter than
>> > eternity.
>> 
>> Did what I wrote in this thread help you understand/judge what I
>> proposed better? Please ask further questions if you want.
>
> It does help, to some extent.  But I'm still in the dark regarding
> some aspects of this.  I'll keep asking questions, but there's a limit
> to which I feel myself entitled to ask questions without annoying too
> much.  Which is why I suggested to write commentary instead, to get
> some of the basics out of the way.  It doesn't have to be you
> personally who writes the commentary, btw: anyone who knows the
> answers to the questions I posted could do that.  With that in mind,
> let me again post the questions which I'm currently struggling with:
>
>   . what are "roots"?
>   . what is the purpose of each root_create_SOMETHING function in igc.c?
>   . what is the difference between "exact" and "ambiguous" roots, and
>     when should we use each one in Emacs?

Coincidentally, I wrote this today, which might help, and would also be
interesting to get some feedback for. 

* Introduction / Garbage collection

Implementing a programming language like Lisp requires automatic
memory management which frees the memory of Lisp objects like conses,
strings etc. when they are no longer in use. The automatic memory
management used for Lisp is called garbage collection (GC), which is
performed by a garbage collector. Objects that are no longer used are
called garbage or dead, objects that are in use or called live
objects. The process of getting rid of garbage is called the GC.

A large number of GC algorithms and implementations exist which differ
in various dimensions. Emacs has two GC implementations which can be
chosen at compile-time. The traditional (old) GC, which was the only
one until recently, is a so-called mark-sweep, non-copying collector.
The new GC implementation in this file is an incremental,
generational, concurrent (igc) collector based on the MPS library from
Ravenbrook. It is a so-called copying collector. The terms used here
will become clearer in the following.

Emacs' traditional mark-sweep GC works in two phases:

1. The mark phase

   Start with a set of so-called (GC) roots that are
   known to contain live objects. Examples of roots in Emacs are

   - the bytecode stacks 
   - the specpdl (binding) stacks
   - Lisp variables defined in C with DEFVAR
   - the C stacks (aka control stacks)
   - ...

   Roots can be either exact or ambiguous.

   If we know that a root always contains a valid Lisp object
   reference it is called exact. Example for an exact root would be
   the root for a DEFVAR. The DEFVAR variable always contains a valid
   Lisp object.

   If we only know that a root might potentially reference a Lisp
   object, that root is called ambiguous. An example for an ambigupus
   root is the C stack. The C stack can contain integers that look
   like they are a reference to a Lisp object, but actually look like
   that only by happenstance. Ambiguous roots are said to be scanned
   conservatively. We make the conservative assumption that we
   actually have seen a valid reference, even if it is actually not.

   GC scans all roots, conservatively or exact, and marks the Lisp
   objects found in them live, plus all Lisp objects reachable from
   them, recursively. In the end, all live objects are marked, and all
   objects not marked are dead, i.e. garbage.

2. The sweep phase.

   Sweeping means to free all objects that are not marked live. The
   collector iterates over all allocated objects, live or dead, and
   frees the ones that are dead so that their memory can be reused.

The traditional mark-sweep GC implementation is

- Not concurrent. Emacs calls GC explicitly in various places, and
  proceeds only when the GC is done.
- Not incremental. The GC is not done in steps.
- Not generational. The GC doesn't take advantage of the so-called
  generational hypothesis, which says that most objects used by program
  die young, i.e. are only used for a short period of time. Other GCs
  take this hypothesis into account to reduce pause times.
- Not copying. Lisp objects are never copied by the GC. They stay at
  the addresses where they their initial allocation puts them. Special
  facilities like allocation in blocks are implemented to avoid memory
  fragmentation.

In contrast, the new igc collector, using MPS, is

- Concurrent. The GC runs in its own thread. There are no explicit
  calls to start GC, and Emacs doesn't have to wait for the GC to end.
- Incremental. The GC is done in steps.
- Generational. The GC takes advantage of the so-called
  generational hypothesis.
- Copying. Lisp objects are copied by the GC, avoiding memory
  fragmentation. Special care must be taken to take into account that
  the same Lisp object can have different addresses at different
  times.

The goal of igc is to reduce pause times when using Emacs
interactively. The properties listed above which come from leveraging
MPS make that possible.

* MPS

A guide and a reference manual for MPS can be found at
https://memory-pool-system.readthedocs.io.

It is recommended to read at least the Guide. And what else to write I
don't know.

* Registry

MPS provides an API of C functions that an application can use to
interface with the MPS. For example, MPS cannot know what the GC roots
of an application are, or what threads the application considers
relevant for GC. So, MPS provides functions that an application can
use to define its roots and threads.

The MPS functions typically return a handle to an MPS object created
for what is defined by them. For example, when we define a root with
mps_root_create_area, MPS gives us a mps_root_t handle back. The
handles returned can later be used to change or delete what we
created. For example, we use the handle we got from MPS for a root to
destroy the root with mps_root_destroy.

All MPS handles we create, we collect in a global registry of type
struct igc which is stored in the global variable global_igc and can
be accessed from amywhere. 

* Initialization
 - Roots
 - Threads
 - Pools, Object formats
* Lisp Object Allocation
Object Header?
* Malloc with roots
* Memory barriers

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-26  7:57                                                         ` Gerd Möllmann
@ 2024-12-26 11:56                                                           ` Eli Zaretskii
  2024-12-26 15:27                                                           ` Stefan Kangas
  1 sibling, 0 replies; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-26 11:56 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: pipcet, ofv, emacs-devel, eller.helmut, acorallo

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: pipcet@protonmail.com,  ofv@wanadoo.es,  emacs-devel@gnu.org,
>   eller.helmut@gmail.com,  acorallo@gnu.org
> Date: Thu, 26 Dec 2024 08:57:17 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >   . what are "roots"?
> >   . what is the purpose of each root_create_SOMETHING function in igc.c?
> >   . what is the difference between "exact" and "ambiguous" roots, and
> >     when should we use each one in Emacs?
> 
> Coincidentally, I wrote this today, which might help, and would also be
> interesting to get some feedback for. 

Thanks, this is helpful.  Especially if you extend the MPS section to
cover the above questions.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-26  7:57                                                         ` Gerd Möllmann
  2024-12-26 11:56                                                           ` Eli Zaretskii
@ 2024-12-26 15:27                                                           ` Stefan Kangas
  2024-12-26 19:51                                                             ` Gerd Möllmann
  1 sibling, 1 reply; 203+ messages in thread
From: Stefan Kangas @ 2024-12-26 15:27 UTC (permalink / raw)
  To: Gerd Möllmann, Eli Zaretskii
  Cc: pipcet, ofv, emacs-devel, eller.helmut, acorallo

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Coincidentally, I wrote this today, which might help, and would also be
> interesting to get some feedback for.

This text looks good to me as a general introduction, especially if you
extend it to include these points also:

> * Initialization
>  - Roots
>  - Threads
>  - Pools, Object formats
> * Lisp Object Allocation
> Object Header?
> * Malloc with roots
> * Memory barriers

Do you intend to install this on the scratch/igc branch?



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-26 15:27                                                           ` Stefan Kangas
@ 2024-12-26 19:51                                                             ` Gerd Möllmann
  2024-12-27  9:45                                                               ` Gerd Möllmann
  0 siblings, 1 reply; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-26 19:51 UTC (permalink / raw)
  To: Stefan Kangas
  Cc: Eli Zaretskii, pipcet, ofv, emacs-devel, eller.helmut, acorallo

Stefan Kangas <stefankangas@gmail.com> writes:

> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>
>> Coincidentally, I wrote this today, which might help, and would also be
>> interesting to get some feedback for.
>
> This text looks good to me as a general introduction, especially if you
> extend it to include these points also:

Thanks.

>> * Initialization
>>  - Roots
>>  - Threads
>>  - Pools, Object formats
>> * Lisp Object Allocation
>> Object Header?
>> * Malloc with roots
>> * Memory barriers
>
> Do you intend to install this on the scratch/igc branch?

Yes, when I'm a bit further. I think Eli wants a comment in igc.c like
the one at xdisp.c starts with.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-26 19:51                                                             ` Gerd Möllmann
@ 2024-12-27  9:45                                                               ` Gerd Möllmann
  2024-12-27 13:56                                                                 ` Gerd Möllmann
  0 siblings, 1 reply; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-27  9:45 UTC (permalink / raw)
  To: Stefan Kangas
  Cc: Eli Zaretskii, pipcet, ofv, emacs-devel, eller.helmut, acorallo

[-- Attachment #1: Type: text/plain, Size: 950 bytes --]

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Stefan Kangas <stefankangas@gmail.com> writes:
>
>> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>>
>>> Coincidentally, I wrote this today, which might help, and would also be
>>> interesting to get some feedback for.
>>
>> This text looks good to me as a general introduction, especially if you
>> extend it to include these points also:
>
> Thanks.
>
>>> * Initialization
>>>  - Roots
>>>  - Threads
>>>  - Pools, Object formats
>>> * Lisp Object Allocation
>>> Object Header?
>>> * Malloc with roots
>>> * Memory barriers
>>
>> Do you intend to install this on the scratch/igc branch?
>
> Yes, when I'm a bit further. I think Eli wants a comment in igc.c like
> the one at xdisp.c starts with.

I've reached the point where I don't know what else to explain. Could
always be improved, of course, and so on. Please find attached, with
request for feedback.

[-- Attachment #2: Doc --]
[-- Type: text/x-org, Size: 11088 bytes --]

:PROPERTIES:
:ID:       0B687D90-0DE7-4BC3-B92E-1D582C175737
:END:
#+title: icg doc commebt
* Introduction / Garbage collection

Implementing a programming language like Lisp requires automatic
memory management which frees the memory of Lisp objects like conses,
strings etc. when they are no longer in use. The automatic memory
management used for Lisp is called garbage collection (GC), which is
performed by a garbage collector. Objects that are no longer used are
called garbage or dead, objects that are in use or called live
objects. The process of getting rid of garbage is called the GC.

A large number of GC algorithms and implementations exist which differ
in various dimensions. Emacs has two GC implementations which can be
chosen at compile-time. The traditional (old) GC, which was the only
one until recently, is a so-called mark-sweep, non-copying collector.
The new GC implementation in this file is an incremental,
generational, concurrent (igc) collector based on the MPS library from
Ravenbrook. It is a so-called copying collector. The terms used here
will become clearer in the following.

Emacs' traditional mark-sweep GC works in two phases:

1. The mark phase

   Start with a set of so-called (GC) roots that are known to contain
   live objects. Examples of roots in Emacs are

   - the bytecode stacks 
   - the specpdl (binding) stacks
   - Lisp variables defined in C with DEFVAR
   - the C stacks (aka control stacks)
   - ...

   Roots in a general sense are contiguous areas of memory, and they
   are either "exact" or "ambiguous".

   If we know that a root always contains only valid Lisp object
   references, the root is called exact. An Example for an exact root
   is the root for a DEFVAR. The DEFVAR variable always contains a
   valid reference to a Lisp object. The memory range of the root is
   the Lisp_Object C variable.

   If we only know that a root contains potential references to Lisp
   objects, the root is called ambiguous.  An example for an ambiguous
   root is the C stack.  The C stack can contain integers that look
   like a Lisp_Object or pointer to a Lisp object, but actually are
   just random integers.  Ambiguous roots are said to be scanned
   conservatively: we make the conservative assumption that we really
   found a reference, so that we don't discard objects that are only
   referenced from the C stack.

   The mark phase of the traditional GC scans all roots,
   conservatively or exactly, and marks the Lisp objects found
   referenced in the roots live, plus all Lisp objects reachable from
   them, recursively.  In the end, all live objects are marked, and all
   objects not marked are dead, i.e.  garbage.

2.  The sweep phase

   Sweeping means to free all objects that are not marked live.  The
   collector iterates over all allocated objects, live or dead, and
   frees the dead ones so that their memory can be reused.

The traditional mark-sweep GC implementation is

- Not concurrent.  Emacs calls GC explicitly in various places, and
  proceeds only when the GC is done.
- Not incremental.  The GC is not done in steps.
- Not generational.  The GC doesn't take advantage of the so-called
  generational hypothesis, which says that most objects used by a
  program die young, i.e.  are only used for a short period of time.
  Other GCs take this hypothesis into account to reduce pause times.
- Not copying.  Lisp objects are never copied by the GC.  They stay at
  the addresses where their initial allocation puts them.  Special
  facilities like allocation in blocks are implemented to avoid memory
  fragmentation.

In contrast, the new igc collector, using MPS, is

- Concurrent.  The GC runs in its own thread.  There are no explicit
  calls to start GC, and Emacs doesn't have to wait for the GC to
  complete.
- Incremental.  The GC is done in steps.
- Generational.  The GC takes advantage of the so-called
  generational hypothesis.
- Copying.  Lisp objects are copied by the GC, avoiding memory
  fragmentation.  Special care must be taken to take into account that
  the same Lisp object can have different addresses at different
  times.

The goal of igc is to reduce pause times when using Emacs
interactively.  The properties listed above which come from leveraging
MPS make that possible.

* MPS (Memory Pool System)

The MPS (Memory Pool System) is a C library developed by Ravenbrook
Inc. over several decades. It is used, for example, by the OpenDylan
project.

A guide and a reference manual for MPS can be found at
https://memory-pool-system.readthedocs.io. The following can only be a
rough and incomplete overview. It is recommended to read at least the
MPS Guide.

MPS uses "arenas" which consist of "pools". The pools represent memory
to allocate objects from that are subject to GC. Different types of
pools implement different GC strategies.

Pools have "object formats" that an application must supply when it
creates a pool. The most important part of an object format are a
handful of functions that an application must implement to perform
lowest-level operations on behalf of the MPS. These functions make it
possible to implement MPS without it having to know low-level details
about what application objects look like.

The most important MPS pool type is named AMC (Automatic Mostly
Copying). AMC implements a variant of a copying collector. Objects
allocated from AMS pools can therefore change their memory addresses.

When copying objects, a marker is left in the original object pointing
to its copy. This marker is also called a "tombstone". A "memory
barrier" is placed on the original object. Memory barrier means the
memory is read and/or write protected (e.g. with mprotect). The
barrier leads to MPS being invoked when an old object is accessed.
The whole process is called "object forwarding".

MPS makes sure that references to old objects are updated to refer to
their new addresses. Functions defined in the object format are used
by MPS to perform the lowest-level tasks of object forwarding, so that
MPS doesn't have to know application-specific details of how objects
look like. In the end, copying/forwarding is transparent to the
application.

AMC implements a "mostly-copying" collector, where "mostly" refers to
the fact that it supports ambiguous references. Ambiguous references
are those from ambiguous roots, where we can't tell if a reference is
real or not. If we would copy such an object, we wouldn't be able to
update their address in all references because we can't tell if the
ambiguous reference is real or just some random integer, and changing
it would have unforeseeable consequences. Ambiguously referenced
objects are therefore never copied, and their address does not change.

* The registry

The MPS shields itself from knowing application details, for example
which GC roots the application has, which threads, how objects look
like and so on. MPS has an internal model instead which describes
these details.

Emacs creates this model using the MPS API. For example, MPS cannot
know what Emacs' GC roots are. We tell MPS about Emacs' roots by
calling an MPS API function an MPS-internal model for the root, and
get back a handle that stands for the root. This handle later used to
refer to the model object. For example, if we want to delete a root
later, because we don't need it anymore, we call an MPS function
giving it the handle for the root we no longer need.

All other model objects are handled in the same way, threads, arenas,
pools, object formats and so on.

Igc collects all these MPS handles in a 'struct igc'. This "registry"
of MPS handles is found in the global variable 'global_igc' and thus
can be accessed from anywhere.

* Root scan functions

MPS allows us to specify roots having tailor-made scan functions that
Emacs implements. Scanning here refers to the process of finding
references in the memory area of the root, and telling MPS about the
references.

The function scan_specpdl is an example. We know the structure of a
bindings stack, so we can tell where references to Lisp objects can
be. This is generally better than letting MPS do the scanning itself,
because MPS can only scan the whole block word for word, ambiguously
or exactly.

All such scan functions in igc have the prefix scan_.

* Lisp object scan functions

Igc tells MPS how to scan Lisp objects allocated via MPS by specifying
a scan function for that purpose in an object format. This function is
'dflt_scan' in igc.c, which dispatches to various subroutines for
different Lisp object types.

The lower-level functions and macros igc defines in the call tree of
dflt_scan have names starting with 'fix'_ or 'FIX_', because they use
the MPS_FIX1 and MPS_FIX2 API to do their job. Please refer to the MPS
reference for details of MPS_FIX1/2.

* Initialization

Before we can use MPS, we must define and create various things that
MPS needs to know, i.e we create an MPS' internal model of Emacs. This
is done at application startup, and all objects are added to the
registry.

- Pools. We tell MPS which pools we want to use, and what the object
  formats are, i.e. which callbacks in Emacs MPS can use.
- Threads. We define which threads Emacs has, and add their C stacks
  as roots. This includes Emacs' main thread.
- Roots. We tell MPS about the various roots in Emacs, the DEFVARs,
  the byte code stack, staticpro, etc.
- ...

When we done all that, we tell MPS it can start doing its job.

* Lisp Object Allocation

All of Emacs' Lisp object allocation ultimately ends up done in igc's
'alloc_impl' function.

MPS allocation from pools is thread-specific, using so-called
"allocation points". These allocation points optimize allocation by
reducing thread-contention. Allocation points are associated with
pools, and there is one allocation point per thread.

The function 'thread_ap' in igc determines which allocation point to
use for the current thread and depending on the type of Lisp object
to allocate.

* Malloc with roots

In a number of places, Emacs allocates memory with its 'xmalloc'
function family and then stores references to Lisp objects there,
pointers or Lisp_Objects.

With the traditional GC, frequently, inadvertently collecting such
objects is prevented by inhibiting GC.

With igc, we do things differently. We don't want to temporarily stop
the GC thread to inhibit GC, as a design decision. Instead, we make
the malloc'd memory a root. The root is destroyed when the memory is
freed.

igc provides a number if functions for doing such allocations. For
example 'igc_xzalloc_ambig', 'igc_xpalloc_exact' and so on. Freeing
the memory must be done with 'igc_xfree'.

These functions are used throughout Emacs in äifdef HAVE_MPS. In
general, it's an erro to put references to Lisp objects in malloc'd
memory and not use the igc functions.

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-27  9:45                                                               ` Gerd Möllmann
@ 2024-12-27 13:56                                                                 ` Gerd Möllmann
  2024-12-27 15:01                                                                   ` Pip Cet via Emacs development discussions.
  2024-12-27 16:37                                                                   ` Eli Zaretskii
  0 siblings, 2 replies; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-27 13:56 UTC (permalink / raw)
  To: Stefan Kangas
  Cc: Eli Zaretskii, pipcet, ofv, emacs-devel, eller.helmut, acorallo

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> I've reached the point where I don't know what else to explain. Could
> always be improved, of course, and so on. Please find attached, with
> request for feedback.

Ahem, just remembered that I had already an admin/igc.org in the branch,
so I've now replaced that with what I've written.

Happy hacking :-)



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-27 13:56                                                                 ` Gerd Möllmann
@ 2024-12-27 15:01                                                                   ` Pip Cet via Emacs development discussions.
  2024-12-27 15:28                                                                     ` Eli Zaretskii
  2024-12-27 16:05                                                                     ` Gerd Möllmann
  2024-12-27 16:37                                                                   ` Eli Zaretskii
  1 sibling, 2 replies; 203+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-27 15:01 UTC (permalink / raw)
  To: Gerd Möllmann
  Cc: Stefan Kangas, Eli Zaretskii, ofv, emacs-devel, eller.helmut,
	acorallo

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>
>> I've reached the point where I don't know what else to explain. Could
>> always be improved, of course, and so on. Please find attached, with
>> request for feedback.
>
> Ahem, just remembered that I had already an admin/igc.org in the branch,
> so I've now replaced that with what I've written.

Thank you very much!

> - Concurrent.  The GC runs in its own thread.  There are no explicit
>  calls to start GC, and Emacs doesn't have to wait for the GC to
>  complete.

I don't think that's true right now (it is what I want for Christmas,
though).  On GNU/Linux, the GC usually runs on the main thread.  On
macOS, the GC can run on the main thread (allocation) or on the SIGSEGV
handler thread (memory barriers); in both cases, the main thread has to
wait for it to complete.

I'm not sure it's ever useful to make the assumption that GC isn't
concurrent:  it is very hard to do so, but it is possible.

Maybe Eli knows more; I posted a patch to force concurrent GC for
debugging a while ago, and Eli told me not to because it would produce
false positives.  I'm not so sure about the "false" part now.

Pip

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-27 15:01                                                                   ` Pip Cet via Emacs development discussions.
@ 2024-12-27 15:28                                                                     ` Eli Zaretskii
  2024-12-27 15:47                                                                       ` Pip Cet via Emacs development discussions.
  2024-12-27 16:18                                                                       ` Gerd Möllmann
  2024-12-27 16:05                                                                     ` Gerd Möllmann
  1 sibling, 2 replies; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-27 15:28 UTC (permalink / raw)
  To: Pip Cet
  Cc: gerd.moellmann, stefankangas, ofv, emacs-devel, eller.helmut,
	acorallo

> Date: Fri, 27 Dec 2024 15:01:11 +0000
> From: Pip Cet <pipcet@protonmail.com>
> Cc: Stefan Kangas <stefankangas@gmail.com>, Eli Zaretskii <eliz@gnu.org>, ofv@wanadoo.es, emacs-devel@gnu.org, eller.helmut@gmail.com, acorallo@gnu.org
> 
> > - Concurrent.  The GC runs in its own thread.  There are no explicit
> >  calls to start GC, and Emacs doesn't have to wait for the GC to
> >  complete.
> 
> I don't think that's true right now (it is what I want for Christmas,
> though).  On GNU/Linux, the GC usually runs on the main thread.

Isn't it both, actually?  That is, MPS could be triggered both
synchronously and on a separate thread?  That's what I thought.

At least on Windows, I clearly see new threads starting when MPS
starts GC.

> On
> macOS, the GC can run on the main thread (allocation) or on the SIGSEGV
> handler thread (memory barriers); in both cases, the main thread has to
> wait for it to complete.
> 
> I'm not sure it's ever useful to make the assumption that GC isn't
> concurrent:  it is very hard to do so, but it is possible.
> 
> Maybe Eli knows more; I posted a patch to force concurrent GC for
> debugging a while ago, and Eli told me not to because it would produce
> false positives.  I'm not so sure about the "false" part now.

I just conveyed what a comment in igc.c says (or used to say back in
May).



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-27 15:28                                                                     ` Eli Zaretskii
@ 2024-12-27 15:47                                                                       ` Pip Cet via Emacs development discussions.
  2024-12-27 16:18                                                                       ` Gerd Möllmann
  1 sibling, 0 replies; 203+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-27 15:47 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: gerd.moellmann, stefankangas, ofv, emacs-devel, eller.helmut,
	acorallo

"Eli Zaretskii" <eliz@gnu.org> writes:

>> Date: Fri, 27 Dec 2024 15:01:11 +0000
>> From: Pip Cet <pipcet@protonmail.com>
>> Cc: Stefan Kangas <stefankangas@gmail.com>, Eli Zaretskii <eliz@gnu.org>, ofv@wanadoo.es, emacs-devel@gnu.org, eller.helmut@gmail.com, acorallo@gnu.org
>>
>> > - Concurrent.  The GC runs in its own thread.  There are no explicit
>> >  calls to start GC, and Emacs doesn't have to wait for the GC to
>> >  complete.
>>
>> I don't think that's true right now (it is what I want for Christmas,
>> though).  On GNU/Linux, the GC usually runs on the main thread.
>
> Isn't it both, actually?  That is, MPS could be triggered both
> synchronously and on a separate thread?  That's what I thought.

It's what the rest of Emacs should assume, IMHO, but it's certainly not
true that GC "runs in its own thread": it uses whichever thread is
current at the time we enter MPS.

> At least on Windows, I clearly see new threads starting when MPS
> starts GC.

You mean MPS calls CreateThread?  GC creates no threads on GNU/Linux.

>> On
>> macOS, the GC can run on the main thread (allocation) or on the SIGSEGV
>> handler thread (memory barriers); in both cases, the main thread has to
>> wait for it to complete.
>>
>> I'm not sure it's ever useful to make the assumption that GC isn't
>> concurrent:  it is very hard to do so, but it is possible.
>>
>> Maybe Eli knows more; I posted a patch to force concurrent GC for
>> debugging a while ago, and Eli told me not to because it would produce
>> false positives.  I'm not so sure about the "false" part now.
>
> I just conveyed what a comment in igc.c says (or used to say back in
> May).

Sounds like I misunderstood, then.

If we all agree that Emacs shouldn't assume GC non-concurrency,
triggering GC eagerly from another thread is a useful thing we should
test (after making dflt_pad poison memory, ideally).

Pip




^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-27 15:28                                                                     ` Eli Zaretskii
  2024-12-27 15:47                                                                       ` Pip Cet via Emacs development discussions.
@ 2024-12-27 16:18                                                                       ` Gerd Möllmann
  2024-12-28  9:10                                                                         ` Stefan Kangas
  1 sibling, 1 reply; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-27 16:18 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: Pip Cet, stefankangas, ofv, emacs-devel, eller.helmut, acorallo

Eli Zaretskii <eliz@gnu.org> writes:

>> Date: Fri, 27 Dec 2024 15:01:11 +0000
>> From: Pip Cet <pipcet@protonmail.com>
>> Cc: Stefan Kangas <stefankangas@gmail.com>, Eli Zaretskii <eliz@gnu.org>, ofv@wanadoo.es, emacs-devel@gnu.org, eller.helmut@gmail.com, acorallo@gnu.org
>> 
>> > - Concurrent.  The GC runs in its own thread.  There are no explicit
>> >  calls to start GC, and Emacs doesn't have to wait for the GC to
>> >  complete.
>> 
>> I don't think that's true right now (it is what I want for Christmas,
>> though).  On GNU/Linux, the GC usually runs on the main thread.
>
> Isn't it both, actually?  That is, MPS could be triggered both
> synchronously and on a separate thread?  That's what I thought.
>
> At least on Windows, I clearly see new threads starting when MPS
> starts GC.
>
>> On
>> macOS, the GC can run on the main thread (allocation) or on the SIGSEGV
>> handler thread (memory barriers); in both cases, the main thread has to
>> wait for it to complete.
>> 
>> I'm not sure it's ever useful to make the assumption that GC isn't
>> concurrent:  it is very hard to do so, but it is possible.
>> 
>> Maybe Eli knows more; I posted a patch to force concurrent GC for
>> debugging a while ago, and Eli told me not to because it would produce
>> false positives.  I'm not so sure about the "false" part now.
>
> I just conveyed what a comment in igc.c says (or used to say back in
> May).

MPS is concurrent.

One can tell MPS that one has a certain amount of time to spare, so it
can do work then, which I think runs in the main thread. I do that when
Emacs thinks it's idle.

There is also the case when an allocation point runs out of memory, but
I don't want to open that box at the moment. I personally believe that
this is too special for igc.org. But others may think differently of
course.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-27 16:18                                                                       ` Gerd Möllmann
@ 2024-12-28  9:10                                                                         ` Stefan Kangas
  2024-12-28  9:20                                                                           ` Gerd Möllmann
  0 siblings, 1 reply; 203+ messages in thread
From: Stefan Kangas @ 2024-12-28  9:10 UTC (permalink / raw)
  To: Gerd Möllmann, Eli Zaretskii
  Cc: Pip Cet, ofv, emacs-devel, eller.helmut, acorallo

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> MPS is concurrent.
>
> One can tell MPS that one has a certain amount of time to spare, so it
> can do work then, which I think runs in the main thread. I do that when
> Emacs thinks it's idle.

Apologies if this is a naive question, but:

What do we gain from using the main thread for this, instead of always
letting it run in a separate thread?



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-28  9:10                                                                         ` Stefan Kangas
@ 2024-12-28  9:20                                                                           ` Gerd Möllmann
  2024-12-28  9:24                                                                             ` Gerd Möllmann
  0 siblings, 1 reply; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-28  9:20 UTC (permalink / raw)
  To: Stefan Kangas
  Cc: Eli Zaretskii, Pip Cet, ofv, emacs-devel, eller.helmut, acorallo

Stefan Kangas <stefankangas@gmail.com> writes:

> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>
>> MPS is concurrent.
>>
>> One can tell MPS that one has a certain amount of time to spare, so it
>> can do work then, which I think runs in the main thread. I do that when
>> Emacs thinks it's idle.
>
> Apologies if this is a naive question, but:
>
> What do we gain from using the main thread for this, instead of always
> letting it run in a separate thread?

It's just an assumption I made:

I thought if Emacs is idle anyway, why not let MPS do some work _in
addition_ to its normal work, in Emacs' thread. And my hope would be to
further reduce pause times, for example for the collection fo older
generations.

It it does that I don't know. It's a knob one could play with.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-28  9:20                                                                           ` Gerd Möllmann
@ 2024-12-28  9:24                                                                             ` Gerd Möllmann
  0 siblings, 0 replies; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-28  9:24 UTC (permalink / raw)
  To: Stefan Kangas
  Cc: Eli Zaretskii, Pip Cet, ofv, emacs-devel, eller.helmut, acorallo

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Stefan Kangas <stefankangas@gmail.com> writes:
>
>> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>>
>>> MPS is concurrent.
>>>
>>> One can tell MPS that one has a certain amount of time to spare, so it
>>> can do work then, which I think runs in the main thread. I do that when
>>> Emacs thinks it's idle.
>>
>> Apologies if this is a naive question, but:
>>
>> What do we gain from using the main thread for this, instead of always
>> letting it run in a separate thread?
>
> It's just an assumption I made:
>
> I thought if Emacs is idle anyway, why not let MPS do some work _in
> addition_ to its normal work, in Emacs' thread. And my hope would be to
> further reduce pause times, for example for the collection fo older
> generations.
>
> It it does that I don't know. It's a knob one could play with.

Forgot the name the kid: It's igc-step-interval.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-27 15:01                                                                   ` Pip Cet via Emacs development discussions.
  2024-12-27 15:28                                                                     ` Eli Zaretskii
@ 2024-12-27 16:05                                                                     ` Gerd Möllmann
  2024-12-27 17:00                                                                       ` Pip Cet via Emacs development discussions.
  1 sibling, 1 reply; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-27 16:05 UTC (permalink / raw)
  To: Pip Cet
  Cc: Stefan Kangas, Eli Zaretskii, ofv, emacs-devel, eller.helmut,
	acorallo

Pip Cet <pipcet@protonmail.com> writes:

> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>
>> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>>
>>> I've reached the point where I don't know what else to explain. Could
>>> always be improved, of course, and so on. Please find attached, with
>>> request for feedback.
>>
>> Ahem, just remembered that I had already an admin/igc.org in the branch,
>> so I've now replaced that with what I've written.
>
> Thank you very much!
>
>> - Concurrent.  The GC runs in its own thread.  There are no explicit
>>  calls to start GC, and Emacs doesn't have to wait for the GC to
>>  complete.
>
> I don't think that's true right now (it is what I want for Christmas,
> though).  On GNU/Linux, the GC usually runs on the main thread.  On
> macOS, the GC can run on the main thread (allocation) or on the SIGSEGV
> handler thread (memory barriers); in both cases, the main thread has to
> wait for it to complete.

You mean what happens when an allocation point has used up its memory,
and needs to get some more? I haven't looked what percentage of GC time
that takes. Also, the GC steps I didn't mention. One has to make a cut
somewhere.

Anyway. If people agree that something should be there, please add it.

That's the advantage of it being in git.
For me :-).

> I'm not sure it's ever useful to make the assumption that GC isn't
> concurrent:  it is very hard to do so, but it is possible.
>
> Maybe Eli knows more; I posted a patch to force concurrent GC for
> debugging a while ago, and Eli told me not to because it would produce
> false positives.  I'm not so sure about the "false" part now.
>
> Pip



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-27 16:05                                                                     ` Gerd Möllmann
@ 2024-12-27 17:00                                                                       ` Pip Cet via Emacs development discussions.
  0 siblings, 0 replies; 203+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-27 17:00 UTC (permalink / raw)
  To: Gerd Möllmann
  Cc: Stefan Kangas, Eli Zaretskii, ofv, emacs-devel, eller.helmut,
	acorallo

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Pip Cet <pipcet@protonmail.com> writes:
>
>> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>>
>>> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>>>
>>>> I've reached the point where I don't know what else to explain. Could
>>>> always be improved, of course, and so on. Please find attached, with
>>>> request for feedback.
>>>
>>> Ahem, just remembered that I had already an admin/igc.org in the branch,
>>> so I've now replaced that with what I've written.
>>
>> Thank you very much!
>>
>>> - Concurrent.  The GC runs in its own thread.  There are no explicit
>>>  calls to start GC, and Emacs doesn't have to wait for the GC to
>>>  complete.
>>
>> I don't think that's true right now (it is what I want for Christmas,
>> though).  On GNU/Linux, the GC usually runs on the main thread.  On
>> macOS, the GC can run on the main thread (allocation) or on the SIGSEGV
>> handler thread (memory barriers); in both cases, the main thread has to
>> wait for it to complete.
>
> You mean what happens when an allocation point has used up its memory,
> and needs to get some more?

That's usually what triggers GC here, IME (but then, my experience
represents an atypical situation, where we never become idle.)

> I haven't looked what percentage of GC time
> that takes. Also, the GC steps I didn't mention.
> One has to make a cut somewhere.

Fair enough.  We should never assume that GC runs on the same thread, so
writing that it never does may be a justifiable lie here.  We also
shouldn't assume that it's on a separate thread, but that's only
relevant to signal handlers.

Pip




^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-27 13:56                                                                 ` Gerd Möllmann
  2024-12-27 15:01                                                                   ` Pip Cet via Emacs development discussions.
@ 2024-12-27 16:37                                                                   ` Eli Zaretskii
  2024-12-27 17:26                                                                     ` Pip Cet via Emacs development discussions.
                                                                                       ` (2 more replies)
  1 sibling, 3 replies; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-27 16:37 UTC (permalink / raw)
  To: Gerd Möllmann
  Cc: stefankangas, pipcet, ofv, emacs-devel, eller.helmut, acorallo

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: Eli Zaretskii <eliz@gnu.org>,  pipcet@protonmail.com,  ofv@wanadoo.es,
>   emacs-devel@gnu.org,  eller.helmut@gmail.com,  acorallo@gnu.org
> Date: Fri, 27 Dec 2024 14:56:06 +0100
> 
> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
> 
> > I've reached the point where I don't know what else to explain. Could
> > always be improved, of course, and so on. Please find attached, with
> > request for feedback.
> 
> Ahem, just remembered that I had already an admin/igc.org in the branch,
> so I've now replaced that with what I've written.

Thanks.  Some questions about that file:

  In contrast, the new igc collector, using MPS, is

  - Concurrent.  The GC runs in its own thread.  There are no explicit
    calls to start GC, and Emacs doesn't have to wait for the GC to
    complete.

Pip says this is not true?  I also thought MPS GC runs concurrently in
its own thread.

  When copying objects, a marker is left in the original object pointing
  to its copy. This marker is also called a "tombstone". A "memory
  barrier" is placed on the original object. Memory barrier means the
  memory is read and/or write protected (e.g. with mprotect). The
  barrier leads to MPS being invoked when an old object is accessed.
  The whole process is called "object forwarding".

This doesn't tell how object forwarding works once triggered by access
to protected memory.  Can you say something about that?  This:

  MPS makes sure that references to old objects are updated to refer to
  their new addresses. Functions defined in the object format are used
  by MPS to perform the lowest-level tasks of object forwarding, so that
  MPS doesn't have to know application-specific details of how objects
  look like. In the end, copying/forwarding is transparent to the
  application.

seems to try to explain that, but AFAIU stops short of telling it.
IOW, how are "functions defined in the object format used by MPS to
perform the lowest-level tasks of object forwarding"?

  AMC implements a "mostly-copying" collector, where "mostly" refers to
  the fact that it supports ambiguous references. Ambiguous references
  are those from ambiguous roots, where we can't tell if a reference is
  real or not. If we would copy such an object, we wouldn't be able to
  update their address in all references because we can't tell if the
  ambiguous reference is real or just some random integer, and changing
  it would have unforeseeable consequences. Ambiguously referenced
  objects are therefore never copied, and their address does not change.

This should be important to understand why some roots are submitted to
MPS as ambiguous -- because want to prevent them from moving, right?

  MPS allows us to specify roots having tailor-made scan functions that
  Emacs implements. Scanning here refers to the process of finding
  references in the memory area of the root, and telling MPS about the
  references.

What is the purpose of "telling MPS about the references"?

  igc provides a number if functions for doing such allocations. For
  example 'igc_xzalloc_ambig', 'igc_xpalloc_exact' and so on. Freeing
  the memory must be done with 'igc_xfree'.

An example of using these to reference Lisp objects in malloc'ed
memory would be great.

Stuff that I'd like added:

 . a few words about the root_create_* functions
 . same for create_*_ap functions
 . why do we need the finalize_* functions?
 . some explanation why pdumper needs special support from igc

Thanks again for writing this.

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-27 16:37                                                                   ` Eli Zaretskii
@ 2024-12-27 17:26                                                                     ` Pip Cet via Emacs development discussions.
  2024-12-27 19:12                                                                       ` Gerd Möllmann
                                                                                         ` (2 more replies)
  2024-12-27 18:21                                                                     ` Gerd Möllmann
  2024-12-28  6:08                                                                     ` Gerd Möllmann
  2 siblings, 3 replies; 203+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-27 17:26 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: Gerd Möllmann, stefankangas, ofv, emacs-devel, eller.helmut,
	acorallo

"Eli Zaretskii" <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: Eli Zaretskii <eliz@gnu.org>,  pipcet@protonmail.com,  ofv@wanadoo.es,
>>   emacs-devel@gnu.org,  eller.helmut@gmail.com,  acorallo@gnu.org
>> Date: Fri, 27 Dec 2024 14:56:06 +0100
>>
>> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>>
>> > I've reached the point where I don't know what else to explain. Could
>> > always be improved, of course, and so on. Please find attached, with
>> > request for feedback.
>>
>> Ahem, just remembered that I had already an admin/igc.org in the branch,
>> so I've now replaced that with what I've written.
>
> Thanks.  Some questions about that file:
>
>   In contrast, the new igc collector, using MPS, is
>
>   - Concurrent.  The GC runs in its own thread.  There are no explicit
>     calls to start GC, and Emacs doesn't have to wait for the GC to
>     complete.
>
> Pip says this is not true?

I'm a bit confused.  Right now, on scratch/igc, on GNU/Linux, for Emacs
in batch mode, it isn't technically true.  This causes the signal
handler issue, which we're trying to solve.  One of my approaches is to
MAKE this statement true, by using a separate thread for allocation
(which triggers GC) and access faults (which call into MPS, which counts
as triggering GC in my book).

> I also thought MPS GC runs concurrently in its own thread.

That's what you should think: GC can strike at any time.  If your code
assumes it can't, it's broken.

As far as everybody but igc.c is concerned, it's safer to assume that GC
runs on a separate thread.  igc.c currently detects (with false
positives) the one very special situation in which it is UNSAFE to
assume that, and acts accordingly.  The allocation thread patches would
make it so the assumption is actually true.

Pip




^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-27 17:26                                                                     ` Pip Cet via Emacs development discussions.
@ 2024-12-27 19:12                                                                       ` Gerd Möllmann
  2024-12-28  7:36                                                                       ` Eli Zaretskii
  2024-12-28  9:29                                                                       ` Eli Zaretskii
  2 siblings, 0 replies; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-27 19:12 UTC (permalink / raw)
  To: Pip Cet
  Cc: Eli Zaretskii, stefankangas, ofv, emacs-devel, eller.helmut,
	acorallo

Pip Cet <pipcet@protonmail.com> writes:

> "Eli Zaretskii" <eliz@gnu.org> writes:
>
>>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>>> Cc: Eli Zaretskii <eliz@gnu.org>,  pipcet@protonmail.com,  ofv@wanadoo.es,
>>>   emacs-devel@gnu.org,  eller.helmut@gmail.com,  acorallo@gnu.org
>>> Date: Fri, 27 Dec 2024 14:56:06 +0100
>>>
>>> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>>>
>>> > I've reached the point where I don't know what else to explain. Could
>>> > always be improved, of course, and so on. Please find attached, with
>>> > request for feedback.
>>>
>>> Ahem, just remembered that I had already an admin/igc.org in the branch,
>>> so I've now replaced that with what I've written.
>>
>> Thanks.  Some questions about that file:
>>
>>   In contrast, the new igc collector, using MPS, is
>>
>>   - Concurrent.  The GC runs in its own thread.  There are no explicit
>>     calls to start GC, and Emacs doesn't have to wait for the GC to
>>     complete.
>>
>> Pip says this is not true?
>
> I'm a bit confused.  Right now, on scratch/igc, on GNU/Linux, for Emacs
> in batch mode, it isn't technically true.

Someone has a bug then, MPS or igc.c. It's definitely using its own
thread on macOS.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-27 17:26                                                                     ` Pip Cet via Emacs development discussions.
  2024-12-27 19:12                                                                       ` Gerd Möllmann
@ 2024-12-28  7:36                                                                       ` Eli Zaretskii
  2024-12-28 12:35                                                                         ` Pip Cet via Emacs development discussions.
  2024-12-28  9:29                                                                       ` Eli Zaretskii
  2 siblings, 1 reply; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-28  7:36 UTC (permalink / raw)
  To: Pip Cet
  Cc: gerd.moellmann, stefankangas, ofv, emacs-devel, eller.helmut,
	acorallo

> Date: Fri, 27 Dec 2024 17:26:04 +0000
> From: Pip Cet <pipcet@protonmail.com>
> Cc: Gerd Möllmann <gerd.moellmann@gmail.com>, stefankangas@gmail.com, ofv@wanadoo.es, emacs-devel@gnu.org, eller.helmut@gmail.com, acorallo@gnu.org
> 
> "Eli Zaretskii" <eliz@gnu.org> writes:
> 
> >   - Concurrent.  The GC runs in its own thread.  There are no explicit
> >     calls to start GC, and Emacs doesn't have to wait for the GC to
> >     complete.
> >
> > Pip says this is not true?
> 
> I'm a bit confused.  Right now, on scratch/igc, on GNU/Linux, for Emacs
> in batch mode, it isn't technically true.  This causes the signal
> handler issue, which we're trying to solve.

The signal handler issue is because igc can happen on our main thread
as well.  IOW, there are two possible triggers for igc, and one of
them is concurrent.

> > I also thought MPS GC runs concurrently in its own thread.
> 
> That's what you should think: GC can strike at any time.

The same is true with the old GC.

> If your code assumes it can't, it's broken.

I disagree.  Sometimes you need to do stuff that cannot allow GC, and
that's okay if we have means to prevent GC when we need that.

> As far as everybody but igc.c is concerned, it's safer to assume that GC
> runs on a separate thread.

We are not talking about assumptions here, we are talking about facts.
If igc is concurrent, it means it runs on a separate thread.  If it
doesn't run on a separate thread, it's not concurrent.  We need to
establish which is the truth, so we understand what we are dealing
with.  Assumptions about our application code come later.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-28  7:36                                                                       ` Eli Zaretskii
@ 2024-12-28 12:35                                                                         ` Pip Cet via Emacs development discussions.
  2024-12-28 12:51                                                                           ` Gerd Möllmann
  2024-12-28 13:13                                                                           ` Eli Zaretskii
  0 siblings, 2 replies; 203+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-28 12:35 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: gerd.moellmann, stefankangas, ofv, emacs-devel, eller.helmut,
	acorallo

"Eli Zaretskii" <eliz@gnu.org> writes:

>> Date: Fri, 27 Dec 2024 17:26:04 +0000
>> From: Pip Cet <pipcet@protonmail.com>
>> Cc: Gerd Möllmann <gerd.moellmann@gmail.com>, stefankangas@gmail.com, ofv@wanadoo.es, emacs-devel@gnu.org, eller.helmut@gmail.com, acorallo@gnu.org
>>
>> "Eli Zaretskii" <eliz@gnu.org> writes:
>>
>> >   - Concurrent.  The GC runs in its own thread.  There are no explicit
>> >     calls to start GC, and Emacs doesn't have to wait for the GC to
>> >     complete.
>> >
>> > Pip says this is not true?
>>
>> I'm a bit confused.  Right now, on scratch/igc, on GNU/Linux, for Emacs
>> in batch mode, it isn't technically true.  This causes the signal
>> handler issue, which we're trying to solve.
>
> The signal handler issue is because igc can happen on our main thread

Yes.

> as well.

It always happens on the main thread if that's all we've got.  Even
macOS emulated SIGSEGV suspends the main thread while the message is
being handled, then resumes it afterwards.

> IOW, there are two possible triggers for igc,

Three, if you count the idle work.

> and one of them is concurrent.

I'd prefer to avoid that word.  There are facts we need to establish,
and "concurrent" isn't well-defined enough to be helpful here.

On single-threaded batch-mode GNU/Linux Emacs, on scratch/igc, no second
thread is created for MPS.  My understanding is this is a useful,
common, and intentional scenario for running MPS, not a bug or an
accident.

Of course we're free to change that, and run MPS from another thread,
but that's not a no-brainer.

>> > I also thought MPS GC runs concurrently in its own thread.
>>
>> That's what you should think: GC can strike at any time.
>
> The same is true with the old GC.

The old GC emphatically could not "strike at any time".  There is plenty
of code that assumes it doesn't strike.  Some of it might even be
correct.

>> If your code assumes it can't, it's broken.
>
> I disagree.  Sometimes you need to do stuff that cannot allow GC, and
> that's okay if we have means to prevent GC when we need that.

So now you're saying it's okay for code to assume GC can't strike, after
agreeing a few lines up that it's not okay for code to do so.  Which one
is it?

>> As far as everybody but igc.c is concerned, it's safer to assume that GC
>> runs on a separate thread.
>
> We are not talking about assumptions here, we are talking about facts.

The fact is we don't even know whether GC is usually on a "separate"
thread on macOS and Windows.  On GNU/Linux, assuming it does leads to
bugs.

> If igc is concurrent, it means it runs on a separate thread.  If it
> doesn't run on a separate thread, it's not concurrent.

Those two statements are equivalent.  They're not sufficient to define
"concurrent"-as-Eli-understands-it, just for establishing a necessary
condition for that.

If we agree on that condition, then no, MPS is not always
concurrent-as-Eli-understands-it.

> We need to establish which is the truth, so we understand what we are
> dealing with.

Why?  Whatever the truth is, we can safely assume it's an implementation
detail and not rely on it.

We don't need to agree on a definition of concurrency.

We don't need to agree on what's likely, just that single-thread
operation of MPS and parallel MPS threads are both possible and not
bugs.

Pip

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-28 12:35                                                                         ` Pip Cet via Emacs development discussions.
@ 2024-12-28 12:51                                                                           ` Gerd Möllmann
  2024-12-28 13:13                                                                           ` Eli Zaretskii
  1 sibling, 0 replies; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-28 12:51 UTC (permalink / raw)
  To: Pip Cet
  Cc: Eli Zaretskii, stefankangas, ofv, emacs-devel, eller.helmut,
	acorallo

Pip Cet <pipcet@protonmail.com> writes:

> On single-threaded batch-mode GNU/Linux Emacs, on scratch/igc, no second
> thread is created for MPS.

Could you please tell why that is?

Since I haven't read anything about that in the MPS docs, and don't
remember wanting that, I'd assume it's a bug somewhere in igc.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-28 12:35                                                                         ` Pip Cet via Emacs development discussions.
  2024-12-28 12:51                                                                           ` Gerd Möllmann
@ 2024-12-28 13:13                                                                           ` Eli Zaretskii
  1 sibling, 0 replies; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-28 13:13 UTC (permalink / raw)
  To: Pip Cet
  Cc: gerd.moellmann, stefankangas, ofv, emacs-devel, eller.helmut,
	acorallo

> Date: Sat, 28 Dec 2024 12:35:09 +0000
> From: Pip Cet <pipcet@protonmail.com>
> Cc: gerd.moellmann@gmail.com, stefankangas@gmail.com, ofv@wanadoo.es, emacs-devel@gnu.org, eller.helmut@gmail.com, acorallo@gnu.org
> 
> "Eli Zaretskii" <eliz@gnu.org> writes:
> 
> >> I'm a bit confused.  Right now, on scratch/igc, on GNU/Linux, for Emacs
> >> in batch mode, it isn't technically true.  This causes the signal
> >> handler issue, which we're trying to solve.
> >
> > The signal handler issue is because igc can happen on our main thread
> 
> Yes.
> 
> > as well.
> 
> It always happens on the main thread if that's all we've got.  Even
> macOS emulated SIGSEGV suspends the main thread while the message is
> being handled, then resumes it afterwards.
> 
> > IOW, there are two possible triggers for igc,
> 
> Three, if you count the idle work.
> 
> > and one of them is concurrent.
> 
> I'd prefer to avoid that word.  There are facts we need to establish,
> and "concurrent" isn't well-defined enough to be helpful here.
> 
> On single-threaded batch-mode GNU/Linux Emacs, on scratch/igc, no second
> thread is created for MPS.  My understanding is this is a useful,
> common, and intentional scenario for running MPS, not a bug or an
> accident.
> 
> Of course we're free to change that, and run MPS from another thread,
> but that's not a no-brainer.
> 
> >> > I also thought MPS GC runs concurrently in its own thread.
> >>
> >> That's what you should think: GC can strike at any time.
> >
> > The same is true with the old GC.
> 
> The old GC emphatically could not "strike at any time".  There is plenty
> of code that assumes it doesn't strike.  Some of it might even be
> correct.
> 
> >> If your code assumes it can't, it's broken.
> >
> > I disagree.  Sometimes you need to do stuff that cannot allow GC, and
> > that's okay if we have means to prevent GC when we need that.
> 
> So now you're saying it's okay for code to assume GC can't strike, after
> agreeing a few lines up that it's not okay for code to do so.  Which one
> is it?
> 
> >> As far as everybody but igc.c is concerned, it's safer to assume that GC
> >> runs on a separate thread.
> >
> > We are not talking about assumptions here, we are talking about facts.
> 
> The fact is we don't even know whether GC is usually on a "separate"
> thread on macOS and Windows.  On GNU/Linux, assuming it does leads to
> bugs.
> 
> > If igc is concurrent, it means it runs on a separate thread.  If it
> > doesn't run on a separate thread, it's not concurrent.
> 
> Those two statements are equivalent.  They're not sufficient to define
> "concurrent"-as-Eli-understands-it, just for establishing a necessary
> condition for that.
> 
> If we agree on that condition, then no, MPS is not always
> concurrent-as-Eli-understands-it.
> 
> > We need to establish which is the truth, so we understand what we are
> > dealing with.
> 
> Why?  Whatever the truth is, we can safely assume it's an implementation
> detail and not rely on it.
> 
> We don't need to agree on a definition of concurrency.
> 
> We don't need to agree on what's likely, just that single-thread
> operation of MPS and parallel MPS threads are both possible and not
> bugs.

You respond as if what I wrote had as its purpose to attack or offend
you.  It wasn't; all I want is to establish the truth.  Please read
what I write with that in mind, and drop the attitude, because it
doesn't help.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-27 17:26                                                                     ` Pip Cet via Emacs development discussions.
  2024-12-27 19:12                                                                       ` Gerd Möllmann
  2024-12-28  7:36                                                                       ` Eli Zaretskii
@ 2024-12-28  9:29                                                                       ` Eli Zaretskii
  2024-12-28 13:12                                                                         ` Pip Cet via Emacs development discussions.
  2 siblings, 1 reply; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-28  9:29 UTC (permalink / raw)
  To: Pip Cet
  Cc: gerd.moellmann, stefankangas, ofv, emacs-devel, eller.helmut,
	acorallo

> Date: Fri, 27 Dec 2024 17:26:04 +0000
> From: Pip Cet <pipcet@protonmail.com>
> Cc: Gerd Möllmann <gerd.moellmann@gmail.com>, stefankangas@gmail.com, ofv@wanadoo.es, emacs-devel@gnu.org, eller.helmut@gmail.com, acorallo@gnu.org
> 
> "Eli Zaretskii" <eliz@gnu.org> writes:
> 
> >   - Concurrent.  The GC runs in its own thread.  There are no explicit
> >     calls to start GC, and Emacs doesn't have to wait for the GC to
> >     complete.
> >
> > Pip says this is not true?
> 
> I'm a bit confused.  Right now, on scratch/igc, on GNU/Linux, for Emacs
> in batch mode, it isn't technically true.

Then how do you explain the fact that, when igc does GC (as evidenced
by the echo-area messages if you enable garbage-collection-messages),
Emacs is not stopped, as it happens with the old GC?  If GC is done on
the main thread, it means the main thread should stop while GC is in
progress, and yet I don't see it stopping.  What did I miss?



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-28  9:29                                                                       ` Eli Zaretskii
@ 2024-12-28 13:12                                                                         ` Pip Cet via Emacs development discussions.
  2024-12-28 14:08                                                                           ` Eli Zaretskii
  0 siblings, 1 reply; 203+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-28 13:12 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: gerd.moellmann, stefankangas, ofv, emacs-devel, eller.helmut,
	acorallo

"Eli Zaretskii" <eliz@gnu.org> writes:

>> Date: Fri, 27 Dec 2024 17:26:04 +0000
>> From: Pip Cet <pipcet@protonmail.com>
>> Cc: Gerd Möllmann <gerd.moellmann@gmail.com>, stefankangas@gmail.com, ofv@wanadoo.es, emacs-devel@gnu.org, eller.helmut@gmail.com, acorallo@gnu.org
>>
>> "Eli Zaretskii" <eliz@gnu.org> writes:
>>
>> >   - Concurrent.  The GC runs in its own thread.  There are no explicit
>> >     calls to start GC, and Emacs doesn't have to wait for the GC to
>> >     complete.
>> >
>> > Pip says this is not true?
>>
>> I'm a bit confused.  Right now, on scratch/igc, on GNU/Linux, for Emacs
>> in batch mode, it isn't technically true.
>
> Then how do you explain the fact that, when igc does GC (as evidenced
> by the echo-area messages if you enable garbage-collection-messages),
> Emacs is not stopped, as it happens with the old GC?  If GC is done on
> the main thread, it means the main thread should stop while GC is in
> progress, and yet I don't see it stopping.  What did I miss?

I have no idea how you "see it stopping".  Incremental GC happens in
increments, which take less time individually than a full GC cycle
would, so interactions are smoother.  Separate threads are certainly not
required for that (neither is incremental GC, in all cases;
mark-and-sweep collectors can be interrupted, discarding the mark bits).

Pip




^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-28 13:12                                                                         ` Pip Cet via Emacs development discussions.
@ 2024-12-28 14:08                                                                           ` Eli Zaretskii
  0 siblings, 0 replies; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-28 14:08 UTC (permalink / raw)
  To: Pip Cet
  Cc: gerd.moellmann, stefankangas, ofv, emacs-devel, eller.helmut,
	acorallo

> Date: Sat, 28 Dec 2024 13:12:18 +0000
> From: Pip Cet <pipcet@protonmail.com>
> Cc: gerd.moellmann@gmail.com, stefankangas@gmail.com, ofv@wanadoo.es, emacs-devel@gnu.org, eller.helmut@gmail.com, acorallo@gnu.org
> 
> "Eli Zaretskii" <eliz@gnu.org> writes:
> 
> >> Date: Fri, 27 Dec 2024 17:26:04 +0000
> >> From: Pip Cet <pipcet@protonmail.com>
> >> Cc: Gerd Möllmann <gerd.moellmann@gmail.com>, stefankangas@gmail.com, ofv@wanadoo.es, emacs-devel@gnu.org, eller.helmut@gmail.com, acorallo@gnu.org
> >>
> >> "Eli Zaretskii" <eliz@gnu.org> writes:
> >>
> >> >   - Concurrent.  The GC runs in its own thread.  There are no explicit
> >> >     calls to start GC, and Emacs doesn't have to wait for the GC to
> >> >     complete.
> >> >
> >> > Pip says this is not true?
> >>
> >> I'm a bit confused.  Right now, on scratch/igc, on GNU/Linux, for Emacs
> >> in batch mode, it isn't technically true.
> >
> > Then how do you explain the fact that, when igc does GC (as evidenced
> > by the echo-area messages if you enable garbage-collection-messages),
> > Emacs is not stopped, as it happens with the old GC?  If GC is done on
> > the main thread, it means the main thread should stop while GC is in
> > progress, and yet I don't see it stopping.  What did I miss?
> 
> I have no idea how you "see it stopping".

Like we always do: try scrolling through xdisp.c, and you will see
Emacs stop from time to time for a split-second, then resume
scrolling.  If you set garbage-collection-messages non-nil, you will
see a GC message when it stops for that time.

With igc, the scrolling is continuous, at least in my perception.

Similar "stuttering" happens in other repeated operations that have
clear visible effects.

> Incremental GC happens in increments, which take less time
> individually than a full GC cycle would, so interactions are
> smoother.  Separate threads are certainly not required for that
> (neither is incremental GC, in all cases; mark-and-sweep collectors
> can be interrupted, discarding the mark bits).

Maybe you are right.  But the difference should be quite significant
to explain what I see.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-27 16:37                                                                   ` Eli Zaretskii
  2024-12-27 17:26                                                                     ` Pip Cet via Emacs development discussions.
@ 2024-12-27 18:21                                                                     ` Gerd Möllmann
  2024-12-27 19:23                                                                       ` Pip Cet via Emacs development discussions.
  2024-12-28 10:39                                                                       ` Eli Zaretskii
  2024-12-28  6:08                                                                     ` Gerd Möllmann
  2 siblings, 2 replies; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-27 18:21 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: stefankangas, pipcet, ofv, emacs-devel, eller.helmut, acorallo

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: Eli Zaretskii <eliz@gnu.org>,  pipcet@protonmail.com,  ofv@wanadoo.es,
>>   emacs-devel@gnu.org,  eller.helmut@gmail.com,  acorallo@gnu.org
>> Date: Fri, 27 Dec 2024 14:56:06 +0100
>>
>> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>>
>> > I've reached the point where I don't know what else to explain. Could
>> > always be improved, of course, and so on. Please find attached, with
>> > request for feedback.
>>
>> Ahem, just remembered that I had already an admin/igc.org in the branch,
>> so I've now replaced that with what I've written.
>
> Thanks.  Some questions about that file:
>
>   In contrast, the new igc collector, using MPS, is
>
>   - Concurrent.  The GC runs in its own thread.  There are no explicit
>     calls to start GC, and Emacs doesn't have to wait for the GC to
>     complete.
>
> Pip says this is not true?  I also thought MPS GC runs concurrently in
> its own thread.

What Pip said was very easy to misunderstand, to say the least :-). No,
MPS is concurrent, period. There are situations in which MPS can, in
addition, use the main thread. And it's still concurrent, period.

>   When copying objects, a marker is left in the original object pointing
>   to its copy. This marker is also called a "tombstone". A "memory
>   barrier" is placed on the original object. Memory barrier means the
>   memory is read and/or write protected (e.g. with mprotect). The
>   barrier leads to MPS being invoked when an old object is accessed.
>   The whole process is called "object forwarding".
>
> This doesn't tell how object forwarding works once triggered by access
> to protected memory.  Can you say something about that?
> This:
>
>   MPS makes sure that references to old objects are updated to refer to
>   their new addresses. Functions defined in the object format are used
>   by MPS to perform the lowest-level tasks of object forwarding, so that
>   MPS doesn't have to know application-specific details of how objects
>   look like. In the end, copying/forwarding is transparent to the
>   application.
>
> seems to try to explain that, but AFAIU stops short of telling it.
> IOW, how are "functions defined in the object format used by MPS to
> perform the lowest-level tasks of object forwarding"?
>

I'm afraid I can't describe that in detals because it's an
implementation details of MPS itself. AFAIK, it's not documented, and I
don't read the sources of MPS.

Object forwarding in general is not specific to MPS. Many copying
collectors use it. I tried to explain this as far as I could without
knowing the MPS implementation.

Whether one should describe this in detail here is a valid question.
Maybe someone knowing the MPS implementation in detail could add that.

>   AMC implements a "mostly-copying" collector, where "mostly" refers to
>   the fact that it supports ambiguous references. Ambiguous references
>   are those from ambiguous roots, where we can't tell if a reference is
>   real or not. If we would copy such an object, we wouldn't be able to
>   update their address in all references because we can't tell if the
>   ambiguous reference is real or just some random integer, and changing
>   it would have unforeseeable consequences. Ambiguously referenced
>   objects are therefore never copied, and their address does not change.
>
> This should be important to understand why some roots are submitted to
> MPS as ambiguous -- because want to prevent them from moving, right?

I don't remember ATM if we did make something an ambiguous root solely
fir the purpose of preventing to move. We have a lot of places though,
where we have to protect objects from GC, with the additional
consequence that objects don't move.

But you are right, I think.

>
>   MPS allows us to specify roots having tailor-made scan functions that
>   Emacs implements. Scanning here refers to the process of finding
>   references in the memory area of the root, and telling MPS about the
>   references.
>
> What is the purpose of "telling MPS about the references"?

If MPS were the old GC, it's like letting MPS do its mark_object.

What about if it said "... telling MPS about the reference so that it
knows the references object is live", only expressed better?

>   igc provides a number if functions for doing such allocations. For
>   example 'igc_xzalloc_ambig', 'igc_xpalloc_exact' and so on. Freeing
>   the memory must be done with 'igc_xfree'.
>
> An example of using these to reference Lisp objects in malloc'ed
> memory would be great.

OK.

>
> Stuff that I'd like added:
>
>  . a few words about the root_create_* functions
>  . same for create_*_ap functions
>  . why do we need the finalize_* functions?
>  . some explanation why pdumper needs special support from igc
>
> Thanks again for writing this.

I'll try to add something for that. Please watch out for commits.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-27 18:21                                                                     ` Gerd Möllmann
@ 2024-12-27 19:23                                                                       ` Pip Cet via Emacs development discussions.
  2024-12-27 20:28                                                                         ` Gerd Möllmann
  2024-12-28 10:39                                                                       ` Eli Zaretskii
  1 sibling, 1 reply; 203+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-27 19:23 UTC (permalink / raw)
  To: Gerd Möllmann
  Cc: Eli Zaretskii, stefankangas, ofv, emacs-devel, eller.helmut,
	acorallo

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Eli Zaretskii <eliz@gnu.org> writes:
>
>>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>>> Cc: Eli Zaretskii <eliz@gnu.org>,  pipcet@protonmail.com,  ofv@wanadoo.es,
>>>   emacs-devel@gnu.org,  eller.helmut@gmail.com,  acorallo@gnu.org
>>> Date: Fri, 27 Dec 2024 14:56:06 +0100
>>>
>>> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>>>
>>> > I've reached the point where I don't know what else to explain. Could
>>> > always be improved, of course, and so on. Please find attached, with
>>> > request for feedback.
>>>
>>> Ahem, just remembered that I had already an admin/igc.org in the branch,
>>> so I've now replaced that with what I've written.
>>
>> Thanks.  Some questions about that file:
>>
>>   In contrast, the new igc collector, using MPS, is
>>
>>   - Concurrent.  The GC runs in its own thread.  There are no explicit
>>     calls to start GC, and Emacs doesn't have to wait for the GC to
>>     complete.
>>
>> Pip says this is not true?  I also thought MPS GC runs concurrently in
>> its own thread.
>
> What Pip said was very easy to misunderstand, to say the least :-).

FWIW, I was mostly objecting to "the GC runs in its own thread":
Currently, that's not always true, and not usually true, and assuming
it is true when it wasn't has resulted in bugs we still have to fix.

> No, MPS is concurrent, period.

I plead No Contest.

Pip




^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-27 19:23                                                                       ` Pip Cet via Emacs development discussions.
@ 2024-12-27 20:28                                                                         ` Gerd Möllmann
  0 siblings, 0 replies; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-27 20:28 UTC (permalink / raw)
  To: Pip Cet
  Cc: Eli Zaretskii, stefankangas, ofv, emacs-devel, eller.helmut,
	acorallo

Pip Cet <pipcet@protonmail.com> writes:

>> No, MPS is concurrent, period.
>
> I plead No Contest.
>
> Pip

:-)



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-27 18:21                                                                     ` Gerd Möllmann
  2024-12-27 19:23                                                                       ` Pip Cet via Emacs development discussions.
@ 2024-12-28 10:39                                                                       ` Eli Zaretskii
  2024-12-28 11:07                                                                         ` Gerd Möllmann
  1 sibling, 1 reply; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-28 10:39 UTC (permalink / raw)
  To: Gerd Möllmann
  Cc: stefankangas, pipcet, ofv, emacs-devel, eller.helmut, acorallo

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: stefankangas@gmail.com,  pipcet@protonmail.com,  ofv@wanadoo.es,
>   emacs-devel@gnu.org,  eller.helmut@gmail.com,  acorallo@gnu.org
> Date: Fri, 27 Dec 2024 19:21:30 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >   - Concurrent.  The GC runs in its own thread.  There are no explicit
> >     calls to start GC, and Emacs doesn't have to wait for the GC to
> >     complete.
> >
> > Pip says this is not true?  I also thought MPS GC runs concurrently in
> > its own thread.
> 
> What Pip said was very easy to misunderstand, to say the least :-). No,
> MPS is concurrent, period. There are situations in which MPS can, in
> addition, use the main thread. And it's still concurrent, period.

How can you see which thread runs MPS?  Where should I put a
breakpoint to see that (IOW, what are the entry points into MPS GC
code)?  If I run Emacs with a breakpoint in process_one_message (after
enabling garbage-collection-messages), all I ever see is GC triggered
by igc_on_idle, which AFAIU is only one of the way GC can be
triggered.  Where are the entry points for the other GC triggers?  I'm
asking because I'd like to run Emacs like that and see which thread(s)
run GC.

> >   When copying objects, a marker is left in the original object pointing
> >   to its copy. This marker is also called a "tombstone". A "memory
> >   barrier" is placed on the original object. Memory barrier means the
> >   memory is read and/or write protected (e.g. with mprotect). The
> >   barrier leads to MPS being invoked when an old object is accessed.
> >   The whole process is called "object forwarding".
> >
> > This doesn't tell how object forwarding works once triggered by access
> > to protected memory.  Can you say something about that?
> > This:
> >
> >   MPS makes sure that references to old objects are updated to refer to
> >   their new addresses. Functions defined in the object format are used
> >   by MPS to perform the lowest-level tasks of object forwarding, so that
> >   MPS doesn't have to know application-specific details of how objects
> >   look like. In the end, copying/forwarding is transparent to the
> >   application.
> >
> > seems to try to explain that, but AFAIU stops short of telling it.
> > IOW, how are "functions defined in the object format used by MPS to
> > perform the lowest-level tasks of object forwarding"?
> >
> 
> I'm afraid I can't describe that in detals because it's an
> implementation details of MPS itself. AFAIK, it's not documented, and I
> don't read the sources of MPS.
> 
> Object forwarding in general is not specific to MPS. Many copying
> collectors use it. I tried to explain this as far as I could without
> knowing the MPS implementation.
> 
> Whether one should describe this in detail here is a valid question.
> Maybe someone knowing the MPS implementation in detail could add that.

I'm only interested in this insofar as it's relevant to the functions
defined in igc.c.  E.g., do any of the "scan" or "fix" function have
anything to do with object forwarding?  If so, I thought it would be
useful to describe that, to help people understand the role of those
functions and what their code needs to do.

> >   MPS allows us to specify roots having tailor-made scan functions that
> >   Emacs implements. Scanning here refers to the process of finding
> >   references in the memory area of the root, and telling MPS about the
> >   references.
> >
> > What is the purpose of "telling MPS about the references"?
> 
> If MPS were the old GC, it's like letting MPS do its mark_object.
> 
> What about if it said "... telling MPS about the reference so that it
> knows the references object is live", only expressed better?

Sure, that's much better, because it makes this much more concrete.

> > Stuff that I'd like added:
> >
> >  . a few words about the root_create_* functions
> >  . same for create_*_ap functions
> >  . why do we need the finalize_* functions?
> >  . some explanation why pdumper needs special support from igc
> >
> > Thanks again for writing this.
> 
> I'll try to add something for that. Please watch out for commits.

Thanks.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-28 10:39                                                                       ` Eli Zaretskii
@ 2024-12-28 11:07                                                                         ` Gerd Möllmann
  2024-12-28 11:23                                                                           ` Gerd Möllmann
  2024-12-28 14:04                                                                           ` Pip Cet via Emacs development discussions.
  0 siblings, 2 replies; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-28 11:07 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: stefankangas, pipcet, ofv, emacs-devel, eller.helmut, acorallo

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: stefankangas@gmail.com,  pipcet@protonmail.com,  ofv@wanadoo.es,
>>   emacs-devel@gnu.org,  eller.helmut@gmail.com,  acorallo@gnu.org
>> Date: Fri, 27 Dec 2024 19:21:30 +0100
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> >   - Concurrent.  The GC runs in its own thread.  There are no explicit
>> >     calls to start GC, and Emacs doesn't have to wait for the GC to
>> >     complete.
>> >
>> > Pip says this is not true?  I also thought MPS GC runs concurrently in
>> > its own thread.
>> 
>> What Pip said was very easy to misunderstand, to say the least :-). No,
>> MPS is concurrent, period. There are situations in which MPS can, in
>> addition, use the main thread. And it's still concurrent, period.
>
> How can you see which thread runs MPS?  Where should I put a
> breakpoint to see that (IOW, what are the entry points into MPS GC
> code)?  If I run Emacs with a breakpoint in process_one_message (after
> enabling garbage-collection-messages), all I ever see is GC triggered
> by igc_on_idle, which AFAIU is only one of the way GC can be
> triggered.  Where are the entry points for the other GC triggers?  I'm
> asking because I'd like to run Emacs like that and see which thread(s)
> run GC.

I wonder if your interpretation is right here. AFAIR,
process_one_message is always called from igc_on_idle. IOW, we handle
messages from MPS only when Emacs thinks it's idle, and that is always
in the main thread.

The messages are produced and put into the MPS message queue in the MPS
thread, usually. Or maybe, I don't know that for a fact, also in the
main thread, when allocation points run out of memory, or when we do an
mps_arena_step. The arena step thing is only done if igc-step-interval
is non-zero, which is not the default. (I'm personally using 0.05 = 50
ms, BTW.)

How to get hold of the MPS thread I don't know. I just see one thread
more when using igc than with the old GC. Maybe one could sett a
breakpoint on that ArenaLock others mentioned, with the warning that I
don't know what I'm talking about when it comes to MPS internals.

>
>> >   When copying objects, a marker is left in the original object pointing
>> >   to its copy. This marker is also called a "tombstone". A "memory
>> >   barrier" is placed on the original object. Memory barrier means the
>> >   memory is read and/or write protected (e.g. with mprotect). The
>> >   barrier leads to MPS being invoked when an old object is accessed.
>> >   The whole process is called "object forwarding".
>> >
>> > This doesn't tell how object forwarding works once triggered by access
>> > to protected memory.  Can you say something about that?
>> > This:
>> >
>> >   MPS makes sure that references to old objects are updated to refer to
>> >   their new addresses. Functions defined in the object format are used
>> >   by MPS to perform the lowest-level tasks of object forwarding, so that
>> >   MPS doesn't have to know application-specific details of how objects
>> >   look like. In the end, copying/forwarding is transparent to the
>> >   application.
>> >
>> > seems to try to explain that, but AFAIU stops short of telling it.
>> > IOW, how are "functions defined in the object format used by MPS to
>> > perform the lowest-level tasks of object forwarding"?
>> >
>> 
>> I'm afraid I can't describe that in detals because it's an
>> implementation details of MPS itself. AFAIK, it's not documented, and I
>> don't read the sources of MPS.
>> 
>> Object forwarding in general is not specific to MPS. Many copying
>> collectors use it. I tried to explain this as far as I could without
>> knowing the MPS implementation.
>> 
>> Whether one should describe this in detail here is a valid question.
>> Maybe someone knowing the MPS implementation in detail could add that.
>
> I'm only interested in this insofar as it's relevant to the functions
> defined in igc.c.  E.g., do any of the "scan" or "fix" function have
> anything to do with object forwarding?  If so, I thought it would be
> useful to describe that, to help people understand the role of those
> functions and what their code needs to do.

MPS_FIX2 has to with it because it returns an object's new address which
we then can store where we got the old reference from. See
fix_lisp_object. Do you think mentioning that would suffice?

>> >   MPS allows us to specify roots having tailor-made scan functions that
>> >   Emacs implements. Scanning here refers to the process of finding
>> >   references in the memory area of the root, and telling MPS about the
>> >   references.
>> >
>> > What is the purpose of "telling MPS about the references"?
>> 
>> If MPS were the old GC, it's like letting MPS do its mark_object.
>> 
>> What about if it said "... telling MPS about the reference so that it
>> knows the references object is live", only expressed better?
>
> Sure, that's much better, because it makes this much more concrete.

👍



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-28 11:07                                                                         ` Gerd Möllmann
@ 2024-12-28 11:23                                                                           ` Gerd Möllmann
  2024-12-28 14:04                                                                           ` Pip Cet via Emacs development discussions.
  1 sibling, 0 replies; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-28 11:23 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: stefankangas, pipcet, ofv, emacs-devel, eller.helmut, acorallo

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Eli Zaretskii <eliz@gnu.org> writes:
>> I'm only interested in this insofar as it's relevant to the functions
>> defined in igc.c.  E.g., do any of the "scan" or "fix" function have
>> anything to do with object forwarding?  If so, I thought it would be
>> useful to describe that, to help people understand the role of those
>> functions and what their code needs to do.
>
> MPS_FIX2 has to with it because it returns an object's new address which
> we then can store where we got the old reference from. See
> fix_lisp_object. Do you think mentioning that would suffice?

Please forget my question, I've just pushed something.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-28 11:07                                                                         ` Gerd Möllmann
  2024-12-28 11:23                                                                           ` Gerd Möllmann
@ 2024-12-28 14:04                                                                           ` Pip Cet via Emacs development discussions.
  2024-12-28 14:25                                                                             ` Gerd Möllmann
  2024-12-28 16:27                                                                             ` Eli Zaretskii
  1 sibling, 2 replies; 203+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-28 14:04 UTC (permalink / raw)
  To: Gerd Möllmann
  Cc: Eli Zaretskii, stefankangas, ofv, emacs-devel, eller.helmut,
	acorallo

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Eli Zaretskii <eliz@gnu.org> writes:
>
>>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>>> Cc: stefankangas@gmail.com,  pipcet@protonmail.com,  ofv@wanadoo.es,
>>>   emacs-devel@gnu.org,  eller.helmut@gmail.com,  acorallo@gnu.org
>>> Date: Fri, 27 Dec 2024 19:21:30 +0100
>>>
>>> Eli Zaretskii <eliz@gnu.org> writes:
>>>
>>> >   - Concurrent.  The GC runs in its own thread.  There are no explicit
>>> >     calls to start GC, and Emacs doesn't have to wait for the GC to
>>> >     complete.
>>> >
>>> > Pip says this is not true?  I also thought MPS GC runs concurrently in
>>> > its own thread.
>>>
>>> What Pip said was very easy to misunderstand, to say the least :-). No,
>>> MPS is concurrent, period. There are situations in which MPS can, in
>>> addition, use the main thread. And it's still concurrent, period.
>>
>> How can you see which thread runs MPS?  Where should I put a
>> breakpoint to see that (IOW, what are the entry points into MPS GC
>> code)?

I'd suggest ArenaEnter or MessagePost.

>> If I run Emacs with a breakpoint in process_one_message (after
>> enabling garbage-collection-messages), all I ever see is GC triggered
>> by igc_on_idle, which AFAIU is only one of the way GC can be
>> triggered.  Where are the entry points for the other GC triggers?  I'm
>> asking because I'd like to run Emacs like that and see which thread(s)
>> run GC.
>
> I wonder if your interpretation is right here. AFAIR,
> process_one_message is always called from igc_on_idle. IOW, we handle

There's a second call path when we create finalizable objects
(maybe_process_messages).

> messages from MPS only when Emacs thinks it's idle, and that is always
> in the main thread.

My understanding is, also, that process_one_message doesn't trigger GC,
it handles messages produced by GCs triggered in other places.

> The messages are produced and put into the MPS message queue in the MPS
> thread, usually. Or maybe, I don't know that for a fact, also in the
> main thread, when allocation points run out of memory, or when we do an
> mps_arena_step. The arena step thing is only done if igc-step-interval
> is non-zero, which is not the default. (I'm personally using 0.05 = 50
> ms, BTW.)

It's usually the main thread here.

> How to get hold of the MPS thread I don't know. I just see one thread
> more when using igc than with the old GC. Maybe one could sett a

My understanding is that's the exception handling thread, which only
ever runs when another thread hits a memory barrier and is suspended
waiting for its resolution; as with my patch, this is about separate
stacks (and signal handling contexts), not about parallelism.

So it seems we're miscommunicating about these "MPS threads".  What are
they?  Where are they created?  What do they do?

If we can't answer that, it'll be harder to decide what to do about
signal handlers calling into MPS.

Pip




^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-28 14:04                                                                           ` Pip Cet via Emacs development discussions.
@ 2024-12-28 14:25                                                                             ` Gerd Möllmann
  2024-12-28 16:27                                                                             ` Eli Zaretskii
  1 sibling, 0 replies; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-28 14:25 UTC (permalink / raw)
  To: Pip Cet
  Cc: Eli Zaretskii, stefankangas, ofv, emacs-devel, eller.helmut,
	acorallo

Pip Cet <pipcet@protonmail.com> writes:

> So it seems we're miscommunicating about these "MPS threads".  What are
> they?  Where are they created?  What do they do?
>
> If we can't answer that, it'll be harder to decide what to do about
> signal handlers calling into MPS.

True, and I can't answer your questions. The only thing I can tell from
the docs, in this case the guide, is

1. Overview of the Memory Pool System?

The Memory Pool System is a very general, adaptable, flexible, reliable,
and efficient memory management system. It permits the flexible
combination of memory management techniques, supporting manual and
automatic memory management, inline allocation, finalization, weakness,
and multiple concurrent co-operating incremental generational garbage
             ^^^^^^^^^^
collections.

Maybe that means something different than I think.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-28 14:04                                                                           ` Pip Cet via Emacs development discussions.
  2024-12-28 14:25                                                                             ` Gerd Möllmann
@ 2024-12-28 16:27                                                                             ` Eli Zaretskii
  1 sibling, 0 replies; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-28 16:27 UTC (permalink / raw)
  To: Pip Cet
  Cc: gerd.moellmann, stefankangas, ofv, emacs-devel, eller.helmut,
	acorallo

> Date: Sat, 28 Dec 2024 14:04:31 +0000
> From: Pip Cet <pipcet@protonmail.com>
> Cc: Eli Zaretskii <eliz@gnu.org>, stefankangas@gmail.com, ofv@wanadoo.es, emacs-devel@gnu.org, eller.helmut@gmail.com, acorallo@gnu.org
> 
> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
> 
> >> If I run Emacs with a breakpoint in process_one_message (after
> >> enabling garbage-collection-messages), all I ever see is GC triggered
> >> by igc_on_idle, which AFAIU is only one of the way GC can be
> >> triggered.  Where are the entry points for the other GC triggers?  I'm
> >> asking because I'd like to run Emacs like that and see which thread(s)
> >> run GC.
> >
> > I wonder if your interpretation is right here. AFAIR,
> > process_one_message is always called from igc_on_idle. IOW, we handle
> 
> There's a second call path when we create finalizable objects
> (maybe_process_messages).
> 
> > messages from MPS only when Emacs thinks it's idle, and that is always
> > in the main thread.
> 
> My understanding is, also, that process_one_message doesn't trigger GC,
> it handles messages produced by GCs triggered in other places.

But igc_on_idle does call GC, right after it calls
process_one_message, so a breakpoint there will show when GC is
entered due to Emacs being idle.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-27 16:37                                                                   ` Eli Zaretskii
  2024-12-27 17:26                                                                     ` Pip Cet via Emacs development discussions.
  2024-12-27 18:21                                                                     ` Gerd Möllmann
@ 2024-12-28  6:08                                                                     ` Gerd Möllmann
  2 siblings, 0 replies; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-28  6:08 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: stefankangas, pipcet, ofv, emacs-devel, eller.helmut, acorallo

Eli Zaretskii <eliz@gnu.org> writes:

> An example of using these to reference Lisp objects in malloc'ed
> memory would be great.
>
> Stuff that I'd like added:
>
>  . a few words about the root_create_* functions
>  . same for create_*_ap functions
>  . why do we need the finalize_* functions?
>  . some explanation why pdumper needs special support from igc

I think I have all now. There are several commits, one for each point
I tried to address.




^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-25 12:50                                     ` Gerd Möllmann
  2024-12-25 13:00                                       ` Eli Zaretskii
  2024-12-25 13:09                                       ` Eli Zaretskii
@ 2024-12-25 17:40                                       ` Pip Cet via Emacs development discussions.
  2024-12-25 17:51                                         ` Eli Zaretskii
                                                           ` (2 more replies)
  2 siblings, 3 replies; 203+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-25 17:40 UTC (permalink / raw)
  To: Gerd Möllmann
  Cc: Eli Zaretskii, ofv, emacs-devel, eller.helmut, acorallo

Gerd Möllmann <gerd.moellmann@gmail.com> writes:
> DEFUN ("function-equal", Ffunction_equal, Sfunction_equal, 2, 2, 0,
>        doc: /* Return non-nil if F1 and F2 come from the same source.
> Used to determine if different closures are just different instances of
> the same lambda expression, or are really unrelated function.  */)
>      (Lisp_Object f1, Lisp_Object f2)
> {
>   bool res;
>   if (EQ (f1, f2))

This EQ can also trip.  Sorry to insist on that, but I think it's an
important point: if we change Lisp internals (such as the slow EQ
thing), the "we're not dereferencing it, just looking at the bit
representation of the pointer" approach will fail again, in unexpected
places.

I haven't seen a technical argument against using separate stacks for
MPS and signals (I don't consider "it's a change and we'd need to test
it" to be any more true for this change than for any other proposed
change, or for what's in scratch/igc now).  It would get us on par with
master.  (Both versions need to add the best memory barrier we have to
the specpdl_ptr++ code)

(I don't think MPS works on multi-threaded systems if word stores aren't
atomic.  If thread A is in the middle of updating an mps_word
referencing another object, and thread B triggers a GC, thread A is
stopped and thread B might scan the segment in the inconsistent state.)

Miraculously, everything can be made to work out in the WIDE_EMACS_INT
case, even though 64-bit words are stored in two insns: we only look at
the LSB 32-bit word when fixing (because we USE_LSB_TAG), so that'll
just work.  Late exthdr creation needed to be changed a little, and now
assumes changing a 64-bit value to another 64-bit value which differs
only in one 32-bit half is atomic.

Here's a snapshot of the current code.  It still assumes strong memory
ordering between threads because I'm pretty sure MPS needs that, too, so
it's just asm volatile ("" ::: "memory") for now.

diff --git a/src/igc.c b/src/igc.c
index 136667aefea..4342964382a 100644
--- a/src/igc.c
+++ b/src/igc.c
@@ -747,19 +747,42 @@ IGC_DEFINE_LIST (igc_root);
 
 /* Registry entry for an MPS thread mps_thr_t.  */
 
+/* FIXME */
+#include <stdatomic.h>
+
+struct emacs_ap
+{
+  mps_ap_t mps_ap;
+  struct igc *gc;
+  size_t requested_bytes;
+  void *usable_memory;
+  void *usable_memory_end;
+
+#ifdef ENABLE_CHECKING
+  atomic_uintptr_t waiting_threads; /* debug only */
+  sys_thread_t emacs_thread;
+#endif
+};
+
+typedef struct emacs_ap emacs_ap_t;
+
+#ifndef ATOMIC_POINTER_LOCK_FREE
+#error "this probably won't work"
+#endif
+
 struct igc_thread
 {
   struct igc *gc;
   mps_thr_t thr;
 
   /* Allocation points for the thread.  */
-  mps_ap_t dflt_ap;
-  mps_ap_t leaf_ap;
-  mps_ap_t weak_strong_ap;
-  mps_ap_t weak_weak_ap;
-  mps_ap_t weak_hash_strong_ap;
-  mps_ap_t weak_hash_weak_ap;
-  mps_ap_t immovable_ap;
+  emacs_ap_t dflt_ap;
+  emacs_ap_t leaf_ap;
+  emacs_ap_t weak_strong_ap;
+  emacs_ap_t weak_weak_ap;
+  emacs_ap_t weak_hash_strong_ap;
+  emacs_ap_t weak_hash_weak_ap;
+  emacs_ap_t immovable_ap;
 
   /* Quick access to the roots used for specpdl, bytecode stack and
      control stack.  */
@@ -814,6 +837,15 @@ IGC_DEFINE_LIST (igc_thread);
   /* The real signal mask we want to restore after handling pending
    * signals.  */
   sigset_t signal_mask;
+
+  sys_thread_t allocation_thread;
+  sys_mutex_t mutex;
+  sys_cond_t cond;
+  atomic_uintptr_t which_ap;
+  atomic_uintptr_t fault_address;
+  atomic_uintptr_t manual_collections;
+  atomic_uintptr_t idle_time;
+  atomic_uintptr_t idle_work;
 };
 
 static bool process_one_message (struct igc *gc);
@@ -2913,8 +2945,179 @@ igc_root_destroy_comp_unit_eph (struct Lisp_Native_Comp_Unit *u)
   maybe_destroy_root (&u->data_eph_relocs_root);
 }
 
+static mps_addr_t alloc_impl_raw (size_t size, mps_ap_t ap);
+static mps_addr_t alloc_impl (size_t size, enum igc_obj_type type, emacs_ap_t *ap);
+static void igc_collect_raw (void);
+
+static void *igc_allocation_thread (void *igc_v)
+{
+  uint64_t n_alloc = 0;
+  uint64_t bytes_alloc = 0;
+  struct igc *gc = igc_v;
+  sys_mutex_lock (&gc->mutex);
+  while (true)
+    {
+      uintptr_t fault_address = gc->fault_address;
+
+      if (fault_address)
+	{
+	  volatile char *cp = (void *)fault_address;
+	  volatile char c = *cp;
+	  (void) c;
+	  atomic_store (&gc->fault_address, 0);
+	  continue;
+	}
+
+      uintptr_t which_ap = gc->which_ap;
+      if (which_ap)
+	{
+	  atomic_store (&gc->which_ap, 0);
+	  emacs_ap_t *ap = (void *)which_ap;
+	  if (gc->allocation_thread != sys_thread_self ())
+	    emacs_abort();
+	  if (!ap->usable_memory)
+	    {
+	      igc_root_create_ambig ((char *)&ap->usable_memory,
+				     (char *)(&ap->usable_memory+1),
+				     "thread ap memory root");
+	    }
+	  while (ap->requested_bytes)
+	    {
+	      size_t size = ap->requested_bytes;
+	      if (size < 1024 * 1024)
+		size = 1024 * 1024;
+	      void *p = alloc_impl_raw (size, ap->mps_ap);
+	      n_alloc++;
+	      bytes_alloc += size;
+#if 0
+	      if (0 == ((n_alloc-1)&n_alloc))
+		fprintf (stderr, "%ld %ld\n", n_alloc, bytes_alloc);
+#endif
+	      ap->usable_memory = p;
+	      ap->usable_memory_end = (char *)p + size;
+	      ap->requested_bytes = 0;
+	      sched_yield ();
+	    }
+	  continue;
+	}
+
+      uintptr_t manual_collections = gc->manual_collections;
+      if (manual_collections)
+	{
+	  igc_collect_raw ();
+	  atomic_store (&gc->manual_collections, 0);
+	  continue;
+	}
+
+      uintptr_t idle_time = gc->idle_time;
+      if (idle_time)
+	{
+	  double interval = idle_time * 1e-9;
+	  atomic_store (&gc->idle_time, 0);
+	  if (mps_arena_step (global_igc->arena, interval, 0))
+	    atomic_store (&gc->idle_work, 1);
+	  else
+	    atomic_store (&gc->idle_work, 0);
+
+	  continue;
+	}
+
+      sys_cond_wait (&gc->cond, &gc->mutex);
+    }
+  sys_mutex_unlock (&gc->mutex); /* in case the infloop returns */
+
+  return NULL;
+}
+
+static mps_addr_t alloc_impl (size_t size, enum igc_obj_type type, emacs_ap_t *ap)
+{
+  if (size == 0)
+    return 0;
+
+  while (size & 7)
+    size++;
+
+  mps_addr_t ret = 0;
+  while (!ret)
+    {
+#ifdef ENABLE_CHECKING
+      if (ap->emacs_thread != sys_thread_self ())
+	emacs_abort ();
+      uintptr_t other_threads = atomic_fetch_add (&ap->waiting_threads, 1);
+      if (other_threads != 0)
+	{
+	  /* we know that the other "thread" is actually on top of us,
+	   * and we're a signal handler, so we shouldn't be allocating
+	   * memory. */
+	  emacs_abort ();
+	}
+#endif
+
+      void *candidate = ap->usable_memory;
+      void *end = ap->usable_memory_end;
+      if ((char *)candidate + size <= (char *)end)
+	{
+	  void *actual = ap->usable_memory;
+	  void *actual_end = (char *)actual + size;
+	  ap->usable_memory = actual_end;
+	  if (actual_end < end)
+	    {
+	      set_header ((struct igc_header *)actual_end, IGC_OBJ_PAD, (char *)end - (char *)actual_end, 0);
+	    }
+	  asm volatile ("" : : : "memory");
+	  set_header ((struct igc_header *)actual, type, size, alloc_hash ());
+	  ret = actual;
+	}
+      else
+	{
+	  ap->requested_bytes = (size_t) size;
+
+	  while (ap->requested_bytes)
+	    {
+	      /* No MPS data can be accessed by the main thread while it holds the mutex! */
+	      sys_mutex_lock (&ap->gc->mutex);
+	      atomic_store (&ap->gc->which_ap, (uintptr_t) ap);
+	      sys_cond_broadcast (&ap->gc->cond);
+	      sys_mutex_unlock (&ap->gc->mutex);
+	      sched_yield ();
+	    }
+	}
+#ifdef ENABLE_CHECKING
+      atomic_fetch_add (&ap->waiting_threads, -1);
+#endif
+    }
+  return (mps_addr_t) ret;
+}
+
+static mps_res_t emacs_ap_create_k (emacs_ap_t *ap, mps_pool_t pool,
+				    mps_arg_s *args)
+{
+  ap->gc = global_igc;
+  ap->usable_memory = NULL;
+  ap->usable_memory_end = NULL;
+#ifdef ENABLE_CHECKING
+  atomic_store(&ap->waiting_threads, 0);
+#endif
+  ap->requested_bytes = 0;
+
+  static int just_one;
+  if (!(just_one++))
+    if (!sys_thread_create(&ap->gc->allocation_thread, igc_allocation_thread, ap->gc))
+      emacs_abort ();
+
+#ifdef ENABLE_CHECKING
+  ap->emacs_thread = sys_thread_self ();
+#endif
+  return mps_ap_create_k (&ap->mps_ap, pool, args);
+}
+
+static void emacs_ap_destroy (emacs_ap_t *ap)
+{
+  return;
+}
+
 static mps_res_t
-create_weak_ap (mps_ap_t *ap, struct igc_thread *t, bool weak)
+create_weak_ap (emacs_ap_t *ap, struct igc_thread *t, bool weak)
 {
   struct igc *gc = t->gc;
   mps_res_t res;
@@ -2923,14 +3126,14 @@ create_weak_ap (mps_ap_t *ap, struct igc_thread *t, bool weak)
   {
     MPS_ARGS_ADD (args, MPS_KEY_RANK,
 		  weak ? mps_rank_weak () : mps_rank_exact ());
-    res = mps_ap_create_k (ap, pool, args);
+    res = emacs_ap_create_k (ap, pool, args);
   }
   MPS_ARGS_END (args);
   return res;
 }
 
 static mps_res_t
-create_weak_hash_ap (mps_ap_t *ap, struct igc_thread *t, bool weak)
+create_weak_hash_ap (emacs_ap_t *ap, struct igc_thread *t, bool weak)
 {
   struct igc *gc = t->gc;
   mps_res_t res;
@@ -2939,7 +3142,7 @@ create_weak_hash_ap (mps_ap_t *ap, struct igc_thread *t, bool weak)
   {
     MPS_ARGS_ADD (args, MPS_KEY_RANK,
 		  weak ? mps_rank_weak () : mps_rank_exact ());
-    res = mps_ap_create_k (ap, pool, args);
+    res = emacs_ap_create_k (ap, pool, args);
   }
   MPS_ARGS_END (args);
   return res;
@@ -2949,12 +3152,14 @@ create_weak_hash_ap (mps_ap_t *ap, struct igc_thread *t, bool weak)
 create_thread_aps (struct igc_thread *t)
 {
   struct igc *gc = t->gc;
+  sys_cond_init (&gc->cond);
+  sys_mutex_init (&gc->mutex);
   mps_res_t res;
-  res = mps_ap_create_k (&t->dflt_ap, gc->dflt_pool, mps_args_none);
+  res = emacs_ap_create_k (&t->dflt_ap, gc->dflt_pool, mps_args_none);
   IGC_CHECK_RES (res);
-  res = mps_ap_create_k (&t->leaf_ap, gc->leaf_pool, mps_args_none);
+  res = emacs_ap_create_k (&t->leaf_ap, gc->leaf_pool, mps_args_none);
   IGC_CHECK_RES (res);
-  res = mps_ap_create_k (&t->immovable_ap, gc->immovable_pool, mps_args_none);
+  res = emacs_ap_create_k (&t->immovable_ap, gc->immovable_pool, mps_args_none);
   IGC_CHECK_RES (res);
   res = create_weak_ap (&t->weak_strong_ap, t, false);
   res = create_weak_hash_ap (&t->weak_hash_strong_ap, t, false);
@@ -3016,13 +3221,13 @@ igc_thread_remove (void **pinfo)
   destroy_root (&t->d.stack_root);
   destroy_root (&t->d.specpdl_root);
   destroy_root (&t->d.bc_root);
-  mps_ap_destroy (t->d.dflt_ap);
-  mps_ap_destroy (t->d.leaf_ap);
-  mps_ap_destroy (t->d.weak_strong_ap);
-  mps_ap_destroy (t->d.weak_weak_ap);
-  mps_ap_destroy (t->d.weak_hash_strong_ap);
-  mps_ap_destroy (t->d.weak_hash_weak_ap);
-  mps_ap_destroy (t->d.immovable_ap);
+  emacs_ap_destroy (&t->d.dflt_ap);
+  emacs_ap_destroy (&t->d.leaf_ap);
+  emacs_ap_destroy (&t->d.weak_strong_ap);
+  emacs_ap_destroy (&t->d.weak_weak_ap);
+  emacs_ap_destroy (&t->d.weak_hash_strong_ap);
+  emacs_ap_destroy (&t->d.weak_hash_weak_ap);
+  emacs_ap_destroy (&t->d.immovable_ap);
   mps_thread_dereg (deregister_thread (t));
 }
 
@@ -3635,7 +3840,16 @@ arena_step (void)
 	    interval = 0.05;
 	}
 
-      if (mps_arena_step (global_igc->arena, interval, 0))
+      atomic_store (&global_igc->idle_time, interval * 1e9);
+      sys_cond_broadcast (&global_igc->cond);
+      while (global_igc->idle_time)
+	{
+	  sched_yield ();
+	  sys_mutex_lock (&global_igc->mutex);
+	  sched_yield ();
+	  sys_mutex_unlock (&global_igc->mutex);
+	}
+      if (global_igc->idle_work)
 	return true;
     }
 
@@ -3686,7 +3900,7 @@ igc_on_idle (void)
   }
 }
 
-static mps_ap_t
+static emacs_ap_t *
 thread_ap (enum igc_obj_type type)
 {
   struct igc_thread_list *t = current_thread->gc_info;
@@ -3707,13 +3921,13 @@ thread_ap (enum igc_obj_type type)
       emacs_abort ();
 
     case IGC_OBJ_MARKER_VECTOR:
-      return t->d.weak_weak_ap;
+      return &t->d.weak_weak_ap;
 
     case IGC_OBJ_WEAK_HASH_TABLE_WEAK_PART:
-      return t->d.weak_hash_weak_ap;
+      return &t->d.weak_hash_weak_ap;
 
     case IGC_OBJ_WEAK_HASH_TABLE_STRONG_PART:
-      return t->d.weak_hash_strong_ap;
+      return &t->d.weak_hash_strong_ap;
 
     case IGC_OBJ_VECTOR:
     case IGC_OBJ_CONS:
@@ -3728,12 +3942,12 @@ thread_ap (enum igc_obj_type type)
     case IGC_OBJ_FACE_CACHE:
     case IGC_OBJ_BLV:
     case IGC_OBJ_HANDLER:
-      return t->d.dflt_ap;
+      return &t->d.dflt_ap;
 
     case IGC_OBJ_STRING_DATA:
     case IGC_OBJ_FLOAT:
     case IGC_OBJ_BYTES:
-      return t->d.leaf_ap;
+      return &t->d.leaf_ap;
     }
   emacs_abort ();
 }
@@ -3746,8 +3960,8 @@ igc_break (void)
 {
 }
 
-void
-igc_collect (void)
+static void
+igc_collect_raw (void)
 {
   struct igc *gc = global_igc;
   if (gc->park_count == 0)
@@ -3758,6 +3972,26 @@ igc_collect (void)
     }
 }
 
+void
+igc_collect (void)
+{
+  struct igc *gc = global_igc;
+  if (gc->park_count == 0)
+    {
+      atomic_store (&gc->manual_collections, 1);
+      while (gc->manual_collections)
+	{
+	  sys_cond_broadcast (&gc->cond);
+	  sched_yield ();
+	  sys_mutex_lock (&gc->mutex);
+	  /* pthread_mutex_lock () directly followed by
+	   * pthread_mutex_unlock () doesn't work, IIRC */
+	  sched_yield ();
+	  sys_mutex_unlock (&gc->mutex);
+	}
+    }
+}
+
 DEFUN ("igc--collect", Figc__collect, Sigc__collect, 0, 0, 0,
        doc: /* Force an immediate arena garbage collection.  */)
   (void)
@@ -3805,7 +4039,7 @@ igc_hash (Lisp_Object key)
    object.  */
 
 static mps_addr_t
-alloc_impl (size_t size, enum igc_obj_type type, mps_ap_t ap)
+alloc_impl_raw (size_t size, mps_ap_t ap)
 {
   mps_addr_t p UNINIT;
   size = alloc_size (size);
@@ -3820,14 +4054,14 @@ alloc_impl (size_t size, enum igc_obj_type type, mps_ap_t ap)
 	    memory_full (0);
 	  /* Object _must_ have valid contents before commit.  */
 	  memclear (p, size);
-	  set_header (p, type, size, alloc_hash ());
+	  set_header (p, IGC_OBJ_PAD, size, 0);
 	}
       while (!mps_commit (ap, p, size));
       break;
 
     case IGC_STATE_DEAD:
       p = xzalloc (size);
-      set_header (p, type, size, alloc_hash ());
+      set_header (p, IGC_OBJ_PAD, size, alloc_hash ());
       break;
 
     case IGC_STATE_INITIAL:
@@ -3854,7 +4088,7 @@ alloc (size_t size, enum igc_obj_type type)
 alloc_immovable (size_t size, enum igc_obj_type type)
 {
   struct igc_thread_list *t = current_thread->gc_info;
-  return alloc_impl (size, type, t->d.immovable_ap);
+  return alloc_impl (size, type, &t->d.immovable_ap);
 }
 
 #ifdef HAVE_MODULES
@@ -4087,6 +4321,8 @@ weak_hash_find_dependent (mps_addr_t base)
 	struct Lisp_Weak_Hash_Table_Strong_Part *w = client;
 	return w->weak;
       }
+    case IGC_OBJ_PAD:
+      return 0;
     default:
       emacs_abort ();
     }
@@ -4448,8 +4684,12 @@ igc_external_header (struct igc_header *h)
       exthdr->hash = header_hash (h);
       exthdr->obj_type = header_type (h);
       exthdr->extra_dependency = Qnil;
-      /* On IA-32, the upper 32-bit word is 0 after this, which is okay.  */
-      h->v = (intptr_t)exthdr + IGC_TAG_EXTHDR;
+      asm volatile ("" : : : "memory");
+      uint64_t v = h->v;
+      /* maintain the upper 32-bit word for WIDE_EMACS_INT builds. */
+      v -= (uintptr_t) v;
+      v += (intptr_t)exthdr + IGC_TAG_EXTHDR;
+      h->v = v;
       mps_addr_t ref = (mps_addr_t) h;
       mps_res_t res = mps_finalize (global_igc->arena, &ref);
       IGC_CHECK_RES (res);
@@ -4893,17 +5133,17 @@ igc_on_pdump_loaded (void *dump_base, void *hot_start, void *hot_end,
 igc_alloc_dump (size_t nbytes)
 {
   igc_assert (global_igc->park_count > 0);
-  mps_ap_t ap = thread_ap (IGC_OBJ_CONS);
+  emacs_ap_t *ap = thread_ap (IGC_OBJ_CONS);
   size_t block_size = igc_header_size () + nbytes;
   mps_addr_t block;
   do
     {
-      mps_res_t res = mps_reserve (&block, ap, block_size);
+      mps_res_t res = mps_reserve (&block, ap->mps_ap, block_size);
       if (res != MPS_RES_OK)
 	memory_full (0);
       set_header (block, IGC_OBJ_INVALID, block_size, 0);
     }
-  while (!mps_commit (ap, block, block_size));
+  while (!mps_commit (ap->mps_ap, block, block_size));
   return (char *) block + igc_header_size ();
 }
 
@@ -5050,6 +5290,40 @@ DEFUN ("igc--remove-extra-dependency", Figc__remove_extra_dependency,
   return Qt;
 }
 
+static struct sigaction mps_sigaction;
+
+static void sigsegv_sigaction (int sig, siginfo_t *info, void *uap)
+{
+  if (sys_thread_self () == main_thread.s.thread_id)
+    {
+      atomic_store (&global_igc->fault_address, (uintptr_t) info->si_addr);
+      sys_cond_broadcast (&global_igc->cond);
+      sched_yield ();
+      sys_mutex_lock (&global_igc->mutex);
+      /* IIRC, pthread_mutex_lock directly followed by
+	 pthread_mutex_unlock causes problems somehow... */
+      sched_yield ();
+      sys_mutex_unlock (&global_igc->mutex);
+    }
+  else
+    {
+      /* Recipe for disaster here, I guess.  */
+      mps_sigaction.sa_sigaction (sig, info, uap);
+    }
+}
+
+#ifdef SIGSEGV
+static void steal_sigsegv (void)
+{
+  struct sigaction emacs_sigaction;
+  emacs_sigaction.sa_sigaction = sigsegv_sigaction;
+  sigemptyset (&emacs_sigaction.sa_mask);
+  emacs_sigaction.sa_flags = SA_SIGINFO | SA_RESTART;
+
+  sigaction (SIGSEGV, &emacs_sigaction, &mps_sigaction);
+}
+#endif
+
 /***********************************************************************
 				  Init
  ***********************************************************************/
@@ -5061,6 +5335,9 @@ init_igc (void)
   (void) mps_lib_assert_fail_install (igc_assert_fail);
   global_igc = make_igc ();
   add_main_thread ();
+#ifdef SIGSEGV
+  steal_sigsegv ();
+#endif
   set_state (IGC_STATE_USABLE_PARKED);
 }
 
diff --git a/src/lisp.h b/src/lisp.h
index 48585c2d8a1..37cc62052c1 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -4102,7 +4102,14 @@ backtrace_debug_on_exit (union specbinding *pdl)
 INLINE void
 grow_specpdl (void)
 {
+  /* we don't need a memory bus barrier here, but a logical memory
+     barrier so GCC doesn't reorder stores. */
+  asm volatile ("" : : : "memory");
+  /* This increment doesn't need to be atomic in its entirety, but it
+     can't expose an intermediate state of specpdl_ptr; IOW, the store
+     needs to be a single CPU instruction. */
   specpdl_ptr++;
+  asm volatile ("" : : : "memory");
   if (specpdl_ptr == specpdl_end)
     grow_specpdl_allocation ();
 }




^ permalink raw reply related	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-25 17:40                                       ` Pip Cet via Emacs development discussions.
@ 2024-12-25 17:51                                         ` Eli Zaretskii
  2024-12-26 15:24                                           ` Pip Cet via Emacs development discussions.
  2024-12-26  5:27                                         ` Gerd Möllmann
  2024-12-26  5:29                                         ` Gerd Möllmann
  2 siblings, 1 reply; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-25 17:51 UTC (permalink / raw)
  To: Pip Cet; +Cc: gerd.moellmann, ofv, emacs-devel, eller.helmut, acorallo

> Date: Wed, 25 Dec 2024 17:40:42 +0000
> From: Pip Cet <pipcet@protonmail.com>
> Cc: Eli Zaretskii <eliz@gnu.org>, ofv@wanadoo.es, emacs-devel@gnu.org, eller.helmut@gmail.com, acorallo@gnu.org
> 
> I haven't seen a technical argument against using separate stacks for
> MPS and signals

You haven't actually presented it.  I wrote that we should carefully
discuss this and other ideas, and asked to present each such idea's
design in enough detail to have a useful and concrete discussion of
each idea.  (And no, showing the code doesn't cut it, because the
design is not always easy to grasp quickly, and because the code might
not implement the design you have in mind, for various reasons.)



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-25 17:51                                         ` Eli Zaretskii
@ 2024-12-26 15:24                                           ` Pip Cet via Emacs development discussions.
  2024-12-26 15:57                                             ` Eli Zaretskii
  0 siblings, 1 reply; 203+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-26 15:24 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gerd.moellmann, ofv, emacs-devel, eller.helmut, acorallo

"Eli Zaretskii" <eliz@gnu.org> writes:

>> Date: Wed, 25 Dec 2024 17:40:42 +0000
>> From: Pip Cet <pipcet@protonmail.com>
>> Cc: Eli Zaretskii <eliz@gnu.org>, ofv@wanadoo.es, emacs-devel@gnu.org, eller.helmut@gmail.com, acorallo@gnu.org
>>
>> I haven't seen a technical argument against using separate stacks for
>> MPS and signals
>
> You haven't actually presented it.

That's correct: we have an idea and a PoC, no design to discuss or
anything close to a proposal, at this point.

My idea was to ask for obvious problems precluding or complicating this
approach.

I've found a few minor things; so far, nothing unfixable, and no
significant effects on performance, but the fixes will have to become
part of the design and discussion.  I don't think anyone is actually
running the code (and that's perfectly okay), but if that is incorrect,
please let me know so we don't rediscover bugs that I've already fixed.

I think rr (time-travel/reverse debugging with acceptable performance)
support is important, but I think I'm the only one? It seems to be
really slow on this branch, though I don't know how fast it is on
scratch/igc.

Pip

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-26 15:24                                           ` Pip Cet via Emacs development discussions.
@ 2024-12-26 15:57                                             ` Eli Zaretskii
  2024-12-27 14:34                                               ` Pip Cet via Emacs development discussions.
  0 siblings, 1 reply; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-26 15:57 UTC (permalink / raw)
  To: Pip Cet; +Cc: gerd.moellmann, ofv, emacs-devel, eller.helmut, acorallo

> Date: Thu, 26 Dec 2024 15:24:14 +0000
> From: Pip Cet <pipcet@protonmail.com>
> Cc: gerd.moellmann@gmail.com, ofv@wanadoo.es, emacs-devel@gnu.org, eller.helmut@gmail.com, acorallo@gnu.org
> 
> "Eli Zaretskii" <eliz@gnu.org> writes:
> 
> >> Date: Wed, 25 Dec 2024 17:40:42 +0000
> >> From: Pip Cet <pipcet@protonmail.com>
> >> Cc: Eli Zaretskii <eliz@gnu.org>, ofv@wanadoo.es, emacs-devel@gnu.org, eller.helmut@gmail.com, acorallo@gnu.org
> >>
> >> I haven't seen a technical argument against using separate stacks for
> >> MPS and signals
> >
> > You haven't actually presented it.
> 
> That's correct: we have an idea and a PoC, no design to discuss or
> anything close to a proposal, at this point.
> 
> My idea was to ask for obvious problems precluding or complicating this
> approach.

OK, but still, since you wrote the code to implement it, I guess you
have at least some initial design ideas?  I hoped you could describe
those ideas, so we could better understand what you have in mind, and
provide a more useful feedback about possible problems, if any, with
those ideas.

In general, as I wrote earlier, there's nothing problematic with
adding a C thread to Emacs.  But since (AFAIU) the suggestion is to
run MPS from that thread, I think we should understand in more detail
how can GC be run from a separate thread.  I expect that to have at
least some impact on the Emacs code elsewhere, since the original
Emacs design assumed that GC runs synchronously, and the rest of the
Lisp machine is stopped while it does.

> I've found a few minor things; so far, nothing unfixable, and no
> significant effects on performance, but the fixes will have to become
> part of the design and discussion.

Right, so I'm asking to describe these aspects, so that others could
consider them and possibly additional issues, and provide feedback or
raise concerns about that.

> I think rr (time-travel/reverse debugging with acceptable performance)
> support is important, but I think I'm the only one? It seems to be
> really slow on this branch, though I don't know how fast it is on
> scratch/igc.

Well, reverse debugging currently doesn't work on Windows, so at least
for that platform we cannot rely on that.

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-26 15:57                                             ` Eli Zaretskii
@ 2024-12-27 14:34                                               ` Pip Cet via Emacs development discussions.
  2024-12-27 15:58                                                 ` Eli Zaretskii
  0 siblings, 1 reply; 203+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-27 14:34 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gerd.moellmann, ofv, emacs-devel, eller.helmut, acorallo

"Eli Zaretskii" <eliz@gnu.org> writes:

>> Date: Thu, 26 Dec 2024 15:24:14 +0000
>> From: Pip Cet <pipcet@protonmail.com>
>> Cc: gerd.moellmann@gmail.com, ofv@wanadoo.es, emacs-devel@gnu.org, eller.helmut@gmail.com, acorallo@gnu.org
>>
>> "Eli Zaretskii" <eliz@gnu.org> writes:
>>
>> >> Date: Wed, 25 Dec 2024 17:40:42 +0000
>> >> From: Pip Cet <pipcet@protonmail.com>
>> >> Cc: Eli Zaretskii <eliz@gnu.org>, ofv@wanadoo.es, emacs-devel@gnu.org, eller.helmut@gmail.com, acorallo@gnu.org
>> >>
>> >> I haven't seen a technical argument against using separate stacks for
>> >> MPS and signals
>> >
>> > You haven't actually presented it.
>>
>> That's correct: we have an idea and a PoC, no design to discuss or
>> anything close to a proposal, at this point.
>>
>> My idea was to ask for obvious problems precluding or complicating this
>> approach.
>
> OK, but still, since you wrote the code to implement it, I guess you
> have at least some initial design ideas?  I hoped you could describe
> those ideas, so we could better understand what you have in mind, and
> provide a more useful feedback about possible problems, if any, with
> those ideas.

The idea is that the main thread, after initialization, never calls into
MPS itself.

Instead, we create an allocation thread, reacting to messages from the
main thread.

The allocation thread never actually does anything in parallel with the
main thread: its purpose is to provide a separate stack, not
parallelization.

All redirected MPS calls wait synchronously for the allocation thread to
respond.

This includes the MPS SIGSEGV handler, which calls into MPS, so it must
be directed to another thread.

All this makes the previously fast allocation path very slow, and we
need a workaround for that:

We ensure that we allocate at least 1MB (magic number here) at a time,
then split the area into MPS objects when we need to.  The assumption
that we can split MPS allocations is significant but justifiable,
because MPS will be in the same state after two successful back-to-back
allocations and a single allocation combining the two.

dflt_skip must never lie to MPS about the size of an object, though.
Once dflt_skip told MPS how to skip it (i.e. how large the object is),
we can no longer split that object.  It is another significant but
justifiable assumption that this happens rarely enough.

> In general, as I wrote earlier, there's nothing problematic with
> adding a C thread to Emacs.  But since (AFAIU) the suggestion is to
> run MPS from that thread, I think we should understand in more detail
> how can GC be run from a separate thread.  I expect that to have at
> least some impact on the Emacs code elsewhere, since the original
> Emacs design assumed that GC runs synchronously, and the rest of the
> Lisp machine is stopped while it does.

Thanks for explaining.

I don't think that's a new problem (when comparing the allocation tread
code to scratch/igc), as the allocation thread does not trigger GC any
more spontaneously than the main thread would.  The spontaneous garbage
collection you're worried about can be triggered by another thread
allocating memory while the main thread is busy inspecting it, but the
allocatiion thread only allocates memory while the main thread is
waiting, so this cannot happen.

It's safe to assume no MPS collection happens when:

1. there is no other thread which might trigger a memory barrier (the
allocation thread doesn't)
2. there is no other thread which might allocate memory (the allocation
thread cannot do so while the main thread is in a critical section)
3. we don't allocate memory
4. we don't trigger memory barriers

In practice, (4) is very hard to guarantee, so it might be easier to
decide now that code should always be written to assume spontaneous GC
is possible no matter where we are, which is the third step to actually
enabling fully concurrent GC.  Once we have made that decision,
we can actually test whether it breaks things to trigger spontaneous GCs
from another thread (I've experimented with this, and IIRC I fixed a bug
this uncovered, but only because that bug could also have occurred
without spontaneous GC).  Once we've done that, we can seriously
consider whether spontaneous GCs might be good for performance or
usability rather than debugging.

>> I've found a few minor things; so far, nothing unfixable, and no
>> significant effects on performance, but the fixes will have to become
>> part of the design and discussion.
>
> Right, so I'm asking to describe these aspects, so that others could
> consider them and possibly additional issues, and provide feedback or
> raise concerns about that.

The main aspect is "dflt_skip must never lie, but it can delay deciding
what the truth is until it's called".  We keep eating ice cream until
we're asked how much we've had, at which point we answer truthfully and
stop eating.

>> I think rr (time-travel/reverse debugging with acceptable performance)
>> support is important, but I think I'm the only one? It seems to be
>> really slow on this branch, though I don't know how fast it is on
>> scratch/igc.
>
> Well, reverse debugging currently doesn't work on Windows, so at least
> for that platform we cannot rely on that.

Considering that rr is in a kind of arms race anyway (rseqs, hardware
lock elision, the E-core/P-core split, and spectre workarounds all broke
rr when introduced, and require workarounds, and CPU manufacturers
appear to be fundamentally unable to agree on when an instruction should
be counted as "retired"), relying on it may be a bad idea.

Unfortunately, qemu doesn't seem to be seeing very active development in
this area, so the qemu instruction counting mechanism is unlikely to
provide usable reverse debugging in many situations.  And plain old GDB
reverse debugging is unbearably slow (and has been for decades), AFAIK.

So it's not a safe bet that we will continue to have usable reverse
debugging.  It may become even more necessary to have the compiler or
the CPU assist in providing it.

BTW, speaking of debugging, there's always a nuclear option for signals:
Use ptrace from another process to step through MPS and resend the
signal at the first possible moment.  Breaks GDB, hard to fix.

Pip

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-27 14:34                                               ` Pip Cet via Emacs development discussions.
@ 2024-12-27 15:58                                                 ` Eli Zaretskii
  2024-12-27 16:42                                                   ` Pip Cet via Emacs development discussions.
  0 siblings, 1 reply; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-27 15:58 UTC (permalink / raw)
  To: Pip Cet; +Cc: gerd.moellmann, ofv, emacs-devel, eller.helmut, acorallo

> Date: Fri, 27 Dec 2024 14:34:22 +0000
> From: Pip Cet <pipcet@protonmail.com>
> Cc: gerd.moellmann@gmail.com, ofv@wanadoo.es, emacs-devel@gnu.org, eller.helmut@gmail.com, acorallo@gnu.org
> 
> "Eli Zaretskii" <eliz@gnu.org> writes:
> 
> > OK, but still, since you wrote the code to implement it, I guess you
> > have at least some initial design ideas?  I hoped you could describe
> > those ideas, so we could better understand what you have in mind, and
> > provide a more useful feedback about possible problems, if any, with
> > those ideas.
> 
> The idea is that the main thread, after initialization, never calls into
> MPS itself.

Thanks.  I will ask some questions below to understand better what you
suggest.

> Instead, we create an allocation thread, reacting to messages from the
> main thread.
> 
> The allocation thread never actually does anything in parallel with the
> main thread: its purpose is to provide a separate stack, not
> parallelization.

Why is it important to have a separate stack when MPS allocates
memory?

> All redirected MPS calls wait synchronously for the allocation thread to
> respond.
> 
> This includes the MPS SIGSEGV handler, which calls into MPS, so it must
> be directed to another thread.

MPS SIGSEGV handler is invoked when the Lisp machine touches objects
which were relocated by MPS, right?  What exactly does the allocation
thread do when that happens?

> All this makes the previously fast allocation path very slow, and we
> need a workaround for that:
> 
> We ensure that we allocate at least 1MB (magic number here) at a time,
> then split the area into MPS objects when we need to.  The assumption
> that we can split MPS allocations is significant but justifiable,
> because MPS will be in the same state after two successful back-to-back
> allocations and a single allocation combining the two.

This seems to rely on some knowledge of MPS internals?

But more worrisome: what about "sudden" needs for more that 1MB of
memory?  For example, C-w in a large buffer needs to allocate a Lisp
string for the killed text.

> 1. there is no other thread which might trigger a memory barrier (the
> allocation thread doesn't)

So the allocation thread doesn't GC?  If so, who does?

If the allocation thread does GC, then how can you ensure it doesn't
trigger a barrier?

> 3. we don't allocate memory

Why can't GC happen when we don't allocate memory?

> 4. we don't trigger memory barriers

Same question here.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-27 15:58                                                 ` Eli Zaretskii
@ 2024-12-27 16:42                                                   ` Pip Cet via Emacs development discussions.
  2024-12-28 18:02                                                     ` Eli Zaretskii
  0 siblings, 1 reply; 203+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-27 16:42 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gerd.moellmann, ofv, emacs-devel, eller.helmut, acorallo

"Eli Zaretskii" <eliz@gnu.org> writes:

>> Date: Fri, 27 Dec 2024 14:34:22 +0000
>> From: Pip Cet <pipcet@protonmail.com>
>> Cc: gerd.moellmann@gmail.com, ofv@wanadoo.es, emacs-devel@gnu.org, eller.helmut@gmail.com, acorallo@gnu.org
>>
>> "Eli Zaretskii" <eliz@gnu.org> writes:
>>
>> > OK, but still, since you wrote the code to implement it, I guess you
>> > have at least some initial design ideas?  I hoped you could describe
>> > those ideas, so we could better understand what you have in mind, and
>> > provide a more useful feedback about possible problems, if any, with
>> > those ideas.
>>
>> The idea is that the main thread, after initialization, never calls into
>> MPS itself.
>
> Thanks.  I will ask some questions below to understand better what you
> suggest.

Thanks!

>> Instead, we create an allocation thread, reacting to messages from the
>> main thread.
>>
>> The allocation thread never actually does anything in parallel with the
>> main thread: its purpose is to provide a separate stack, not
>> parallelization.
>
> Why is it important to have a separate stack when MPS allocates
> memory?

Because that way, signal handlers can wait for the MPS allocation to
finish.  A signal handler waiting for the thread it interrupted
deadlocks.  A signal handler waiting for another thread works.

>> All redirected MPS calls wait synchronously for the allocation thread to
>> respond.
>>
>> This includes the MPS SIGSEGV handler, which calls into MPS, so it must
>> be directed to another thread.
>
> MPS SIGSEGV handler is invoked when the Lisp machine touches objects
> which were relocated by MPS, right?
> What exactly does the allocation thread do when that happens?

Attempt to trigger another fault at the same address, which calls into
MPS, which eventually does whatever is necessary to advance to a state
where there is no longer a memory barrier.  Of course we could call the
MPS signal handler directly from the allocation thread rather than
triggering another fault.  (MPS allows for the possibility that the
memory barrier is no longer in place by the time the arena lock has been
acquired, and it has to, for multi-threaded operation.)

What precisely MPS does is an implementation detail, and may be
complicated (the instruction emulation code which causes so much trouble
for weak objects, for example).

I also think it's an implementation detail what MPS uses memory barriers
for: I don't think the current code uses superfluous memory barriers to
gather statistics, for example, but we certainly cannot assume that will
never happen.

>> All this makes the previously fast allocation path very slow, and we
>> need a workaround for that:
>>
>> We ensure that we allocate at least 1MB (magic number here) at a time,
>> then split the area into MPS objects when we need to.  The assumption
>> that we can split MPS allocations is significant but justifiable,
>> because MPS will be in the same state after two successful back-to-back
>> allocations and a single allocation combining the two.
>
> This seems to rely on some knowledge of MPS internals?

Yes.  The assumption is that object sizes are determined by the skip
function, not fixed at allocation time.  This must be spelled out
clearly in our code, and ideally it's something which the MPS
documentation should guarantee (AFAIK, it doesn't right now).

> But more worrisome: what about "sudden" needs for more that 1MB of
> memory?  For example, C-w in a large buffer needs to allocate a Lisp
> string for the killed text.

That's why I said "at least".  If we need more than 1MB we'll allocate
as much as we need.

>> 1. there is no other thread which might trigger a memory barrier (the
>> allocation thread doesn't)
>
> So the allocation thread doesn't GC?  If so, who does?

It does GC.  It doesn't trigger memory barriers on its own.

> If the allocation thread does GC, then how can you ensure it doesn't
> trigger a barrier?

MPS never triggers memory barriers from MPS code.

>> 3. we don't allocate memory
>
> Why can't GC happen when we don't allocate memory?
>
>> 4. we don't trigger memory barriers
>
> Same question here.

I meant all four conditions are necessary, not that any one of thew
mould be sufficient.

GC can happen if another thread triggers a memory barrier OR another
thread allocates OR we hit a memory barrier OR we allocate.  The
question is whether it is ever useful to assume that GC can happen ONLY
in these four cases.

Pip

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-27 16:42                                                   ` Pip Cet via Emacs development discussions.
@ 2024-12-28 18:02                                                     ` Eli Zaretskii
  2024-12-28 21:05                                                       ` Pip Cet via Emacs development discussions.
  0 siblings, 1 reply; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-28 18:02 UTC (permalink / raw)
  To: Pip Cet; +Cc: gerd.moellmann, ofv, emacs-devel, eller.helmut, acorallo

> Date: Fri, 27 Dec 2024 16:42:48 +0000
> From: Pip Cet <pipcet@protonmail.com>
> Cc: gerd.moellmann@gmail.com, ofv@wanadoo.es, emacs-devel@gnu.org, eller.helmut@gmail.com, acorallo@gnu.org
> 
> "Eli Zaretskii" <eliz@gnu.org> writes:
> 
> >> All redirected MPS calls wait synchronously for the allocation thread to
> >> respond.
> >>
> >> This includes the MPS SIGSEGV handler, which calls into MPS, so it must
> >> be directed to another thread.
> >
> > MPS SIGSEGV handler is invoked when the Lisp machine touches objects
> > which were relocated by MPS, right?
> > What exactly does the allocation thread do when that happens?
> 
> Attempt to trigger another fault at the same address, which calls into
> MPS, which eventually does whatever is necessary to advance to a state
> where there is no longer a memory barrier.  Of course we could call the
> MPS signal handler directly from the allocation thread rather than
> triggering another fault.  (MPS allows for the possibility that the
> memory barrier is no longer in place by the time the arena lock has been
> acquired, and it has to, for multi-threaded operation.)
> 
> What precisely MPS does is an implementation detail, and may be
> complicated (the instruction emulation code which causes so much trouble
> for weak objects, for example).
> 
> I also think it's an implementation detail what MPS uses memory barriers
> for: I don't think the current code uses superfluous memory barriers to
> gather statistics, for example, but we certainly cannot assume that will
> never happen.

I think you lost me.  Let me try to explain what I was asking about.

MPS SIGSEGV is triggered when the main thread touches memory that was
relocated by MPS.  With the current way we interface with MPS, where
the main thread calls MPS directly, MPS sets up SIGSEGV to invoke its
(MPS's) own handler, which then handles the memory access.  By
contrast, under your proposal, MPS should be called from a separate
thread.  However, the way we currently process signals, the signals
are delivered to the main thread.  So we should install our own
SIGSEGV handler which will run in the context of the main thread, and
should somehow redirect the handling of this SIGSEGV to the MPS
thread, right?  So now the main thread calls pthread_kill to deliver
the SIGSEGV to the MPS thread, but what will the MPS thread do with
that? how will it know which MPS function to call?

> >> 3. we don't allocate memory
> >
> > Why can't GC happen when we don't allocate memory?
> >
> >> 4. we don't trigger memory barriers
> >
> > Same question here.
> 
> I meant all four conditions are necessary, not that any one of thew
> mould be sufficient.
> 
> GC can happen if another thread triggers a memory barrier OR another
> thread allocates OR we hit a memory barrier OR we allocate.  The
> question is whether it is ever useful to assume that GC can happen ONLY
> in these four cases.

GC can also happen when Emacs is idle, and at that time there's no
allocations.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-28 18:02                                                     ` Eli Zaretskii
@ 2024-12-28 21:05                                                       ` Pip Cet via Emacs development discussions.
  2024-12-29  6:15                                                         ` Eli Zaretskii
  0 siblings, 1 reply; 203+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-28 21:05 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gerd.moellmann, ofv, emacs-devel, eller.helmut, acorallo

"Eli Zaretskii" <eliz@gnu.org> writes:

>> Date: Fri, 27 Dec 2024 16:42:48 +0000
>> From: Pip Cet <pipcet@protonmail.com>
>> Cc: gerd.moellmann@gmail.com, ofv@wanadoo.es, emacs-devel@gnu.org, eller.helmut@gmail.com, acorallo@gnu.org
>>
>> "Eli Zaretskii" <eliz@gnu.org> writes:
>>
>> >> All redirected MPS calls wait synchronously for the allocation thread to
>> >> respond.
>> >>
>> >> This includes the MPS SIGSEGV handler, which calls into MPS, so it must
>> >> be directed to another thread.
>> >
>> > MPS SIGSEGV handler is invoked when the Lisp machine touches objects
>> > which were relocated by MPS, right?
>> > What exactly does the allocation thread do when that happens?
>>
>> Attempt to trigger another fault at the same address, which calls into
>> MPS, which eventually does whatever is necessary to advance to a state
>> where there is no longer a memory barrier.  Of course we could call the
>> MPS signal handler directly from the allocation thread rather than
>> triggering another fault.  (MPS allows for the possibility that the
>> memory barrier is no longer in place by the time the arena lock has been
>> acquired, and it has to, for multi-threaded operation.)
>>
>> What precisely MPS does is an implementation detail, and may be
>> complicated (the instruction emulation code which causes so much trouble
>> for weak objects, for example).
>>
>> I also think it's an implementation detail what MPS uses memory barriers
>> for: I don't think the current code uses superfluous memory barriers to
>> gather statistics, for example, but we certainly cannot assume that will
>> never happen.
>
> I think you lost me.  Let me try to explain what I was asking about.
>
> MPS SIGSEGV is triggered when the main thread touches memory that was
> relocated by MPS.

(well, the memory hasn't been relocated, it just contains invalid
pointers).

> With the current way we interface with MPS, where
> the main thread calls MPS directly, MPS sets up SIGSEGV to invoke its
> (MPS's) own handler, which then handles the memory access.

It removes the barrier and returns, making the main thread try again.

> By
> contrast, under your proposal, MPS should be called from a separate
> thread.  However, the way we currently process signals, the signals
> are delivered to the main thread.  So we should install our own
> SIGSEGV handler which will run in the context of the main thread, and
> should somehow redirect the handling of this SIGSEGV to the MPS
> thread, right?

Correct.

> So now the main thread calls pthread_kill to deliver
> the SIGSEGV to the MPS thread,

No, that wouldn't work.  We need the signal handler to have access to
the siginfo_t data, and pthread_kill provides no way to include that
information.

Instead, we call the SIGSEGV handler directly on the other thread,
passing in the same siginfo structure.

(My original code simply accessed a byte at the fault address; however,
reading the byte isn't sufficient, and writing it risks exposing
inadmissible intermediate states to other threads, so now we call the
sa_sigaction function directly).

> but what will the MPS thread do with that?

Call the MPS SIGSEGV handler, which knows what to do based (currently)
only on the address.

> how will it know which MPS function to call?

The MPS SIGSEGV handler is obtained by calling sigaction.

>> >> 3. we don't allocate memory
>> >
>> > Why can't GC happen when we don't allocate memory?
>> >
>> >> 4. we don't trigger memory barriers
>> >
>> > Same question here.
>>
>> I meant all four conditions are necessary, not that any one of thew
>> mould be sufficient.
>>
>> GC can happen if another thread triggers a memory barrier OR another
>> thread allocates OR we hit a memory barrier OR we allocate.  The
>> question is whether it is ever useful to assume that GC can happen ONLY
>> in these four cases.
>
> GC can also happen when Emacs is idle, and at that time there's no
> allocations.

If you want to spell it out, sure, but there's no way for the main
thread to become idle without potentially allocating memory, so this
fifth condition is redundant.

Pip




^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-28 21:05                                                       ` Pip Cet via Emacs development discussions.
@ 2024-12-29  6:15                                                         ` Eli Zaretskii
  0 siblings, 0 replies; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-29  6:15 UTC (permalink / raw)
  To: Pip Cet; +Cc: gerd.moellmann, ofv, emacs-devel, eller.helmut, acorallo

> Date: Sat, 28 Dec 2024 21:05:43 +0000
> From: Pip Cet <pipcet@protonmail.com>
> Cc: gerd.moellmann@gmail.com, ofv@wanadoo.es, emacs-devel@gnu.org, eller.helmut@gmail.com, acorallo@gnu.org
> 
> "Eli Zaretskii" <eliz@gnu.org> writes:
> 
> > By
> > contrast, under your proposal, MPS should be called from a separate
> > thread.  However, the way we currently process signals, the signals
> > are delivered to the main thread.  So we should install our own
> > SIGSEGV handler which will run in the context of the main thread, and
> > should somehow redirect the handling of this SIGSEGV to the MPS
> > thread, right?
> 
> Correct.
> 
> > So now the main thread calls pthread_kill to deliver
> > the SIGSEGV to the MPS thread,
> 
> No, that wouldn't work.  We need the signal handler to have access to
> the siginfo_t data, and pthread_kill provides no way to include that
> information.
> 
> Instead, we call the SIGSEGV handler directly on the other thread,
> passing in the same siginfo structure.

How can we call the SIGSEGV handler directly from another thread?  And
how will that thread know it needs to call the handler in the first
place?

> (My original code simply accessed a byte at the fault address; however,
> reading the byte isn't sufficient, and writing it risks exposing
> inadmissible intermediate states to other threads, so now we call the
> sa_sigaction function directly).
> 
> > but what will the MPS thread do with that?
> 
> Call the MPS SIGSEGV handler, which knows what to do based (currently)
> only on the address.
> 
> > how will it know which MPS function to call?
> 
> The MPS SIGSEGV handler is obtained by calling sigaction.

That's unreliable: it assumes that no one else calls sigaction after
MPS, or changes the chain of handlers in some other way.

Also, on MS-Windows MPS doesn't use a signal handler (because there
are no signals on Windows, really).  It uses an exception handler, and
I don't think there's a documented method of getting at the handler
after installing it.  So we'd probably need to call the MPS function
directly, leveraging our knowledge of the MPS internals.  That's
assuming we can call the exception handler outside of the context of a
real exception, which I'm not at all sure we can.

In sum, this part of your idea, of simulating a segfault/exception due
to access to protected memory, sounds to me to be quite fragile and
unreliable.

> >> >> 4. we don't trigger memory barriers
> >> >
> >> > Same question here.
> >>
> >> I meant all four conditions are necessary, not that any one of thew
> >> mould be sufficient.
> >>
> >> GC can happen if another thread triggers a memory barrier OR another
> >> thread allocates OR we hit a memory barrier OR we allocate.  The
> >> question is whether it is ever useful to assume that GC can happen ONLY
> >> in these four cases.
> >
> > GC can also happen when Emacs is idle, and at that time there's no
> > allocations.
> 
> If you want to spell it out, sure, but there's no way for the main
> thread to become idle without potentially allocating memory, so this
> fifth condition is redundant.

I don't understand why the assertion "there's no way for the main
thread to become idle without potentially allocating memory" is true.
Emacs becomes idle because there are no input events to process, and
that has nothing to do with allocating memory.  The GC we trigger when
Emacs is idle might find that there's no garbage to collect, but AFAIU
it will nevertheless take the arena lock.

But if this is not relevant to those conditions (which I frankly don't
understand why they are needed), then we could stop arguing about that
aspect.

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-25 17:40                                       ` Pip Cet via Emacs development discussions.
  2024-12-25 17:51                                         ` Eli Zaretskii
@ 2024-12-26  5:27                                         ` Gerd Möllmann
  2024-12-26  5:29                                         ` Gerd Möllmann
  2 siblings, 0 replies; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-26  5:27 UTC (permalink / raw)
  To: Pip Cet; +Cc: Eli Zaretskii, ofv, emacs-devel, eller.helmut, acorallo

Pip Cet <pipcet@protonmail.com> writes:

> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>> DEFUN ("function-equal", Ffunction_equal, Sfunction_equal, 2, 2, 0,
>>        doc: /* Return non-nil if F1 and F2 come from the same source.
>> Used to determine if different closures are just different instances of
>> the same lambda expression, or are really unrelated function.  */)
>>      (Lisp_Object f1, Lisp_Object f2)
>> {
>>   bool res;
>>   if (EQ (f1, f2))
>
> This EQ can also trip.  Sorry to insist on that, but I think it's an
> important point: if we change Lisp internals (such as the slow EQ
> thing), the "we're not dereferencing it, just looking at the bit
> representation of the pointer" approach will fail again, in unexpected
> places.
>
> I haven't seen a technical argument against using separate stacks for
> MPS and signals (I don't consider "it's a change and we'd need to test
> it" to be any more true for this change than for any other proposed
> change, or for what's in scratch/igc now).  It would get us on par with
> master.  (Both versions need to add the best memory barrier we have to
> the specpdl_ptr++ code)
>
> (I don't think MPS works on multi-threaded systems if word stores aren't
> atomic.  If thread A is in the middle of updating an mps_word
> referencing another object, and thread B triggers a GC, thread A is
> stopped and thread B might scan the segment in the inconsistent state.)
>
> Miraculously, everything can be made to work out in the WIDE_EMACS_INT
> case, even though 64-bit words are stored in two insns: we only look at
> the LSB 32-bit word when fixing (because we USE_LSB_TAG), so that'll
> just work.  Late exthdr creation needed to be changed a little, and now
> assumes changing a 64-bit value to another 64-bit value which differs
> only in one 32-bit half is atomic.
>
> Here's a snapshot of the current code.  It still assumes strong memory
> ordering between threads because I'm pretty sure MPS needs that, too, so
> it's just asm volatile ("" ::: "memory") for now.

Thanks. 



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-25 17:40                                       ` Pip Cet via Emacs development discussions.
  2024-12-25 17:51                                         ` Eli Zaretskii
  2024-12-26  5:27                                         ` Gerd Möllmann
@ 2024-12-26  5:29                                         ` Gerd Möllmann
  2 siblings, 0 replies; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-26  5:29 UTC (permalink / raw)
  To: Pip Cet; +Cc: Eli Zaretskii, ofv, emacs-devel, eller.helmut, acorallo

Pip Cet <pipcet@protonmail.com> writes:

> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>> DEFUN ("function-equal", Ffunction_equal, Sfunction_equal, 2, 2, 0,
>>        doc: /* Return non-nil if F1 and F2 come from the same source.
>> Used to determine if different closures are just different instances of
>> the same lambda expression, or are really unrelated function.  */)
>>      (Lisp_Object f1, Lisp_Object f2)
>> {
>>   bool res;
>>   if (EQ (f1, f2))
>
> This EQ can also trip.  Sorry to insist on that, but I think it's an
> important point: if we change Lisp internals (such as the slow EQ
> thing), the "we're not dereferencing it, just looking at the bit
> representation of the pointer" approach will fail again, in unexpected
> places.
>
> I haven't seen a technical argument against using separate stacks for
> MPS and signals (I don't consider "it's a change and we'd need to test
> it" to be any more true for this change than for any other proposed
> change, or for what's in scratch/igc now).  It would get us on par with
> master.  (Both versions need to add the best memory barrier we have to
> the specpdl_ptr++ code)
>
> (I don't think MPS works on multi-threaded systems if word stores aren't
> atomic.  If thread A is in the middle of updating an mps_word
> referencing another object, and thread B triggers a GC, thread A is
> stopped and thread B might scan the segment in the inconsistent state.)
>
> Miraculously, everything can be made to work out in the WIDE_EMACS_INT
> case, even though 64-bit words are stored in two insns: we only look at
> the LSB 32-bit word when fixing (because we USE_LSB_TAG), so that'll
> just work.  Late exthdr creation needed to be changed a little, and now
> assumes changing a 64-bit value to another 64-bit value which differs
> only in one 32-bit half is atomic.
>
> Here's a snapshot of the current code.  It still assumes strong memory
> ordering between threads because I'm pretty sure MPS needs that, too, so
> it's just asm volatile ("" ::: "memory") for now.

Thanks.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-24 14:12                             ` Gerd Möllmann
  2024-12-24 14:40                               ` Eli Zaretskii
@ 2024-12-24 21:18                               ` Pip Cet via Emacs development discussions.
  2024-12-25  5:23                                 ` Gerd Möllmann
  1 sibling, 1 reply; 203+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-24 21:18 UTC (permalink / raw)
  To: Gerd Möllmann
  Cc: Eli Zaretskii, ofv, emacs-devel, eller.helmut, acorallo

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Eli Zaretskii <eliz@gnu.org> writes:
> We're coming from the problem that MPS uses signals for memory barriers.
> On platforms != macOS. And I am proposing a solution for that.

I don't think that's the problem.  The problem is that signals can
interrupt MPS, on all platforms.  We can't have MPS-signal-MPS stacks,
ever.  The best way to ensure that is to keep signals on one stack, and
MPS on another stack.  MacOS already does that for their SIGSEGV
equivalent, but we need to do it for all entry points into MPS.

If they don't have separate stacks, and we interrupt MPS, the signal
handler cannot look at any MPS-modifiable memory (including roots, which
may be in an inconsistent state mid-GC), ever.  This includes the
specpdl.  We can't write to MPS-known memory, ever.  This includes any
area we might want to copy the backtrace or specpdl to.

> The SIGPROF handler does two things: (1) get the current backtrace,
> which does not trip on memory barriers, and

Even if the specpdl were an ambiguous root, we'd be making very
permanent and far-reaching assumptions about how MPS handles such roots
if we assumed that we could even look at such roots during GC.  This
goes doubly for assuming that we can extract references to
ambiguously-rooted objects and put them into other areas of MPS-visible
memory.  Even if this worked perfectly with current MPS on all
platforms, it would still be unreasonable for us to rely on such
implementation details.

We can't do (1).

>> Doing this from another thread raises the problem I describe above: we
>> need the Lisp thread(s) stopped, because you cannot examine the data
>> of the Lisp machine while the machine is running.  And if we stop the
>> Lisp threads, why do we need the other thread at all?

Because MPS can continue and reach an MPS-consistent state only if it
has its own stack.  In practice, this means an extra thread.

Or we re-raise signals (scratch/igc right now; this will delay signals
unpredictably), or we block them for the allocation fast path
(significant slowdown on some systems) *and* in the SIGSEGV handler
(which we'd need to "steal" from MPS, calling it from our real signal
handler by extracting sa_sigaction and calling that pointer.  Recipe for
disaster).

I'm still convinced the extra thread is the least painful option,
followed by what we have now.

Pip

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-24 21:18                               ` Pip Cet via Emacs development discussions.
@ 2024-12-25  5:23                                 ` Gerd Möllmann
  2024-12-25 10:48                                   ` Pip Cet via Emacs development discussions.
                                                     ` (2 more replies)
  0 siblings, 3 replies; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-25  5:23 UTC (permalink / raw)
  To: Pip Cet; +Cc: Eli Zaretskii, ofv, emacs-devel, eller.helmut, acorallo

Pip Cet <pipcet@protonmail.com> writes:

> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>
>> Eli Zaretskii <eliz@gnu.org> writes:
>> We're coming from the problem that MPS uses signals for memory barriers.
>> On platforms != macOS. And I am proposing a solution for that.
>
> I don't think that's the problem.  The problem is that signals can
> interrupt MPS, on all platforms.  We can't have MPS-signal-MPS stacks,
> ever.  The best way to ensure that is to keep signals on one stack, and
> MPS on another stack.  MacOS already does that for their SIGSEGV
> equivalent, but we need to do it for all entry points into MPS.
>
> If they don't have separate stacks, and we interrupt MPS, the signal
> handler cannot look at any MPS-modifiable memory (including roots, which
> may be in an inconsistent state mid-GC), ever.  This includes the
> specpdl.  We can't write to MPS-known memory, ever.  This includes any
> area we might want to copy the backtrace or specpdl to.

And I don't think that's right :-). It's completely right that in the
SIGPROF handler everything can be inconsistent. That's true both for MPS
and Emacs. For example, the bindings stack (specpdl) may be inconsistent
when SIGPROF arrives. Literally everything we do in the SIGPROF runs the
risk of encountering inconsistencies.

I think that's already true for the old GC. There is nothing
guaranteeing that the contents of the binding stack is consistent, for
example. But we get away with it well enough that the profiler is
useful.

With MPS, from my POV, the situation is pretty similar. Try to get away
with it by not triggering MPS while in a state that we must assume is
inconsistent.

>> The SIGPROF handler does two things: (1) get the current backtrace,
>> which does not trip on memory barriers, and
>
> Even if the specpdl were an ambiguous root, we'd be making very
> permanent and far-reaching assumptions about how MPS handles such roots
> if we assumed that we could even look at such roots during GC.  This
> goes doubly for assuming that we can extract references to
> ambiguously-rooted objects and put them into other areas of MPS-visible
> memory.  Even if this worked perfectly with current MPS on all
> platforms, it would still be unreasonable for us to rely on such
> implementation details.
>
> We can't do (1).

I disagree, abviously :-)

For me, it's not about a theoretical or even practical solution that
somehow ensures a consistent state in MPS, or some future changes in MPS
or something. It's about getting away with what we do in the profiler
_now_, as we do with the old GC. which is already seeing potentially
inconsistent state in Emacs' memory.

I think the _now_ is also important. From my POV, we could discuss
better solutions later.

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-25  5:23                                 ` Gerd Möllmann
@ 2024-12-25 10:48                                   ` Pip Cet via Emacs development discussions.
  2024-12-25 13:40                                     ` Stefan Kangas
  2024-12-25 11:48                                   ` Helmut Eller
  2024-12-25 12:31                                   ` Eli Zaretskii
  2 siblings, 1 reply; 203+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-25 10:48 UTC (permalink / raw)
  To: Gerd Möllmann
  Cc: Eli Zaretskii, ofv, emacs-devel, eller.helmut, acorallo

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Pip Cet <pipcet@protonmail.com> writes:
>
>> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
>>
>>> Eli Zaretskii <eliz@gnu.org> writes:
>>> We're coming from the problem that MPS uses signals for memory barriers.
>>> On platforms != macOS. And I am proposing a solution for that.
>>
>> I don't think that's the problem.  The problem is that signals can
>> interrupt MPS, on all platforms.  We can't have MPS-signal-MPS stacks,
>> ever.  The best way to ensure that is to keep signals on one stack, and
>> MPS on another stack.  MacOS already does that for their SIGSEGV
>> equivalent, but we need to do it for all entry points into MPS.
>>
>> If they don't have separate stacks, and we interrupt MPS, the signal
>> handler cannot look at any MPS-modifiable memory (including roots, which
>> may be in an inconsistent state mid-GC), ever.  This includes the
>> specpdl.  We can't write to MPS-known memory, ever.  This includes any
>> area we might want to copy the backtrace or specpdl to.
>
> And I don't think that's right :-). It's completely right that in the
> SIGPROF handler everything can be inconsistent. That's true both for MPS
> and Emacs. For example, the bindings stack (specpdl) may be inconsistent
> when SIGPROF arrives. Literally everything we do in the SIGPROF runs the
> risk of encountering inconsistencies.

This is getting into technical details, but I think the specpdl code
was, at one point, carefully written so the specpdl stack would always
look consistent, making some assumptions about the compiler in use.
Then compilers changed to make this impossible (automatic inlining,
LTO), then they changed to make this possible again (stdatomic.h, memory
ordering), and we also introduced an unfortunate feature which breaks
consistency.  Now, we can (and should) restore the consistency
assumption, at least if we drop that unfortunate feature (as we should).

Inconsistency of the specpdl stack is avoidable, because we control both
the mutator and the inspection code.  Inconsistency of MPS data is not,
unless we take over control of the entire MPS library so we can make
far-reaching assumptions there.

> I think that's already true for the old GC. There is nothing
> guaranteeing that the contents of the binding stack is consistent, for
> example. But we get away with it well enough that the profiler is
> useful.

My understanding is that was true at one point, before C caught up to
memory ordering between a thread and its signal handlers, but with C11,
we have everything we need to ensure consistency, at least on systems
that store words atomically (we don't use memcpy for modifying the
specpdl stack).

And about the usefulness thing: I really want SIGPROF specifically to
improve MPS performance, which means we need to do something in the
"we've interrupted MPS" situation.  Or at least I want the option,
rather than make "signal handlers can't do anything useful if
igc_busy_p()" an axiom.  And if we start declaring huge chunks of Emacs
data (the entire specpdl, a large area of storage for storing the
"backtrace", all thread stacks, why not the pdumper area while we're at
it?) as ambiguous roots, we risk ending up with AMS pools everywhere and
no copying.

> With MPS, from my POV, the situation is pretty similar. Try to get away
> with it by not triggering MPS while in a state that we must assume is
> inconsistent.

That's one approach.  The other approach is to keep arguing about this
until we get a SIGPROF that we're actually happy with, and then we can
tell people interested in other signal handlers to copy that code :-)

> For me, it's not about a theoretical or even practical solution that
> somehow ensures a consistent state in MPS, or some future changes in MPS
> or something. It's about getting away with what we do in the profiler
> _now_, as we do with the old GC. which is already seeing potentially
> inconsistent state in Emacs' memory.
>
> I think the _now_ is also important. From my POV, we could discuss
> better solutions later.

I misunderstood.

We've got a solution that I'm convinced we can get away with, on
scratch/igc, now.  It's not pretty or permanent, and I don't think it's
"good enough"; but I don't think splitting SIGPROF handlers improves it
enough to make it "good enough", either.

Pip

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-25 10:48                                   ` Pip Cet via Emacs development discussions.
@ 2024-12-25 13:40                                     ` Stefan Kangas
  2024-12-25 17:03                                       ` Pip Cet via Emacs development discussions.
  0 siblings, 1 reply; 203+ messages in thread
From: Stefan Kangas @ 2024-12-25 13:40 UTC (permalink / raw)
  To: Pip Cet, Gerd Möllmann
  Cc: Eli Zaretskii, ofv, emacs-devel, eller.helmut, acorallo

Pip Cet via "Emacs development discussions." <emacs-devel@gnu.org>
writes:

>> I think that's already true for the old GC. There is nothing
>> guaranteeing that the contents of the binding stack is consistent, for
>> example. But we get away with it well enough that the profiler is
>> useful.
>
> My understanding is that was true at one point, before C caught up to
> memory ordering between a thread and its signal handlers, but with C11,
> we have everything we need to ensure consistency, at least on systems
> that store words atomically (we don't use memcpy for modifying the
> specpdl stack).

Which parts of C11 help us?

I'm probably missing something obvious, but aren't we using C99 on the
scratch/igc branch also?

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-25 13:40                                     ` Stefan Kangas
@ 2024-12-25 17:03                                       ` Pip Cet via Emacs development discussions.
  2024-12-26  5:22                                         ` Gerd Möllmann
  2024-12-26 16:12                                         ` Stefan Kangas
  0 siblings, 2 replies; 203+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-25 17:03 UTC (permalink / raw)
  To: Stefan Kangas
  Cc: Gerd Möllmann, Eli Zaretskii, ofv, emacs-devel, eller.helmut,
	acorallo

"Stefan Kangas" <stefankangas@gmail.com> writes:

> Pip Cet via "Emacs development discussions." <emacs-devel@gnu.org>
> writes:
>
>>> I think that's already true for the old GC. There is nothing
>>> guaranteeing that the contents of the binding stack is consistent, for
>>> example. But we get away with it well enough that the profiler is
>>> useful.
>>
>> My understanding is that was true at one point, before C caught up to
>> memory ordering between a thread and its signal handlers, but with C11,
>> we have everything we need to ensure consistency, at least on systems
>> that store words atomically (we don't use memcpy for modifying the
>> specpdl stack).
>
> Which parts of C11 help us?

stdatomic.h, in this case.

Please take the rest of this email with a grain of salt: as far as I
know, compilers and I have reached the promised land of explicit memory
barriers, and I'm actively trying to forget about the precise rules
surrounding implicit memory barriers in the old days.

The summary is we should explicitly indicate thread/signal data
dependencies (atomic_signal_fence) in new or modified code, and
explicitly indicate thread/thread dependencies in new code
(atomic_thread_fence).  If the compiler doesn't support it, we must
assume it's too old to reorder stores.

https://github.com/rmind/stdc/blob/master/c11memfences.md seems to
mostly agree with my memory, though.

IIRC, C99 doesn't have usable memory barriers, not even for signal
handler/main thread races such as this one.

Of course almost every compiler that supports C99, and certainly all
compilers usable for compiling Emacs, provides (or doesn't need, in the
case of TinyCC) ways of implementing them.  In the case of GCC, that
used to be asm volatile ("" : : : "memory").

> I'm probably missing something obvious, but aren't we using C99 on the
> scratch/igc branch also?

We're using C99 + a lot of assumptions (most importantly, conservative
scanning lore).

If my memory is correct, this particular assumption (as far as the
compiler goes) changed from "C won't reorder stores" to "C won't reorder
stores across a non-inlined function" (where "non-inlined" changed from
"not explicitly inlined", then "in another CU so it can't be inlined",
then to "explicitly defined as non-inlined"), and finally to "you have
to tell C explicitly not to reorder stores".

Emacs hasn't caught up with the final change, so we're currently in a
"nothing guarantees that it'll work, but it kind of does" stage with
regard to the specpdl and profiling.  Very old compilers don't support
or require explicit memory barriers; old compilers support but don't
require them; current/future compilers may require them.

Note that Gerd appeared to be disagreeing about some part of this, but
I'm not sure which part precisely.

Pip

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-25 17:03                                       ` Pip Cet via Emacs development discussions.
@ 2024-12-26  5:22                                         ` Gerd Möllmann
  2024-12-26  7:33                                           ` Eli Zaretskii
  2024-12-26 16:12                                         ` Stefan Kangas
  1 sibling, 1 reply; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-26  5:22 UTC (permalink / raw)
  To: Pip Cet
  Cc: Stefan Kangas, Eli Zaretskii, ofv, emacs-devel, eller.helmut,
	acorallo

Pip Cet <pipcet@protonmail.com> writes:

> Note that Gerd appeared to be disagreeing about some part of this, but
> I'm not sure which part precisely.

Then I gave a false impression, sorry. #hat you write is likely correct.
And I'm only saying likely because there's always a chance that
something is wrong.

I'm coming to all this from a completely different angle. My
understanding is (1) the signal handling/MPS thing, is the only thing
preventing landing in master, and (2) the problems with
reordering/consistency and so on basically already exists in master. Add
(3) that I believe landing in master ASAP is desirable. If these
premises are not valid, especially (1) I'd like to be corrected, BTW.

My approach is "focus!" :-). Get a signal handling/MPS thing into igc
that is good enough to be accepted, land in master, and only then
proceed with anything else that has come up. Admittedly, neither the
"focus", nor my "good enough solution" seems to work, so far.

If people find that that the wrong approach, or not helpful, please
tell! I'm not married to the idea. I can as well wait for things to
settle by themselves, because everything works just fine already on the
system in my office :-).

I don't know. Maintainers please take over, I guess.

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-26  5:22                                         ` Gerd Möllmann
@ 2024-12-26  7:33                                           ` Eli Zaretskii
  2024-12-26  8:02                                             ` Gerd Möllmann
  2024-12-26 15:50                                             ` Stefan Kangas
  0 siblings, 2 replies; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-26  7:33 UTC (permalink / raw)
  To: Gerd Möllmann
  Cc: pipcet, stefankangas, ofv, emacs-devel, eller.helmut, acorallo

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: Stefan Kangas <stefankangas@gmail.com>,  Eli Zaretskii <eliz@gnu.org>,
>   ofv@wanadoo.es,  emacs-devel@gnu.org,  eller.helmut@gmail.com,
>   acorallo@gnu.org
> Date: Thu, 26 Dec 2024 06:22:17 +0100
> 
> I'm coming to all this from a completely different angle. My
> understanding is (1) the signal handling/MPS thing, is the only thing
> preventing landing in master

That's not so.  It is not the only thing we need to figure out and
solve before we can consider landing this on master.  At the very
least, we have unresolved issues with patches to MPS for some
platforms, whereby we considered forking MPS or some other course of
actions.  Also, there are several FIXMEs in igc.c itself.  For the
MS-Windows build, we have the issue of registering some threads with
MPS (see our discussion Re: "MPS: w32 threads" back in May).  So we
still have a way to go.

> My approach is "focus!" :-). Get a signal handling/MPS thing into igc
> that is good enough to be accepted, land in master, and only then
> proceed with anything else that has come up.

The "focus!" approach is correct, IMO, but landing the feature on
master is only possible if we believe the branch is stable enough,
because there are enough people who use master for production to
consider its being reasonably stable a necessary requirement.  I
believe we still have unresolved reports about freezes on GNU/Linux,
so we are not there yet.  I also don't have a clear idea of which
Emacs configurations (in terms of toolkits, PGTK yes/no,
native-compilation yes/no, etc.) were or are being tested on
GNU/Linux -- this is also relevant to assessing the stability.

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-26  7:33                                           ` Eli Zaretskii
@ 2024-12-26  8:02                                             ` Gerd Möllmann
  2024-12-26 15:50                                             ` Stefan Kangas
  1 sibling, 0 replies; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-26  8:02 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: pipcet, stefankangas, ofv, emacs-devel, eller.helmut, acorallo

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: Stefan Kangas <stefankangas@gmail.com>,  Eli Zaretskii <eliz@gnu.org>,
>>   ofv@wanadoo.es,  emacs-devel@gnu.org,  eller.helmut@gmail.com,
>>   acorallo@gnu.org
>> Date: Thu, 26 Dec 2024 06:22:17 +0100
>> 
>> I'm coming to all this from a completely different angle. My
>> understanding is (1) the signal handling/MPS thing, is the only thing
>> preventing landing in master
>
> That's not so.  It is not the only thing we need to figure out and
> solve before we can consider landing this on master.  At the very
> least, we have unresolved issues with patches to MPS for some
> platforms, whereby we considered forking MPS or some other course of
> actions.  Also, there are several FIXMEs in igc.c itself.  For the
> MS-Windows build, we have the issue of registering some threads with
> MPS (see our discussion Re: "MPS: w32 threads" back in May).  So we
> still have a way to go.
>
>> My approach is "focus!" :-). Get a signal handling/MPS thing into igc
>> that is good enough to be accepted, land in master, and only then
>> proceed with anything else that has come up.
>
> The "focus!" approach is correct, IMO, but landing the feature on
> master is only possible if we believe the branch is stable enough,
> because there are enough people who use master for production to
> consider its being reasonably stable a necessary requirement.  I
> believe we still have unresolved reports about freezes on GNU/Linux,
> so we are not there yet.  I also don't have a clear idea of which
> Emacs configurations (in terms of toolkits, PGTK yes/no,
> native-compilation yes/no, etc.) were or are being tested on
> GNU/Linux -- this is also relevant to assessing the stability.

Hm. If my assumption (1) is not true, I think it's best for me to just
wait and do my other stuff meanwhile.




^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-26  7:33                                           ` Eli Zaretskii
  2024-12-26  8:02                                             ` Gerd Möllmann
@ 2024-12-26 15:50                                             ` Stefan Kangas
  2024-12-26 16:13                                               ` Eli Zaretskii
  2024-12-26 17:01                                               ` Pip Cet via Emacs development discussions.
  1 sibling, 2 replies; 203+ messages in thread
From: Stefan Kangas @ 2024-12-26 15:50 UTC (permalink / raw)
  To: Eli Zaretskii, Gerd Möllmann
  Cc: pipcet, ofv, emacs-devel, eller.helmut, acorallo

Eli Zaretskii <eliz@gnu.org> writes:

>> I'm coming to all this from a completely different angle. My
>> understanding is (1) the signal handling/MPS thing, is the only thing
>> preventing landing in master
>
> That's not so.  It is not the only thing we need to figure out and
> solve before we can consider landing this on master.

Thanks.  Should we perhaps make a list of these items somewhere, e.g. in
README-IGC on the scratch/igc branch?

> we have unresolved issues with patches to MPS for some platforms,
> whereby we considered forking MPS or some other course of actions.

Forking MPS is obviously better to avoid, if at all possible.

Do we have a complete list of these patches, or can we assemble one now?
Are all of these open pull requests to Ravenbrook, so that we are in
effect only waiting for them, or do we need to more work on our end?

> Also, there are several FIXMEs in igc.c itself.

Are all of these important enough to be considered as blockers for
merging to master, or only some of them?

If the latter, how about making a list of the ones that are considered
blockers, or perhaps just marking them as such in the FIXME comment in
the source code?

> For the MS-Windows build, we have the issue of registering some
> threads with MPS (see our discussion Re: "MPS: w32 threads" back in
> May).

In the long run, it is highly desirable to have support for (reasonably
modern) MS-Windows, of course.  There is no doubt about that.

But could you elaborate on why you think this is a blocker for merging
this to master?  My understanding is that, from the point of view of
GNU, we maintainers can choose to take features even if they only work
on GNU/Linux.  They can then be made to work on other systems later.

Maybe I'm missing something.

> The "focus!" approach is correct, IMO, but landing the feature on
> master is only possible if we believe the branch is stable enough,
> because there are enough people who use master for production to
> consider its being reasonably stable a necessary requirement.

I agree that more stability and testing is always better.

However, if we have the fundamental issues worked out to such an extent
that we can demonstrate that the MPS branch is viable, I personally
don't consider "not enough testing" to be a blocker for merging the
branch to master.  For example, we could put the MPS GC behind a feature
flag, and mark it as experimental.

I don't suggest that we should rush to merge or anything like that, but
let's keep in mind the benefits of merging also: it could help lead to
faster stabilization, as it will be easier to test than building a
feature branch.  It will also be a clear indicator that we consider the
basic approach to be viable enough.

We will also get more testing of the combined result "for free"
(i.e. using the old GC even in the presence of the changes needed for
the new one).

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-26 15:50                                             ` Stefan Kangas
@ 2024-12-26 16:13                                               ` Eli Zaretskii
  2024-12-26 19:40                                                 ` Gerd Möllmann
  2024-12-26 17:01                                               ` Pip Cet via Emacs development discussions.
  1 sibling, 1 reply; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-26 16:13 UTC (permalink / raw)
  To: Stefan Kangas
  Cc: gerd.moellmann, pipcet, ofv, emacs-devel, eller.helmut, acorallo

> From: Stefan Kangas <stefankangas@gmail.com>
> Date: Thu, 26 Dec 2024 15:50:38 +0000
> Cc: pipcet@protonmail.com, ofv@wanadoo.es, emacs-devel@gnu.org, 
> 	eller.helmut@gmail.com, acorallo@gnu.org
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> I'm coming to all this from a completely different angle. My
> >> understanding is (1) the signal handling/MPS thing, is the only thing
> >> preventing landing in master
> >
> > That's not so.  It is not the only thing we need to figure out and
> > solve before we can consider landing this on master.
> 
> Thanks.  Should we perhaps make a list of these items somewhere, e.g. in
> README-IGC on the scratch/igc branch?

Feel free to do that, sure.

> > we have unresolved issues with patches to MPS for some platforms,
> > whereby we considered forking MPS or some other course of actions.
> 
> Forking MPS is obviously better to avoid, if at all possible.

But if we have patches that we must have there, and if the MPS folks
do not intend to accept them upstream, for some reason, then forking
is the only practical way for us.

> Do we have a complete list of these patches, or can we assemble one now?

They were mentioned in these discussions, so they are in the archives.
I don't have such a list handy, sorry (my builds of the igc branch
uses stock 1.118.0 release of MPS with a couple of patches specific to
32-bit MinGW).

> Are all of these open pull requests to Ravenbrook, so that we are in
> effect only waiting for them, or do we need to more work on our end?

I don't think we submitted the changes to Ravenbrook, with the single
exception of the one Gerd posted there.  But I'm not sure my memory is
accurate.

> > Also, there are several FIXMEs in igc.c itself.
> 
> Are all of these important enough to be considered as blockers for
> merging to master, or only some of them?

I don't know.  Maybe Gerd can answer that.

> If the latter, how about making a list of the ones that are considered
> blockers, or perhaps just marking them as such in the FIXME comment in
> the source code?

Let's first see which of those FIXMEs are important enough to worry
about them.

> > For the MS-Windows build, we have the issue of registering some
> > threads with MPS (see our discussion Re: "MPS: w32 threads" back in
> > May).
> 
> In the long run, it is highly desirable to have support for (reasonably
> modern) MS-Windows, of course.  There is no doubt about that.
> 
> But could you elaborate on why you think this is a blocker for merging
> this to master?

Because that's what I run here.  I cannot do my job if Emacs doesn't
compile and work well on MS-Windows.

> My understanding is that, from the point of view of
> GNU, we maintainers can choose to take features even if they only work
> on GNU/Linux.  They can then be made to work on other systems later.

It's semi-okay if some optional feature cannot be compiled here, but I
cannot imagine being responsible for the master branch if igc cannot
be compiled or doesn't work well on Windows, because this is a major
feature of Emacs, and any serious bugs in it must be reproducible on
my machine.

> > The "focus!" approach is correct, IMO, but landing the feature on
> > master is only possible if we believe the branch is stable enough,
> > because there are enough people who use master for production to
> > consider its being reasonably stable a necessary requirement.
> 
> I agree that more stability and testing is always better.
> 
> However, if we have the fundamental issues worked out to such an extent
> that we can demonstrate that the MPS branch is viable, I personally
> don't consider "not enough testing" to be a blocker for merging the
> branch to master.  For example, we could put the MPS GC behind a feature
> flag, and mark it as experimental.

I don't see how it makes sense to merge a feature and mark it
experimental.  It isn't different from leaving it on a branch, IMO.

> I don't suggest that we should rush to merge or anything like that, but
> let's keep in mind the benefits of merging also: it could help lead to
> faster stabilization, as it will be easier to test than building a
> feature branch.  It will also be a clear indicator that we consider the
> basic approach to be viable enough.

If we want faster stabilization, we cannot mark it experimental.  We
need to ask people to use it.  And for that, it must be stable enough,
because people will not seriously use Emacs if it crashes many times a
day and they lose their edits.

> We will also get more testing of the combined result "for free"
> (i.e. using the old GC even in the presence of the changes needed for
> the new one).

That's much less important, IMO.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-26 16:13                                               ` Eli Zaretskii
@ 2024-12-26 19:40                                                 ` Gerd Möllmann
  0 siblings, 0 replies; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-26 19:40 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: Stefan Kangas, pipcet, ofv, emacs-devel, eller.helmut, acorallo

Eli Zaretskii <eliz@gnu.org> writes:

>> Are all of these important enough to be considered as blockers for
>> merging to master, or only some of them?
>
> I don't know.  Maybe Gerd can answer that.

Soory, I have no idea what these patches could be. 

On macOS, everything just works for me, with MPS from Homebrew, and with
an MPS that I built from git. I never had a blocker. The only thing I
ever needed was

  https://github.com/Ravenbrook/mps/issues/281

which is meanwhile fixed, I think.






^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-26 15:50                                             ` Stefan Kangas
  2024-12-26 16:13                                               ` Eli Zaretskii
@ 2024-12-26 17:01                                               ` Pip Cet via Emacs development discussions.
  2024-12-26 19:45                                                 ` Gerd Möllmann
  1 sibling, 1 reply; 203+ messages in thread
From: Pip Cet via Emacs development discussions. @ 2024-12-26 17:01 UTC (permalink / raw)
  To: Stefan Kangas
  Cc: Eli Zaretskii, Gerd Möllmann, ofv, emacs-devel, eller.helmut,
	acorallo

"Stefan Kangas" <stefankangas@gmail.com> writes:

> Eli Zaretskii <eliz@gnu.org> writes:
>
>>> I'm coming to all this from a completely different angle. My
>>> understanding is (1) the signal handling/MPS thing, is the only thing
>>> preventing landing in master
>>
>> That's not so.  It is not the only thing we need to figure out and
>> solve before we can consider landing this on master.

While I think we should keep trying for a while, I still think it's
possible we'll just have to conclude that the current approach is the
best we can do for a quick merge.

It's ugly, but the user-visible problems seem manageable.

1. Delayed signals have to wait for the next explicit check for pending
signals.

On GNU/Linux, we could spawn a helper thread to take the arena lock and
release it again, then send a signal back to the main thread so we can
check for pending signal handlers and run them, assuming we're lucky and
the arena is no longer locked.  This exposes quite a few MPS details,
and there's a risk of an unusable Emacs if the helper thread saturates
the CPU.  The price we pay is that once a signal is delayed, all further
signals have to go to the ping-pong thread until the main thread
restores the sigmask, which it cannot do in a signal handler.

2. One flag per signal hard-codes the number of signals usable by Emacs.

I don't think this is a blocker.

3. We restore the "original" sigmask, so changing the sigmask from other
parts of Emacs won't work reliably.  Neither will any checks for whether
a signal is logically blocked, because delayed signals will be masked
once in a while, causing false positives.

This seems fixable, and should be fixed.

>> we have unresolved issues with patches to MPS for some platforms,
>> whereby we considered forking MPS or some other course of actions.
>
> Forking MPS is obviously better to avoid, if at all possible.

Forking MPS on GNU/Linux x86-64, in particular.  I'm not sure how Gerd
feels about macOS.

>> Also, there are several FIXMEs in igc.c itself.
>
> Are all of these important enough to be considered as blockers for
> merging to master, or only some of them?

Very few of them, I think.  My impression is most of them are about the
remaining places where we use ambiguous marking unnecessarily, which we
should fix before concluding MPS slows us down too much, but not
necessarily before we merge.  (However, we should be very, very clear on
that point: mps works now, but it's almost entirely unoptimized.  Code
which relies heavily on GC may be very slow.)

Bytecode stack marking looks like a huge problem in the source code, but
in reality I've never seen a crash because of it.  My proposed fix is to
limit the maximum stack depth of bytecode objects we read, which will
turn a pernicious problem into an obvious one.

And, yes, we should be careful and register all threads with MPS.  This
will slow down things, but I don't know by how much.

That's all I can remember today, pre-grep.  (Which isn't enough for me:
I'd like to take the time to go through the entire diff for this).

> If the latter, how about making a list of the ones that are considered
> blockers, or perhaps just marking them as such in the FIXME comment in
> the source code?

Excellent idea.  As some of the FIXMEs will go away when what is
currently scratch/no-purespace is merged, maybe we should do that first.

Pip

^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-26 17:01                                               ` Pip Cet via Emacs development discussions.
@ 2024-12-26 19:45                                                 ` Gerd Möllmann
  2024-12-26 20:05                                                   ` Eli Zaretskii
  0 siblings, 1 reply; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-26 19:45 UTC (permalink / raw)
  To: Pip Cet
  Cc: Stefan Kangas, Eli Zaretskii, ofv, emacs-devel, eller.helmut,
	acorallo

Pip Cet <pipcet@protonmail.com> writes:

>> Forking MPS is obviously better to avoid, if at all possible.
>
> Forking MPS on GNU/Linux x86-64, in particular.  I'm not sure how Gerd
> feels about macOS.

My Emacs just works, but I'm meanwhile exclusively running --without-ns
only, because NS is unstable for me for years, and I've given up on it.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-26 19:45                                                 ` Gerd Möllmann
@ 2024-12-26 20:05                                                   ` Eli Zaretskii
  2024-12-26 20:12                                                     ` Gerd Möllmann
  0 siblings, 1 reply; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-26 20:05 UTC (permalink / raw)
  To: Gerd Möllmann
  Cc: pipcet, stefankangas, ofv, emacs-devel, eller.helmut, acorallo

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: Stefan Kangas <stefankangas@gmail.com>,  Eli Zaretskii <eliz@gnu.org>,
>  ofv@wanadoo.es,  emacs-devel@gnu.org,  eller.helmut@gmail.com,
>  acorallo@gnu.org
> Date: Thu, 26 Dec 2024 20:45:30 +0100
> 
> Pip Cet <pipcet@protonmail.com> writes:
> 
> >> Forking MPS is obviously better to avoid, if at all possible.
> >
> > Forking MPS on GNU/Linux x86-64, in particular.  I'm not sure how Gerd
> > feels about macOS.
> 
> My Emacs just works, but I'm meanwhile exclusively running --without-ns
> only, because NS is unstable for me for years, and I've given up on it.

So we will probably be interested to hear from someone who runs the
branch build --with-ns.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-26 20:05                                                   ` Eli Zaretskii
@ 2024-12-26 20:12                                                     ` Gerd Möllmann
  0 siblings, 0 replies; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-26 20:12 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: pipcet, stefankangas, ofv, emacs-devel, eller.helmut, acorallo

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: Stefan Kangas <stefankangas@gmail.com>,  Eli Zaretskii <eliz@gnu.org>,
>>  ofv@wanadoo.es,  emacs-devel@gnu.org,  eller.helmut@gmail.com,
>>  acorallo@gnu.org
>> Date: Thu, 26 Dec 2024 20:45:30 +0100
>> 
>> Pip Cet <pipcet@protonmail.com> writes:
>> 
>> >> Forking MPS is obviously better to avoid, if at all possible.
>> >
>> > Forking MPS on GNU/Linux x86-64, in particular.  I'm not sure how Gerd
>> > feels about macOS.
>> 
>> My Emacs just works, but I'm meanwhile exclusively running --without-ns
>> only, because NS is unstable for me for years, and I've given up on it.
>
> So we will probably be interested to hear from someone who runs the
> branch build --with-ns.

Yes, for long-term testing. The NS version with igc worked fine for me
until I ditched NS.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-25 17:03                                       ` Pip Cet via Emacs development discussions.
  2024-12-26  5:22                                         ` Gerd Möllmann
@ 2024-12-26 16:12                                         ` Stefan Kangas
  2024-12-26 17:05                                           ` Eli Zaretskii
  1 sibling, 1 reply; 203+ messages in thread
From: Stefan Kangas @ 2024-12-26 16:12 UTC (permalink / raw)
  To: Pip Cet
  Cc: Gerd Möllmann, Eli Zaretskii, ofv, emacs-devel, eller.helmut,
	acorallo

Pip Cet <pipcet@protonmail.com> writes:

> "Stefan Kangas" <stefankangas@gmail.com> writes:
>
>> Which parts of C11 help us?
>
> stdatomic.h, in this case.

Thanks for the explanation.

> IIRC, C99 doesn't have usable memory barriers, not even for signal
> handler/main thread races such as this one.
>
> Of course almost every compiler that supports C99, and certainly all
> compilers usable for compiling Emacs, provides (or doesn't need, in the
> case of TinyCC) ways of implementing them.  In the case of GCC, that
> used to be asm volatile ("" : : : "memory").

I don't think I understand what this means in practice.

Can we use stdatomic.h with C99, or do you propose that we require C11?

FWIW, the Linux kernel has used -std=gnu11 for a few years already, but
their job is probably easier than ours since they only target GCC and
clang.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-26 16:12                                         ` Stefan Kangas
@ 2024-12-26 17:05                                           ` Eli Zaretskii
  0 siblings, 0 replies; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-26 17:05 UTC (permalink / raw)
  To: Stefan Kangas
  Cc: pipcet, gerd.moellmann, ofv, emacs-devel, eller.helmut, acorallo

> From: Stefan Kangas <stefankangas@gmail.com>
> Date: Thu, 26 Dec 2024 16:12:52 +0000
> Cc: Gerd Möllmann <gerd.moellmann@gmail.com>, 
> 	Eli Zaretskii <eliz@gnu.org>, ofv@wanadoo.es, emacs-devel@gnu.org, eller.helmut@gmail.com, 
> 	acorallo@gnu.org
> 
> Pip Cet <pipcet@protonmail.com> writes:
> 
> > "Stefan Kangas" <stefankangas@gmail.com> writes:
> >
> >> Which parts of C11 help us?
> >
> > stdatomic.h, in this case.
> 
> Thanks for the explanation.
> 
> > IIRC, C99 doesn't have usable memory barriers, not even for signal
> > handler/main thread races such as this one.
> >
> > Of course almost every compiler that supports C99, and certainly all
> > compilers usable for compiling Emacs, provides (or doesn't need, in the
> > case of TinyCC) ways of implementing them.  In the case of GCC, that
> > used to be asm volatile ("" : : : "memory").
> 
> I don't think I understand what this means in practice.
> 
> Can we use stdatomic.h with C99, or do you propose that we require C11?

We will not require C11.  We should see which platforms of those we
care and can use MPS don't have stdatomic.h, and if any do, find
solutions for them.  All that assuming we need atomics, which is not
yet established, AFIU.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-25  5:23                                 ` Gerd Möllmann
  2024-12-25 10:48                                   ` Pip Cet via Emacs development discussions.
@ 2024-12-25 11:48                                   ` Helmut Eller
  2024-12-25 11:58                                     ` Gerd Möllmann
  2024-12-25 12:52                                     ` Eli Zaretskii
  2024-12-25 12:31                                   ` Eli Zaretskii
  2 siblings, 2 replies; 203+ messages in thread
From: Helmut Eller @ 2024-12-25 11:48 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: Pip Cet, Eli Zaretskii, ofv, emacs-devel, acorallo

On Wed, Dec 25 2024, Gerd Möllmann wrote:

> Pip Cet <pipcet@protonmail.com> writes:
>
>> I don't think that's the problem.  The problem is that signals can
>> interrupt MPS, on all platforms.
[...]
> And I don't think that's right :-). It's completely right that in the
> SIGPROF handler everything can be inconsistent. That's true both for MPS
> and Emacs. For example, the bindings stack (specpdl) may be inconsistent
> when SIGPROF arrives. Literally everything we do in the SIGPROF runs the
> risk of encountering inconsistencies.

The SIGPROF handler copies part of the potentially inconsistent state to
the profiler log.  That same potentially inconsistent profiler log is
used later, outside the signal handler.  Sounds like a problem to me.
Is it not?  Or is the probability for inconistencies being copied so low
that we ignore it?

Helmut



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-25 11:48                                   ` Helmut Eller
@ 2024-12-25 11:58                                     ` Gerd Möllmann
  2024-12-25 12:52                                     ` Eli Zaretskii
  1 sibling, 0 replies; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-25 11:58 UTC (permalink / raw)
  To: Helmut Eller; +Cc: Pip Cet, Eli Zaretskii, ofv, emacs-devel, acorallo

Helmut Eller <eller.helmut@gmail.com> writes:

> On Wed, Dec 25 2024, Gerd Möllmann wrote:
>
>> Pip Cet <pipcet@protonmail.com> writes:
>>
>>> I don't think that's the problem.  The problem is that signals can
>>> interrupt MPS, on all platforms.
> [...]
>> And I don't think that's right :-). It's completely right that in the
>> SIGPROF handler everything can be inconsistent. That's true both for MPS
>> and Emacs. For example, the bindings stack (specpdl) may be inconsistent
>> when SIGPROF arrives. Literally everything we do in the SIGPROF runs the
>> risk of encountering inconsistencies.
>
> The SIGPROF handler copies part of the potentially inconsistent state to
> the profiler log.  That same potentially inconsistent profiler log is
> used later, outside the signal handler.  Sounds like a problem to me.
> Is it not?  Or is the probability for inconistencies being copied so low
> that we ignore it?
>
> Helmut

I think the latter, i.e. we ignore it. I think, but I can't prove
anything, that the probability is good that we get away with it. For
example, We're only using the backtrace_p binding stack entries,
so the GIGPROF would have to happen when in some code putting them to be
in danger, so to speak. That's not so likely, I think.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-25 11:48                                   ` Helmut Eller
  2024-12-25 11:58                                     ` Gerd Möllmann
@ 2024-12-25 12:52                                     ` Eli Zaretskii
  1 sibling, 0 replies; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-25 12:52 UTC (permalink / raw)
  To: Helmut Eller; +Cc: gerd.moellmann, pipcet, ofv, emacs-devel, acorallo

> From: Helmut Eller <eller.helmut@gmail.com>
> Cc: Pip Cet <pipcet@protonmail.com>,  Eli Zaretskii <eliz@gnu.org>,
>   ofv@wanadoo.es,  emacs-devel@gnu.org,  acorallo@gnu.org
> Date: Wed, 25 Dec 2024 12:48:44 +0100
> 
> On Wed, Dec 25 2024, Gerd Möllmann wrote:
> 
> > Pip Cet <pipcet@protonmail.com> writes:
> >
> >> I don't think that's the problem.  The problem is that signals can
> >> interrupt MPS, on all platforms.
> [...]
> > And I don't think that's right :-). It's completely right that in the
> > SIGPROF handler everything can be inconsistent. That's true both for MPS
> > and Emacs. For example, the bindings stack (specpdl) may be inconsistent
> > when SIGPROF arrives. Literally everything we do in the SIGPROF runs the
> > risk of encountering inconsistencies.
> 
> The SIGPROF handler copies part of the potentially inconsistent state to
> the profiler log.  That same potentially inconsistent profiler log is
> used later, outside the signal handler.  Sounds like a problem to me.
> Is it not?  Or is the probability for inconistencies being copied so low
> that we ignore it?

Or maybe the profiling code is robust in the face of these
inconsistencies?



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-25  5:23                                 ` Gerd Möllmann
  2024-12-25 10:48                                   ` Pip Cet via Emacs development discussions.
  2024-12-25 11:48                                   ` Helmut Eller
@ 2024-12-25 12:31                                   ` Eli Zaretskii
  2024-12-25 12:54                                     ` Gerd Möllmann
  2 siblings, 1 reply; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-25 12:31 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: pipcet, ofv, emacs-devel, eller.helmut, acorallo

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: Eli Zaretskii <eliz@gnu.org>,  ofv@wanadoo.es,  emacs-devel@gnu.org,
>   eller.helmut@gmail.com,  acorallo@gnu.org
> Date: Wed, 25 Dec 2024 06:23:49 +0100
> 
> > If they don't have separate stacks, and we interrupt MPS, the signal
> > handler cannot look at any MPS-modifiable memory (including roots, which
> > may be in an inconsistent state mid-GC), ever.  This includes the
> > specpdl.  We can't write to MPS-known memory, ever.  This includes any
> > area we might want to copy the backtrace or specpdl to.
> 
> And I don't think that's right :-). It's completely right that in the
> SIGPROF handler everything can be inconsistent. That's true both for MPS
> and Emacs. For example, the bindings stack (specpdl) may be inconsistent
> when SIGPROF arrives.

Theoretically, maybe.  But in practice, you'd need to identify the
code which manipulates specpdl that could have specpdl in inconsistent
state if interrupted at some opportune point.  Can you identify such
places in the code?

> Literally everything we do in the SIGPROF runs the
> risk of encountering inconsistencies.

Only if we interrupt code which leaves the global state inconsistent,
and if what the SIGPROF handler does involves accessing those
potentially-inconsistent data.

> I think that's already true for the old GC. There is nothing
> guaranteeing that the contents of the binding stack is consistent, for
> example. But we get away with it well enough that the profiler is
> useful.

With the old GC, we have special code to deal with this:

  /* Signal handler for sampling profiler.  */

  static void
  add_sample (struct profiler_log *plog, EMACS_INT count)
  {
    if (EQ (backtrace_top_function (), QAutomatic_GC)) /* bug#60237 */
      /* Special case the time-count inside GC because the hash-table
	 code is not prepared to be used while the GC is running.
	 More specifically it uses ASIZE at many places where it does
	 not expect the ARRAY_MARK_FLAG to be set.  We could try and
	 harden the hash-table code, but it doesn't seem worth the
	 effort.  */
      plog->gc_count = saturated_add (plog->gc_count, count);

So all we need is for backtrace_top_function to be safe when SIGPROF
arrives while we are in GC.  Are you saying backtrace_top_function is
unsafe in that case?

> With MPS, from my POV, the situation is pretty similar. Try to get away
> with it by not triggering MPS while in a state that we must assume is
> inconsistent.

The difference with MPS is that the old GC is synchronous with the
Lisp machine, so it couldn't possibly start while we are modifying
specpdl.  That is no longer true with MPS, AFAIU, because MPS could
start GC asynchronously.

> >> The SIGPROF handler does two things: (1) get the current backtrace,
> >> which does not trip on memory barriers, and
> >
> > Even if the specpdl were an ambiguous root, we'd be making very
> > permanent and far-reaching assumptions about how MPS handles such roots
> > if we assumed that we could even look at such roots during GC.  This
> > goes doubly for assuming that we can extract references to
> > ambiguously-rooted objects and put them into other areas of MPS-visible
> > memory.  Even if this worked perfectly with current MPS on all
> > platforms, it would still be unreasonable for us to rely on such
> > implementation details.
> >
> > We can't do (1).
> 
> I disagree, abviously :-)
> 
> For me, it's not about a theoretical or even practical solution that
> somehow ensures a consistent state in MPS, or some future changes in MPS
> or something. It's about getting away with what we do in the profiler
> _now_, as we do with the old GC. which is already seeing potentially
> inconsistent state in Emacs' memory.

See above: there's a difference.  So I would really like to hear why
you think accessing specpdl from a SIGPROF handler in an igc build is
safe.

> I think the _now_ is also important. From my POV, we could discuss
> better solutions later.

If you are right in your conclusions, certainly.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-25 12:31                                   ` Eli Zaretskii
@ 2024-12-25 12:54                                     ` Gerd Möllmann
  0 siblings, 0 replies; 203+ messages in thread
From: Gerd Möllmann @ 2024-12-25 12:54 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: pipcet, ofv, emacs-devel, eller.helmut, acorallo

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann@gmail.com>
>> Cc: Eli Zaretskii <eliz@gnu.org>,  ofv@wanadoo.es,  emacs-devel@gnu.org,
>>   eller.helmut@gmail.com,  acorallo@gnu.org
>> Date: Wed, 25 Dec 2024 06:23:49 +0100
>> 
>> > If they don't have separate stacks, and we interrupt MPS, the signal
>> > handler cannot look at any MPS-modifiable memory (including roots, which
>> > may be in an inconsistent state mid-GC), ever.  This includes the
>> > specpdl.  We can't write to MPS-known memory, ever.  This includes any
>> > area we might want to copy the backtrace or specpdl to.
>> 
>> And I don't think that's right :-). It's completely right that in the
>> SIGPROF handler everything can be inconsistent. That's true both for MPS
>> and Emacs. For example, the bindings stack (specpdl) may be inconsistent
>> when SIGPROF arrives.
>
> Theoretically, maybe.  But in practice, you'd need to identify the
> code which manipulates specpdl that could have specpdl in inconsistent
> state if interrupted at some opportune point.  Can you identify such
> places in the code?

Which is basically what I answered to Helmut in another sub-thread.

...

>> For me, it's not about a theoretical or even practical solution that
>> somehow ensures a consistent state in MPS, or some future changes in MPS
>> or something. It's about getting away with what we do in the profiler
>> _now_, as we do with the old GC. which is already seeing potentially
>> inconsistent state in Emacs' memory.
>
> See above: there's a difference.  So I would really like to hear why
> you think accessing specpdl from a SIGPROF handler in an igc build is
> safe.

I tried to answer your questions in a different reply sent a few minutes
ago.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-23 23:37                   ` Some experience with the igc branch Pip Cet via Emacs development discussions.
  2024-12-24  4:03                     ` Gerd Möllmann
@ 2024-12-24 12:11                     ` Eli Zaretskii
  1 sibling, 0 replies; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-24 12:11 UTC (permalink / raw)
  To: Pip Cet; +Cc: gerd.moellmann, ofv, emacs-devel, eller.helmut, acorallo

> Date: Mon, 23 Dec 2024 23:37:13 +0000
> From: Pip Cet <pipcet@protonmail.com>
> Cc: Gerd Möllmann <gerd.moellmann@gmail.com>, ofv@wanadoo.es, emacs-devel@gnu.org, eller.helmut@gmail.com, acorallo@gnu.org
> 
> "Eli Zaretskii" <eliz@gnu.org> writes:
> 
> 1. which signal handlers want to read Lisp data
> 2. which signal handlers want to write Lisp data
> 3. which signal handlers want to allocate Lisp objects temporarily,
> while guaranteeing no references to those objects survive when the
> signal handler returns.
> 4. which signal handlers want to allocate Lisp objects permanently,
> storing references to the new objects in "old" data
> 4a. ... and are willing to call a special transformation function to do
> so
> 4b. ... and want to do so implicitly, expecting memory manipulation to
> "just work".
> 
> 1: definitely works
> 2: should work, but may hit a write barrier
> 3: could be made to work if there's interest
> 4a: if we must
> 4b: see the other thread.  If we have both make_object_writable
> (formerly CHECK_IMPURE) and commit_object_changes functions and call
> them consistently, it might be possible to find a way.
> 
> > SIGPROF does (it's the basis for our Lisp profiler).
> 
> That's 1, 2, but not 3 or 4, right?

I don't think I understand your categories well enough, and anyway
didn't look at the code to find out where it stops in that scale.

> > SIGCHLD doesn't run Lisp (I think), but it examines objects and data
> > structures of the Lisp machine (those related to child processes).
> 
> Just 1, then?

Ditto.  It calls various functions, which I didn't trace into.

> >> One thing I've seen done elsewhere is to publish a message to a message
> >> board so that it can be handled outside of the signal handler. Something
> >> like that, you know what I mean.
> >
> > This is tricky for the profiler, because you want to sample the
> > function in which you are right there and then, not some time later.
> 
> But would it be so bad to use a copy of the specpdl stack, placed in a
> prepared area which is a GC root so we'd guarantee survival (but not
> immutability; I don't think that matters in practice) of entries?
> memcpy is safe to call from a signal handler, and then we could do all
> of the processing safely.

How will you ensure that the copied specpdl stack faithfully tells the
profile info?  It will most probably introduce bias into the profile.



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
@ 2024-12-27  9:49 Sean Devlin
  2024-12-27 12:34 ` Eli Zaretskii
  0 siblings, 1 reply; 203+ messages in thread
From: Sean Devlin @ 2024-12-27  9:49 UTC (permalink / raw)
  To: eliz; +Cc: emacs-devel

> > From: Gerd Möllmann <gerd.moellmann@gmail.com>
> > Cc: Stefan Kangas <stefankangas@gmail.com>,  Eli Zaretskii <eliz@gnu.org>,
> >  ofv@wanadoo.es,  emacs-devel@gnu.org,  eller.helmut@gmail.com,
> >  acorallo@gnu.org
> > Date: Thu, 26 Dec 2024 20:45:30 +0100
> > 
> > Pip Cet <pipcet@protonmail.com> writes:
> > 
> > >> Forking MPS is obviously better to avoid, if at all possible.
> > >
> > > Forking MPS on GNU/Linux x86-64, in particular.  I'm not sure how Gerd
> > > feels about macOS.
> > 
> > My Emacs just works, but I'm meanwhile exclusively running --without-ns
> > only, because NS is unstable for me for years, and I've given up on it.
> 
> So we will probably be interested to hear from someone who runs the
> branch build --with-ns.

I use the NS build as my daily driver, and I’ve been running the scratch/igc branch for a couple weeks now. I’m happy to perform tests and report any problems. It’s been stable so far. (I did report bug#74966, but it sounds from the investigation that it is unrelated to GC.)

Thanks, and let me know how I can help!


^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-27  9:49 Sean Devlin
@ 2024-12-27 12:34 ` Eli Zaretskii
  2024-12-28  1:55   ` Sean Devlin
  0 siblings, 1 reply; 203+ messages in thread
From: Eli Zaretskii @ 2024-12-27 12:34 UTC (permalink / raw)
  To: Sean Devlin; +Cc: emacs-devel

> From: Sean Devlin <spd@toadstyle.org>
> Date: Fri, 27 Dec 2024 18:49:08 +0900
> Cc: emacs-devel@gnu.org
> 
> > > From: Gerd Möllmann <gerd.moellmann@gmail.com>
> > > Cc: Stefan Kangas <stefankangas@gmail.com>,  Eli Zaretskii <eliz@gnu.org>,
> > >  ofv@wanadoo.es,  emacs-devel@gnu.org,  eller.helmut@gmail.com,
> > >  acorallo@gnu.org
> > > Date: Thu, 26 Dec 2024 20:45:30 +0100
> > > 
> > > Pip Cet <pipcet@protonmail.com> writes:
> > > 
> > > >> Forking MPS is obviously better to avoid, if at all possible.
> > > >
> > > > Forking MPS on GNU/Linux x86-64, in particular.  I'm not sure how Gerd
> > > > feels about macOS.
> > > 
> > > My Emacs just works, but I'm meanwhile exclusively running --without-ns
> > > only, because NS is unstable for me for years, and I've given up on it.
> > 
> > So we will probably be interested to hear from someone who runs the
> > branch build --with-ns.
> 
> I use the NS build as my daily driver, and I’ve been running the scratch/igc branch for a couple weeks now. I’m happy to perform tests and report any problems. It’s been stable so far. (I did report bug#74966, but it sounds from the investigation that it is unrelated to GC.)

Thanks, that's very good news!

> Thanks, and let me know how I can help!

Perhaps try to use more the features which require signals, like
profiler and sub-processes?

Also, if you are used to start Emacs anew frequently, try instead to
leave a session running for a long time (and use emacsclient for
one-off jobs, if you need that).

TIA



^ permalink raw reply	[flat|nested] 203+ messages in thread

* Re: Some experience with the igc branch
  2024-12-27 12:34 ` Eli Zaretskii
@ 2024-12-28  1:55   ` Sean Devlin
  0 siblings, 0 replies; 203+ messages in thread
From: Sean Devlin @ 2024-12-28  1:55 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 638 bytes --]

Hi Eli,

> On Dec 27, 2024, at 9:34 PM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
> …
> 
> 
> Perhaps try to use more the features which require signals, like
> profiler and sub-processes?

Sounds good, will do.

I ran the profiler for a short while a few days ago and saw no issues. I’ll try to experiment with subprocesses.

> 
> Also, if you are used to start Emacs anew frequently, try instead to
> leave a session running for a long time (and use emacsclient for
> one-off jobs, if you need that).

Yeah, I typically favor long-running sessions. I’ll try to avoid unnecessary restarts.

> 
> TIA

Cheers.

[-- Attachment #2: Type: text/html, Size: 6037 bytes --]

^ permalink raw reply	[flat|nested] 203+ messages in thread

end of thread, other threads:[~2024-12-29  6:15 UTC | newest]

Thread overview: 203+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-22 15:40 Some experience with the igc branch Óscar Fuentes
2024-12-22 17:18 ` Gerd Möllmann
2024-12-22 17:29 ` Gerd Möllmann
2024-12-22 17:41 ` Pip Cet via Emacs development discussions.
2024-12-22 17:56   ` Gerd Möllmann
2024-12-22 19:11   ` Óscar Fuentes
2024-12-23  0:05     ` Pip Cet via Emacs development discussions.
2024-12-23  1:00       ` Óscar Fuentes
2024-12-24 22:34         ` Pip Cet via Emacs development discussions.
2024-12-25  4:25           ` Freezing frame with igc Gerd Möllmann
2024-12-25 11:19             ` Pip Cet via Emacs development discussions.
2024-12-25 11:55             ` Óscar Fuentes
2024-12-23  3:42       ` Some experience with the igc branch Gerd Möllmann
2024-12-23  6:27     ` Jean Louis
2024-12-22 20:29   ` Helmut Eller
2024-12-22 20:50   ` Gerd Möllmann
2024-12-22 22:26     ` Pip Cet via Emacs development discussions.
2024-12-23  3:23       ` Gerd Möllmann
     [not found]         ` <m234ieddeu.fsf_-_@gmail.com>
     [not found]           ` <87ttaueqp9.fsf@protonmail.com>
     [not found]             ` <m2frme921u.fsf@gmail.com>
     [not found]               ` <87ldw6ejkv.fsf@protonmail.com>
     [not found]                 ` <m2bjx2h8dh.fsf@gmail.com>
2024-12-23 14:45                   ` Make Signal handling patch platform-dependent? Pip Cet via Emacs development discussions.
2024-12-23 14:54                     ` Gerd Möllmann
2024-12-23 15:11                       ` Eli Zaretskii
2024-12-23 13:35       ` Some experience with the igc branch Eli Zaretskii
2024-12-23 14:03         ` Discussion with MPS people Gerd Möllmann
2024-12-23 14:04           ` Gerd Möllmann
2024-12-23 15:07         ` Some experience with the igc branch Pip Cet via Emacs development discussions.
2024-12-23 15:26           ` Gerd Möllmann
2024-12-23 16:03             ` Pip Cet via Emacs development discussions.
2024-12-23 16:44               ` Eli Zaretskii
2024-12-23 17:16                 ` Pip Cet via Emacs development discussions.
2024-12-23 18:35                   ` Eli Zaretskii
2024-12-23 18:48                     ` Gerd Möllmann
2024-12-23 19:25                       ` Eli Zaretskii
2024-12-23 20:30                     ` Benjamin Riefenstahl
2024-12-23 23:39                       ` Pip Cet via Emacs development discussions.
2024-12-24 12:14                         ` Eli Zaretskii
2024-12-24 13:18                           ` Pip Cet via Emacs development discussions.
2024-12-24 13:42                           ` Benjamin Riefenstahl
2024-12-24  3:37                       ` Eli Zaretskii
2024-12-24  8:48                         ` Benjamin Riefenstahl
2024-12-24 13:52                           ` Eli Zaretskii
2024-12-24 13:54                             ` Benjamin Riefenstahl
2024-12-23 17:44               ` Gerd Möllmann
2024-12-23 19:00                 ` Eli Zaretskii
2024-12-23 19:37                   ` Eli Zaretskii
2024-12-23 20:49                   ` Gerd Möllmann
2024-12-23 21:43                     ` Helmut Eller
2024-12-23 21:49                       ` Pip Cet via Emacs development discussions.
2024-12-23 21:58                         ` Helmut Eller
2024-12-23 23:20                           ` Pip Cet via Emacs development discussions.
2024-12-24  5:38                             ` Helmut Eller
2024-12-24  6:27                               ` Gerd Möllmann
2024-12-24 10:09                               ` Pip Cet via Emacs development discussions.
2024-12-24  4:05                       ` Gerd Möllmann
2024-12-24  8:50                         ` Gerd Möllmann
2024-12-24  6:03                     ` SIGPROF + SIGCHLD and igc Gerd Möllmann
2024-12-24  8:23                       ` Helmut Eller
2024-12-24  8:39                         ` Gerd Möllmann
2024-12-25  9:22                           ` Helmut Eller
2024-12-25  9:43                             ` Gerd Möllmann
2024-12-24 13:05                         ` Eli Zaretskii
2024-12-25 10:46                           ` Helmut Eller
2024-12-25 12:45                             ` Eli Zaretskii
2024-12-24 12:54                       ` Eli Zaretskii
2024-12-24 12:59                         ` Gerd Möllmann
2024-12-27  8:08                       ` Helmut Eller
2024-12-27  8:51                         ` Eli Zaretskii
2024-12-27 14:53                           ` Helmut Eller
2024-12-27 15:09                             ` Pip Cet via Emacs development discussions.
2024-12-27 15:19                             ` Eli Zaretskii
2024-12-27  8:55                         ` Gerd Möllmann
2024-12-27 15:40                           ` Helmut Eller
2024-12-27 15:53                             ` Gerd Möllmann
2024-12-27 11:36                         ` Pip Cet via Emacs development discussions.
2024-12-27 16:14                           ` Helmut Eller
2024-12-28 10:02                             ` Helmut Eller
2024-12-28 10:50                               ` Eli Zaretskii
2024-12-28 13:52                                 ` Helmut Eller
2024-12-28 14:25                                   ` Eli Zaretskii
2024-12-28 16:46                                     ` Helmut Eller
2024-12-28 17:35                                       ` Eli Zaretskii
2024-12-28 18:08                                         ` Helmut Eller
2024-12-28 19:00                                           ` Eli Zaretskii
2024-12-28 19:28                                             ` Helmut Eller
2024-12-28 19:32                                       ` Pip Cet via Emacs development discussions.
2024-12-28 19:51                                         ` Helmut Eller
2024-12-28 20:43                                           ` Pip Cet via Emacs development discussions.
2024-12-29  5:44                                             ` Eli Zaretskii
2024-12-23 23:37                   ` Some experience with the igc branch Pip Cet via Emacs development discussions.
2024-12-24  4:03                     ` Gerd Möllmann
2024-12-24 10:25                       ` Pip Cet via Emacs development discussions.
2024-12-24 10:50                         ` Gerd Möllmann
2024-12-24 13:15                         ` Eli Zaretskii
2024-12-24 12:26                       ` Eli Zaretskii
2024-12-24 12:56                         ` Gerd Möllmann
2024-12-24 13:19                           ` Pip Cet via Emacs development discussions.
2024-12-24 13:38                             ` Gerd Möllmann
2024-12-24 13:46                           ` Eli Zaretskii
2024-12-24 14:12                             ` Gerd Möllmann
2024-12-24 14:40                               ` Eli Zaretskii
2024-12-25  4:56                                 ` Gerd Möllmann
2024-12-25 12:19                                   ` Eli Zaretskii
2024-12-25 12:50                                     ` Gerd Möllmann
2024-12-25 13:00                                       ` Eli Zaretskii
2024-12-25 13:08                                         ` Gerd Möllmann
2024-12-25 13:26                                           ` Eli Zaretskii
2024-12-25 14:07                                             ` Gerd Möllmann
2024-12-25 14:43                                               ` Helmut Eller
2024-12-25 14:59                                                 ` Eli Zaretskii
2024-12-25 20:44                                                   ` Helmut Eller
2024-12-26  6:29                                                     ` Eli Zaretskii
2024-12-26  8:02                                                       ` Helmut Eller
2024-12-26  9:32                                                         ` Eli Zaretskii
2024-12-26 12:24                                                           ` Helmut Eller
2024-12-26 15:23                                                             ` Eli Zaretskii
2024-12-26 23:29                                                               ` Paul Eggert
2024-12-27  7:57                                                                 ` Eli Zaretskii
2024-12-27 19:34                                                                   ` Paul Eggert
2024-12-28  8:06                                                                     ` Eli Zaretskii
2024-12-28 20:44                                                                       ` Paul Eggert
2024-12-29  5:47                                                                         ` Eli Zaretskii
2024-12-25 15:02                                                 ` Gerd Möllmann
2024-12-25 13:09                                       ` Eli Zaretskii
2024-12-25 13:46                                         ` Gerd Möllmann
2024-12-25 14:37                                           ` Eli Zaretskii
2024-12-25 14:57                                             ` Gerd Möllmann
2024-12-25 15:28                                               ` Eli Zaretskii
2024-12-25 15:49                                                 ` Gerd Möllmann
2024-12-25 17:26                                                   ` Eli Zaretskii
2024-12-26  5:25                                                     ` Gerd Möllmann
2024-12-26  7:43                                                       ` Eli Zaretskii
2024-12-26  7:57                                                         ` Gerd Möllmann
2024-12-26 11:56                                                           ` Eli Zaretskii
2024-12-26 15:27                                                           ` Stefan Kangas
2024-12-26 19:51                                                             ` Gerd Möllmann
2024-12-27  9:45                                                               ` Gerd Möllmann
2024-12-27 13:56                                                                 ` Gerd Möllmann
2024-12-27 15:01                                                                   ` Pip Cet via Emacs development discussions.
2024-12-27 15:28                                                                     ` Eli Zaretskii
2024-12-27 15:47                                                                       ` Pip Cet via Emacs development discussions.
2024-12-27 16:18                                                                       ` Gerd Möllmann
2024-12-28  9:10                                                                         ` Stefan Kangas
2024-12-28  9:20                                                                           ` Gerd Möllmann
2024-12-28  9:24                                                                             ` Gerd Möllmann
2024-12-27 16:05                                                                     ` Gerd Möllmann
2024-12-27 17:00                                                                       ` Pip Cet via Emacs development discussions.
2024-12-27 16:37                                                                   ` Eli Zaretskii
2024-12-27 17:26                                                                     ` Pip Cet via Emacs development discussions.
2024-12-27 19:12                                                                       ` Gerd Möllmann
2024-12-28  7:36                                                                       ` Eli Zaretskii
2024-12-28 12:35                                                                         ` Pip Cet via Emacs development discussions.
2024-12-28 12:51                                                                           ` Gerd Möllmann
2024-12-28 13:13                                                                           ` Eli Zaretskii
2024-12-28  9:29                                                                       ` Eli Zaretskii
2024-12-28 13:12                                                                         ` Pip Cet via Emacs development discussions.
2024-12-28 14:08                                                                           ` Eli Zaretskii
2024-12-27 18:21                                                                     ` Gerd Möllmann
2024-12-27 19:23                                                                       ` Pip Cet via Emacs development discussions.
2024-12-27 20:28                                                                         ` Gerd Möllmann
2024-12-28 10:39                                                                       ` Eli Zaretskii
2024-12-28 11:07                                                                         ` Gerd Möllmann
2024-12-28 11:23                                                                           ` Gerd Möllmann
2024-12-28 14:04                                                                           ` Pip Cet via Emacs development discussions.
2024-12-28 14:25                                                                             ` Gerd Möllmann
2024-12-28 16:27                                                                             ` Eli Zaretskii
2024-12-28  6:08                                                                     ` Gerd Möllmann
2024-12-25 17:40                                       ` Pip Cet via Emacs development discussions.
2024-12-25 17:51                                         ` Eli Zaretskii
2024-12-26 15:24                                           ` Pip Cet via Emacs development discussions.
2024-12-26 15:57                                             ` Eli Zaretskii
2024-12-27 14:34                                               ` Pip Cet via Emacs development discussions.
2024-12-27 15:58                                                 ` Eli Zaretskii
2024-12-27 16:42                                                   ` Pip Cet via Emacs development discussions.
2024-12-28 18:02                                                     ` Eli Zaretskii
2024-12-28 21:05                                                       ` Pip Cet via Emacs development discussions.
2024-12-29  6:15                                                         ` Eli Zaretskii
2024-12-26  5:27                                         ` Gerd Möllmann
2024-12-26  5:29                                         ` Gerd Möllmann
2024-12-24 21:18                               ` Pip Cet via Emacs development discussions.
2024-12-25  5:23                                 ` Gerd Möllmann
2024-12-25 10:48                                   ` Pip Cet via Emacs development discussions.
2024-12-25 13:40                                     ` Stefan Kangas
2024-12-25 17:03                                       ` Pip Cet via Emacs development discussions.
2024-12-26  5:22                                         ` Gerd Möllmann
2024-12-26  7:33                                           ` Eli Zaretskii
2024-12-26  8:02                                             ` Gerd Möllmann
2024-12-26 15:50                                             ` Stefan Kangas
2024-12-26 16:13                                               ` Eli Zaretskii
2024-12-26 19:40                                                 ` Gerd Möllmann
2024-12-26 17:01                                               ` Pip Cet via Emacs development discussions.
2024-12-26 19:45                                                 ` Gerd Möllmann
2024-12-26 20:05                                                   ` Eli Zaretskii
2024-12-26 20:12                                                     ` Gerd Möllmann
2024-12-26 16:12                                         ` Stefan Kangas
2024-12-26 17:05                                           ` Eli Zaretskii
2024-12-25 11:48                                   ` Helmut Eller
2024-12-25 11:58                                     ` Gerd Möllmann
2024-12-25 12:52                                     ` Eli Zaretskii
2024-12-25 12:31                                   ` Eli Zaretskii
2024-12-25 12:54                                     ` Gerd Möllmann
2024-12-24 12:11                     ` Eli Zaretskii
  -- strict thread matches above, loose matches on Subject: below --
2024-12-27  9:49 Sean Devlin
2024-12-27 12:34 ` Eli Zaretskii
2024-12-28  1:55   ` Sean Devlin

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.