unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: "Gerd Möllmann" <gerd.moellmann@gmail.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: emacs-devel@gnu.org
Subject: Re: MPS experiment successful
Date: Wed, 17 Apr 2024 16:30:42 +0200	[thread overview]
Message-ID: <m2pluogiy5.fsf@Pro.fritz.box> (raw)
In-Reply-To: <86le5cgmp3.fsf@gnu.org> (Eli Zaretskii's message of "Wed, 17 Apr 2024 16:09:44 +0300")

[-- Attachment #1: Type: text/plain, Size: 2428 bytes --]

Eli Zaretskii <eliz@gnu.org> writes:

>> That's all. There is nothing more. And I'm currently undecided how to
>> proceed with this.
>
> The next step would be to push a feature branch into the Emacs Git
> repository and let people try the branch on the other supported
> platforms.  If the code is not yet mature enough for that, please
> suggest how to get from here to there.

I think technically it could be done, but there are some things that
would need to be done. From the top of my head:

- CL packages vs. obarrays (I do have packages but no obarrays)
- Handling of pure space
- Handling the new JSON stuff, which I don't have in my branch, I think.
- ...

And then of course the general work of getting a branch from a what is
basically a fork back to the forked repo, which I'm not so sure how to
go about that.

But my undecidedness comes more from what could come after that. I'm not
sure how willing I am to invest what can be years to get things
finished, esp. if no one else starts working on it. Had that one time
too often in the 90s, some might remember ;-).

> Would you like to describe in a few words what would be the advantages
> and disadvantages of this GC for Emacs?

Advantages for users:

For the user it means the final end to GC pauses. MPS runs in a
different thread wich can be in a different core.

What MPS means for the overall speed of Emacs I find impossible to say.
MPS doesn't seem to be slow at all though, although some slowdown can be
expected in the client because of thread-safe allocations.

Example for a confusing fact that I noticed today:

Full build without MPS, -O0, checking

real	26:20.01
user	1:32:15.54
sys	3:20.74

Same build with debug MPS (-lmps-debug)

real	14:07.90
user	44:17.94
sys	3:15.99

That's on an 2,3 GHz Quad-Core Intel Core i7, 16G RAM, SSD. 🤷

Possible advantages for developers of Emacs:

- MPS is thread-safe, i.e. it's a tiny step in that direction.
- I think igc.c is easier to understand that alloc.c.

Disadvantages:

As long as there are 2 GCs, there are 2 instead of 1 GC :-).

Will increase memory usage in its current state, part of which could be
optimized.

Some configurations/platform might not be possible to support. Hard to
say.

...

Let me also attach a badly written and curated Org file that I have in
that branch. Maybe that has some additional answers


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: igc.org --]
[-- Type: text/x-org, Size: 6450 bytes --]

#+title: MPS garbage collection for Emacs

* What is MPS?

The MPS (Memory Pool System) is a GC library developed by Ravenbrook
Ltd. MPS is available from [[https://github.com/Ravenbrook/mps?tab=readme-ov-file][Github]] under a BSD license. See the
[[https://memory-pool-system.readthedocs.io/en/latest/][documentation]].

In short, MPS implements incremental, generational, concurrent, copying,
thread-safe garbage collection on a large variety of platforms. It has
been around for a long time, is stable, and well documented.

* What is this branch?

This [[https://github.com/gerd-moellmann/emacs-with-cl-packages/tree/igc][branch]] is an experiment if Emacs can be made to use a GC based on
MPS. I'm doing this for my own entertainment, it's not in any form
"official".

* Caveats

This is my local Emacs, which is different from mainstream Emacs: It
uses CL packages, doesn't have obarrays, doesn't support pure space,
does not support shorthands and probably some other stuff.

In addition, I'm exclusively using macOS, so it's unlikely to compile or
run on other systems OOTB. It should not be too hard to port, though.

* Current state

Build succeeds up to and including =compile-first=, i.e. Emacs pdumps, and
compiles some =.elc= files.

* Things worth mentioning

** Configuration

There is a now configure switch =--with-mps= with values =no, yes, debug=.
If =debug= is given, Emacs links with the debug version of the MPS
library.

** Building MPS

I built MPS from its Git repo. I had to make two trivial fixes for macOS
for which I submitted issues upstream.

** Every object has a 1 word header

At the moment, every object has a one-word header, which is not visible
to the rest of Emacs. See ~struct igc_header~.

This means in particular that conses are 50% larger than they would
normally be. I did this because it is less work:

- All objects can be handled by one set of MPS callback functions.

- It simplifies the implementation of eq hash tables considerably by
  storing an address-independent hash in the header.

The header can be removed from conses (and other objects, if that's
worth it) by writing additional code using additional MPS pools with
their own object formats.

Note that doing this also means that one has to use MPS's location
dependency feature for implementing eq hash tables.

Also be aware that two calls to ~sxhash-eq~ can then return different
hashes when a concurrent GC happens between calls, unless something is
done to ensure that the hashed objects aren't moved by the GC for long
enough.

** MPS In-band headers

I have tried to use MPS in-band headers at first, but couldn't get it to
work. I don't claim they don't work, though. After all I was and still
am learning MPS.

** Weak hash tables

I didn't think that weak hash tables were important enough for my
experiment, so I didn't implement them to save work.

Weak tables can be implemented using the already present in =igc.c= AWL
pool and its allocation points, and then using MPS's dependent objects
in the hash table implementation. There are examples how to do this in
the MPS documentation, and in an example Scheme interpreter.

To prepare for that, keys and values of a hash table are already split
into two vectors. Two vectors are necessary because objects in an AWL
pool must either contain weak references only, or strong references
only. The currently malloc'd vectors would have to be replaced with
special vectors allocated from the AWL pool.

** Handling of a loaded pdump

The hot part of a loaded pdump (ca. 18 MB) is currently used as an
ambiguous root for MPS. A number of things could be investigated

- Use a root with barrier (~MPS_RM_PROT~)

- Copy objects from the dump to an MPS pool that uses ~MPS_KEY_GEN~ to
  allocate objects in an old generation.

  It is unclear to me from the docs if the AMC pool supports that, but
  one could use an AMS pool.

  After loading a dump we would copy the whole object graph to MPS,
  starting from static roots.  After that, the dump itself would no
  longer be used.

  Costs some load time, though.

There is also a slight problem currently that's a consequence of Emacs
mixing GC'd objects and malloc'd ones. The loaded dump is scanned
conservativly, but if such objects contain malloc'd data structures
holding references, these are invisble to MPS, so one has to jump
through hoops.

Examples:

- Hash tables hold keys and values in malloc'd vectors. If the hash
  table is in the dump, and the vectors are on the heap, keys and values
  won't be seen be MPS.

- Symbols in the dump may have a Lisp_Buffer_Local_Value that is on the
  heap.

- Buffers have a itree_tree that is malloc'd.

** Intervals and ~itree_node~

Problem with these two is that there are pointers from Lisp objects to
malloc'd memory and back. This is easier to handle if allocated
from MPS. Moving these to MPS makes things easier because MPS triggers
the scanning, and, not the least, makes an ambiguous scan of the loaded
dump keep things alive.

** Finalization

Is now implemented.

** Things old GC does except GC

The function ~garbage_collect~ does some things that are not directly
related to GC, simply because it is called every once in a while.

- compact buffers, undo-list.

This is currently not done, but could be done in another way, from a
timer, for instance.

** Not Considered

Some things are not implemented because they were out of scope. For
example,

- ~memory-report~ Could be done with MPS's pool walk functionality.

- profiler (~profiler-memory-start~...) No idea, haven't looked at it.

- Anything I don't currently use either because it doesn't exist on
  macOS (text conversions, for example), or because I didn't think it
  being essiential (xwidgets, for example).

** Knobs not tried

- Number of generations
- Size of generations
- Mortality probabilities
- Allocation policies, like ramp allocation
- ...

** Implementation

I think it's not too terrible, but some things should be improved

- Error handling. It currently aborts in many circumstances, but
  it is also not clear what else to do.

- Idle time use. It does something in this regard, but not much,
  and not always with a time constraint (handling MPS messages).

** Debugger

MPS uses memory barriers. In certain situations it is necessary to
remove these to be able to do certain things.  I've added a command
=xpostmortem= to the LLDB support for that. GDB will need something
similar.

  reply	other threads:[~2024-04-17 14:30 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-17 10:46 MPS experiment successful Gerd Möllmann
2024-04-17 12:22 ` Björn Bidar
     [not found] ` <87cyqo89gk.fsf@>
2024-04-17 12:35   ` Björn Bidar
2024-04-17 13:09 ` Eli Zaretskii
2024-04-17 14:30   ` Gerd Möllmann [this message]
2024-04-17 15:49     ` Eli Zaretskii
2024-04-17 16:29       ` Gerd Möllmann
2024-04-17 17:51         ` Eli Zaretskii
2024-04-17 18:34           ` Gerd Möllmann
2024-04-17 19:21             ` Eli Zaretskii
2024-04-17 18:41       ` Gerd Möllmann
2024-04-17 19:23         ` Eli Zaretskii
2024-04-17 19:51           ` Gerd Möllmann
2024-04-17 23:45             ` Dmitry Gutov
2024-04-18  4:27               ` Gerd Möllmann
2024-04-18  5:08             ` Eli Zaretskii
2024-04-18  8:57               ` Gerd Möllmann
2024-04-18 19:18                 ` Gerd Möllmann
2024-04-17 23:48     ` Dmitry Gutov
2024-04-18  4:31       ` Gerd Möllmann
2024-04-18  9:14     ` Andrea Corallo
     [not found] ` <661fbf1d.050a0220.936ef.ee84SMTPIN_ADDED_BROKEN@mx.google.com>
2024-04-17 13:26   ` Gerd Möllmann
     [not found] ` <661fc22e.170a0220.18fe3.c635SMTPIN_ADDED_BROKEN@mx.google.com>
2024-04-17 13:33   ` Gerd Möllmann
2024-04-17 14:22     ` Björn Bidar
     [not found]     ` <661fdb15.5d0a0220.4bfec.1ac8SMTPIN_ADDED_BROKEN@mx.google.com>
2024-04-17 14:34       ` Gerd Möllmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m2pluogiy5.fsf@Pro.fritz.box \
    --to=gerd.moellmann@gmail.com \
    --cc=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).