#+title: MPS garbage collection for Emacs * What is MPS? The MPS (Memory Pool System) is a GC library developed by Ravenbrook Ltd. MPS is available from [[https://github.com/Ravenbrook/mps?tab=readme-ov-file][Github]] under a BSD license. See the [[https://memory-pool-system.readthedocs.io/en/latest/][documentation]]. In short, MPS implements incremental, generational, concurrent, copying, thread-safe garbage collection on a large variety of platforms. It has been around for a long time, is stable, and well documented. * What is this branch? This [[https://github.com/gerd-moellmann/emacs-with-cl-packages/tree/igc][branch]] is an experiment if Emacs can be made to use a GC based on MPS. I'm doing this for my own entertainment, it's not in any form "official". * Caveats This is my local Emacs, which is different from mainstream Emacs: It uses CL packages, doesn't have obarrays, doesn't support pure space, does not support shorthands and probably some other stuff. In addition, I'm exclusively using macOS, so it's unlikely to compile or run on other systems OOTB. It should not be too hard to port, though. * Current state Build succeeds up to and including =compile-first=, i.e. Emacs pdumps, and compiles some =.elc= files. * Things worth mentioning ** Configuration There is a now configure switch =--with-mps= with values =no, yes, debug=. If =debug= is given, Emacs links with the debug version of the MPS library. ** Building MPS I built MPS from its Git repo. I had to make two trivial fixes for macOS for which I submitted issues upstream. ** Every object has a 1 word header At the moment, every object has a one-word header, which is not visible to the rest of Emacs. See ~struct igc_header~. This means in particular that conses are 50% larger than they would normally be. I did this because it is less work: - All objects can be handled by one set of MPS callback functions. - It simplifies the implementation of eq hash tables considerably by storing an address-independent hash in the header. The header can be removed from conses (and other objects, if that's worth it) by writing additional code using additional MPS pools with their own object formats. Note that doing this also means that one has to use MPS's location dependency feature for implementing eq hash tables. Also be aware that two calls to ~sxhash-eq~ can then return different hashes when a concurrent GC happens between calls, unless something is done to ensure that the hashed objects aren't moved by the GC for long enough. ** MPS In-band headers I have tried to use MPS in-band headers at first, but couldn't get it to work. I don't claim they don't work, though. After all I was and still am learning MPS. ** Weak hash tables I didn't think that weak hash tables were important enough for my experiment, so I didn't implement them to save work. Weak tables can be implemented using the already present in =igc.c= AWL pool and its allocation points, and then using MPS's dependent objects in the hash table implementation. There are examples how to do this in the MPS documentation, and in an example Scheme interpreter. To prepare for that, keys and values of a hash table are already split into two vectors. Two vectors are necessary because objects in an AWL pool must either contain weak references only, or strong references only. The currently malloc'd vectors would have to be replaced with special vectors allocated from the AWL pool. ** Handling of a loaded pdump The hot part of a loaded pdump (ca. 18 MB) is currently used as an ambiguous root for MPS. A number of things could be investigated - Use a root with barrier (~MPS_RM_PROT~) - Copy objects from the dump to an MPS pool that uses ~MPS_KEY_GEN~ to allocate objects in an old generation. It is unclear to me from the docs if the AMC pool supports that, but one could use an AMS pool. After loading a dump we would copy the whole object graph to MPS, starting from static roots. After that, the dump itself would no longer be used. Costs some load time, though. There is also a slight problem currently that's a consequence of Emacs mixing GC'd objects and malloc'd ones. The loaded dump is scanned conservativly, but if such objects contain malloc'd data structures holding references, these are invisble to MPS, so one has to jump through hoops. Examples: - Hash tables hold keys and values in malloc'd vectors. If the hash table is in the dump, and the vectors are on the heap, keys and values won't be seen be MPS. - Symbols in the dump may have a Lisp_Buffer_Local_Value that is on the heap. - Buffers have a itree_tree that is malloc'd. ** Intervals and ~itree_node~ Problem with these two is that there are pointers from Lisp objects to malloc'd memory and back. This is easier to handle if allocated from MPS. Moving these to MPS makes things easier because MPS triggers the scanning, and, not the least, makes an ambiguous scan of the loaded dump keep things alive. ** Finalization Is now implemented. ** Things old GC does except GC The function ~garbage_collect~ does some things that are not directly related to GC, simply because it is called every once in a while. - compact buffers, undo-list. This is currently not done, but could be done in another way, from a timer, for instance. ** Not Considered Some things are not implemented because they were out of scope. For example, - ~memory-report~ Could be done with MPS's pool walk functionality. - profiler (~profiler-memory-start~...) No idea, haven't looked at it. - Anything I don't currently use either because it doesn't exist on macOS (text conversions, for example), or because I didn't think it being essiential (xwidgets, for example). ** Knobs not tried - Number of generations - Size of generations - Mortality probabilities - Allocation policies, like ramp allocation - ... ** Implementation I think it's not too terrible, but some things should be improved - Error handling. It currently aborts in many circumstances, but it is also not clear what else to do. - Idle time use. It does something in this regard, but not much, and not always with a time constraint (handling MPS messages). ** Debugger MPS uses memory barriers. In certain situations it is necessary to remove these to be able to do certain things. I've added a command =xpostmortem= to the LLDB support for that. GDB will need something similar.