On Sunday, June 30th, 2024 at 13:15, Gerd Möllmann <gerd.moellmann@gmail.com> wrote:
> Pip Cet pipcet@protonmail.com writes:
> 
> > > True. I forgot to mention an important thing: When something is splat,
> > > set flag(s) in the dependent hash table indicating that something must
> > > be done because of that splatting. In gethash and so, check the flag and
> > > do what's necessary. (I did something similar for the weak hash tables
> > > in CMUCL, and it wasn't entirely bad. And weak tables should be rare.)
> > 
> > Not necessarily rare, particularly not if we turn BUF_MARKERS into a
> > weak hash table (I still don't see why we shouldn't do that, maybe I
> > missed it).
> 
> 
> Hm, don't know. On the one hand, there's Stefan's gap buffer data
> structure, and on the other hand add_marker and remove_marker are now
> O(1) in igc, modulo bugs. So the pressure has decreased.

Well, I needed a weak hash table to test things on, which is why I've included the change in the attached patch.

> > But, yes, that's a good idea and requires far fewer changes to the
> > hash table code than I've now made locally. However, I've decided to
> > go through with it and have just successfully splatted my first weak
> > hash table entry.
> 
> Congrats! :-). I have to say, my ideas are idle musings, that I'll
> probably never realize. Working code wins :-).

I've implemented weak-key hash tables for the scratch/igc branch. Like
existing hash tables, the tables themselves never shrink automatically,
but can grow; entries can be removed by GC as well as manually, and the
table's apparent size will change immediately.

Unfortunately, much of the hash table code had to be duplicated: weak
hash tables are now composed of two objects which, in MPS language, are
"dependent objects" of each other: this means that when an entry in a
weak hash table is being removed during GC, we must not modify anything
except non-MPS memory and at most one object in addition to the one being
scanned.

It's possible to make strong hash tables a (boring) special case of weak
hash tables, but that seems backwards to me: strong hash tables need to
be fast, and they're the normal case, so extra indirection to make them
behave more like the limited weak hash table case is worse than
duplicating some code.

There's a lot of boring TODO work:
 * make them readable in lread.c
 * restore them properly from the dump rather than, as the current code does, making and keeping them strong
 * user-defined hash functions
 * weak-value hash tables
 * stop wasting quite as much memory by overallocating everything

Some intermediately-interesting TODO work:
 * key-and-value hash table weakness

And one very challenging item:
 * key-or-value hash table weakness

My understanding of the latter is that a reference to, say, the value
would need to keep the entire key/value pair alive. I'm not sure how to
do that with MPS, and I'm afraid the best course of action may be to
deprecate this rare feature.

(Of course, one possible implementation is to add another word to the
IGC header to keep alive a list of value-or-keys associated with this
key-or-value, but that seems disproportionately expensive).

There is a special optimization that MPS performs on 32-bit x86 systems which
requires all words of objects in our weak pool to be either misaligned
or valid pointers to (in our case) a struct igc_header. There's some
initial work toward supporting that, but the patch will probably not
work on 32-bit x86.

More TODO:
 * iterating over a weak hash table may be interrupted by GC. What happens then?
 * restore the weak vector code for Lisp_Marker, as Gerd indicated he prefers that to using a weak hash table. I removed it so we'd actually be working with weak hash tables.

Anyway, just thought sharing a snapshot of the current code might make discussing things easier.

Patch is against b278c7ede95816ba972f64647b68d05d88cc9f18.

Pip