On Sunday, June 30th, 2024 at 13:15, Gerd Möllmann wrote: > Pip Cet pipcet@protonmail.com writes: > > > > True. I forgot to mention an important thing: When something is splat, > > > set flag(s) in the dependent hash table indicating that something must > > > be done because of that splatting. In gethash and so, check the flag and > > > do what's necessary. (I did something similar for the weak hash tables > > > in CMUCL, and it wasn't entirely bad. And weak tables should be rare.) > > > > Not necessarily rare, particularly not if we turn BUF_MARKERS into a > > weak hash table (I still don't see why we shouldn't do that, maybe I > > missed it). > > > Hm, don't know. On the one hand, there's Stefan's gap buffer data > structure, and on the other hand add_marker and remove_marker are now > O(1) in igc, modulo bugs. So the pressure has decreased. Well, I needed a weak hash table to test things on, which is why I've included the change in the attached patch. > > But, yes, that's a good idea and requires far fewer changes to the > > hash table code than I've now made locally. However, I've decided to > > go through with it and have just successfully splatted my first weak > > hash table entry. > > Congrats! :-). I have to say, my ideas are idle musings, that I'll > probably never realize. Working code wins :-). I've implemented weak-key hash tables for the scratch/igc branch. Like existing hash tables, the tables themselves never shrink automatically, but can grow; entries can be removed by GC as well as manually, and the table's apparent size will change immediately. Unfortunately, much of the hash table code had to be duplicated: weak hash tables are now composed of two objects which, in MPS language, are "dependent objects" of each other: this means that when an entry in a weak hash table is being removed during GC, we must not modify anything except non-MPS memory and at most one object in addition to the one being scanned. It's possible to make strong hash tables a (boring) special case of weak hash tables, but that seems backwards to me: strong hash tables need to be fast, and they're the normal case, so extra indirection to make them behave more like the limited weak hash table case is worse than duplicating some code. There's a lot of boring TODO work: * make them readable in lread.c * restore them properly from the dump rather than, as the current code does, making and keeping them strong * user-defined hash functions * weak-value hash tables * stop wasting quite as much memory by overallocating everything Some intermediately-interesting TODO work: * key-and-value hash table weakness And one very challenging item: * key-or-value hash table weakness My understanding of the latter is that a reference to, say, the value would need to keep the entire key/value pair alive. I'm not sure how to do that with MPS, and I'm afraid the best course of action may be to deprecate this rare feature. (Of course, one possible implementation is to add another word to the IGC header to keep alive a list of value-or-keys associated with this key-or-value, but that seems disproportionately expensive). There is a special optimization that MPS performs on 32-bit x86 systems which requires all words of objects in our weak pool to be either misaligned or valid pointers to (in our case) a struct igc_header. There's some initial work toward supporting that, but the patch will probably not work on 32-bit x86. More TODO: * iterating over a weak hash table may be interrupted by GC. What happens then? * restore the weak vector code for Lisp_Marker, as Gerd indicated he prefers that to using a weak hash table. I removed it so we'd actually be working with weak hash tables. Anyway, just thought sharing a snapshot of the current code might make discussing things easier. Patch is against b278c7ede95816ba972f64647b68d05d88cc9f18. Pip