unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Steve Fink <sfink@mozilla.com>
To: Pip Cet <pipcet@gmail.com>
Cc: emacs-devel@gnu.org
Subject: Re: [EXPERIMENT] Emacs with the SpiderMonkey garbage collector
Date: Sat, 2 Dec 2017 21:24:50 -0800	[thread overview]
Message-ID: <ed5aeab5-9394-4c65-fcbb-05ce61cea4ba@mozilla.com> (raw)
In-Reply-To: <CAOqdjBcdoKZ=Jdv3eFCGogHp1t7dVf=DG-cYi+7g8No=6F=7rw@mail.gmail.com>

On 12/2/17 4:37 PM, Pip Cet wrote:
> On Fri, Dec 1, 2017 at 5:57 PM, Steve Fink <sfink@mozilla.com> wrote:
>
>> The one big thing that the analysis doesn't handle particularly well is
>> internal pointers. For now, it sounds like you're mostly safe from these
>> moving because all your data is hanging off of the JS_GetPrivate
>> indirection, so you wouldn't have any pointers internal to GC things. But
>> you can still run into issues keeping things alive. We mostly have this
>> problem with strings, since small strings are stored inline in the GC things
>> and so it's very easy to end up with a char* that will move. We tend to work
>> around this by either (1) passing around the GC pointer instead of the
>> char*; (2) making functions that accept such a char* to also declare that
>> they won't GC by requiring a reference to an 'AutoRequireCannotGC&' token,
>> which is validated by the static analysis; or (3) forcing the contents to be
>> stored in the malloc heap if they aren't already, and just being careful
>> with keeping the owning JSString* in a Rooted<JSString*> somewhere higher on
>> the stack.
>>
>> Collections are also an issue, if you want to index them by GC pointer
>> value.
> It seems I have two options: rewrite all hashed collections whenever
> something moves, or make up a hash value and store it in a private
> slot for each object upon creation. My understanding is SpiderMonkey
> does the former for WeakMaps, and those seem to perform okay, so that
> might be the better option long-term, but I haven't given much thought
> to this and the made-up hash value seems easier to implement...

We've done it two ways, gradually shifting almost everything over to the 
second.

The first was what we called "rekeying", which is just removing and 
re-inserting anything whose pointer changed (or more generally, when any 
pointer that formed part of the key changed.) We had to do some careful 
dancing in our hashtable implementation to be certain rekeying can't 
cause the table to grow (we have tombstones, so the naive approach 
accumulates tombstones.) Most things are now switching over to the 
second approach, which is to key tables off of unique ids. We have a 
separate table mapping anything that needs one to a unique id. But 
that's still just a space optimization; if all of your objects are going 
to end up needing a unique id, then you're paying the extra memory cost 
for everything anyway and you may as well store a hash (or unique id), 
as you say.

If these tables aren't holding strong references (if being a key in the 
table shouldn't keep the key alive), then you need to sweep them too, to 
throw out all of the dead stuff. Even if nothing ever looks up those 
keys again, hash collisions will have you comparing their dead memory 
with lookup keys. And if you have code to sweep the table, I guess you 
can always reuse it during the moving part of the collection to rekey 
everything that is moving. (And if they *are* holding strong references, 
they should be traced.)

The embedder API to hook into this can be seen at 
https://searchfox.org/mozilla-central/source/js/public/GCAPI.h#926

We use GC-aware data structures within spidermonkey (GCHashMap, 
GCHashSet, GCVector) to automate most of this. See eg 
https://searchfox.org/mozilla-central/source/js/public/GCHashTable.h#28 
though even those still require something to call their sweep() methods. 
The WeakCache at 
https://searchfox.org/mozilla-central/source/js/public/SweepingAPI.h#54 
sets itself up to be swept automatically at the right time.

And like I said, those generally use unique IDs. 
https://searchfox.org/mozilla-central/source/js/public/GCHashTable.h#105 
is the hashtable that rekeys instead. (Note that our hashtable 
implementation isn't great. It uses double hashing, and probably ought 
to be replaced with one of the Robin Hood variants, which would change 
rekeying.)





  reply	other threads:[~2017-12-03  5:24 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAOqdjBe98BpWE&#45; Ey8fPY1DmfCiLYnB06d30Xqf_KMm_muvKbDg@mail.gmail.com>
2017-12-01 17:57 ` [EXPERIMENT] Emacs with the SpiderMonkey garbage collector Steve Fink
2017-12-03  0:37   ` Pip Cet
2017-12-03  5:24     ` Steve Fink [this message]
2017-11-23 19:01 Pip Cet
2017-11-24  8:07 ` Paul Eggert
2017-11-24 16:23   ` Pip Cet
2017-11-24 18:20     ` Paul Eggert
2017-11-24 23:27       ` Pip Cet
2017-11-25  0:21         ` Paul Eggert
2017-11-25 23:50           ` Pip Cet
2017-11-24 22:13 ` Stefan Monnier
2017-11-24 23:05   ` Pip Cet
2017-11-25  4:15     ` Stefan Monnier
2017-11-25 23:50       ` Pip Cet
2017-11-26  1:20         ` Stefan Monnier
2017-11-26  4:20           ` Paul Eggert
2017-11-26  5:11             ` Stefan Monnier
2017-11-26 10:27         ` martin rudalics
     [not found]         ` <jwva7z9rgqh.fsf&#45;monnier+Inbox@gnu.org>
     [not found]           ` <9d7be625&#45;85ae&#45;54d5&#45;3897&#45;6f701c8ea124@cs.ucla.edu>
     [not found]             ` <jwvo9npprfw.fsf&#45;monnier+emacs@gnu.org>
2017-12-01  1:03               ` Steve Fink

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ed5aeab5-9394-4c65-fcbb-05ce61cea4ba@mozilla.com \
    --to=sfink@mozilla.com \
    --cc=emacs-devel@gnu.org \
    --cc=pipcet@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).