From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: =?utf-8?Q?Gerd_M=C3=B6llmann?= Newsgroups: gmane.emacs.devel Subject: Re: MPS experiment successful Date: Wed, 17 Apr 2024 16:30:42 +0200 Message-ID: References: <86le5cgmp3.fsf@gnu.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="27405"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Cc: emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed Apr 17 16:31:32 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1rx6Jv-0006xL-82 for ged-emacs-devel@m.gmane-mx.org; Wed, 17 Apr 2024 16:31:31 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rx6JI-0004Fy-CJ; Wed, 17 Apr 2024 10:30:52 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rx6JF-0004FY-Rs for emacs-devel@gnu.org; Wed, 17 Apr 2024 10:30:49 -0400 Original-Received: from mail-lj1-x232.google.com ([2a00:1450:4864:20::232]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rx6JD-00009c-9o; Wed, 17 Apr 2024 10:30:49 -0400 Original-Received: by mail-lj1-x232.google.com with SMTP id 38308e7fff4ca-2d858501412so72915921fa.0; Wed, 17 Apr 2024 07:30:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1713364244; x=1713969044; darn=gnu.org; h=mime-version:user-agent:message-id:date:references:in-reply-to :subject:cc:to:from:from:to:cc:subject:date:message-id:reply-to; bh=8Fzw2BZzir8VDd4z8+NSrCp7ywySAUe5Xc5pN2eMJsE=; b=YVMaAG/KA4EPoJKHmzVEXqpGC6Kd3HxyySIzqcEnoFXeUy3zkf2vJMSh3kYRu85QcD n5rWDpbirEdIaYYokiZfkQcxBRl2qtlHuvLp7jCvr2hSUz0+LYMcxR7+BEIytQwOf3wQ KgQ9lib1Vue21OL1Pb5wgUjBQJWLE2vaQx15J5oPA/fl2TxECnppMJJSVmmu5WPNqi0H bEHCgvd5oHZoB879I/3zfUKXdNWHDoKLOE7t/pkLrHu/sp4Ya13wlt8a6kjMoClNLZTT O4QpU+JnbzGbO/u+qtagqV1RPhQrbHfj1sfv3y7E+AOX9Ja2J9Wf80/QTX0Kkje+84CW waZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713364244; x=1713969044; h=mime-version:user-agent:message-id:date:references:in-reply-to :subject:cc:to:from:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=8Fzw2BZzir8VDd4z8+NSrCp7ywySAUe5Xc5pN2eMJsE=; b=u/StOUTwZAIBJx5HGqtRpaldOW0K1HxF1y+gNDJevtV+XtdvxaFgHBGwjj37O/C1KU bHs8zUoqTk1xl3+THx83eoqQ0BqjRSZzWNs0uVyBZYCBx0+7INww5Od9YrO4SWj799hJ WOUnSeOMizKSQ8nT3bAQzLo/ZsFuAkKWbaXqZe6ZXoYO92hMH0G2BPdYB+Z1mGXC0OM8 1dB5IvswCw+mmQK8caUpMKFav2CC3zRoW94FjY51dqoMt1dW5Nrv96B8mHxp7kxvVQFV 5QK0JC+FaGlhCpt9Y5icQKd/OPAOBPs1FS4vsT9JQPdpTnmtE1jD7JKinNFkPl7rnjtx MBpQ== X-Gm-Message-State: AOJu0YwOK01e6aCmDO4y9dBJ6d480/bBISbIXA2+hmKeKaalxGHOviWL 9mbU/Q1gGM9BMaDvfl2iosmVc42UJIXcbhvhobnsMUPWxq/2lrSAyeyMK1jd X-Google-Smtp-Source: AGHT+IG/dCZh3Ebj79IDe20FyTmN0oZkL7BPrWIZSaBPEUEu8EB54WJFNQuQUS6j2yWspzC4JTfWuQ== X-Received: by 2002:a2e:9f49:0:b0:2d8:3b49:f831 with SMTP id v9-20020a2e9f49000000b002d83b49f831mr10383149ljk.2.1713364243899; Wed, 17 Apr 2024 07:30:43 -0700 (PDT) Original-Received: from Pro.fritz.box (pd9e3638b.dip0.t-ipconnect.de. [217.227.99.139]) by smtp.gmail.com with ESMTPSA id q3-20020adff943000000b00346cc85c821sm16954629wrr.89.2024.04.17.07.30.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Apr 2024 07:30:43 -0700 (PDT) In-Reply-To: <86le5cgmp3.fsf@gnu.org> (Eli Zaretskii's message of "Wed, 17 Apr 2024 16:09:44 +0300") Received-SPF: pass client-ip=2a00:1450:4864:20::232; envelope-from=gerd.moellmann@gmail.com; helo=mail-lj1-x232.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:317771 Archived-At: --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Eli Zaretskii writes: >> That's all. There is nothing more. And I'm currently undecided how to >> proceed with this. > > The next step would be to push a feature branch into the Emacs Git > repository and let people try the branch on the other supported > platforms. If the code is not yet mature enough for that, please > suggest how to get from here to there. I think technically it could be done, but there are some things that would need to be done. From the top of my head: - CL packages vs. obarrays (I do have packages but no obarrays) - Handling of pure space - Handling the new JSON stuff, which I don't have in my branch, I think. - ... And then of course the general work of getting a branch from a what is basically a fork back to the forked repo, which I'm not so sure how to go about that. But my undecidedness comes more from what could come after that. I'm not sure how willing I am to invest what can be years to get things finished, esp. if no one else starts working on it. Had that one time too often in the 90s, some might remember ;-). > Would you like to describe in a few words what would be the advantages > and disadvantages of this GC for Emacs? Advantages for users: For the user it means the final end to GC pauses. MPS runs in a different thread wich can be in a different core. What MPS means for the overall speed of Emacs I find impossible to say. MPS doesn't seem to be slow at all though, although some slowdown can be expected in the client because of thread-safe allocations. Example for a confusing fact that I noticed today: Full build without MPS, -O0, checking real 26:20.01 user 1:32:15.54 sys 3:20.74 Same build with debug MPS (-lmps-debug) real 14:07.90 user 44:17.94 sys 3:15.99 That's on an 2,3 GHz Quad-Core Intel Core i7, 16G RAM, SSD. =F0=9F=A4=B7 Possible advantages for developers of Emacs: - MPS is thread-safe, i.e. it's a tiny step in that direction. - I think igc.c is easier to understand that alloc.c. Disadvantages: As long as there are 2 GCs, there are 2 instead of 1 GC :-). Will increase memory usage in its current state, part of which could be optimized. Some configurations/platform might not be possible to support. Hard to say. ... Let me also attach a badly written and curated Org file that I have in that branch. Maybe that has some additional answers --=-=-= Content-Type: text/x-org Content-Disposition: attachment; filename=igc.org Content-Description: igc.org #+title: MPS garbage collection for Emacs * What is MPS? The MPS (Memory Pool System) is a GC library developed by Ravenbrook Ltd. MPS is available from [[https://github.com/Ravenbrook/mps?tab=readme-ov-file][Github]] under a BSD license. See the [[https://memory-pool-system.readthedocs.io/en/latest/][documentation]]. In short, MPS implements incremental, generational, concurrent, copying, thread-safe garbage collection on a large variety of platforms. It has been around for a long time, is stable, and well documented. * What is this branch? This [[https://github.com/gerd-moellmann/emacs-with-cl-packages/tree/igc][branch]] is an experiment if Emacs can be made to use a GC based on MPS. I'm doing this for my own entertainment, it's not in any form "official". * Caveats This is my local Emacs, which is different from mainstream Emacs: It uses CL packages, doesn't have obarrays, doesn't support pure space, does not support shorthands and probably some other stuff. In addition, I'm exclusively using macOS, so it's unlikely to compile or run on other systems OOTB. It should not be too hard to port, though. * Current state Build succeeds up to and including =compile-first=, i.e. Emacs pdumps, and compiles some =.elc= files. * Things worth mentioning ** Configuration There is a now configure switch =--with-mps= with values =no, yes, debug=. If =debug= is given, Emacs links with the debug version of the MPS library. ** Building MPS I built MPS from its Git repo. I had to make two trivial fixes for macOS for which I submitted issues upstream. ** Every object has a 1 word header At the moment, every object has a one-word header, which is not visible to the rest of Emacs. See ~struct igc_header~. This means in particular that conses are 50% larger than they would normally be. I did this because it is less work: - All objects can be handled by one set of MPS callback functions. - It simplifies the implementation of eq hash tables considerably by storing an address-independent hash in the header. The header can be removed from conses (and other objects, if that's worth it) by writing additional code using additional MPS pools with their own object formats. Note that doing this also means that one has to use MPS's location dependency feature for implementing eq hash tables. Also be aware that two calls to ~sxhash-eq~ can then return different hashes when a concurrent GC happens between calls, unless something is done to ensure that the hashed objects aren't moved by the GC for long enough. ** MPS In-band headers I have tried to use MPS in-band headers at first, but couldn't get it to work. I don't claim they don't work, though. After all I was and still am learning MPS. ** Weak hash tables I didn't think that weak hash tables were important enough for my experiment, so I didn't implement them to save work. Weak tables can be implemented using the already present in =igc.c= AWL pool and its allocation points, and then using MPS's dependent objects in the hash table implementation. There are examples how to do this in the MPS documentation, and in an example Scheme interpreter. To prepare for that, keys and values of a hash table are already split into two vectors. Two vectors are necessary because objects in an AWL pool must either contain weak references only, or strong references only. The currently malloc'd vectors would have to be replaced with special vectors allocated from the AWL pool. ** Handling of a loaded pdump The hot part of a loaded pdump (ca. 18 MB) is currently used as an ambiguous root for MPS. A number of things could be investigated - Use a root with barrier (~MPS_RM_PROT~) - Copy objects from the dump to an MPS pool that uses ~MPS_KEY_GEN~ to allocate objects in an old generation. It is unclear to me from the docs if the AMC pool supports that, but one could use an AMS pool. After loading a dump we would copy the whole object graph to MPS, starting from static roots. After that, the dump itself would no longer be used. Costs some load time, though. There is also a slight problem currently that's a consequence of Emacs mixing GC'd objects and malloc'd ones. The loaded dump is scanned conservativly, but if such objects contain malloc'd data structures holding references, these are invisble to MPS, so one has to jump through hoops. Examples: - Hash tables hold keys and values in malloc'd vectors. If the hash table is in the dump, and the vectors are on the heap, keys and values won't be seen be MPS. - Symbols in the dump may have a Lisp_Buffer_Local_Value that is on the heap. - Buffers have a itree_tree that is malloc'd. ** Intervals and ~itree_node~ Problem with these two is that there are pointers from Lisp objects to malloc'd memory and back. This is easier to handle if allocated from MPS. Moving these to MPS makes things easier because MPS triggers the scanning, and, not the least, makes an ambiguous scan of the loaded dump keep things alive. ** Finalization Is now implemented. ** Things old GC does except GC The function ~garbage_collect~ does some things that are not directly related to GC, simply because it is called every once in a while. - compact buffers, undo-list. This is currently not done, but could be done in another way, from a timer, for instance. ** Not Considered Some things are not implemented because they were out of scope. For example, - ~memory-report~ Could be done with MPS's pool walk functionality. - profiler (~profiler-memory-start~...) No idea, haven't looked at it. - Anything I don't currently use either because it doesn't exist on macOS (text conversions, for example), or because I didn't think it being essiential (xwidgets, for example). ** Knobs not tried - Number of generations - Size of generations - Mortality probabilities - Allocation policies, like ramp allocation - ... ** Implementation I think it's not too terrible, but some things should be improved - Error handling. It currently aborts in many circumstances, but it is also not clear what else to do. - Idle time use. It does something in this regard, but not much, and not always with a time constraint (handling MPS messages). ** Debugger MPS uses memory barriers. In certain situations it is necessary to remove these to be able to do certain things. I've added a command =xpostmortem= to the LLDB support for that. GDB will need something similar. --=-=-=--