From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: =?utf-8?Q?Gerd_M=C3=B6llmann?= Newsgroups: gmane.emacs.devel Subject: Re: Some experience with the igc branch Date: Thu, 26 Dec 2024 08:57:17 +0100 Message-ID: References: <87o713wwsi.fsf@telefonica.net> <864j2u442i.fsf@gnu.org> <87ldw6as5f.fsf@protonmail.com> <86o7112rnq.fsf@gnu.org> <867c7p2nz4.fsf@gnu.org> <861pxx2lh7.fsf@gnu.org> <86ldw40xbo.fsf@gnu.org> <868qs329kj.fsf@gnu.org> <864j2r25hg.fsf@gnu.org> <861pxv234y.fsf@gnu.org> <86zfkjznat.fsf@gnu.org> <86ldw2zy6s.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="6022"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Cc: pipcet@protonmail.com, ofv@wanadoo.es, emacs-devel@gnu.org, eller.helmut@gmail.com, acorallo@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Thu Dec 26 08:58:16 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1tQil6-0001OX-Aq for ged-emacs-devel@m.gmane-mx.org; Thu, 26 Dec 2024 08:58:16 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tQikJ-0002ev-3w; Thu, 26 Dec 2024 02:57:27 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tQikI-0002en-Eh for emacs-devel@gnu.org; Thu, 26 Dec 2024 02:57:26 -0500 Original-Received: from mail-ed1-x52e.google.com ([2a00:1450:4864:20::52e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1tQikE-0006JL-L9; Thu, 26 Dec 2024 02:57:26 -0500 Original-Received: by mail-ed1-x52e.google.com with SMTP id 4fb4d7f45d1cf-5d3e6f6cf69so10153952a12.1; Wed, 25 Dec 2024 23:57:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1735199840; x=1735804640; darn=gnu.org; h=content-transfer-encoding:mime-version:user-agent:message-id:date :references:in-reply-to:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=AfHPoslNKEr6kmLinmZ2SyoKeMihjhUBF3GMq8nVmrM=; b=DyqlRB5QJylym59zulcviOo1HL+fALOj9FXNDiOOrxVDfqEIatpwQKYOuj29iu3PMs 7H/QMtqzUEvNmK8cypkSwDdpbDw0OqRcML1zpw6ivYGYzsDiq/4/ynpRxVxcdUDZszg9 dUnSec6EQnOxm6xkn+fUFAu7OXLfMkFZ6TAfGfEM5mgL2y7xSngB4AZRv+XS9v/E9NoH 30ux7vrmZ3D9pq39RLZwUY/d3dzg1W4E6R/Itz2AHvdebXmufULymawIAYV5WgRLm9Q/ 30jRKI6o4WrV3yZ/wy7xSpq4zb2EOdCq5CjYSgtDIvcpNprTEn2OOcrddpoKDULxcHFS GoMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1735199840; x=1735804640; h=content-transfer-encoding:mime-version:user-agent:message-id:date :references:in-reply-to:subject:cc:to:from:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=AfHPoslNKEr6kmLinmZ2SyoKeMihjhUBF3GMq8nVmrM=; b=dmPvOqCB41ethbYtNTIDYR3lFATxIviLo885bTxiR3gutwniS9d23wQ7v1J+Tt2t4V 2udoQre1JoQrnKKcklkLZ8RVtZX089Nn8vnoeUcpESvu7BZ6CKRrl9bckCALTny4tGMR q/0/wVLam27o3Sw9AGh9C8sXidybS4AXCn7atNsotxbLVjfPvFL8Jh08KSWUjH+nuLfB S4oeDWJB0Em811ryy9p7Pdf6PSdCQEdJmSbvg/GWWOdoFsNcWoQ7uQR1h9MJsCeUTjge auKCrkntyOtFgzH+FaA8aw2cnuCLNDjZTwbMt5kOy6FgPM7mn9TTYQYP4oAZfLdTWtKA gIuA== X-Forwarded-Encrypted: i=1; AJvYcCUYLfYbtoaqHvp0ng3AR/W5MRbT72H/szcZ+MTosFK0fZ5PAdRtcTkUegjWXRn4H464XhqwqHDzpQ==@gnu.org, AJvYcCVJZSbAVn1YPB7r48Dy7c7omaNd7doKe+FuHLP9RVGzH+pZwpocdCtaZq5lZGso9P4bE/qwBBD4YR4wcPk=@gnu.org X-Gm-Message-State: AOJu0YyUbp+8g3BNujoskJMN1KJcBFQgP0kAG9OpTPTgWSIfBuflZQF/ ze2D4bdi/q6EJAXDDieDrn2XzXOW6GnjblRTTzlehy5ZC2MjKyXHWed58g== X-Gm-Gg: ASbGncsxNmmcDEDWoSweVZkUZ0yjbeljML9keSU6TeNAJX93kAospmDwF0N2bNON8AD YSE56LOA2B7XH18hV187BoTs1Dv8hyvx1vWjbWAlEXcSQ32sDOjnS0qcN9sWfBMzay/1i2O+cel bpRW4uEO9aRCah9usgv30uxTdMb/wb4rIurt2tnEaL7Xx7BfGiYL4M0btJlKLEenMReQsKsL/cu eRxWJ90MukxtCs5qp6qPO3HF4T/g2ffwIVFragFXubkHkinyG/X4vSGeJ6QHi48kCBBp6R/+CJb MgxEAQ86SVNLsXTTCDqjtRMblUHJx3hnvqtTy0Tyzt57BadNRiJRVEAFY7XnM8rX2A== X-Google-Smtp-Source: AGHT+IGlCO4H4q0GA4gC8o7WBEyc05xGedMwIwLQ38KMPhu40jkOxh/Jqeu4eAeX2R+9/r2WqROaOQ== X-Received: by 2002:a05:6402:388a:b0:5d0:e3fa:17e2 with SMTP id 4fb4d7f45d1cf-5d81dda9517mr22748103a12.11.1735199839676; Wed, 25 Dec 2024 23:57:19 -0800 (PST) Original-Received: from pro2 (p200300e0b747910049c6927f9e8bb023.dip0.t-ipconnect.de. [2003:e0:b747:9100:49c6:927f:9e8b:b023]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-5d80701abdcsm8678175a12.74.2024.12.25.23.57.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Dec 2024 23:57:18 -0800 (PST) In-Reply-To: <86ldw2zy6s.fsf@gnu.org> (Eli Zaretskii's message of "Thu, 26 Dec 2024 09:43:39 +0200") Received-SPF: pass client-ip=2a00:1450:4864:20::52e; envelope-from=gerd.moellmann@gmail.com; helo=mail-ed1-x52e.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:327138 Archived-At: Eli Zaretskii writes: >> From: Gerd M=C3=B6llmann >> Cc: pipcet@protonmail.com, ofv@wanadoo.es, emacs-devel@gnu.org, >> eller.helmut@gmail.com, acorallo@gnu.org >> Date: Thu, 26 Dec 2024 06:25:33 +0100 >>=20 >> Eli Zaretskii writes: >>=20 >> >> >> https://memory-pool-system.readthedocs.io/en/latest/guide/index.= html# >> >> >>=20 >> >> >> help? >> >> > >> >> > Not really. Their documentation is (a) too vague/abstract/circular >> >> > (i.e., stuff is not explained for itself, but instead they >> >> > cross-reference to some other stuff -- e.g., try to understand what= is >> >> > a "root"), and (b) not specific to Emacs, so the relation between M= PS >> >> > terminology and Emacs objects is missing. >> >>=20 >> >> Too bad. In that case, I guess I should consider to begin to think ab= out >> >> planning to start to try write something up. That's a bit like pulling >> >> teeth, TBH. Don't know how long that will take. >> > >> > Thanks. No matter how long it will take, it will be shorter than >> > eternity. >>=20 >> Did what I wrote in this thread help you understand/judge what I >> proposed better? Please ask further questions if you want. > > It does help, to some extent. But I'm still in the dark regarding > some aspects of this. I'll keep asking questions, but there's a limit > to which I feel myself entitled to ask questions without annoying too > much. Which is why I suggested to write commentary instead, to get > some of the basics out of the way. It doesn't have to be you > personally who writes the commentary, btw: anyone who knows the > answers to the questions I posted could do that. With that in mind, > let me again post the questions which I'm currently struggling with: > > . what are "roots"? > . what is the purpose of each root_create_SOMETHING function in igc.c? > . what is the difference between "exact" and "ambiguous" roots, and > when should we use each one in Emacs? Coincidentally, I wrote this today, which might help, and would also be interesting to get some feedback for.=20 * Introduction / Garbage collection Implementing a programming language like Lisp requires automatic memory management which frees the memory of Lisp objects like conses, strings etc. when they are no longer in use. The automatic memory management used for Lisp is called garbage collection (GC), which is performed by a garbage collector. Objects that are no longer used are called garbage or dead, objects that are in use or called live objects. The process of getting rid of garbage is called the GC. A large number of GC algorithms and implementations exist which differ in various dimensions. Emacs has two GC implementations which can be chosen at compile-time. The traditional (old) GC, which was the only one until recently, is a so-called mark-sweep, non-copying collector. The new GC implementation in this file is an incremental, generational, concurrent (igc) collector based on the MPS library from Ravenbrook. It is a so-called copying collector. The terms used here will become clearer in the following. Emacs' traditional mark-sweep GC works in two phases: 1. The mark phase Start with a set of so-called (GC) roots that are known to contain live objects. Examples of roots in Emacs are - the bytecode stacks=20 - the specpdl (binding) stacks - Lisp variables defined in C with DEFVAR - the C stacks (aka control stacks) - ... Roots can be either exact or ambiguous. If we know that a root always contains a valid Lisp object reference it is called exact. Example for an exact root would be the root for a DEFVAR. The DEFVAR variable always contains a valid Lisp object. If we only know that a root might potentially reference a Lisp object, that root is called ambiguous. An example for an ambigupus root is the C stack. The C stack can contain integers that look like they are a reference to a Lisp object, but actually look like that only by happenstance. Ambiguous roots are said to be scanned conservatively. We make the conservative assumption that we actually have seen a valid reference, even if it is actually not. GC scans all roots, conservatively or exact, and marks the Lisp objects found in them live, plus all Lisp objects reachable from them, recursively. In the end, all live objects are marked, and all objects not marked are dead, i.e. garbage. 2. The sweep phase. Sweeping means to free all objects that are not marked live. The collector iterates over all allocated objects, live or dead, and frees the ones that are dead so that their memory can be reused. The traditional mark-sweep GC implementation is - Not concurrent. Emacs calls GC explicitly in various places, and proceeds only when the GC is done. - Not incremental. The GC is not done in steps. - Not generational. The GC doesn't take advantage of the so-called generational hypothesis, which says that most objects used by program die young, i.e. are only used for a short period of time. Other GCs take this hypothesis into account to reduce pause times. - Not copying. Lisp objects are never copied by the GC. They stay at the addresses where they their initial allocation puts them. Special facilities like allocation in blocks are implemented to avoid memory fragmentation. In contrast, the new igc collector, using MPS, is - Concurrent. The GC runs in its own thread. There are no explicit calls to start GC, and Emacs doesn't have to wait for the GC to end. - Incremental. The GC is done in steps. - Generational. The GC takes advantage of the so-called generational hypothesis. - Copying. Lisp objects are copied by the GC, avoiding memory fragmentation. Special care must be taken to take into account that the same Lisp object can have different addresses at different times. The goal of igc is to reduce pause times when using Emacs interactively. The properties listed above which come from leveraging MPS make that possible. * MPS A guide and a reference manual for MPS can be found at https://memory-pool-system.readthedocs.io. It is recommended to read at least the Guide. And what else to write I don't know. * Registry MPS provides an API of C functions that an application can use to interface with the MPS. For example, MPS cannot know what the GC roots of an application are, or what threads the application considers relevant for GC. So, MPS provides functions that an application can use to define its roots and threads. The MPS functions typically return a handle to an MPS object created for what is defined by them. For example, when we define a root with mps_root_create_area, MPS gives us a mps_root_t handle back. The handles returned can later be used to change or delete what we created. For example, we use the handle we got from MPS for a root to destroy the root with mps_root_destroy. All MPS handles we create, we collect in a global registry of type struct igc which is stored in the global variable global_igc and can be accessed from amywhere.=20 * Initialization - Roots - Threads - Pools, Object formats * Lisp Object Allocation Object Header? * Malloc with roots * Memory barriers